Creating a Level 2 Accessible PDF using OCR

Jessica Blackwood; Kate Brown

45 Creating a Level 2 Accessible PDF using OCR

Accessibility 101 – Creating a Level 2 PDF using OCR

Library Accessibility Services, McMaster University
Paige Maylott (maylotp@mcmaster.ca)

Definitions

OCR: Optical Character Recognition – a tool or app which recognizes and (sometimes) enhances letters in an image so that words may be manipulated with word processing tools or screen readers.
Tagging: This is the programming language that PDFs (and HTML) uses to organize a document. If a webpage or document has remediated and accessible tagging, it will likely scan properly using screen readers.

After reviewing the PDF accessibility levels introduced last chapter, many pdfs you will encounter will often be categorized as level 1 (and unreadable by screen readers). This brief tutorial will show you how to use Adobe Acrobat Pro to improve your document to level 2, and thus accessible to most screen reading software.

Important Note on Adobe Acrobat Pro and Optical Character Recognition (OCR)

While this demonstration shows how to use Adobe Acrobat Pro’s OCR tool, many other programs have an OCR function that you could use to perform this action (with varying levels of conversion power). The program ‘ABBY’ has a robust OCR program that is, perhaps, more effective than Adobe Acrobat Pro, and there are a number of free PDF conversion tools that could also perform this function.

What Exactly does OCR do?

At a basic level, it is the computer retyping what it interprets as letters onto the document. An unprocessed PDF is effectively a picture, so this process of retyping is necessary not only for a user to copy and paste text from a PDF, but also for a screen reader to recognize and read aloud any existing text. OCR tools can also be used to create and organize HTML tags out of your document (thereby recognizing images, tables, and text).

Why Create a Level 2 PDF if it’s not Fully Accessible?

While it’s true that simply running OCR on a PDF does not make it fully accessible, sometimes it is still sufficient for the end user’s needs. While a level 1 PDF is not acceptable, accessibly speaking, a level 2 PDF can still be accessed by most screen readers. So, why not go the extra step and make a level 3 fully accessible PDF?

Time. Currently, creating a level 3 PDF means not only running OCR on the document, but also having a human manually error check multiple factors such as headings, reading order, alt text, tables and more. If you know that the PDF will have a limited audience, and you are aware of that audience’s needs, and if a level 2 PDF will suffice, performing OCR will often be enough. However, if a PDF will have a wide distribution and it is not possible to know who will access it, creating a level 3 fully accessible PDF is the only way to know for sure that your document will be accessible for all.

How to OCR with Adobe Acrobat Pro

Click on Tools, a dropdown will appear.
In the dropdown, expand Text Recognition.
Click In This File to OCR the single file you’ve opened, alternatively you can OCR multiple files at once with the In Multiple Files function.
When the popup appears, select All Pages (or Current Page if it is only a single page long).
Wait for the process to finish.
Check to see if you can now highlight a line of text.
That’s it, you’re done!

If you were to run the Accessibility Checker now, you would see that several errors still exist. You can turn your level 2 PDF into a level 3 by going through that error list and fixing each until no more Xs remain.