Program 4: Creating Accessible PDFs from Scanned Documents


When talking about electronic documents that are inaccessible, we’re typically talking about text that’s been scanned into a digital format and placed online, such as book chapters or journal articles.

Instructors typically use this text as supplementary material in a class, without even realizing that it might be inaccessible to a student with a disability.

Fortunately it doesn’t always have to be inaccessible for someone with a disability. In this video we’re going to walk through some steps that an individual can take to turn an inaccessible PDF into an accessible one.

The first issue we need to look at though is the idea of a clean scan. Careless scans can lead to words that are blurred, or a corner of a page that is unreadable.

This is an accessibility issue for everyone, but for students who rely on text being read aloud, this can be especially frustrating, as the computer will not be able to recognize or read the text aloud. To correct this issue it is simple.

When scanning the item, make sure it’s lying flat on the scanner, and double check that the scan looks clean before posting it.

If you still don’t have a clean scan, try re-positioning the document on the scanner and scan it again. If trouble persists, you may have to go to the library to use a more precise scanner. Once you do get that clean scan, there are still some steps that instructors must follow to make sure that the document is accessible to all.

To do this, you’re going to need a copy of Adobe Acrobat Professional. Most universities offer access to that software to employees at a discounted rate. Alright, let’s walk through the steps of how to create an accessible scanned PDF.

This is a document that was scanned in, and it can’t be read correctly by text readers.

To determine what needs to be fixed, we can run an accessibility check. To do this, we’ll go to Advanced, and we click Accessibility, and first we’ll run a quick check. Quick Check will tell us if there are major problems with this. Right now this one says this document is not structured so the reading order may not be correct. Try different reading orders using the reading preferences panel. So we’ve got some big problems here.

Let’s run a full check now by going to Advanced, Accessibility, and Full Check. What Full Check is going to do is we check what we want and we’re going to check everything-it’s going to look for all kinds of accessibility issues. We hit start checking, it tells us there was a problem, it’s going to give us a report that lists all kinds of things that are wrong with this. In this report, it tells us that we don’t have a language selected, the document is not tagged, and all of the images do not have alternative text.

These are all fairly common problems with scanned text. Let’s tackle taking a look at some of these issues. Let’s start by fixing the language problem.

That’s the easiest thing to fix-that’s probably the most common issue for a PDF that’s not going to read something properly.

To fix the language we go up to the top and click file, we click properties, and then we choose the advanced tab which is already chosen. We look down here and we see that language-nothing is selected in language . We hit the drop down menu and we choose the language we want-in our case English, and we hit okay, and the language issue should be fixed at this point. The next step we’re going to do, which I don’t think needs to be done on this document, but does need to be done on several documents, we need to run Optical Character Recognition. This allows the text on the page to be recognized as text. To do this, we’re going to click on Document up along the top. From documents we’ll click on OCR text recognition, that’s Optical Character Recognition. We’ll hit recognize text using OCR. We’ll hit okay. This says Acrobat cannot perform OCR because this page because this page contains renderable text. So it’s already recognizing text, so we’re in good shape.

If this were not to have recognized this as text, running the OCR would have done that for us. After running the OCR we need to make sure the document is tagged. We already know that this is not tagged. By being tagged, the text that’s now recognized will be readable. To make sure this is tagged, we go up and we clicked advanced, and accessibility, and then we’ll click “add tags to document.”

If the document is already tagged, you will not be able to click this. We’re going to click “add tags to document”, and you’re going to see a little blue line down at the corner of the screen as it’s tagging. It just went through a tagged, and now we should have the document tagged, and we should be able to manipulate the tags at this point.

Let’s take a look at what these tags look like. We want to make sure these tags represent what they’re supposed to represent.

To find out what’s tagged as what right now, we’re going to click on Advanced, Accessibility, and Touch up Reading Order. When we do that, it’s going to highlight all of the tags within the document. It’s going to give us a touch up reading order panel, and the touch up reading order panel is right here. As you can see in this PDF, everything has a blue box around it. The blue boxes with exes through them like this, this, and in this case, this, are recognized either as figures and tables. Anything without the exes, such as this and this, are recognized as just text. What we need to do here now is we need to look and make sure that everything is recognized the way it is supposed to be. In our case, we’re going to go up here and make this figure option a little bit smaller. So if we see something that we don’t want, we can left click on it, then right click on it and choose “Delete the structured item.”

Now if we want to then draw in a new item for something that’s not covered, in this case, our logo of the University of Iowa is not tagged at all, nothing will be indicated at this point, we can just draw a box around it, and then come into our touch up reading order menu, say that we want this to be a figure, we will click on that, as a figure and it becomes a figure. Now if we look down here we’ve got an issue down here as well.

We’ve got text that’s listed as a table. We need to change that. We want everything that’s text to be just text. To change this, we’re going to again left click on the box, then right click, and choose delete selected item structure. That gets rid of our table, but now this text is not recognized as anything down there, and we want that to be recognized as text. So if we want that to be recognized as text and in this case associated with the text above it, we can draw a box around that text that extends into the box of text above it.

You’ll see everything that was highlighted is now boxed in blue, and then we come over to our touch up reading order menu and we select text. We can do that for both of these other options as well. We can select our text, choose text, and it becomes text as well. So we’ve got just a little more text here that’s not included in anything. We’ll draw our box around it, and then we’ll go and we’ll turn that into text.

So now we’ve cleaned this up so we’ve got our figures the way we want them, and we’ve got the text indicated the way we want them as well.

We may also want to make sure that up here, Education First is indicated as a top level heading, if we want to do that, we can click on our box, go to our touch up reading order menu and select heading one, and that becomes a heading.

So now that we’ve corrected all our tags, the next step is to add alternative text to pictures or graphics. This will allow text readers to indicate what exactly is in the picture for students with visual impairments.

As you can see, on our document right now, the alternative text says “figure, no alternative text exists.”

If we were to read this, that is exactly what would be read by a text reader.

We need to change this to indicate what that is to a text reader user.

To change this, we’ll hit left click on our box, right click, and click edit alternate text, and well type in the University of Iowa Old Capitol logo. We’ll hit okay and we’ll see that replaced up here as well. Now it says “figure University of Iowa old capitol logo.” We’d also want to do this with the picture here.

We’d want to indicate what was going on in this picture in our alternative text box. We’d want to say, this is a professor John Achrazoglou working with a life sciences student on a computer on a technology project. And I’ll show you what the end product looks like with that, but we would add the text the same way we did on the figure up here. So now we’ve got alternative text added to our figures, we’ve got everything spaced out the way we want, the next thing we need to do is check the order.

We need to make sure that the numbers associated with each tag correspond with the order that we want things to be read. To change the order let me show you what we’re going to do. When I talk about numbers first of all, look up to the left corner of each one of these boxes there’s a number.

So the way this is going to work right now is this will read Education First first, then jump over and read the University of Iowa logo, and then down here it will start to go through our text.

What I don’t like here, is that I would rather have it read the UI logo first, then Education First, then when it gets down into this, I’d like it read this column and then this column, and right now what it’s going to do is read this picture and then this column, but we want it to go down and read all of our text first, and then finish with the picture.

What we’re going to do is we’re going to go down and click show order panel. When we click the show order panel, that same list that we see with our numbers is listed out here numerically so that we can see our reading order. Our Iowa logo is number 2, and I want to move that to number one. What I’ll do is I’ll just click on the box next to number 2, hold that down, and drag it up above number one. And when I drop that there, it should change and you can see now on Education First, it’s now number 2. It used to be number 2 but now it’s number one. This figure is number eight-we want that to be number nine. We want those two to change places. So we’ll go back to our reading order, select the tag next to number 8, click it, and drag it below number nine. We can see the numbers changed here. This is now number eight and this is number nine. So it will read this column, this column, this chunk of text, and then this picture.

So now that we’ve changed our reading order, the last thing we need to do is make sure everything is correct.

So we’ll close out of our order panel, and we will run our accessibility full check again. To do this we’ll go up to Advanced, Accessibility, and Full Check. We’ll use the same options, start checking, and we’ll see that the checker found no problems with the document. We’ll click okay, it’s still going to give us a report that says the checker found no problems within the document.

At this point, we have an accessible document, and here’s an example of how it would read within a text reader. Once you’ve done this, students using reading technology should be able to access the text fairly easily.

Again these are basic steps, if you have forms, you’ll have to follow some additional steps, which we’ll look at in the next video.

Implementing UDL