Content Quality Issues in the Collection

HathiTrust materials are subject to quality review as a fundamental step in the digitization process.  However, despite the quality review, our books were typically scanned through mass digitization processes which focused on speed, and as a result the quality of the content may not meet your purposes.

For books that were digitized by Google: Google is continually improving the quality of the digital scans and Optical Character Recognition (OCR) it delivers to HathiTrust members. Over time, we will be replacing low-quality scans with better ones.

 Seeking a better quality image on a scanned page

If you are seeking a better quality image on a scanned page, all we can offer are the image format options available to you in the left sidebar of the HathiTrust book viewer. If those options won't work for what you need, you may want to find a library near you that holds a print copy of the work and contact them for a new scan from their print copy.

 Reporting images that are hard to read, missing, or otherwise problematic

To report quality issues with page scans, use our Book Viewer to navigate to the first page scan that has quality issues. Then click on “Get Help” at the top of the page, and select “Report a problem” from the drop-down menu.

 Image quality and search

The quality of scanned images can affect the way they can be searched. Poor quality scanned images contribute to OCR errors. In some cases, the poor quality images cause the OCR engine to guess the wrong language. In other cases, only some occurrences of keywords may be affected. Depending on the severity of the OCR errors, the text may not be searchable at all, or searches for most words in the text may succeed. Text quality can be improved by improvements in OCR software and by human correction. We will incorporate better text whenever possible.

 “Page not available” message on images

This message is displayed in 3 types of scenarios:

  • Pages were missing from the library's print copy of the work. If a work does have a missing page, this generally means that two pages are missing, since publishers generally print on both sides of the page. So, if pages 81-82 of a work are missing, there should be two pages with the message "Page not available" between pages 80 and 83.

  • One or more pages were not scanned.

  • In some cases, Google will misidentify a page, leading them to believe that a page is missing when it is not. For example, if they misidentify p. 206 as "205," they will think p. 206 is missing. They will insert a page to display "Page not available," although there is no missing page. Please notify us if you believe a page has been misidentified in this way.

The difference between page numbers and scan numbers: Page numbers are the numbers originally printed in the book. Page numbering typically starts with number 1 at the first page of the text. Some books may include roman numerals for introductory material. Journals may contain duplicate page numbers in a digital scan, for example, two different issues both have a page 10.

Scan numbers are the numbers of the digital scan of the book. The very first scanned image (typically the cover of the book) starts with number 1, and all images are counted, even blank or rubbish pages. We display both page and scan numbers in the book viewer.

 Related articles