Formats for Downloading Materials
You can download certain public domain materials from the HathiTrust collection in a variety of formats. Download capabilities follow our current download restrictions. “Temporary access” books cannot be downloaded and will not have these options. Most materials that are available to view in full are also available to download in full, but only for users who are affiliated with a HathiTrust member library and are logged in. Some books do not have download restrictions and may be downloaded by all users. Here is one such book: A Handbook of Greek Lace Making.
For more information about what you can download, please see our help article on What You Can Download from the Collection.
Formats in which you can download a book
PDF: Select this file when you wish to print out the pages or to read the book on your computer.
Details: This file format can be opened in your browser or a dedicated PDF viewer, such as Adobe Reader. The scanned page images are embedded within the file, as well as the plain text OCR that accompanies each page.
EPUB: Select this file when you wish to read the book on your mobile device or eReader.
Details: This file is created from automatically generated OCR, which may contain errors, misspellings, or nonsense characters. You cannot open this file format on a computer unless you have installed an EPUB reader. EPUBs are compatible with many eReaders, such as Nook or Kobo, but are not compatible with Kindle eReaders. To convert to a format compatible with Kindle or open EPUB files on your computer, you will need to use conversion or EPUB display software such as Calibre.
Text (.txt): Select this file when you wish to work with or read the text of the book in a text editor or in word processing software.
Details: This file is created from automatically generated OCR, which may contain errors, misspellings, or nonsense characters. The OCR for the entire book or selected pages are compiled together into one file. You can open this file with your computer’s default text editor, such as Notepad or TextEdit, or open it in other text software, such as Microsoft Word.
Text (.zip): Select this file when you wish to do text mining or data analysis with the book.
Details: This file is created from automatically generated OCR, which may contain errors, misspellings, or nonsense characters. The OCR is provided as individual .txt page files for the pages you have selected and zipped together into a package.
Image (JPEG) or Image (TIFF): Select one of these files when you want to view the scanned images associated with a book.
Details: The images provided are lower resolution versions of the images we received from the contributor. Watermarks are added to the bottom of the image. Each book page is provided as a separate image file.
Because downloading pages as Image (TIFF) is more resource-intensive, you will only be able to download up to ten pages at a time using this format. Using the “Whole item” page range will not succeed unless the entire book is ten pages or less in length.
Some text or ePub files may contain nonsense characters or text problems. These files are created from automatically generated Optical Character Recognition (OCR). As an automated process, it is subject to error. In order to interpret the images of letters on a page, the OCR software must first determine the script and language of the page. When the OCR software misidentifies the script or the language of the volume or the page, it will tend to produce gibberish.
Factors that determine the resolution in which you can download a book
Authenticated users (from member institutions or as guest users) receive additional download options: They can download full books as TIFF, which is fairly resource-intensive. Anyone can download one page at a time.
The maximum resolution at which you can download a file is dependent upon the format in which the document was originally contributed to HathiTrust. Black & white TIFFs are usually 600ppi, but JPEG2000s are usually 300ppi/400ppi. And some volumes are a combination of both, which means only the lower resolution would apply.
Materials in the collection are contributed as JPEG2000 most of the time, and it doesn’t tend to work well with Firefox. If you are having trouble viewing a downloaded file, we recommend using a different browser, or opening the file using Adobe Reader.
If our download options do not meet your needs, we would recommend reaching out to the interlibrary loan department at your local library to see if they can get better quality scans for you. Alternatively, you could travel to a library that has the physical materials and scan them at a higher resolution for yourself.