Economics of Book Digitization

Digitizing books still has some challenges, but I believe the economics of it are clear.  Nonetheless, some misunderstandings persist. I’d like to review some of the most basic facts about book digitization that I’ve learned over the past seven or so years.

Most attention is paid to the cost of scanning (photographing the pages and processing them), but I cannot emphasize enough that the greatest costs of building a digital library are those borne by the brick-and-mortar libraries.  Libraries spend billions each year building, curating, and maintaining their collections.  So, the real value, and costs, are in the books and the libraries. This aspect is too often overlooked and undervalued.

As for the cost of scanning books, let’s look at some numbers.

  • The Million Books Project in China cost around $6/book.
  • Google‘s library project I estimate to cost well below $10/book, maybe as low as $5/book.
  • The Internet Archive scans books at a cost of 10 cents/page or $30/book. It is more expensive but you get superior quality–I may be biased, but check it out–and that cost also covers periodically reprocessing the books based on new techniques and technologies as well as perpetual storage.
  • All of these projects produce page images for reading, optical character recognition for searching, and access formats like pdf and on-screen viewing.

As for the number of books that have been scanned:

  • Google is now presenting 7 million books scanned books, which I would estimate to represent a $35-70 million project. (They have likely scanned many more books than those they are presenting.)
  • China’s government has scanned 1.4 million books for $9 million.  They have told me they are going to scan another 3 million books starting this summer.
  • India’s government has scanned 600,000 – 1 million books, but I don’t have any indication of their costs.
  • The U.S. government has scanned probably fewer than 100,000 books. Clearly, the US government has a “scanning gap” relative to other governments.
  • Together, U.S. foundations such as the Sloan Foundation, Microsoft, and Yahoo together helped the Internet Archive and Kirtas to scan 600,000 books, for about $14 million.
  • There are now nearly 1.3 million public domain books from various projects on archive.org, which are full-text searchable on openlibrary.org.

So, putting these two sets of figures together, the #1 takeaway from my adventures in book digitization is that building a great library of digital books the size of Harvard or the Library of Congress would require a one-time cost of $300M, for the highest quality scans. $300 million is a small price tag in the scheme of things. As federal spending goes, it’s a drop in the bucket (remember the $231 million Bridge to Nowhere?).

The US library system costs $12 billion a year (with $3-4 billion of that going to publishers’ products). To give just one example, Cornell’s library has an annual budget of $55 million.

I believe that if just 100 top libraries in the US were to put 5% of their acquisition budgets into digitizing, we could have a 10 million-book digital library done in about 5 years.

We now have over 3 million books in the growing public digital libraries. This is an alternative to the private single-access digital library Google is building.

We can build something great if we keep focused on the dream–a library and publishing system that enables communities to thrive through the meaningful sharing of works.

-brewster

Bookmark and Share

Tags: ,

10 Responses to “Economics of Book Digitization”

  1. ResourceShelf » Blog Archive » Brewster Kahle Comments on the Economics of Book Digitization Says:

    [...] Brewster begins the post: Digitizing books still has some challenges, but I believe the economics of it are clear. Nonetheless, some misunderstandings persist. I’d like to review some of the most basic facts about book digitization that I’ve learned over the past seven or so years. [...]

  2. ADA Library Blog Says:

    [...] Brewster begins the post: Digitizing books still has some challenges, but I believe the economics of it are clear. Nonetheless, some misunderstandings persist. I’d like to review some of the most basic facts about book digitization that I’ve learned over the past seven or so years. [...]

  3. is what i do :: The cost of digitizing books (and of spreading the info in them) Says:

    [...] Kahle, over at the Open Content Alliance, has an interesting post about the cost of digitizing books. His overall take: it’s very cheap, especially relative to the cost of maintaining brick [...]

  4. bowerbird Says:

    yes. our librarians have thus far demonstrated
    a lack of will which is astounding and pathetic…

    -bowerbird

  5. Economia cunoaşterii « Says:

    [...] Economics of Book Digitization de pe blogul Open Content Alliance. [...]

  6. Library Views 圖書館觀點 » 一些圖書數位化的數據 Says:

    [...] Open Content Alliance (OCA) 的部落格上整理了一些有關書籍數位化的數據,例如掃描成本、冊數…等。不過作者認為這些成本都比不上一座圖書館在建築物以及館藏購置及維護上的成本,但後者卻常常被忽略或低估了。下面就是這些數據(OCA 以外的數據都是推測的): [...]

  7. Eric Rumsey Says:

    Do you have any details on the Chinese and Indian government scanning efforts? Where are these books available?

  8. Eric Rumsey Says:

    Some of your captcha’s are UNREADABLE! … Others are crystal-clear

  9. kroliolitin Says:

    I need a German kroliolitin journal to contribute my ongoing research, can someone let me know the good source for searching German Journals on various topics.

    Thanks,
    John

  10. Book digitization project doubles to serve visually impaired Says:

    [...] has its own book digitization project, and numerous other initiatives are being [...]