New Archive Book Reader

June 9th, 2009 by archive

We’ve got a new release of our book reader, which we’ve been working on for the past few months.

In addition to a new theme and user interface, the reader has the following special features:

  • the reader includes unique (and simple to understand) URLs for each page, which update as you move through a book. These URLs can be used in citations and bookmarks, making it easier and more legible when referring to a particular page of a book.
  • books can be viewed in one or two-page mode.
  • in one-page mode, images can be zoomed up to 100% of the original scans. Because the Internet Archive scans are in color, this is an especially nice feature with illustrated books.
  • it has the capability of accommodating books that read right-to-left, such as those books in our Yiddish collection.
  • the reader is supported by all browsers (but IE 6).
  • there is an auto-play feature, so that you can set the pages to turn automatically.
  • As always, the reader is open source. If you have suggestions or bug reports, please add them to the book reader’s launchpad page so the engineers will see and prioritize them.

To test the reader, go to the book’s details page and, in the “View the Book” box, click on the “Read online” link or the animated gif.

Work on the book reader has been done primarily by Michael Ang and Raj Kumar with funding from the Sloan Foundation and the Library of Congress. Icons by Jeffrey Ventrella. We will be developing new features in the future.

If you have comments or questions, please share them with us below.

–brewster


Bookmark and Share

Recent Published Pieces on the Future of Digital Books

June 3rd, 2009 by archive

The Washington Post recently ran this op-ed by me, titled “A Book Grab by Google,” in which I lay out some (but not all) of my problems with the proposed settlement between Google, the AAP, and the Authors Guild. It’s gotten picked up by many blogs, twitterers, and websites, but, for readers of this blog who might not have seen it, I link to it here as well.

I’ve also got a short piece in a new report from Congressional Quarterly Researcher on the future of books, in counterpoint with Dan Clancy of Google, answering the question why I don’t think the settlement will increase digital access to books in the long run. Click on the tab on the left that says “Pro/Con.” (Unfortunately, I’m the “con,” but am looking forward to a return to my usual “pro” self soon enough!)

–brewster


Bookmark and Share

Cornell Removes Restrictions on Public Domain

May 18th, 2009 by archive

We welcome the news that last week Cornell University Library removed all restrictions on its digital public domain holdings. It did so in conjunction with a donation of more than 70,000 digitized public domain books to the Internet Archive.   As these books are processed, they will appear on archive.org.

Cornell has removed restrictions not only on non-commercial use but commercial use as well. University Librarian Anne Kenney explains: “We decided it was more important to encourage the use of the public domain materials in our holdings than to impose roadblocks.”

We applaud Cornell for this move and hope that others will be inspired by its leadership in this vital area. The public domain belongs to everyone and to no one. Attempts to restrict it promise to have a chilling effect on the collective digital library we are trying to build.

Hats off to Cornell!

Brewster Kahle Interviewed on Democracy Now!

April 30th, 2009 by archive

Today Democracy Now! broadcast an interview between Amy Goodman and Brewster Kahle about digitization, the Google Book Search Settlement, and the future of books and libraries (taped on April 17 in San Francisco):

Internet Archive files Intervention Request

April 17th, 2009 by Peter Brantley

Greetings. The Internet Archive is seeking leave to file a motion before the Southern District of New York U.S. District Court to intervene in the matter of The Authors Guild Inc. et al. v. Google Inc. as a party defendant.

Below is the letter delivered to the Court of the Honorable Dennis Chin.

View Request at Scribd: Archive intervention in Google Book Search

New Book Reader

April 1st, 2009 by Peter Brantley

The Internet Archive has released a new book reader in beta. The new version of our book reader provides support for several critical new features. The reader is widgetable, so it can be embedded easily in blog posts or digital asset repository pages. We also support full text search against books, and the new reader can display extremely high resolution images, up to the limit of the archival scan. Finally, a highly desired feature: it supports books written in right to left languages, such as our fantastic new Yiddish collection, and certain CJK historical or display variants.

But best of all: it is open source.

The new reader can be chosen by selecting “FlipBook Beta” off of any book page; an example display on our home — the San Francisco Presidio — can be found in a historical survey (which is pretty cool, itself).

Comments are welcome!

Peter Brantley joins the Internet Archive

March 24th, 2009 by archive

I am thrilled to announce that Peter Brantley will be joining the Internet Archive as our newest Director.   In this role, he will direct our efforts and help coordinate with partners in building an open library and distributed publishing system.

His experience in running the Digital Library Federation and coordinating between publishing, library, and high tech organizations gives him an almost unique ability to succeed in helping books find accessible and profitable forms in the digital age.

We hope you all will welcome him into this new position.

(peter at archive dot org)

-brewster

Economics of Book Digitization

March 22nd, 2009 by archive

Digitizing books still has some challenges, but I believe the economics of it are clear.  Nonetheless, some misunderstandings persist. I’d like to review some of the most basic facts about book digitization that I’ve learned over the past seven or so years.

Most attention is paid to the cost of scanning (photographing the pages and processing them), but I cannot emphasize enough that the greatest costs of building a digital library are those borne by the brick-and-mortar libraries.  Libraries spend billions each year building, curating, and maintaining their collections.  So, the real value, and costs, are in the books and the libraries. This aspect is too often overlooked and undervalued.

As for the cost of scanning books, let’s look at some numbers.

  • The Million Books Project in China cost around $6/book.
  • Google’s library project I estimate to cost well below $10/book, maybe as low as $5/book.
  • The Internet Archive scans books at a cost of 10 cents/page or $30/book. It is more expensive but you get superior quality–I may be biased, but check it out–and that cost also covers periodically reprocessing the books based on new techniques and technologies as well as perpetual storage.
  • All of these projects produce page images for reading, optical character recognition for searching, and access formats like pdf and on-screen viewing.

As for the number of books that have been scanned:

  • Google is now presenting 7 million books scanned books, which I would estimate to represent a $35-70 million project. (They have likely scanned many more books than those they are presenting.)
  • China’s government has scanned 1.4 million books for $9 million.  They have told me they are going to scan another 3 million books starting this summer.
  • India’s government has scanned 600,000 - 1 million books, but I don’t have any indication of their costs.
  • The U.S. government has scanned probably fewer than 100,000 books. Clearly, the US government has a “scanning gap” relative to other governments.
  • Together, U.S. foundations such as the Sloan Foundation, Microsoft, and Yahoo together helped the Internet Archive and Kirtas to scan 600,000 books, for about $14 million.
  • There are now nearly 1.3 million public domain books from various projects on archive.org, which are full-text searchable on openlibrary.org.

So, putting these two sets of figures together, the #1 takeaway from my adventures in book digitization is that building a great library of digital books the size of Harvard or the Library of Congress would require a one-time cost of $300M, for the highest quality scans. $300 million is a small price tag in the scheme of things. As federal spending goes, it’s a drop in the bucket (remember the $231 million Bridge to Nowhere?).

The US library system costs $12 billion a year (with $3-4 billion of that going to publishers’ products). To give just one example, Cornell’s library has an annual budget of $55 million.

I believe that if just 100 top libraries in the US were to put 5% of their acquisition budgets into digitizing, we could have a 10 million-book digital library done in about 5 years.

We now have over 3 million books in the growing public digital libraries. This is an alternative to the private single-access digital library Google is building.

We can build something great if we keep focused on the dream–a library and publishing system that enables communities to thrive through the meaningful sharing of works.

-brewster

Bookmark and Share

Does Richard Sarnoff Think the Google Settlement Is Anti-Competitive?

February 24th, 2009 by mary

According to arstechnica, Richard Sarnoff, the chairman of the Association of American Publishers, in a public presentation at Princeton University, seems to have admitted that the Google Book Settlement is anti-competitive. The piece reports that …

Sarnoff said that the publishers he represents didn’t set out to create a monopoly in the markets for book search engines or online book sales. But he didn’t deny that the settlement could have that effect. After all, he noted, “copyright itself is a monopoly.”…

Sarnoff said that the structure of the registry will be “tough to replicate for [Google's] competitors.”

and, finally,

Sarnoff also speculated that … [l]egal hurdles may make it infeasible for any other firms to build a search engine comparable to Google Book Search.

Is the Settlement itself one of these legal hurdles?

It’s All About the Orphans

February 23rd, 2009 by archive

The Internet Archive first used the term “orphan” to describe books that are no longer commercially viable, (”out of print”); still in copyright; and whose ownership is either impossible or extremely difficult to determine. In 2004 Larry Lessig, Rick Prelinger, and I brought a suit to make it easier for orphans to enter the public domain (Kahle vs. Gonzales). As that case was proceeding, the Copyright Office held hearings and issued a report, which led to proposed orphan works legislation in both the House and Senate.

As that legislation has been wending its way through the Capitol Hill meat grinder, it turns out that Google, the AAP, and the Authors Guild were negotiating their own private solution to the problem of orphan works. After digesting the proposed Google Book Settlement, it becomes clear that the dizzyingly complex agreement is, in essence, an elaborate scheme for the exploitation of orphan works. The class action mechanism allows the Authors Guild (8,500 members) and the AAP (260 members) to extrapolate themselves to include millions of unfindable and unknowable rightsholders to orphan works.  It is to this end–the certification of a class that includes the orphans–that the parties need the blessing of the court.

The upshot, if the Settlement is approved, would be legal protection for Google, and only for Google, to scan and provide digital access to the orphan works. Presto! Like magic, Google proceeds without any need for legislation: their own private orphan works legislation.

So, should the Settlement be approved, Google will be handed exclusive access to the orphans, and the public loses out. With orphan works legislation, orphan works could have been opened up to digitization by anyone: not just Google but competitors to Google, libraries, Open Content Alliance partners, and others. Now, however, no one but Google will have access to the orphan class created by the Settlement, without enduring a similar class action lawsuit from the authors and publishers.

I, personally, am amazed at this creative use of class action law. The three parties have managed to skirt copyright law, bypass legislative efforts, and feather their own nests–all through the clever use of law intended to remedy harms.

This Settlement, if approved by the judge, will accomplish things appropriate to a legislative body not to private corporate board rooms. Let’s live under the rule of law, as arduous as that might be, and free the orphans, legitimately, not for one corporation but for all of us.

-brewster

Bookmark and Share

Google Books Acquisition Division: gBAD

February 18th, 2009 by brewster

Please forgive a light post on a heavy subject.

I went to a meeting of people concerned about the sweeping nature of the Google-AAP-Authors Guild settlement.  My favorite interaction was when the group was trying to figure out what the “Books Rights Registry” really is (BRR sounds so benign).

Since the settlement immunizes only Google’s scanning, lives off of Google’s money for the foreseeable future, and helps to find more things for Google to scan, this name was proposed:

“Google Books Acquisition Division.”   Pretty great.   Another added that the acronym is funny as well:  gBAD.

-brewster

Over Half of All Yiddish Literature Now Online

February 7th, 2009 by brewster

As discussed at the October “Using Digital Collections” meeting in San Francisco, the Yiddish collection is now online, and announced in the New York Times.

Magic Untapped

January 26th, 2009 by archive

Peter Brantley has written an inspired post on what’s really wrong with the Google Settlement: it lacks imagination.

An excerpt (but read the whole thing):

The settlement describes a world of time past, not a world of possibilities. … Let us imagine an alternative world where children routinely carry Alexandria in their hands. Where they experience works of literature as games, pushing at the borders of their knowledge and experience by engaging the library with others as a festschrift…. Let us say: we want our citizens to remake these books. We shall allow unceasing access to all books within our libraries; there shall be no barriers between them. Read the rest of this entry »

A Monopoly dressed in a Class-action Suit?

January 25th, 2009 by brewster

Dan Clancy, head of Google Book Search, presented and took questions at the American Library Association conference Jan 24, 2009.
Read the rest of this entry »

Is OCLC Reconsidering its Proposed Records Policy?

January 14th, 2009 by archive

In a press release dated January 13, OCLC announced the creation of a Review Board to advise OCLC on the principles and best practices for sharing library data. At the same time, the proposed records policy effective date has been put off until the third quarter of 2009. Read the rest of this entry »

A Raw Deal for Libraries

December 6th, 2008 by archive

One of the most surprising, even shocking, features of the Google-AAP-Authors Guild Settlement is how hard it is on libraries. Given that Google Book Search could not have gotten off the ground without the cooperation of various university libraries, it is particularly disheartening that the proposed settlement treats them with such an iron fist at the same time as it expects them to foot much of the bill through subscriptions. It will be interesting to see how many libraries continue as partners, given Google’s bait-and-switch. Read the rest of this entry »

Libraries: We Need Them More than Ever

December 1st, 2008 by mary

Marjorie Kehe at the Christian Science Monitor reminds us of the importance of libraries, especially in tough times.

Recommended Changes to Google Book Search Settlement

November 24th, 2008 by mary

New York Law School professor James Grimmelman has written an impressive, even-handed blogpost about the Google-AAP-Authors Guild settlement in which he lays out five principles to guide the court and the public. He closes with fourteen “recommendations” to the court:

Read the rest of this entry »

A Useful Guide to Google Settlement

November 20th, 2008 by mary

The Association of Research Libraries (ARL) and the American Library Association (ALA) have released a useful 22-page summary of the key points of the Settlement entitled “A Guide for the Perplexed: Libraries and the Google Library Project Settlement” and written by Jonathan Band, JD. From the ARL website:

The guide is designed to help the library community better understand the terms and conditions of the recent settlement agreement between Google, the Authors Guild, and the Association of American Publishers concerning Google’s scanning of copyrighted works. … The guide outlines and simplifies the settlement’s provisions, with special emphasis on the provisions that apply directly to libraries.

The guide doesn’t evaluate, criticize, or take a stand toward the Settlement, but it is a thoughtful and careful guide.

New Contributors to Open Content Alliance collections

November 9th, 2008 by Linda Frueh

Through all the changes in 2008 libraries and other cultural institutions continue to contribute their works to universally accessible open collections.  Fifty one new institutions have joined the ranks of our community! Read the rest of this entry »