Google Scanning Project (part duex)
Unfortunately, the New York Times article on the Google story didn’t give enough credit to the University of Michigan for not only being the first to sign on with the ambitious project, but also being willing to go to the mat on copyright issues. UM Libraries is the only one out of the four participating in the project to let Google scan books that are not yet in the public domain. Looks like we’re in for another UM Supreme Court case in the future. Here’s another article about ths issue:
Chronicle of Higher Education
I’m sure I could find this on Google’s site but asking you might be faster. Do you know what process will be taken and what the final outcome of this project will be? I keep hearing “scanning project.” Does that mean an OCR technology will be used (scanning with text recognition)? Or are these documents going to be raster images (not good)? Will the final product be raw text stored in a database (like this site) or will it be numerous PDF files?
Hmm, good questions. I know that they are scanning with OCR technology, using a process and machine that is faster than any other. I’m not sure how the final product will be stored at Google. However, based upon what the DLPS at UM already does with their digital collections, they store the OCR text and the images both, giving users the option of accessing either. I actually have a good friend who is the main UM library person working on this Google thing…I’ll pass along details as I get them from her. Oh yeah, and she want to reassure everyone that they’re not getting rid of the books;)
Here are your answers, direct from my friend who is working on the project:
Do you know what process will be taken and what the final outcome of
this project will be?
I keep hearing “scanning project.” Does that mean an OCR technology will be used (scanning with text recognition)?
Will the final product be raw text stored in a database (like this site) or will
it be numerous PDF files?