<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Google Scanning Project (part duex)</title>
	<atom:link href="http://www.iddream.com/2004/12/14/google-scanning-project-part-duex/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.iddream.com/2004/12/14/google-scanning-project-part-duex/</link>
	<description>Welcome to I'dDream.com, est. Nov 1999.</description>
	<lastBuildDate>Tue, 16 Dec 2008 18:17:41 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
	<item>
		<title>By: Dharma</title>
		<link>http://www.iddream.com/2004/12/14/google-scanning-project-part-duex/comment-page-1/#comment-830</link>
		<dc:creator>Dharma</dc:creator>
		<pubDate>Fri, 17 Dec 2004 17:47:05 +0000</pubDate>
		<guid isPermaLink="false">http://www.iddream.com/plete/?p=368#comment-830</guid>
		<description>Here are your answers, direct from my friend who is working on the project:

&lt;b&gt;Do you know what process will be taken and what the final outcome of

this project will be?&lt;/b&gt;

&lt;blockquote&gt;Google has a copy of all of the materials and can build services around them. I think they use JPEG as their format. We will also have copies of all materials scanned and will be receiving G4 tiffs--the preservation standard format for libraries.&lt;/blockquote&gt;

&lt;b&gt;I keep hearing “scanning project.” Does that mean an OCR technology will be used (scanning with text recognition)?&lt;/b&gt;

&lt;blockquote&gt;Yes, the books are scanned and processed. OCR is part of that process. We&#8217;ll be using Google&#8217;s (dirty) OCR. At some point the Library may derive their own, better OCR&lt;/blockquote&gt;

&lt;b&gt;Will the final product be raw text stored in a database (like this site) or will

it be numerous PDF files?&lt;/b&gt;

&lt;blockquote&gt;No, no proprietary file formats are used.&lt;/blockquote&gt;
</description>
		<content:encoded><![CDATA[<p>Here are your answers, direct from my friend who is working on the project:</p>
<p><b>Do you know what process will be taken and what the final outcome of</p>
<p>this project will be?</b></p>
<blockquote><p>Google has a copy of all of the materials and can build services around them. I think they use JPEG as their format. We will also have copies of all materials scanned and will be receiving G4 tiffs&#8211;the preservation standard format for libraries.</p></blockquote>
<p><b>I keep hearing “scanning project.” Does that mean an OCR technology will be used (scanning with text recognition)?</b></p>
<blockquote><p>Yes, the books are scanned and processed. OCR is part of that process. We&#8217;ll be using Google&#8217;s (dirty) OCR. At some point the Library may derive their own, better OCR</p></blockquote>
<p><b>Will the final product be raw text stored in a database (like this site) or will</p>
<p>it be numerous PDF files?</b></p>
<blockquote><p>No, no proprietary file formats are used.</p></blockquote>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dharma</title>
		<link>http://www.iddream.com/2004/12/14/google-scanning-project-part-duex/comment-page-1/#comment-829</link>
		<dc:creator>Dharma</dc:creator>
		<pubDate>Fri, 17 Dec 2004 01:30:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.iddream.com/plete/?p=368#comment-829</guid>
		<description>Hmm, good questions.&#160; I know that they are scanning with OCR technology, using a process and machine that is faster than any other.&#160; I&#8217;m not sure how the final product will be stored at Google.&#160; However, based upon what the DLPS at UM already does with their digital collections, they store the OCR text and the images both, giving users the option of accessing either. I actually have a good friend who is the main UM library person working on this Google thing...I&#8217;ll pass along details as I get them from her. Oh yeah, and she want to reassure everyone that they&#8217;re not getting rid of the books;)
</description>
		<content:encoded><![CDATA[<p>Hmm, good questions.&nbsp; I know that they are scanning with OCR technology, using a process and machine that is faster than any other.&nbsp; I&#8217;m not sure how the final product will be stored at Google.&nbsp; However, based upon what the DLPS at UM already does with their digital collections, they store the OCR text and the images both, giving users the option of accessing either. I actually have a good friend who is the main UM library person working on this Google thing&#8230;I&#8217;ll pass along details as I get them from her. Oh yeah, and she want to reassure everyone that they&#8217;re not getting rid of the books;)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Randy Bishop</title>
		<link>http://www.iddream.com/2004/12/14/google-scanning-project-part-duex/comment-page-1/#comment-828</link>
		<dc:creator>Randy Bishop</dc:creator>
		<pubDate>Thu, 16 Dec 2004 17:31:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.iddream.com/plete/?p=368#comment-828</guid>
		<description>I&#8217;m sure I could find this on Google&#8217;s site but asking you might be faster.&#160; Do you know what process will be taken and what the final outcome of this project will be?&#160; I keep hearing &#8220;scanning project.&#8221;  Does that mean an OCR technology will be used (scanning with text recognition)?&#160; Or are these documents going to be raster images (not good)?&#160; Will the final product be raw text stored in a database (like this site) or will it be numerous PDF files?
</description>
		<content:encoded><![CDATA[<p>I&#8217;m sure I could find this on Google&#8217;s site but asking you might be faster.&nbsp; Do you know what process will be taken and what the final outcome of this project will be?&nbsp; I keep hearing &#8220;scanning project.&#8221;  Does that mean an OCR technology will be used (scanning with text recognition)?&nbsp; Or are these documents going to be raster images (not good)?&nbsp; Will the final product be raw text stored in a database (like this site) or will it be numerous PDF files?</p>
]]></content:encoded>
	</item>
</channel>
</rss>

