Scanning technologies aren’t that sexy, but we think they’re more important than ever. The emergence of search as the quintessential Web application has created a demand for more/better Web content. If there’s information out there, the reasoning goes, it should be accessible through your browser.
Amazingly, technologies that help convert deadwood documents into a Web-ready format are not standardized and, in most cases, offer pretty crude functionality. Adobe’s Acrobat is a great low-end product, but a PDF isn’t really the same as Web page. At the high-end, there are solutions like 4digitalbooks, which leaf through texts in an automated fashion, scanning up to 1500 pages an hour – but it’s bloody expensive.
We expect the big search players like Google, Yahoo, Microsoft, and Amazon, to throw increasing resources behind projects in the digitization area. One recent example is Google’s agreement with Harvard University, Oxford University, The University of Michigan, Stanford University, and The New York Public Library to scan their books and make them available the online.
Indexed and fully-searchable…
An AP story, which prompted this post, also reports that scientists are now developing optical recognition software that will scan Arabic documents. Google, and others, will surely appreciate this effort.
Read: Arabic documents going digital [AP via CNN]
Read: Oxford University’s Imaging Projects [Oxford University Site]
Read: Google To Digitize 15 Million Books [Library Journal]