Monday, May 05, 2008

OCR (Optical character recognition)

This is one of the many times when I think it would be nice to have a tablet pc (yes, I know most of you are thinking “how many devices can one person use?” my answer..... many).

I guess I should explain. Object character recognition is when a computer analyses a picture and tries to make words out of it. This comes in two forms, handwriting recognition as is used on most palm pilots, and scanned document recognition. It comes up because one of the oldest and best math study guides on campus is slowly diminishing to a couple of copies of copies floating around. To prevent the eventuality of this piece of wisdom becoming extinct our dear friend Ian Larson scanned it into a series of jpeg files. Unfortunately these are unwieldy large (20 MB) , contain all sorts of artifacts and in general are cumbersome to read and print. So I decided to step in and convert them to digital text. My tool of choice is simple OCR which is free to non commercial use ($1250.00 for commercial licenses, ouch). So then comes the tablet pc. A computer can do a decent job of recognizing a character if its really a character, unfortunately it freaks out and tries to recognize eraser marks and underlines as character too, this is gotten around by highlighting areas that you want the computer to ignore, unfortunately this is hard to do with a mouse, with a tablet (if anybody doesn’t know, this is like a laptop where the screen can go completely flat against he top and drawn on with a stylus) this would be vastly easier to highlight ignored areas and clean blemish’s from the scans. In any case, the end result of this was that I “entered” 15 pages of text in 2 minutes. that’s 3500 words per minute J


Lucas said...

I hear Evan just ordered a massive drawing tablet(13" by 16" I think). this would serve the exacts same purpose in conjunction with a desktop and be a whole lot cheaper. Clearly your making excuses to justify having yet another computer around. can never have to many

Jeremy said...

On a completely different note, I am really stoked about the Amazon Kindle. Brilliant!