Research/OCR
From Publication Station
OCR (optical character recognition)
tesseract
https://code.google.com/p/tesseract-ocr/
Teassearct is OCR software. It was HP Labs between 1985 and 1995 currently is developed by Google.
install
Debian:
aptitude install tesseract-ocr
Mac:
using homebrew need to run the commands:
brew install leptonica --with-libtiff brew install tesseract --all-languages