Revision as of 13:49, 4 December 2015

OCR (optical character recognition)

tesseract

https://code.google.com/p/tesseract-ocr/

Teassearct is OCR software. It was HP Labs between 1985 and 1995 currently is developed by Google.

install

Debian:

aptitude install tesseract-ocr

Mac:

using homebrew need to run the commands:

brew install leptonica --with-libtiff
brew install tesseract --all-languages

https://gist.github.com/henrik/1967035

Run

prerequisites

source files should be:

in .tiff format
have at least 300dpi - otherwise the text recognition will be very sloppy

command

tesseract input.tiff output

will result in OCRed file output.txt

@@ Line 24: / Line 24: @@
 ===command===
-  tesseract input.tiff output.txt
+  tesseract input.tiff output
+will result in OCRed file output.txt

Anonymous

Search

Research/OCR: Difference between revisions

Namespaces

More

Page actions

Revision as of 13:49, 4 December 2015

Contents

tesseract

install

Debian:

Mac:

Run

prerequisites

command

Navigation

Main navigation

Namespaces

Wiki tools

Wiki tools

Anonymous

Search

Research/OCR: Difference between revisions

Revision as of 13:49, 4 December 2015

tesseract

install

Debian:

Mac:

Run

prerequisites

command

Navigation

Wiki tools

Page tools