Developing software in the Real World

Automatic OCR with Hazel and PDFPen

I have a useful scanner as part of my networked HP printer that will scan directly to a shared directory on my computer. Once there, I want the file to be renamed to the current date and the document OCR’d so that I can search it.

To do this, I use Hazel and PDFPen and this is a note to ensure that I can remember to do it again if I ever need to!

Firstly, rename the file. My scanner names each file with the prefix scan, so the Hazel rule is quite simple:

This is the screenshot:

Hazel1

Having renamed the file, we can use PDFPen’s AppleScript support to perform an OCR of the document:

The embedded AppleScript is:

This is the screenshot of it in Hazel:

Hazel2

That’s it. Scanning a document now results in a dated, OCR’d PDF file in my Scans folder.

7 thoughts on “Automatic OCR with Hazel and PDFPen

  1. Or in Hazel do a check to see if the document contents contain one of the following a, e, i, o u = if the document has an OCR layer then hazel should find a vowel (how good/accurate the OCR has been is a different matter)

  2. Hello Rob,

    well, I am going to built up my own automation with Hazel, Devonthink Pro etc. Right now I have a small working solution which use PDFPenPro for OCRing. But if you import documents directly by Devonthink, Devonthink starts for OCR the Abbyy FineReader, which belongs to Devonthink.

    Do you know, which application is the best for OCR? And if Abby FineReader is the best, how can I call this application in the AppleScript to do the job?

    Kind regards,

    Michael

  3. I found a solution for finding out if the ocr must be started: "if needs ocr then". Works for my old PDFpenPro.
    See the following script (I hope it is correctly formatted).

Thoughts? Leave a reply

Your email address will not be published. Required fields are marked *