Pragmatism in the real world

Automatic OCR with Hazel and PDFPen

I have a useful scanner as part of my networked HP printer that will scan directly to a shared directory on my computer. Once there, I want the file to be renamed to the current date and the document OCR’d so that I can search it.

To do this, I use Hazel and PDFPen and this is a note to ensure that I can remember to do it again if I ever need to!

Firstly, rename the file. My scanner names each file with the prefix scan, so the Hazel rule is quite simple:

If all the following conditions are met:
	Name starts with scan

Do the following to the matched file or folder:
	Rename with pattern: [date created][extension]

This is the screenshot:

Hazel1

Having renamed the file, we can use PDFPen’s AppleScript support to perform an OCR of the document:

If all the following conditions are met:
	Extension is pdf
	Date Last Modified is after Date Last Matched

Do the following to the matched file or folder:
	Run AppleScript embedded script

The embedded AppleScript is:

tell application "PDFpen"
	open theFile as alias
	tell document 1
		ocr
		repeat while performing ocr
			delay 1
		end repeat
		delay 1
		close with saving
	end tell
	quit
end tell

This is the screenshot of it in Hazel:

Hazel2

That’s it. Scanning a document now results in a dated, OCR’d PDF file in my Scans folder.

9 thoughts on “Automatic OCR with Hazel and PDFPen

      1. Hi ,

        I don't know how to get it to work yet – but one should be able to test if the doc has already been ocr'ed by testing the property of a document

        needs ocr (boolean, r/o) : Does application think the document is a candidate for OCR?

        Thanks

        Iain.

  1. Or in Hazel do a check to see if the document contents contain one of the following a, e, i, o u = if the document has an OCR layer then hazel should find a vowel (how good/accurate the OCR has been is a different matter)

  2. Hello Rob,

    well, I am going to built up my own automation with Hazel, Devonthink Pro etc. Right now I have a small working solution which use PDFPenPro for OCRing. But if you import documents directly by Devonthink, Devonthink starts for OCR the Abbyy FineReader, which belongs to Devonthink.

    Do you know, which application is the best for OCR? And if Abby FineReader is the best, how can I call this application in the AppleScript to do the job?

    Kind regards,

    Michael

  3. I found a solution for finding out if the ocr must be started: "if needs ocr then". Works for my old PDFpenPro.
    See the following script (I hope it is correctly formatted).

    tell application "PDFpenPro"
    	open theFile as alias
    	tell document 1
    		if needs ocr then
    			ocr
    			repeat while performing ocr
    				delay 1
    			end repeat
    			delay 1
    			close with saving
    		end if
    	end tell
    	quit
    end tell
    

Comments are closed.