Developing software in the Real World

Automatic OCR with Hazel and PDFPen

I have a useful scanner as part of my networked HP printer that will scan directly to a shared directory on my computer. Once there, I want the file to be renamed to the current date and the document OCR'd so that I can search it.

To do this, I use Hazel and PDFPen and this is a note to ensure that I can remember to do it again if I ever need to!

Firstly, rename the file. My scanner names each file with the prefix scan, so the Hazel rule is quite simple:

If all the following conditions are met:
	Name starts with scan

Do the following to the matched file or folder:
	Rename with pattern: [date created][extension]

This is the screenshot:


Having renamed the file, we can use PDFPen's AppleScript support to perform an OCR of the document:

If all the following conditions are met:
	Extension is pdf
	Date Last Modified is after Date Last Matched

Do the following to the matched file or folder:
	Run AppleScript embedded script

The embedded AppleScript is:

tell application "PDFpen"
	open theFile as alias
	tell document 1
		repeat while performing ocr
			delay 1
		end repeat
		delay 1
		close with saving
	end tell
end tell

This is the screenshot of it in Hazel:


That's it. Scanning a document now results in a dated, OCR'd PDF file in my Scans folder.

Leave a Reply

Your email address will not be published. Required fields are marked *