Automatic OCR with Hazel and PDFPen

I have a useful scanner as part of my networked HP printer that will scan directly to a shared directory on my computer. Once there, I want the file to be renamed to the current date and the document OCR’d so that I can search it.

To do this, I use Hazel and PDFPen and this is a note to ensure that I can remember to do it again if I ever need to!

Firstly, rename the file. My scanner names each file with the prefix scan, so the Hazel rule is quite simple:

If all the following conditions are met:
	Name starts with scan

Do the following to the matched file or folder:
	Rename with pattern: [date created][extension]

This is the screenshot:

Hazel1

Having renamed the file, we can use PDFPen’s AppleScript support to perform an OCR of the document:

If all the following conditions are met:
	Extension is pdf
	Date Last Modified is after Date Last Matched

Do the following to the matched file or folder:
	Run AppleScript embedded script

The embedded AppleScript is:

tell application "PDFpen"
	open theFile as alias
	tell document 1
		ocr
		repeat while performing ocr
			delay 1
		end repeat
		delay 1
		close with saving
	end tell
	quit
end tell

This is the screenshot of it in Hazel:

Hazel2

That’s it. Scanning a document now results in a dated, OCR’d PDF file in my Scans folder.

This article was posted on 11 January 2017 in Computing

9 thoughts on “Automatic OCR with Hazel and PDFPen”

Is there a way to determine if the pdf has already undergone OCR?
Thanks.

Rob says:

14 September 2018 at 6:46 am

The only way I know is to add a tag to the file.
1. Iain says:
  
  8 June 2019 at 10:51 am
  
  Hi ,
  
  I don't know how to get it to work yet – but one should be able to test if the doc has already been ocr'ed by testing the property of a document
  
  needs ocr (boolean, r/o) : Does application think the document is a candidate for OCR?
  
  Thanks
  
  Iain.
  1. Iain says:
    
    8 June 2019 at 10:59 am
    
    And I scroll down and Andre Fixed it!
    
    Thanks!!

Or in Hazel do a check to see if the document contents contain one of the following a, e, i, o u = if the document has an OCR layer then hazel should find a vowel (how good/accurate the OCR has been is a different matter)

Rob says:

26 October 2018 at 12:27 pm

I like this idea, Phil!

Hello Rob,

well, I am going to built up my own automation with Hazel, Devonthink Pro etc. Right now I have a small working solution which use PDFPenPro for OCRing. But if you import documents directly by Devonthink, Devonthink starts for OCR the Abbyy FineReader, which belongs to Devonthink.

Do you know, which application is the best for OCR? And if Abby FineReader is the best, how can I call this application in the AppleScript to do the job?

Kind regards,

Michael

Rob says:

24 January 2019 at 12:02 pm

Michael,

I have no idea which application does OCR best. I find PDFPen good enough for my needs.

I found a solution for finding out if the ocr must be started: "if needs ocr then". Works for my old PDFpenPro.
See the following script (I hope it is correctly formatted).

tell application "PDFpenPro"
	open theFile as alias
	tell document 1
		if needs ocr then
			ocr
			repeat while performing ocr
				delay 1
			end repeat
			delay 1
			close with saving
		end if
	end tell
	quit
end tell

Comments are closed.

Automatic OCR with Hazel and PDFPen

9 thoughts on “Automatic OCR with Hazel and PDFPen”

Search

Follow

Books I've written