replied on October 27, 2022
Hi Mike,
I bounced this question off a few people. There's no magic button "recognize underscores" setting. However, there are at least two things you can potentially try:
- If you're doing any image processing to remove lines/grids, there's a chance the underscores are getting dropped out there before OCR runs. Try running OCR without that preprocessing step and see if it makes a difference.
- It's possible OCR is missing them because they're too light and it thinks they're noise. You could try fixing that with a local smooth (grow) operation before OCR and see if it makes a difference.
As a workaround, if OCR is reliably picking up spaces rather than underscores so the resulting string is:
"2870864 Recall Packet Acknowledgement 2022 Fall Yuma"
And the format is consistently something like:
"(numbers) (doc type words) (year numbers) (season word) (location word(s))"
You can likely do the split with regex for that pattern. Though a bit more complex than simply splitting on underscores, writing the regex pattern is likely much simpler than trying to game the Quick Fields OCR engine into recognizing the underscores if the two suggestions above don't work.