I am trying to create a pattern that will find any word prior to another word. Specifically, I have a document with the word "description" in it and I want to have any word directly before "description" be filled into metadata. Hope this makes sense. Thank you :)
Question
Question
Answer
The reason my regex wouldn't work here is most likely to be the fact that your word DESCRIPTION is upper-case (assuming you are using a case sensitive match)
Try
(\S+)\s+DESCRIPTION
(or try making your match not be case sensitive). Make sure you use \s+ for the space, as this will also match the newlines that your sample seems to use.
Replies
So if your pattern is Document Description and you want to find the word document, I would use the following pattern.
(\w+) +Description
The \w+ means any length of letters (a word), and the parentheses means grab that part of the pattern only.
Hi Nicholas! There's a few ways of doing this. If you know there is one space between the word "description" and the word you want, you can use this pattern:
([^\s]+)\sdescription
That is, "one or more of anything except a space character, then a space character, then the word descriptions (capturing the part before the space character)"
Alternatively you can do
(\S+)\sdescription
where \S is the equivalent of [^\s].
Note that these things will also capture symbols and stuff, so it'll match #O'Malley!! in
Hello Mr. #O'Malley!! description
You can also do something like
(\S+)\s+description
if there is a potential of more than one space before the word 'description'
Thank you for these responses. Those didn't quite work only because I left out a bit of information. Zone OCR reads the text as follows:
NAME
NICHOLAS J MARTIN
DESCRIPTION
I am trying to grab the last name separately from everything else for metadata. Not sure the best way to do this. Some employees have middle initial, some don't and some have full middle name, so the inconsistencies make it difficult. So I thought grabbing any 1 word before description would work but description is on a separate line. I hope I am making sense. :)
The reason my regex wouldn't work here is most likely to be the fact that your word DESCRIPTION is upper-case (assuming you are using a case sensitive match)
Try
(\S+)\s+DESCRIPTION
(or try making your match not be case sensitive). Make sure you use \s+ for the space, as this will also match the newlines that your sample seems to use.
Yes Sir! That worked like a charm. Thank you so much.