You are viewing limited content. For full access, please sign in.

Question

Question

QF pattern

asked on March 2, 2015

I am trying to create a pattern that will find any word prior to another word. Specifically, I have a document with the word "description" in it and I want to have any word directly before "description" be filled into metadata. Hope this makes sense.  Thank you :)

0 0

Answer

SELECTED ANSWER
replied on March 2, 2015 Show version history

The reason my regex wouldn't work here is most likely to be the fact that your word DESCRIPTION is upper-case (assuming you are using a case sensitive match)

Try 

 

(\S+)\s+DESCRIPTION

 

(or try making your match not be case sensitive). Make sure you use \s+ for the space, as this will also match the newlines that your sample seems to use.

1 0

Replies

replied on March 2, 2015 Show version history

So if your pattern is Document Description and you want to find the word document, I would use the following pattern.

 

(\w+) +Description

 

The \w+ means any length of letters (a word), and the parentheses means grab that part of the pattern only. 

1 0
replied on March 2, 2015 Show version history

Hi Nicholas! There's a few ways of doing this. If you know there is one space between the word "description" and the word you want, you can use this pattern:

 

  ([^\s]+)\sdescription

 

That is, "one or more of anything except a space character, then a space character, then the word descriptions (capturing the part before the space character)"

 

Alternatively you can do

 

  (\S+)\sdescription

 

where \S is the equivalent of [^\s].

 

Note that these things will also capture symbols and stuff, so it'll match #O'Malley!! in 

  Hello Mr. #O'Malley!! description

 

You can also do something like

 

  (\S+)\s+description

 

if there is a potential of more than one space before the word 'description'

0 0
replied on March 2, 2015

Thank you for these responses.  Those didn't quite work only because I left out a bit of information. Zone OCR reads the text as follows:

         NAME 
NICHOLAS J MARTIN 
         DESCRIPTION

 

I am trying to grab the last name separately from everything else for metadata.  Not sure the best way to do this.  Some employees have middle initial, some don't and some have full middle name, so the inconsistencies make it difficult.  So I thought grabbing any 1 word before description would work but description is on a separate line.  I hope I am making sense. :)

0 0
SELECTED ANSWER
replied on March 2, 2015 Show version history

The reason my regex wouldn't work here is most likely to be the fact that your word DESCRIPTION is upper-case (assuming you are using a case sensitive match)

Try 

 

(\S+)\s+DESCRIPTION

 

(or try making your match not be case sensitive). Make sure you use \s+ for the space, as this will also match the newlines that your sample seems to use.

1 0
replied on March 2, 2015

Yes Sir!  That worked like a charm.  Thank you so much.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.