You are viewing limited content. For full access, please sign in.

Question

Question

Regular Expression Needed That Matches Pattern Until Line Break With No Text

asked on January 24, 2018

I have unstructured LF files from Outlook e-mails that can contain multiple lines of text.  I need a pattern that will match all lines until reaches a white space.  That pattern will then be put into a Laserfiche field.  An example is this

 

Corporation A and Corporation B (this last one if applicable) need the monthly 2017 statements for all custodians printed and sent to their office in Miami via FedEx.  Note the windup that went down in November 2017.  I think we can spread this out over the course of this week.  Please start gathering the info and printing
out.  If you have any questions, please let me know.

 

Jeffrey Lewis, CRM CIP MLS
Records Management Program Manager

 

I only want a regular expression that can grab the first paragraph because I don't want to much data in the field. I figure good practice would be to stop at the first line that doesn't contain data, but nothing I have tried has worked.  

 

Thanks! 

0 0

Replies

replied on January 24, 2018 Show version history

I'm not a regex expert by any means, so I don't know if this will work for more than just your initial sample, but I was able to get a decent result with the following: (?s)(.+)(?:\n\r)

edit: Actually, I don't think this will work.

 
 
 
0 0
replied on January 24, 2018

Thanks! that works in the example above, but in other examples there can be text below the signature.  When there is text after the signature that regex doesn't work.  

0 0
replied on January 25, 2018

This will capture all the text up to the first line break characters.
(?s)(.+?)\r\n
Almost what the other Andy posted but the 2nd question mark makes the match non-greedy, meaning it will stop capturing at the first line break and not the last.

0 0
replied on January 25, 2018

It's interesting, when I put that into RAD Software Regular Expression Designer it worked, but when I put it in my workflow it didn't work.  That RegEx in Workflow gave me only the first line and didn't give me all the data until the line break.  I was able to tweak and replace the \r\n with the company name because that will appear in the e-mail signature and that gave me the desired result.  It's slightly more data needed, but it works.  

 

I think the problem has something to do with how the text is OCR'd because if I type it into workflow it works, but if I paste it from the document that is OCR'd in Laserfiche it doesn't work.  I have attached a screenshot of my token editor that identifies the culprit, but why it does that or how to fix it (aside from changing the regular expression like I did) baffles me.

 

 

all matches.jpg
first match.jpg
all matches.jpg (195.84 KB)
first match.jpg (142.7 KB)
1 0
replied on January 25, 2018

You are right,  OCR text does come back as single lines each ended with /r/n with no paragraph delimiters.  That makes sense for OCR.    The only other option I can think of, other than using the Company Name delimiter as you are, is just to return the first X number of sentences.   

0 0
replied on June 10, 2020

I was going to say the same as Andrew - your best bet might be to return a certain number of characters.

0 0
You are not allowed to follow up in this post.

Sign in to reply to this post.