I'd like to say that my regex skills are not my strength. I've been working on the following regex string to look ahead up to 16 characters and then determine if an SSN exists determined by the following criteria.
^(?=.{9,16}$)(\d{3}+[ |-]{1,3}+\d{2}+[ |-]{1,3}+\d{4})$
Based on the assessment of LF pages in our repository, I've seen various characters that don't allow simple SSN searches to come up with any type of high degree of accuracy. /d(3)-/d(2)-(/d4) works, but if the scanned document of converted LF page comes up with something with a space before or after the dash, it fails. I would like to use the above regex if possible to alleviate some of the misses.
I have come up with the following, but I would like to perform a read ahead to make certain the max string length taken into consideration are only 16 characters long. At times we have false positives I'm trying to avoid as well.
(\d{3})+([ |-]{1,3})+(\d{2})+([ |-]{1,3})+(\d{4}) (this works, but doesn't limit length)
Could someone please tell me what the problem is with my regex? Also, if someone has a better/more robust way of doing it, I'd be very appreciative for that information as well.