How can I express such requirement using Java regular expression?

How can I express such requirement using Java regular expression? - java

I need to check that a file contains some amounts that match a specific format:
between 1 and 15 characters (numbers or ",")
may contains at most one "," separator for decimals
must at least have one number before the separator
this amount is supposed to be in the middle of a string, bounded by alphabetical characters (but we have to exclude the malformed files).
I currently have this:
\d{1,15}(,\d{1,14})?
But it does not match with the requirement as I might catch up to 30 characters here.
Unfortunately, for some reasons that are too long to explain here, I cannot simply pick a substring or use any other java call. The match has to be in a single, java-compatible, regular expression.

^(?=.{1,15}$)\d+(,\d+)?$
^ start of the string
(?=.{1,15}$) positive lookahead to make sure that the total length of string is between 1 and 15
\d+ one or more digit(s)
(,\d+)? optionally followed by a comma and more digits
$ end of the string (not really required as we already checked for it in the lookahead).
You might have to escape backslashes for Java: ^(?=.{1,15}$)\\d+(,\\d+)?$
update: If you're looking for this in the middle of another string, use word boundaries \b instead of string boundaries (^ and $).
\b(?=[\d,]{1,15}\b)\d+(,\d+)?\b
For java:
"\\b(?=[\\d,]{1,15}\\b)\\d+(,\\d+)?\\b"
More readable version:
"\\b(?=[0-9,]{1,15}\\b)[0-9]+(,[0-9]+)?\\b"

Related

Regex + sign followed by numbers

Hi i want to find Strings like "+19" in Java
so a + sign followed by infinite amount of numbers.
How do i do this?
Tried "+[0123456789]"
and "\+[0123456789]"
thank you :)

This is the regex you want to use:
\\+\\d+
Two kinds of plus are being used here. The first is escaped with two backslashes because it is treated as a literal. The second one means match 1 of more times (i.e. match any digit one or more times).
Code:
String input = "+19";
if (input.matches("\\+\\d+")) {
System.out.println("input string matches");
}

Yes, to match a plus you need to escape it with two backslashes in a C string literal that Java uses. A literal plus needs to be either escaped or put into a character class, [+]. If you just use a plus symbol, it becomes a quantifier that matches the previous symbol or group one or more number of times.
Also, note that the \d shorthand digit class can match more than just ASCII digits if Pattern.UNICODE_CHARACTER_CLASS flag is passed to Pattern.compile (or embedded (?U) flag is added at the start of the pattern). It is advised to use unambiguous patterns in case the code might be maintained or enhanced/adjusted by different developers later.
Most people prefer patterns without escaping backslashes if possible since that allows to avoid issues like the one you faced.
Here is a version of the regex that does not require any escaping:
"[+][0-9]+"
Also, the plus quantifier does not match an infinite number of digits, only MAX_UINT number of times.

regular expression in Java (Spring configuration) with 2 specific characters in begining

I need regular expression which will start with 2 specific letters and will be 28 characters long.
The regular expression is needed, as this is in conjunction with Spring configuration, which will only take a regular expression.
I've been trying to do with this, it's not working (^[AK][28]*)

If you mean that the string should be like "AKxxxxxxxx" (28 characters in total), then you can use:
^AK.{26}$ //using 26 since AK already count for 2 characters

Regex is nothing specific to Java, nor is it that difficult if you have a look at any tutorial (and there's plenty!).
To answer your question:
AK[a-zA-Z]{26}
The above regex should solve your issue regarding a 28 character String with the first two letters being AK.
Elaboration:
AK[a-zA-Z]{26}> Characters written as such, without any special characters will be matched as is (that means they must be where they were written, in exactly that fashion)
AK[a-zA-Z]{26}> By using square brackets you can define a set of characters, signs, etc. to be matched against a part of the regex (1 by default) - you can write down all the possible characters/signs or make use of groups (e.g. a-z, /d for digits, and so forth)
AK[a-zA-Z]{26}> for each set of characters/signs you can define a repetition count, this defines how often the set can/must be applied. E.g. {26} means it must match 26 times. Other possibilities are {2, 26} meaning it must match at least 2 times but at most 26 times, or for example use an operator like *, + or ? which denote that the set can be matched 0 or more times, at least once or 0 or 1 time
In case you need it matching a whole line you would likely want to add ^ and $ at the beginning and end respectively, to tell the regex parser that it has to match a whole line/String and not just a part:
^AK[a-zA-Z]{26}$

If you need to count the number of repetitions use the {min, max} syntax. Omiting both the comma and max tells the regex parser to look for exactly minrepetitions.
For example :
.{1,3} will match for any character (shown by the dot) sequence between 1 and 3 characters long.
[AK]{2} will match for exactly 2 characters that are either A or K :
AK, AA, KA or KK.
Additionnaly, your regex uses [AK]. This means that it will match against one of the characters given, i.e. A or K.
If you need to match for the specific "AK" sequence then you need to get rid of the '[' ']' tokens.
Therefore you regex could be AK.{28} meaning it will match for AK followed by exactly 28 characters.

Parse content-page using Regex?

I'm writing a Java code using regex to parse a content-page extracted from a PDF document.
In a string the regex must match: a digit (up to three) followed by a space (or many) followed by a word (or many [word: any sequence of characters]). And vise versa: (word(s) space(s) digit(s)), they all must be in the string. Also considering leading spaces and be case insensitive.
The extracted content-page could look something like this:
Directors’ responsibilities 8
Corporate governance 9
Remuneration report 10
the numbering-style is not consistent and number of spaces between digit and string do vary, so it could also look like:
01 Contents
02 Strategy and highlights
04 Chairman’s statement
The regex i'm using matches any number of words followed by any number of spaces and then a number of no more than 3 digits:
(?i)([a-z\\s])*[0-9]{1,3}(?i)
It works but not quite well, can't tell what I'm doing wrong? and I wish there is a way to detect both numbering-style (having the page numbers to the left or right of the string) instead of repeating the regex and flip the order.
Cheers

If you want to match phrases you should include any punctuation you want to match in your regex. AFAIK there is no way in regex to say if a phrase is "before or after", so you should flip one and append it with a |. Something along the lines of:
[a-zA-Z'".,!\s]+\d{1,3}|\d{1,3}[a-zA-Z'".,!\s]+
Also, you don't need two instances of (?i), as the regex will apply the case insensitivity until the end of the string or if it encounters a (?-i).

You can use this pattern with multiline mode, if there is always a number before or after each items:
"^(?:(?<nb1>\\d{1,3}) +)?(?<item>\\S+(?: +\\S+)*?)(?: +(?<nb2>\\d{1,3})|$)"
Then you can use m.group('nb1')+m.group('nb2') to always obtain the number for each whole match.
But if you must check there is at least a number, you must repeat the whole pattern:
"^(?:(?<nb1>\\d{1,3}) +(?<item1>\\S+(?: +\\S+)*)|(?<item2>\\S+(?: +\\S+)*) +(?<nb2>\\d{1,3})$"
Then:
item = m.group('item1')+m.group('item2');
nb = m.group('nb1')+m.group('nb2');
Notice: since the patterns are anchored at the begining and at the end, it is possible that you have to add some optional spaces to do them work: ^\\s* and \\s*$

Pattern/Regular expression to grab a number only if it's the only field in the record

This has been driving me crazy the past couple of days. I'm trying to kill two birds with one stone by validating a record and extracting a field at the same time. My strategy has been to do this with a regular expression:
private Pattern firstNumber = Pattern.compile("\\d{1}");
Which I understand to mean "the first number in the line (record)." So far this has been effective at grabbing the first field (and ensuring that it's a number), but I want to take this a step further:
How can I tweak the regexp to specify that I want the number only if it's the sole field?
That is, if the record is simply 10, I want to grab 10. But if the record is 10 4, I don't want to grab anything (as this is an invalid record for the project).
I tried:
private Pattern oneNumberOnly = Pattern.compile("\\d{1}\n");
But -- to my chagrin -- this (and any other permutation of it) does not pick up any numbers. Is there something I'm missing here?

You can denote beginning of line/string with ^ and end of line/string with $, so the pattern would be
^\d+$
The {1} won't work because it excludes anything with more than one digit, such as 10. Using \d+ indicates one or more digits. Using \d may also allow decimals and negative values (not sure about Java), so if you only want digits, replace \d with [0-9].
Specifying {1} is always redundant, by the way, because by default an atom is matched once.

You can use the start line character and end line character. If you are trying to grab a number that is on its own line you can use:
Pattern.compile("^(\\d)++$");
By adding the {1} you will only get 1 digit of a number. You should also trim the string you are comparing against to get rid of any extra whitespace.
^ - Start of line character
\\d - digit character [0-9]
+ - 1 or more characters that match \d
+ - possesive (this will grab all the digits and is quicker than greedy quantifiers)
$ - End of line character

Regex to detect number within String

I'm confronted with a String:
[something] -number OR number [something]
I want to be able to cast the number. I do not know at which position is occures. I cannot build a sub-string because there's no obvious separator.
Is there any method how I could extract the number from the String by matching a pattern like
[-]?[0..9]+
, where the minus is optional? The String can contain special characters, which actually drives me crazy defining a regex.

-?\b\d+\b
That's broken down by:
-? (optional minus sign)
\b word boundary
\d+ 1 or more digits
[EDIT 2] - nod to Alan Moore
Unfortuantely Java doesn't have verbatim strings, so you'll have to escape the Regex above as:
String regex = "-?\\b\\d+\\b"
I'd also recommend a site like http://regexlib.com/RETester.aspx or a program like Expresso to help you test and design your regular expressions
[EDIT] - after some good comments
If haven't done something like *?(-?\d+).* (from #Voo) because I wasn't sure if you wanted to match the entire string, or just the digits. Both versions should tell you if there are digits in the string, and if you want the actual digits, use the first regex and look for group[0]. There are clever ways to name groups or multiple captures, but that would be a complicated answer to a straight forward question...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I express such requirement using Java regular expression? - java

Related

Regex + sign followed by numbers

regular expression in Java (Spring configuration) with 2 specific characters in begining

Parse content-page using Regex?

Pattern/Regular expression to grab a number only if it's the only field in the record

Regex to detect number within String

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How can I express such requirement using Java regular expression? - java

Related

Regex + sign followed by numbers

regular expression in Java (Spring configuration) with 2 specific characters in begining

Parse content-page using Regex?

Pattern/Regular expression to grab a number *only* if it's the only field in the record

Regex to detect number within String

Categories

Resources

Pattern/Regular expression to grab a number only if it's the only field in the record