Pattern Matching - String Search

Pattern Matching - String Search - java

I am trying to work out a formula to match a following pattern:
input string example:
'444'/'443'/'434'/'433'/'344'/'334'/'333'
if any of the patterns above exist in a particular input string I want to match it as the same pattern.
also is it possible to do a variable substitution using regex? meaning check for the 3 chars of the string by using each character as a variable and just doing an increment/decrement for each character? so that you dont have to specify the particular number ranges (hardcoding the pattern string ) for different patterns?
Is there any good library one can use for this?? I was working with Pattern class in java.
If you have any link which would be helpful please pass it through :)
Thank you.

Let's first consider this pattern: [34]{3}
The […] is a character class, it matches exactly one of the characters in the set. The {n} is an exact finite repetition.
So, [34]{3} informally means "exactly 3 of either '3' or '4'". Thus, it matches "333", "334", "343", "344", "433", "434", "443", "444", and nothing else.
As a string literal, the pattern is "[34]{3}". If you don't want to hardcode this pattern, then just generate similar-looking strings that follows this template "[…]{n}". Just put the characters that you want to match in the …, and substitute n with the number you want.
Here's an example:
String alpha = "aeiou";
int n = 5;
String pattern = String.format("[%s]{%s}", alpha, n);
System.out.println(pattern);
// [aeiou]{5}
We've now seen that the pattern is not hardcoded, but rather programmatically generated depending on the values of the variables alpha and n. The pattern [aeiou]{5} will 5 consecutive lowercase vowels, e.g. "ooiae", "ioauu", "eeeee", etc.
It's again not clear if you just want to match these kinds of strings, or if they have to appear like '…'/'…'/'…'/'…'/'…'. If the latter is desired, then simply compose the pattern as desired, using repetition and grouping as necessary. You can also just programmatically copy and paste the pattern 5 times if that's simpler. Here's an example:
String p5 = String.format("'%s'/'%<s'/'%<s'/'%<s'/'%<s'", pattern);
System.out.println(p5);
// '[aeiou]{5}'/'[aeiou]{5}'/'[aeiou]{5}'/'[aeiou]{5}'/'[aeiou]{5}'
This will now match strings like "'aeooi'/'eeiuu'/'uaooo'/'eeeia'/'eieio'".
Caveat
Do be careful about what goes in alpha. Specifically, -, [. ], &&, ^, etc, are special metacharacters in Java character class definition. If you restrict alpha to contain only digits/letters, then you will probably not run into any problems, but e.g. [^a] does NOT mean "either '^' or 'a'". It in fact means "anything but 'a'. See java.util.regex.Pattern for exact character class syntax.

You can use the regex:
('\\d{3}'/){6}'\\d{3}'

Pattern.Compile takes a String as its parameter. Though that's probably most often supplied in the form of a string literal, if you have variable upper and lower bounds for your pattern, you can use something like StringBuilder to build your string, then pass that result to Pattern.Compile.

Related

String replace with condition not be a subpart of word Java

String replace change more that i want.
For example
String input = "The blue house Theatres";
input = input.replace("the", "AAA");
output it will be:
"AAA blue house AAAatres"
I don't whant to change when is a subpart of a word.

First you should try to use replaceAll(regex, replacement) instead of replace(literal, replacement) since the latter works on literals only, i.e. you can't use expressions, while the former uses regular expressions to find matches.
Next your regular expression should use word boundaries, e.g. \bthe\b where \b marks a word boundary.
Finally if you want to do a case-insensitive replacement you'll need to either handle the possible cases in the epxression (e.g. \b[tT]he\b) or switch the expression to case-insensitive mode by prepending it with (?i), i.e. (?i)\bthe\b. Note that the expression [tT]he would not match THE while the case-insensitive expression would, so depending on your requirements you'd need to choose one or the other.
Using all that you'd get input = input.replaceAll("(?i)\\bthe\\b", "AAA");.
Edit:
According to your comment on the question you don't want to use word boundaries but only look for characters before and after. You can achieve that with negative look-around expressions, e.g. (?i)(?<![a-z])the(?![a-z]). Note that I used the quite simple character class [a-z] here, if you need to exclude more characters you'd need to expand it.
The above expression would match !The, the, THE? etc. but not Theatre or aether etc. since if requires the match to not be preceded by a character ((?<![a-z])) and not be followed by one ((?![a-z])).

Use a regex with word boundaries \b:
String input = "The blue house Theatres";
input.replaceAll("\\bThe\\b", "AAA");

Regular expression not working despite testing

I'm trying to enforce validation of an ID that includes the first two letters being letters and the next four being numbers, there can be one 0 i.e. 0333 but can never be full zeroes with 0000 therefore something like ID0000 is not allowed. The expression I came up with seems to check out when testing it online but doesn't seem to work when trying to enforce it in the program:
\b(?![A-Z]{2}[0]{4})[A-Z]{2}[0-9]{4}\b
and heres the code I'm currently using to implement it:
String pattern = "/\b(?![A-Z]{2}[0]{4})[A-Z]{2}[0-9]{4}\b/";
Pattern regEx = Pattern.compile(pattern);
String ingID = ingredID.getText().toString();
Matcher m = regEx.matcher(ingID);
if (m.matches()) {
ingredID.setError("Please enter a valid Ingrediant ID");
}
For some reason it doesn't seem to validate correctly with accepting ids like ID0000 when it shouldn't be. Any thoughts folks ?

Change your regex pattern to "\\b(?![A-Z]{2}[0]{4})[A-Z]{2}[0-9]{4}\\b"

Your problem is essentially that Java isn't all that Regex-friendly; you need to deal with the limitations of Java strings in order to create a string that can be used as a Regex pattern. Since \ is the escape character in Regex and the escape character in Java strings (and since there's no such thing as a raw string literal in Java), you must double-escape anything that must be escaped in the Regex in order to create a literal \ character within the Java string, which, when parsed as a Regex pattern, will be correctly treated as the escape character.
So, for instance, the Regex pattern /\b/ (where /, as mentioned in my comment, delimits the pattern itself) would be represented in Java as the string "\\b".

How to use two types of regex in single regex?

I have a string field. I need to pass UUID string or digits number to that field.
So I want to validate this passing value using regex.
sample :
stringField = "1af6e22e-1d7e-4dab-a31c-38e0b88de807";
stringField = "123654";
For UUID I can use,
"[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}"
For digits I can use
"\\d+"
Is there any way to use above 2 pattern in single regex

Yes..you can use |(OR) between those two regex..
[\\da-f]{8}-[\\da-f]{4}-[\\da-f]{4}-[\\da-f]{4}-[\\da-f]{12}|\\d+
^

try:
"(?:[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12})|(?:\\d+)"

You can group regular expressions with () and use | to allow alternatives.
So this will work:
(([0-9a-fA-F]){8}-([0-9a-fA-F]){4}-([0-9a-fA-F]){4}-([0-9a-fA-F]){4}-([0-9a-fA-F]){12})|(\\d+)
Note that I've adjusted your UUID regular expression a little to allow for upper case letters.

How are you applying the regex? If you use the matches(), all you have to do is OR them together as #Anirudh said:
return myString.matches(
"[\\da-f]{8}-[\\da-f]{4}-[\\da-f]{4}-[\\da-f]{4}-[\\da-f]{12}|\\d+");
This works because matches() acts as if the regex were enclosed in a non-capturing group and anchored at both ends, like so:
"^(?:[\\da-f]{8}-[\\da-f]{4}-[\\da-f]{4}-[\\da-f]{4}-[\\da-f]{12}|\\d+)$"
If you use Matcher's find() method, you have to add the group and the anchors yourself. That's because find() returns a positive result if any substring of the string matches the regex. For example, "xyz123<>&&" would match because the "123" matches the "\\d+" in your regex.
But I recommend you add the explicit group and anchors anyway, no matter what method you use. In fact, you probably want to add the inline modifier for case-insensitivity:
"(?i)^(?:[\\da-f]{8}-[\\da-f]{4}-[\\da-f]{4}-[\\da-f]{4}-[\\da-f]{12}|\\d+)$"
This way, anyone who looks at the regex will be able to tell exactly what it's meant to do. They won't have to notice that you're using the matches() method and remember that matches() automatically anchors the match. (This will be especially helpful for people who learned regexes in a non-Java context. Almost every other regex flavor in the world uses the find() semantics by default, and has no equivalent for Java's matches(); that's what anchors are for.)
In case you're wondering, the group is necessary because alternation (the | operator) has the lowest precedence of all the regex constructs. This regex would match a string that starts with something that looks like a UUID or ends with one or more digits.
"^[\\da-f]{8}-[\\da-f]{4}-[\\da-f]{4}-[\\da-f]{4}-[\\da-f]{12}|\\d+$" // WRONG

Java: how to parse double from regex

I have a string that looks like "A=1.23;B=2.345;C=3.567"
I am only interested in "C=3.567"
what i have so far is:
Matcher m = Pattern.compile("C=\\d+.\\d+").matcher("A=1.23;B=2.345;C=3.567");
while(m.find()){
double d = Double.parseDouble(m.group());
System.out.println(d);
}
the problem is it shows the 3 as seperate from the 567
output:
3.0
567.0
i am wondering how i can include the decimal so it outputs "3.567"
EDIT: i would also like to match C if it does not have a decimal point:
so i would like to capture 3567 as well as 3.567
since the C= is built into the pattern as well, how can i strip it out before parsing the double?

I may be mistaken on this part, but the reason it's separating the two is because group() will only match the last-matched subsequence, which is whatever gets matched by each call to find(). Thanks, Mark Byers.
For sure, though, you can solve this by placing the entire part you want inside a "capturing group", which is done by placing it in parentheses. This makes it so that you can group together matched parts of your regular expression into one substring. Your pattern would then look like:
Pattern.compile("C=(\\d+\\.\\d+)")
For the parsing 3567 or 3.567, your pattern would be C=(\\d+(\\.\\d+)?) with group 1 representing the whole number. Also, do note that since you specifically want to match a period, you want to escape your . (period) character so that it's not interpreted as the "any-character" token. For this input, though, it doesn't matter
Then, to get your 3.567, you would you would call m.group(1) to grab the first (counting from 1) specified group. This would mean that your Double.parseDouble call would essentially become Double.parseDouble("3.567")
As for taking C= out of your pattern, since I'm not that well-versed with RegExp, I might recommend that you split your input string on the semi-colons and then check to see if each of the splits contain the C; then you could apply the pattern (with the capturing groups) to get the 3.567 from your Matcher.
Edit For the more general (and likely more useful!) cases in gawi's comment, please use the following (from http://www.regular-expressions.info/floatingpoint.html)
Pattern.compile("[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?")
This has support for optional sign, either optional integer or optional decimal parts, and optional positive/negative exponents. Insert capturing groups where desired to pick out parts individually. The exponent as a whole is in its own group to make it, as a whole, optional.

Your regular expression is only matching numeric characters. To also match the decimal point too you will need:
Pattern.compile("\\d+\\.\\d+")
The . is escaped because this would match any character when unescaped.
Note: this will then only match numbers with a decimal point which is what you have in your example.

To match any sequence of digits and dots you can change the regular expression to this:
"(?<=C=)[.\\d]+"
If you want to be certain that there is only a single dot you might want to try something like this:
"(?<=C=)\\d+(?:\\.\\d+)?"
You should also be aware that this pattern can match the 1.2 in ABC=1.2.3;. You should consider if you need to improve the regular expression to correctly handle this situation.

if you need to validate decimal with dots, commas, positives and negatives:
Object testObject = "-1.5";
boolean isDecimal = Pattern.matches("^[\\+\\-]{0,1}[0-9]+[\\.\\,][0-9]+$", (CharSequence) testObject);
Good luck.

if you want a regex for an input which might be double or just integer without any *.0 thing you can use this:Pattern.compile("(-?\d+\.?\d*)")

Why doesn't this Java regular expression work?

I need to create a regular expression that allows a string to contain any number of:
alphanumeric characters
spaces
(
)
&
.
No other characters are permitted. I used RegexBuddy to construct the following regex, which works correctly when I test it within RegexBuddy:
\w* *\(*\)*&*\.*
Then I used RegexBuddy's "Use" feature to convert this into Java code, but it doesn't appear to work correctly using a simple test program:
public class RegexTest
{
public static void main(String[] args)
{
String test = "(AT) & (T)."; // Should be valid
System.out.println("Test string matches: "
+ test.matches("\\w* *\\(*\\)*&*\\.*")); // Outputs false
}
}
I must admit that I have a bit of a blind spot when it comes to regular expressions. Can anyone explain why it doesn't work please?

That regular expression tests for any amount of whitespace, followed by any amount of alphanumeric characters, followed by any amount of open parens, followed by any amount of close parens, followed by any amount of ampersands, followed by any amount of periods.
What you want is...
test.matches("[\\w \\(\\)&\\.]*")
As mentioned by mmyers, this allows the empty string. If you do not want to allow the empty string...
test.matches("[\\w \\(\\)&\\.]+")
Though that will also allow a string that is only spaces, or only periods, etc.. If you want to ensure at least one alpha-numeric character...
test.matches("[\\w \\(\\)&\\.]*\\w+[\\w \\(\\)&\\.]*")
So you understand what the regular expression is saying... anything within the square brackets ("[]") indicates a set of characters. So, where "a*" means 0 or more a's, [abc]* means 0 or more characters, all of which being a's, b's, or c's.

Maybe I'm misunderstanding your description, but aren't you essentially defining a class of characters without an order rather than a specific sequence? Shouldn't your regexp have a structure of [xxxx]+, where xxxx are the actual characters you want ?

The difference between your Java code snippet and the Test tab in RegexBuddy is that the matches() method in Java requires the regular expression to match the whole string, while the Test tab in RegexBuddy allows partial matches. If you use your original regex in RegexBuddy, you'll see multiple blocks of yellow and blue highlighting. That indicates RegexBuddy found multiple partial matches in your string. To get a regex that works as intended with matches(), you need to edit it until the whole test subject is highlighted in yellow, or if you turn off highlighting, until the Find First button selects the whole text.
Alternatively, you can use the anchors \A and \Z at the start and the end of your regex to force it to match the whole string. When you do that, your regex always behaves in the same way, whether you test it in RegexBuddy, or whether you use matches() or another method in Java. Only matches() requires a full string match. All other Matcher methods in Java allow partial matches.

the regex
\w* *\(*\)*&*\.*
will give you the items you described, but only in the order you described, and each one can be as many as wanted. So "skjhsklasdkjgsh((((())))))&&&&&....." works, but not mixing the characters.
You want a regex like this:
\[\w\(\)\&\.]+\
which will allow a mix of all characters.
edit: my regex knowledge is limited, so the above syntax may not be perfect.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Pattern Matching - String Search - java

You can use the regex: ('\\d{3}'/){6}'\\d{3}'

Pattern.Compile takes a String as its parameter. Though that's probably most often supplied in the form of a string literal, if you have variable upper and lower bounds for your pattern, you can use something like StringBuilder to build your string, then pass that result to Pattern.Compile.

Related

String replace with condition not be a subpart of word Java

Regular expression not working despite testing

How to use two types of regex in single regex?

Java: how to parse double from regex

Why doesn't this Java regular expression work?

Categories

Resources