Is possible, in java, to make a regex for matching the end of the string but not the newlines, using the Pattern.DOTALL option and searching for a line with \n?
Examples:
1)
aaa\n==test==\naaa\nbbb\naaa
2)
bbb\naaa==toast==cccdd\nb\nc
3)
aaa\n==trick==\naaaDDDaaa\nbbb
I want to match
\naaa\nbbb\naaa
and
cccdd\nb\nc
but, in the third example, i don't want to match text ater DDD.
\naaa
Yes, there is. For example, (?-m)}$ will match a close-brace at the very end of a Java source file. The point is to disable the multiline mode. You can disable as I've shown or by setting the appropriate flag on the Pattern instance.
UPDATE: I believe that multiline is off by default when you instantiate a Pattern, but is on in Eclipse's find by regex.
The regex you need is:
"(?s)==(?!.*?==)([^(?:DDD)]*)"
Here is the full code:
String[] sarr = {"aaa\n==test==\naaa\nbbb\naaa", "bbb\naaa==toast==cccdd\nb\nc",
"aaa\n==trick==\naaaDDDaaa\nbbb"};
Pattern pt = Pattern.compile("(?s)==(?!.*?==)([^(?:DDD)]*)");
for (String s : sarr) {
Matcher m = pt.matcher(s);
System.out.print("For input: [" + s + "] => ");
if (m.find())
System.out.println("Matched: [" + m.group(1) + ']');
else
System.out.println("Didn't Match");
}
OUTPUT:
For input: [aaa\n==test==\naaa\nbbb\naaa] => Matched: [\naaa\nbbb\naaa]
For input: [bbb\naaa==toast==cccdd\nb\nc] => Matched: [cccdd\nb\nc]
For input: [aaa\n==trick==\naaaDDDaaa\nbbb] => Matched: [\naaa]
Related
It may be very simple, but I am extremely new to regex and have a requirement where I need to do some regex matches in a string and extract the number in it. Below is my code with sample i/p and required o/p. I tried to construct the Pattern by referring to https://www.freeformatter.com/java-regex-tester.html, but my regex match itself is returning false.
Pattern pattern = Pattern.compile(".*/(a-b|c-d|e-f)/([0-9])+(#[0-9]?)");
String str = "foo/bar/Samsung-Galaxy/a-b/1"; // need to extract 1.
String str1 = "foo/bar/Samsung-Galaxy/c-d/1#P2";// need to extract 2.
String str2 = "foo.com/Samsung-Galaxy/9090/c-d/69"; // need to extract 69
System.out.println("result " + pattern.matcher(str).matches());
System.out.println("result " + pattern.matcher(str1).matches());
System.out.println("result " + pattern.matcher(str1).matches());
All of above SOPs are returning false. I am using java 8, is there is any way by which in a single statement I can match the pattern and then extract the digit from the string.
I would be great if somebody can point me on how to debug/develop the regex.Please feel free to let me know if something is not clear in my question.
You may use
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
See the regex demo
When used with matches(), the pattern above does not require explicit anchors, ^ and $.
Details
.* - any 0+ chars other than line break chars, as many as possible
/ - the rightmost / that is followed with the subsequent subpatterns
(?:a-b|c-d|e-f) - a non-capturing group matching any of the alternatives inside: a-b, c-d or e-f
/ - a / char
[^/]*? - any chars other than /, as few as possible
([0-9]+) - Group 1: one or more digits.
Java demo:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
Pattern pattern = Pattern.compile(".*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)");
for (String s : strs) {
Matcher m = pattern.matcher(s);
if (m.matches()) {
System.out.println(s + ": \"" + m.group(1) + "\"");
}
}
A replacing approach using the same regex with anchors added:
List<String> strs = Arrays.asList("foo/bar/Samsung-Galaxy/a-b/1","foo/bar/Samsung-Galaxy/c-d/1#P2","foo.com/Samsung-Galaxy/9090/c-d/69");
String pattern = "^.*/(?:a-b|c-d|e-f)/[^/]*?([0-9]+)$";
for (String s : strs) {
System.out.println(s + ": \"" + s.replaceFirst(pattern, "$1") + "\"");
}
See another Java demo.
Output:
foo/bar/Samsung-Galaxy/a-b/1: "1"
foo/bar/Samsung-Galaxy/c-d/1#P2: "2"
foo.com/Samsung-Galaxy/9090/c-d/69: "69"
Because you match always the last number in your regex, I would Like to just use replaceAll with this regex .*?(\d+)$ :
String regex = ".*?(\\d+)$";
String strResult1 = str.replaceAll(regex, "$1");
System.out.println(!strResult1.isEmpty() ? "result " + strResult1 : "no result");
String strResult2 = str1.replaceAll(regex, "$1");
System.out.println(!strResult2.isEmpty() ? "result " + strResult2 : "no result");
String strResult3 = str2.replaceAll(regex, "$1");
System.out.println(!strResult3.isEmpty() ? "result " + strResult3 : "no result");
If the result is empty then you don't have any number.
Outputs
result 1
result 2
result 69
Here is a one-liner using String#replaceAll:
public String getDigits(String input) {
String number = input.replaceAll(".*/(?:a-b|c-d|e-f)/[^/]*?(\\d+)$", "$1");
return number.matches("\\d+") ? number : "no match";
}
System.out.println(getDigits("foo.com/Samsung-Galaxy/9090/c-d/69"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/a-b/some other text/1"));
System.out.println(getDigits("foo/bar/Samsung-Galaxy/9090/a-b/69ace"));
69
no match
no match
This works on the sample inputs you provided. Note that I added logic which will display no match for the case where ending digits could not be matched fitting your pattern. In the case of a non-match, we would typically be left with the original input string, which would not be all digits.
Here is my requirement:
Input1: adasd|adsasd\|adsadsadad|asdsad
output1: Array(adasd,adsasd\|adsadsadad,asdsad)
Input2: adasd|adsasd\\|adsadsadad|asdsad
output2: Array(adasd,adsasd\\,adsadsadad,asdsad)
Input3: adasd|adsasd\\\|adsadsadad|asdsad
output3: Array(adasd,adsasd\\\|adsadsadad,asdsad)
I was using this code:
val delimiter =Pattern.quote("|")
val esc = "\\"
val regex = "(?<!" + Pattern.quote(esc) + ")" + delimiter
But this is not working fine with all the cases.
What will be the best solution to deal with this?
Instead of splitting, use this regex for a match:
(?<=[|]|^)[^|\\]*(?:\\.[^|\\]*)*
Java Code Demo
Java code:
final String[] input = {"adasd|adsasd\\|adsadsadad|asdsad",
"adasd|adsasd\\\\|adsadsadad|asdsad",
"adasd|adsasd\\\\\\|adsadsadad|asdsad"};
final String regex = "(?<=[|]|^)[^|\\\\]*(?:\\\\.[^|\\\\]*)*";
final Pattern pattern = Pattern.compile(regex);
Matcher matcher;
for (String string: input) {
matcher = pattern.matcher(string);
System.out.println("\n*** Input: " + string);
while (matcher.find()) {
System.out.println(matcher.group(0));
}
}
Output:
*** Input: adasd|adsasd\|adsadsadad|asdsad
adasd
adsasd\|adsadsadad
asdsad
*** Input: adasd|adsasd\\|adsadsadad|asdsad
adasd
adsasd\\
adsadsadad
asdsad
*** Input: adasd|adsasd\\\|adsadsadad|asdsad
adasd
adsasd\\\|adsadsadad
asdsad
For the sake of simplicity, let's take ";"(semicolon) instead of "\"(backslash) to avoid too many escape sequences here.
We can do this split with a look-behind as below:
String[] input = { "adasd|zook;|adsadsadad|asdsad", "adasd|zook;;|adsadsadad|asdsad",
"adasd|zook;;;|adsadsadad|asdsad", "blah;|blah;;;;|blah|blahblah;|blahbloooh;;|" };
String regex = "(?<!;)(;;)+\\||(?<!;)\\|";
for(String str : input) {
System.out.println("Input : "+ str);
System.out.println("Output: ");
String[] astr = str.split(regex);
for(String nres : astr)
System.out.print(nres+", ");
System.out.println("\n");
}
Let's have a deeper look at the regex. I will split this into 2 parts:
Split on even occurrence of semicolon(;) followed by a pipe("|"):
(?<!;)(;;)+\\| :
Here we make sure we match just even occurrence with (;;)+ and a look-behind to make sure we are not matching any unintended ";" before the set of even occurrences.
Split on pipe without a preceding semicolon:
(?<!;)\\| :
Here we will just match lone pipe symbols and use look-behind to make sure no ";" before the "|"
Output for the above snippet
Hope this helps! :)
I want to replace all :variable (word starting with :) with ${variable}$.
For example,
:aks_num with ${aks_num}$
:brn_num with ${brn_num}$
Following is my code, which does not work:
public static void main(String[] argv) throws Exception
{
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
// replaceAll also not working
//String s = chSeq.replaceAll(":\\([a-z_]*\\)","\\${ $1 \\}$");
Pattern p = Pattern.compile(":\\([a-z_]*\\)");
Matcher m = p.matcher(chSeq);
if (m.find()) {
System.out.println("Found value: " + m.group(0) );
System.out.println("Found value: " + m.group(1) );
System.out.println("Found value: " + m.group(2) );
} else {
System.out.println("NO MATCH");
}
}
While in shell script the following regex works perfectly:
s/:\([a-z_]*\)/${\1}$/g
:\\([a-z_]*\\) (with escaped parenthesis) means that you want to match expressions like :(aks_num). Obviously, there are no such expression in the input string. That explains why there are no matches.
Instead, if you want to use parenthesis in order to capture some variables, you should not escape the parenthesis.
Example :
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
Pattern p = Pattern.compile(":([a-z_]*)");
Matcher m = p.matcher(chSeq);
while (m.find()) {
System.out.println("Found value: " + m.group(0)+". Captured : "+m.group(1));
}
Output:
Found value: :aks_num. Captured : aks_num
Found value: :aks_num. Captured : aks_num
Found value: :brn_num. Captured : brn_num
Found value: :brn_num. Captured : brn_num
CharSequence chSeq = "AND ((:aks_num = -1) OR (aks_num = :aks_num AND ((:brn_num = -1) OR (brn_num = :brn_num))))";
// replaceAll also not working
//String s = chSeq.replaceAll(":\\([a-z_]*\\)","\\${ $1 \\}$");
Pattern p = Pattern.compile(":(\\w+)");
Matcher m = p.matcher(chSeq);
while (m.find()) {
System.out.println("Found value: " + m.group(1) );
}
Ideone Demo
Working fine with replaceAll
Pattern p = Pattern.compile("(:\\w+)");
Matcher m = p.matcher(x);
x = m.replaceAll("\\${$1}\\$");
You don't need to escape the parentheses, so
Pattern.compile(":([a-z_]*)");
should work.
I believe you got confused with the Java's regex syntax that is different from regular sed syntax. You do not need to escape parentheses to make them "special" grouping operators. Vice versa, in Java, when you escape parentheses, they start matching literal ( and ) symbols.
In the replacement pattern, $ must be escaped for the regex engine to replace with literal $ symbols, but you do not need to escape braces there.
So, just use
.replaceAll(":([a-z_]+)", "\\${$1}\\$")
See the IDEONE demo
I suggest the + quantifier because I doubt you need to match a : followed with a space, or digits - any non-letter.
BTW, you do not need any /g flag in Java since replaceAll will replace all matches with the provided replacement pattern.
NOTE: you can further adjust the pattern to match all letters/digits/underscores with ":(\\w+)". Or just alphanumerics/underscore: ":([\\p{Alnum}_]+)".
I'd like to extract 2 arguments from given string using regex. For example:
C:\Users "C:\Program files"
C:\mytext.txt mytext2.txt
Output would be C:\Users and C:\Program files
C:\mytext.txt and mytext2.txt
If string is between " " it can contain white spaces, otherwise it has to be without them. So far I managed to extract arguments between " ", but can't figure out how to extract them when one argument has " " and the other one doesn't (like in example above).
Pattern p = Pattern.compile("\"(.*?)\"");
Matcher m = p.matcher(string);
while(m.find()){
System.out.println(m.group(1));
}
You can use this regex for matching:
Pattern p = Pattern.compile("\"[^\"]*\"|\\S+");
RegEx Demo
I have a set of strings, which I cycle through, checking those against the following set of regex, to try and separate the first small section from the rest of the string. The regex works in almost all cases, but unfortunately I have no idea why it fails occasionally. I’ve been using Pattern Matcher to print out the string, if the pattern is found.
Two example working strings:
98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials; inflorescence …
Two example failed strings:
100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …
26. POA L. (Parodiochloa C.E. Hubb.) - Meadow-grasses Annuals or perennials with or without stolons or rhizomes; sheaths overlapping or some …
Regex’s used so far:
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusTwo = Pattern.compile("(?<=(^\\d+" + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusThree = Pattern.compile("(?<=(\\d+\\. " + genusNames[l] + "))");
Pattern endOfGenusFour = Pattern.compile("(?<=(\\d+" + genusNames[l] + "))");
Pattern endOfGenusFive = Pattern.compile("(?<=(\\. " + genusNames[l] + "))");
The first of these is the one thats producing the reliable results so far.
Example Code
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Matcher endOfGenusFinder = endOfGenus.matcher(descriptionPartBits[b]);
if (endOfGenusFinder.find()) {
System.out.print(descriptionPartBits[b] + ":- ");
System.out.print(genusNames[l] + "\n");
String[] genusNameBits = descriptionPartBits[b].split("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
}
Desired Output. This is what is produced by strings that work. Strings that don't work simply don't appear in the output:
98. SORGHUM Moench - Millets Annuals or rhizomatous perennials:- Sorghum
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials:- Miscanthus
From regex tutorial:
Lookahead and lookbehind, collectively called "lookaround", are
zero-length assertions just like the start and end of line, and start
and end of word anchors explained earlier in this tutorial.
Lookahead and lookbehind only return true or false.
So I changed your code example:
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. ZEA L))(.+)$");
// Matcher matcher = endOfGenus.matcher("98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …");
Matcher matcher = endOfGenus.matcher("100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …");
while (matcher.find()) {
String group1 = matcher.group(1);
String group2 = matcher.group(2);
System.out.println("group1=" + group1);
System.out.println("group2=" + group2);
}
Group 1 is matched by (^\\d+\\. ZEA L). Group 2 is matched by (.+).