Java last occurence of regex till end of string

Java last occurence of regex till end of string - java

I need a regex that gets the text between the last occurrence of .java;[number] or java;NONE and the end of the string.
Here's an example of the text I have as input:
user: ilian
branch: HEAD
changed files:
FlatFilePortfolioImportController.java;1.78
ConvertibleBondParser.java;1.52
OptionKnockedOutException.java;1.1.2.1
RebatePayoff.java;NONE
possible dead-lock. The suggested solution is to first create a TransactionContext and then lock AccountableDataFactory.IMPORT_LOCK and PositionManagerSQL
Basically I need to get the comment at the end of the commit, which is after the last changed file, which could end in something like 1.52, 1.1.2.1 or NONE.

String regex = "\\.java;\\d+\\.\\d+(.+)";
Pattern p = Pattern.compile(regex, Pattern.DOTALL);
Matcher m = p.matcher(input);
if (m.find()) {
System.out.println(m.group(1));
}

Edited
Solution assuming input is in a single line (lines in original post only for clarity, see OP's comment below).
String input = "user: ilian branch: "
+ "HEAD changed files: "
+ "FlatFilePortfolioImportController.java;1.78 "
+ "ConvertibleBondParser.java;1.52 "
+ "possible dead-lock. The suggested solution is to first create a "
+ "TransactionContext and then lock AccountableDataFactory.IMPORT_LOCK "
+ "and PositionManagerSQL";
// checks last occurrence of java;x.xx, optional space(s), anything until end of input
Pattern pattern = Pattern.compile(".+java;[\\d\\.]+\\s+?(.+?)$");
Matcher matcher = pattern.matcher(input);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output:
possible dead-lock. The suggested solution is to first create a TransactionContext and then lock AccountableDataFactory.IMPORT_LOCK and PositionManagerSQL

String comment = mydata.replaceAll("(?s).*java;[0-9,.]+|.*java;NONE", "");
System.out.println(comment);
Works for all the file endings and prints correctly.

Related

How can I get non-matching groups using a Matcher in Java?

I'm trying to write a java regex to catch some groups of words from a String using a Matcher.
Say i got this string: "Hello, we are #happy# to see you today".
I would like to get 2 group of matches, one having
Hello, we are
to see you today
and the other
happy
So far, I was only able to match the word between the #s using this Pattern:
Pattern p = Pattern.compile("#(.+?)#");
I've read about negative lookahead and lookaround, played a bit with it but without success.
I assume I should do some sort of negation of the regex so far, but I couldn't come up with anything.
Any help would be really appreciated, thank you.

From comment:
I may incur in a string where I got more than one instances of words wrapped by #, such as "#Hello# kind #stranger#"
From comment:
I need to apply some different style format to both the text inside and outside.
Since you need to apply different stylings, the code need to process each block of text separately, and needs to know if the text is inside or outside a #..# section.
Note, in the following code, it will silently skip the last #, if there is an odd number of them.
String input = ...
for (Matcher m = Pattern.compile("([^#]+)|#([^#]+)#").matcher(input); m.find(); ) {
if (m.start(1) != -1) {
String outsideText = m.group(1);
System.out.println("Outside: \"" + outsideText + "\"");
} else {
String insideText = m.group(2);
System.out.println("Inside: \"" + insideText + "\"");
}
}
Output for input = "Hello, we are #happy# to see you today"
Outside: "Hello, we are "
Inside: "happy"
Outside: " to see you today"
Output for input = "#Hello# kind #stranger#"
Inside: "Hello"
Outside: " kind "
Inside: "stranger"
Output for input = "This #text# has unpaired # characters"
Outside: "This "
Inside: "text"
Outside: " has unpaired "
Outside: " characters"

The best I could do is splitting in 3 groups, then merging the group 1 and 4 :
(^.*)(\#(.+?)\#)(.*)
Test it here
EDIT: Taking remarks from the comments :
(^[^\#]*)(?:\#(.+?)\#)([^\#]*)
Thanks to #Lino we don't capture the useless group with # anymore, and we capture anything except #, instead of any non whitespace character in the 1st and 2nd groups.
Test it here

Is this solution fine?
Pattern pattern =
Pattern.compile("([^#]+)|#([^#]*)#");
Matcher matcher =
pattern.matcher("Hello, we are #happy# to see you today");
List<String> notBetween = new ArrayList<>(); // not surrounded by #
List<String> between = new ArrayList<>(); // surrounded by #
while (matcher.find()) {
if (Objects.nonNull(matcher.group(1))) notBetween.add(matcher.group(1));
if (Objects.nonNull(matcher.group(2))) between.add(matcher.group(2));
}
System.out.println("Printing group 1");
for (String string :
notBetween) {
System.out.println(string);
}
System.out.println("Printing group 2");
for (String string :
between) {
System.out.println(string);
}

RegEx to extract text between tags in Java

I need to extract the values after :70: in the following text file using RegEx. Value may contain line breaks as well.
My current solution is to extract the string between :70: and : but this always returns only one match, the whole text between the first :70: and last :.
:32B:xxx,
:59:yyy
something
:70:ACK1
ACK2
:21:something
:71A:something
:23E:something
value
:70:ACK2
ACK3
:71A:something
How can I achive this using Java? Ideally I want to iterate through all values, i.e.
ACK1\nACK2,
ACK2\nACK3
Thanks :)
Edit: What I'm doing right now,
Pattern pattern = Pattern.compile("(?<=:70:)(.*)(?=\n)", Pattern.DOTALL);
Matcher matcher = pattern.matcher(data);
while (matcher.find()) {
System.out.println(matcher.group())
}

Try this.
String data = ""
+ ":32B:xxx,\n"
+ ":59:yyy\n"
+ "something\n"
+ ":70:ACK1\n"
+ "ACK2\n"
+ ":21:something\n"
+ ":71A:something\n"
+ ":23E:something\n"
+ "value\n"
+ ":70:ACK2\n"
+ "ACK3\n"
+ ":71A:something\n";
Pattern pattern = Pattern.compile(":70:(.*?)\\s*:", Pattern.DOTALL);
Matcher matcher = pattern.matcher(data);
while (matcher.find())
System.out.println("found="+ matcher.group(1));
result:
found=ACK1
ACK2
found=ACK2
ACK3

You need a loop to do this.
Pattern p = Pattern.compile(regexPattern);
List<String> list = new ArrayList<String>();
Matcher m = p.matches(input);
while (m.find()) {
list.add(m.group());
}
As seen here Create array of regex matches

Regex expression to get the file name

I want to extract only filename from the complete file name + time stamp . below is the input.
String filePath = "fileName1_20150108.csv";
expected output should be: "fileName1"
String filePath2 = "fileName1_filedesc1_20150108_002_20150109013841.csv"
And expected output should be: "fileName1_filedesc1"
I wrote a below code in java to get the file name but it is working for first part (filePath) but not for filepath2.
Pattern pattern = Pattern.compile(".*.(?=_)");
String filePath = "fileName1_20150108.csv";
String filePath2 = "fileName1_filedesc1_20150108_002_20150109013841.csv";
Matcher matcher = pattern.matcher(filePath);
while (matcher.find()) {
System.out.print("Start index: " + matcher.start());
System.out.print(" End index: " + matcher.end() + " ");
System.out.println(matcher.group());
}
Can somebody please help me to correct the regex so i can parse both filepath using same regex?
Thanks

Anchor the start, and make the .* non-greedy:
^.*?(_\D.*?)?(?=[_.])
Update: change the second group (for fileDesc) to optional, and enforce that it starts with a non-digit character. This will work as long as your fileDesc strings never start with numbers.

You can get the characters before the first underscode, the first underscore, and then the characters until the next underscore:
^[^_]*_[^_]*

This should work: "^(.*?)_([0-9_]*)\\.([^.]*)$"
It will return you 3 groups:
the base name (assuming not a single part will be all numbers)
the timestamp info
the extension.
You can test here: http://fiddle.re/v0hne6 (RegexPlanet)

Certain strings that should be found by a working Regex are missed, and I need help identifying why

I have a set of strings, which I cycle through, checking those against the following set of regex, to try and separate the first small section from the rest of the string. The regex works in almost all cases, but unfortunately I have no idea why it fails occasionally. I’ve been using Pattern Matcher to print out the string, if the pattern is found.
Two example working strings:
98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials; inflorescence …
Two example failed strings:
100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …
26. POA L. (Parodiochloa C.E. Hubb.) - Meadow-grasses Annuals or perennials with or without stolons or rhizomes; sheaths overlapping or some …
Regex’s used so far:
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusTwo = Pattern.compile("(?<=(^\\d+" + genusNames[l].toUpperCase() + "))");
Pattern endOfGenusThree = Pattern.compile("(?<=(\\d+\\. " + genusNames[l] + "))");
Pattern endOfGenusFour = Pattern.compile("(?<=(\\d+" + genusNames[l] + "))");
Pattern endOfGenusFive = Pattern.compile("(?<=(\\. " + genusNames[l] + "))");
The first of these is the one thats producing the reliable results so far.
Example Code
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
Matcher endOfGenusFinder = endOfGenus.matcher(descriptionPartBits[b]);
if (endOfGenusFinder.find()) {
System.out.print(descriptionPartBits[b] + ":- ");
System.out.print(genusNames[l] + "\n");
String[] genusNameBits = descriptionPartBits[b].split("(?<=(^\\d+\\. " + genusNames[l].toUpperCase() + "))");
}
Desired Output. This is what is produced by strings that work. Strings that don't work simply don't appear in the output:
98. SORGHUM Moench - Millets Annuals or rhizomatous perennials:- Sorghum
99. MISCANTHUS Andersson - Silver-grasses Rhizomatous perennials:- Miscanthus

From regex tutorial:
Lookahead and lookbehind, collectively called "lookaround", are
zero-length assertions just like the start and end of line, and start
and end of word anchors explained earlier in this tutorial.
Lookahead and lookbehind only return true or false.
So I changed your code example:
Pattern endOfGenus = Pattern.compile("(?<=(^\\d+\\. ZEA L))(.+)$");
// Matcher matcher = endOfGenus.matcher("98. SORGHUM Moench - Millets Annuals or rhizomatous perennials; inflorescence …");
Matcher matcher = endOfGenus.matcher("100. ZEA L. - Maize Annuals; male and female inflorescences separate, the …");
while (matcher.find()) {
String group1 = matcher.group(1);
String group2 = matcher.group(2);
System.out.println("group1=" + group1);
System.out.println("group2=" + group2);
}
Group 1 is matched by (^\\d+\\. ZEA L). Group 2 is matched by (.+).

RegEX in trimming first two character

I'm trying to extract two words from a line with regex using matcher in Java
my line goes like this, BROWSER=Firefox
I'm using the below code
currentLine = currentLine.trim();
System.out.println("Current Line: "+ currentLine);
Pattern p = Pattern.compile("(.*?)=(.*)");
Matcher m = p1.matcher(currentLine);
if(m.find(1) && m.find(2)){
System.out.println("Key: "+m.group(1)+" Value: "+m.group(2));
}
The output I get is
Key: OWSER Value: FireFox
BR is trimming off in my case. It seems to be weird to me, till I know why it behaves in this way, as this works perfectly with PERL. Can someone help me?

When you call m.find(2) it strips the first two chars. From the JavaDocs (bold is mine):
public boolean find(int start)
Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.
So, use just m.find():
String currentLine = "BROWSER=FireFox";
System.out.println("Current Line: "+ currentLine);
Pattern p = Pattern.compile("(.*?)=(.*)");
Matcher m = p.matcher(currentLine);
if (m.find()) {
System.out.println("Key: "+m.group(1)+" Value: "+m.group(2));
}
Output:
Current Line: BROWSER=FireFox
Key: BROWSER Value: FireFox
See online demo here.

You can use String.indexOf to find the location of the = and then String.substring to get your two values:
String currentLine = "BROWSER=Firefox";
int indexOfEq = currentLine.indexOf('=');
String myKey = currentLine.substring(0, indexOfEq);
String myVal = currentLine.substring(indexOfEq + 1);
System.out.println(myKey + ":" + myVal);

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java last occurence of regex till end of string - java

String regex = "\\.java;\\d+\\.\\d+(.+)"; Pattern p = Pattern.compile(regex, Pattern.DOTALL); Matcher m = p.matcher(input); if (m.find()) { System.out.println(m.group(1)); }

String comment = mydata.replaceAll("(?s).java;[0-9,.]+|.java;NONE", ""); System.out.println(comment); Works for all the file endings and prints correctly.

Related

How can I get non-matching groups using a Matcher in Java?

RegEx to extract text between tags in Java

Regex expression to get the file name

Certain strings that should be found by a working Regex are missed, and I need help identifying why

RegEX in trimming first two character

Categories

Resources

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java last occurence of regex till end of string - java

String regex = "\\.java;\\d+\\.\\d+(.+)"; Pattern p = Pattern.compile(regex, Pattern.DOTALL); Matcher m = p.matcher(input); if (m.find()) { System.out.println(m.group(1)); }

String comment = mydata.replaceAll("(?s).*java;[0-9,.]+|.*java;NONE", ""); System.out.println(comment); Works for all the file endings and prints correctly.

Related

How can I get non-matching groups using a Matcher in Java?

RegEx to extract text between tags in Java

Regex expression to get the file name

Certain strings that should be found by a working Regex are missed, and I need help identifying why

RegEX in trimming first two character

Categories

Resources

String comment = mydata.replaceAll("(?s).java;[0-9,.]+|.java;NONE", ""); System.out.println(comment); Works for all the file endings and prints correctly.