JAVA how to retrieve a string based on another - java

I have a list that contains some tags:
[[food_1]], [[drink_2]], [[food_1]]
I want to retrieve all tags that match an input
example:
input: [[food_*]]
result:[[food_1]], [[food_2]]
input and tags have always the same format
this is my code snippet
Matcher m = Pattern.compile(input.replace("*","\\d")).matcher(element from tags list)
while(m.find()){
...
}

if you are using collections, you can start to use streams, check this construction :
yourList
.stream()
.filter(line -> line.matches("YOUR REGEX"))
.collect(Collectors.toList());

You can not use the input as is: if your tag format is [[character_*]], then it is most likely that:
Pattern.compile(input.replace("*","\\d"))
Turns to:
Pattern.compile("[[character_*]]".replace("*","\\d"))
And
Pattern.compile("[[character_\\d]]"))
Since you can do a union of range, eg: [[a-z]&&[^b]], I think that pattern will only ever match one character at a time:
You can test that here, or from your code: https://www.regexplanet.com/share/index.html?share=yyyyd5puaar
It find 'd' for your example.
TL;DR: you must escape your pattern or fix it:
Pattern.compile(Pattern.quote("[[character_*]]").replace("*","\\E\\d\\Q")))
// turns to: \Q[[character_\E\d\Q]]\E
The \E and \Q are what Java uses in Pattern.quote to escape a regexp.

Square bracket [] is regex metacharacter, hence need to escape that.
import java.io.IOException;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class TestRegex {
public static void main(String[] args) throws IOException {
String pattern = "(\\[\\[food_[*]]])";
pattern = pattern.replace("*", "\\d");
Matcher m = Pattern.compile(pattern).matcher(
"[[food_1]], [[drink_2]], [[food_3]]");
while (m.find()) {
System.out.println(m.group(1));
}
}
}
Output:
[[food_1]]
[[food_3]]

Related

Java Regex capture nested matches

I am having trouble with regex here.
Say i have this input:
608094.21.1.2014.TELE.&BIG00Z.1.1.GBP
My regex looks like this
(\d\d\d\d\.\d?\d\.\d?\d)|(\d?\d\.\d?\d\.\d?\d?\d\d)
I want to extract the date 21.1.2014 out of the string, but all i get is
8094.21.1
I think my problem here is, that 21.1.2014 starts within the (wrong) match before. Is there a simple way to make the matcher look for the next match not after the end of the match before but one character after the beginning of the match before?
You could use a regex like this:
\d{1,2}\.\d{1,2}\.\d{4}
Working demo
Or shorten it and use:
(\d{1,2}\.){2}\d{4}
If the date is always surrounded by dot:
\.(\d\d\d\d\.\d?\d\.\d?\d|\d?\d\.\d?\d\.\d?\d?\d\d)\.
I hope this will help you.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String x = "608094.21.1.2014.TELE.&BIG00Z.1.1.GBP";
String pattern = "[0-9]{2}.[0-9]{1}.[0-9]{4}";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(x);
if (m.find( )) {
System.out.println("Found value: " + m.group() );
}else {
System.out.println("NO MATCH");
}
}

java regex find match between commas

I am trying to find a match between commas if it contains a specific string.
so far i have ,(.*?myString.?*),
Obviously this finds all the input between the first comma in the entire input and the first comma after the string i want. How do i reference the comma immediately before the string that i want?
Edit: i also want to find the match that occurs after a specific set of characters
ie. occurs after (fooo)
dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa
returns gfhhhgdtheMatchfhhhfd, not gfdsgdtheMatchfdsgfd
The following regex should do it :
[^,]+theMatch.*?(?=,)
see regex demo / explanation
Java ( demo )
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class RegEx {
public static void main(String[] args) {
String s = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa";
String r = "[^,]+theMatch.*?(?=,)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group()); // gfdsgdtheMatchfdsgfd
}
}
}
Edit
use this regex fooo.*?([^,]+theMatch.*?)(?=,) demo
You are finding too much because .* will include the comma.
You need the following regular expression: ,([^,]*myinput[^,]*),
[^,]* basically says find all non-comma characters.
I would suggest the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(0));
// prints out ",myinput,"
System.out.println(m.group(1));
// prints out "myinput"
}
}
}
Here is a StackOverflow question that is basically the same with some very good answers associated:
Regex to find internal match between two characters
For more on regular expressions in Java look here: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
If you want the position of the comma proceeding your input string use the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(str.indexOf(m.group(0)));
// prints out "16"
}
}
}
By feeding the match of the regular expression into the String Method indexOf( you are able to locate the position of the start of your string.
Edit:
To find the occurrence of a string following another string, simply modify the regex to: fooo.*,([^,]*theMatch[^,]*),
fooo.* will greedily consume all characters between fooo and the start of your match.
Example code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa";
Pattern p = Pattern.compile("fooo.*,([^,]*theMatch[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(1));
// prints out: gfhhhgdtheMatchfhhhfd
}
}
}
The usual approach is to use a pattern that cannot match your delimiter in place of .. In this case, you need that only at the front of the pattern; you can use a reluctant quantifier at the back as you already do (though you've misspelled it). For example:
,([^,]*myString.*?),

split a string based on parentheses and next characther

I am having a problem tring to split a sting based on parentheses.
I have a String like this
Fe(C5H5)2FeO3 and I need to split the sting in to an array so the array reads
Fe
(C5H5)2
FeO3
Im an using this code.
String form = "Fe(C5H5)2FeO3";
from.split([()]+);
I am having trouble getting the characther after the ")" to split out.
This also has to work for multiple sets of () in the same string.
Thanks
positive look ahead and look behind can do some of this:
String formula = "Fe(C5H5)2FeO3";
String regex = "(?=\\()|(?<=\\)\\d)";
String[] tokens = formula.split(regex );
System.out.println(Arrays.toString(tokens));
For more on this, check out the regular expressions tutorial
You can use a simple regex to match parts of the sequence instead of splitting on a regex:
import java.util.*;
import java.util.regex.*;
import java.lang.*;
class Main {
public static void main (String[] args) throws java.lang.Exception
{
String formula = "Fe(C5H5)2FeO3";
Pattern p = Pattern.compile("[^(]+|[(][^)]*[)]\\d+");
Matcher m = p.matcher(formula);
while (m.find()) {
System.out.println(m.group());
}
}
}
This program produces the output below:
Fe
(C5H5)2
FeO3

Java regex question

I have a text something like
ab1ab2ab3ab4cd
Can one create a java regular expression to obtain all subtrings that start with "ab" and end with "cd"? e.g:
ab1ab2ab3ab4cd
ab2ab3ab4cd
ab3ab4cd
ab4cd
Thanks
The regex (?=(ab.*cd)) will group such matches in group 1 as you can see:
import java.util.regex.*;
public class Main {
public static void main(String[] args) throws Exception {
Matcher m = Pattern.compile("(?=(ab.*cd))").matcher("ab1ab2ab3ab4cd");
while (m.find()) {
System.out.println(m.group(1));
}
}
}
which produces:
ab1ab2ab3ab4cd
ab2ab3ab4cd
ab3ab4cd
ab4cd
You need the look ahead, (?= ... ), otherwise you'll just get one match. Note that regex will fail to produce the desired results if there are more than 2 cd's in your string. In that case, you'll have to resort to some manual string algorithm.
Looks like you want either ab\w+?cd or \bab\w+?cd\b
/^ab[a-z0-9]+cd$/gm
If only a b c and digits 0-9 can appear in the middle as in the examples:
/^ab[a-c\d]+cd$/gm
See it in action: http://regexr.com?2tpdu

Using Java's Regex to extract a word from a path name

I have a directory like this and I am trying to extract the word "photon" from just before "photon.exe".
C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)
My code looks like this:
String path = "C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)";
Pattern pattern = Pattern.compile(".+\\\\(.+).exe");
Matcher matcher = pattern.matcher(path);
System.out.println(matcher.group(1));
No matter what permutations I try I keep getting IllegalStateExceptions etc, despite this regular expression working on http://www.regexplanet.com/simple/index.html.
Thanks in advance for any help. I am super frustrated at this point >.<
You need to actually run the matcher:
if ( matcher.find() ) {
System.out.println(matcher.group(1));
}
Note that I use matcher.find() above instead of matcher.matches() because your regex is not set up to match the entire string (it won't match the (Suspended... part). Since that's the case, you don't really need the preamble to the slash; \\\\(.+).exe should work fine.
Of course, this is mentioned in the documentation for group(int):
Throws:
IllegalStateException - If no match has yet been attempted, or if the previous match operation failed
you can use the following regular expression: ^.*\\(.*)\.exe.*$ and the file name will be in the first match group. Here is an example.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main
{
public static void main(final String[] args)
{
final String input = args[0];
final Pattern pattern = Pattern.compile("^.*\\\\(.*)\\.exe.*$");
final Matcher matcher = pattern.matcher(input);
if (matcher.find())
{
System.out.println("matcher.group(1) = " + matcher.group(1));
}
else
{
System.out.format("%s does not match %s\n", input, pattern.pattern());
}
}
}
run it with C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running) as the input and here is the expected output:
matcher.group(1) = photon
(new java.io.File("C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)")).getName().split("\\.")[0];
Try this regex: [\\d\\w]+\\.exe
It assumes the executable only has digits and letters.
Another option is to use .+\\.exe to get the full file name and use substring and lastIndexOf('\') to get the file name.
You can also use new File(fullFilePath).getFileName() which is a more correct way to do it as it will save you the substring - but I don't know if it has better performance.

Categories

Resources