Java Regex capture nested matches - java

I am having trouble with regex here.
Say i have this input:
608094.21.1.2014.TELE.&BIG00Z.1.1.GBP
My regex looks like this
(\d\d\d\d\.\d?\d\.\d?\d)|(\d?\d\.\d?\d\.\d?\d?\d\d)
I want to extract the date 21.1.2014 out of the string, but all i get is
8094.21.1
I think my problem here is, that 21.1.2014 starts within the (wrong) match before. Is there a simple way to make the matcher look for the next match not after the end of the match before but one character after the beginning of the match before?

You could use a regex like this:
\d{1,2}\.\d{1,2}\.\d{4}
Working demo
Or shorten it and use:
(\d{1,2}\.){2}\d{4}

If the date is always surrounded by dot:
\.(\d\d\d\d\.\d?\d\.\d?\d|\d?\d\.\d?\d\.\d?\d?\d\d)\.

I hope this will help you.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String x = "608094.21.1.2014.TELE.&BIG00Z.1.1.GBP";
String pattern = "[0-9]{2}.[0-9]{1}.[0-9]{4}";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(x);
if (m.find( )) {
System.out.println("Found value: " + m.group() );
}else {
System.out.println("NO MATCH");
}
}

Related

java regex find match between commas

I am trying to find a match between commas if it contains a specific string.
so far i have ,(.*?myString.?*),
Obviously this finds all the input between the first comma in the entire input and the first comma after the string i want. How do i reference the comma immediately before the string that i want?
Edit: i also want to find the match that occurs after a specific set of characters
ie. occurs after (fooo)
dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa
returns gfhhhgdtheMatchfhhhfd, not gfdsgdtheMatchfdsgfd
The following regex should do it :
[^,]+theMatch.*?(?=,)
see regex demo / explanation
Java ( demo )
import java.util.regex.Matcher;
import java.util.regex.Pattern;
class RegEx {
public static void main(String[] args) {
String s = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa";
String r = "[^,]+theMatch.*?(?=,)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group()); // gfdsgdtheMatchfdsgfd
}
}
}
Edit
use this regex fooo.*?([^,]+theMatch.*?)(?=,) demo
You are finding too much because .* will include the comma.
You need the following regular expression: ,([^,]*myinput[^,]*),
[^,]* basically says find all non-comma characters.
I would suggest the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(0));
// prints out ",myinput,"
System.out.println(m.group(1));
// prints out "myinput"
}
}
}
Here is a StackOverflow question that is basically the same with some very good answers associated:
Regex to find internal match between two characters
For more on regular expressions in Java look here: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
If you want the position of the comma proceeding your input string use the following code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,myinput,dsafdsa";
Pattern p = Pattern.compile(",([^,]*myinput[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(str.indexOf(m.group(0)));
// prints out "16"
}
}
}
By feeding the match of the regular expression into the String Method indexOf( you are able to locate the position of the start of your string.
Edit:
To find the occurrence of a string following another string, simply modify the regex to: fooo.*,([^,]*theMatch[^,]*),
fooo.* will greedily consume all characters between fooo and the start of your match.
Example code:
import java.util.regex.*;
public class Main {
public static void main(String[] args) {
String str = "dsfdsdafd,safdsa,gfdsgdtheMatchfdsgfd,dsafdsa,dsfoooafd,safdsa,gfhhhgdtheMatchfhhhfd,dsafdsa";
Pattern p = Pattern.compile("fooo.*,([^,]*theMatch[^,]*),");
Matcher m = p.matcher(str);
if(m.find()) {
System.out.println(m.group(1));
// prints out: gfhhhgdtheMatchfhhhfd
}
}
}
The usual approach is to use a pattern that cannot match your delimiter in place of .. In this case, you need that only at the front of the pattern; you can use a reluctant quantifier at the back as you already do (though you've misspelled it). For example:
,([^,]*myString.*?),

java regex matching &[text]

I am currently working on creating a regex to split out all occurrences of Strings that match the following format: &[text] and need to get at the text. Strings could look like: something &[text] &[text] anything &[text] etc.
I have tried the following regex but I cannot seem to get it to work: &\[(.*)\]
Any help would be greatly appreciated.
Brackets are a bit tricky regarding escaping. Try this:
Pattern r = Pattern.compile("&\\[([^\\]]*)\\]");
Matcher m = r.matcher("foo &[bla] [foo] &[blub]&[blab]");
while (m.find()) {
System.out.println("Found value: " + m.group(1));
}
I replaced your dot with a group of any sign that is not a closing bracket. The star operator would otherwise greedily match until the very end of the string. You could also suppress the greedy matching with a question mark, this reads even better: "&\\[(.*?)\\]"
Two things you need to do:
Double escape your square brackets
Prevent the capture group from matching other occurrences of the pattern, by preventing it from matching an opening or a closing bracket
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = "&[test] something ] something &[test2]";
Pattern pattern = Pattern.compile("&\\[([^\\[\\]]*)\\]");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println("capture group: " + matcher.group(1));
}
}
}

My regex search only prints out there last match

I actually wrote a regex expression to search for web URLs in a text (full code below) but on running the code, console prints out only the last URL in the text. I don't know what's wrong and I actually used a while loop. See code below and kindly help make corrections. Thanks
import java.util.*;
import java.util.regex.*;
public class Main
{
static String query = "This is a URL http://facebook.com"
+ " and this is another, http://twitter.com "
+ "this is the last URL http://instagram.com"
+ " all these URLs should be printed after the code execution";
public static void main(String args[])
{
String pattern = "([\\w \\W]*)((http://)([\\w \\W]+)(.com))";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(query);
while(m.find())
{
System.out.println(m.group(2));
}
}
}
On running the above code, only http://instagram.com gets printed to the console output
I found another RegEx here
https?:\/\/(www\.)?[-a-zA-Z0-9#:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9#:%_\+.~#?&//=]*)
It looks for https, but seems to be valid in your case.
I'm getting all 3 URLs printed with this code :
public class Main {
static String query = "This is a URL http://facebook.com"
+ " and this is another, http://twitter.com "
+ "this is the last URL http://instagram.com"
+ " all these URLs should be printed after the code execution";
public static void main(String[] args) {
String pattern = "https?:\\/\\/(www\\.)?[-a-zA-Z0-9#:%._\\+~#=]{2,256}\\.[a-z]{2,6}\\b([-a-zA-Z0-9#:%_\\+.~#?&//=]*)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(query);
while (m.find()) {
System.out.println(m.group());
}
}
}
I hope this will clear it for you but you are matching too many characters, your match should be as restrictive as possible because regex is greedy and is going to try to match as much as possible.
here is my take on your code:
public class Main {
static String query = "This is a URL http://facebook.com"
+ " and this is another, http://twitter.com "
+ "this is the last URL http://instagram.com"
+ " all these URLs should be printed after the code execution";
public static void main(String args[]) {
String pattern = "(http:[/][/][Ww.]*[a-zA-Z]+.com)";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(query);
while(m.find())
{
System.out.println(m.group(1));
}
}
}
the above cote will match only your examples if you wish to match more you need to tweak it to your needs.
And a great way to live test patterns is http://www.regexpal.com/ you can tweet your pattern there to match exactly what you want just remember to replace the \ with double \\ in java for escaped caracters .
I'm not sure how reliable this pattern is, but it prints out all the URLs when I run your example.
(http://[A-Za-z0-9]+\\.[a-zA-Z]{2,3})
You will have to modify it if you encounter an url that looks like this:
http://www.instagram.com
As it will only capture URLs without the 'www'.
Perhaps you're looking for this regex:
http://(\w+(?:\.\w+)+)
For example, from this string:
http://ww1.amazon.com and http://npr.org
it extracts
"ww1.amazon.com"
"npr.org"
To break down how it works:
http:// is literal
( ... ) is the main capture group
\w+ find one or more alphanumeric characters
(?: ... ) ...followed by a non-capturing group
\.\w+ ...that contains a literal period followed by at least one alphanumeric
+ repeated one or more times
Hope this helps.
Your problem is that your regex quantifiers (i.e. the * and + characters) are greedy, meaning that they match as much as possible. You need to use reluctant quantifiers. See the corrected code pattern below - just two extra characters - a ? character after the * and + to match as little as possible.
String pattern = "([\\w \\W]*?)((http://)([\\w \\W]+?)(.com))";

Extract two substrings from String surronded by special string (Java)

I have a String like this..
I am a !!!guy!!! but I like !!!cats!!! better than dogs.
I need the strings within the exclamation Strings (!!!), a collection of Strings or array will do.
I can probably do this a dirty way with String's substring and indexOf, but if you can suggest a better way with regular expressions or just cleaner code that would be much appreciated.
Thanks.
You can use a simple regex like this:
!!!(.*?)!!!
And then grab the capturing group content
Working demo
Match information
MATCH 1
1. [10-13] `guy`
MATCH 2
1. [31-35] `cats`
You can use something like this java code:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "I am a !!!guy!!! but I like !!!cats!!! better than dogs.";
String pattern = "!!!(.*?)!!!";
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
// Now create matcher object.
Matcher m = r.matcher(line);
while (m.find( )) {
//--> If you want a array do the logic you want with m.group(1)
System.out.println("Found value: " + m.group(1) );
}
}
}

Using Java's Regex to extract a word from a path name

I have a directory like this and I am trying to extract the word "photon" from just before "photon.exe".
C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)
My code looks like this:
String path = "C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)";
Pattern pattern = Pattern.compile(".+\\\\(.+).exe");
Matcher matcher = pattern.matcher(path);
System.out.println(matcher.group(1));
No matter what permutations I try I keep getting IllegalStateExceptions etc, despite this regular expression working on http://www.regexplanet.com/simple/index.html.
Thanks in advance for any help. I am super frustrated at this point >.<
You need to actually run the matcher:
if ( matcher.find() ) {
System.out.println(matcher.group(1));
}
Note that I use matcher.find() above instead of matcher.matches() because your regex is not set up to match the entire string (it won't match the (Suspended... part). Since that's the case, you don't really need the preamble to the slash; \\\\(.+).exe should work fine.
Of course, this is mentioned in the documentation for group(int):
Throws:
IllegalStateException - If no match has yet been attempted, or if the previous match operation failed
you can use the following regular expression: ^.*\\(.*)\.exe.*$ and the file name will be in the first match group. Here is an example.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Main
{
public static void main(final String[] args)
{
final String input = args[0];
final Pattern pattern = Pattern.compile("^.*\\\\(.*)\\.exe.*$");
final Matcher matcher = pattern.matcher(input);
if (matcher.find())
{
System.out.println("matcher.group(1) = " + matcher.group(1));
}
else
{
System.out.format("%s does not match %s\n", input, pattern.pattern());
}
}
}
run it with C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running) as the input and here is the expected output:
matcher.group(1) = photon
(new java.io.File("C:\workspace\photon\output\i686\diagnostic\photon.exe(Suspended) Thread(Running)")).getName().split("\\.")[0];
Try this regex: [\\d\\w]+\\.exe
It assumes the executable only has digits and letters.
Another option is to use .+\\.exe to get the full file name and use substring and lastIndexOf('\') to get the file name.
You can also use new File(fullFilePath).getFileName() which is a more correct way to do it as it will save you the substring - but I don't know if it has better performance.

Categories

Resources