I'm new to using patterns and looked everywhere on the internet for an explanation to this problem.
Say I have a string: String info = "Data I need to extract is 'here' and 'also here'";
How would I extract the words:
here
also here
without the single quotes using a pattern?
This is what I have so far...
Pattern p = Pattern.compile("(?<=\').*(?=\')");
But it returns ( here and 'also here ) minus the brackets, that is just for viewing. It skips over the second piece of data and goes straight to the last quote...
Thank you!
EDIT:
Thank you for your replies everyone! How would it be possible to alter the pattern so that here is stored in matcher.group(1) and also here is stored in matcher.group(2)? I need these values for different reasons, and splitting them from 1 group seems inefficient...
Try making your regex non-greedy:
Pattern p = Pattern.compile("(?<=')(.*?)(?=')");
EDIT:
This does not work. It gives the following matches:
here
and
also here
This is because the lookahead/lookbehind do not consume the '.
To fix this use the regex:
Pattern p = Pattern.compile("'(.*?)'");
or even better (& faster):
Pattern p = Pattern.compile("'([^']*)'");
I think you're making it to complicated, try
Pattern.compile("'([^']+)'");
or
Pattern.compile("'(.*?)'");
They will both work. Then you can extract the result from the first group matcher.group(1) after performing a matcher.find().
This should work for you:
Pattern p = Pattern.compile("'([\\w\\s]+)'");
String info = "Data I need to extract is 'here' and 'also here'";
Matcher m = p.matcher(info);
while (m.find()) {
System.out.println(m.group(1));
}
Here's the printout:-
here
also here
If you want the data in 2 separate groups, you could do something like this:-
Pattern p = Pattern.compile("^[\\w\\s]*?'([\\w\\s]+)'[\\w\\s]*?'([\\w\\s]+)'$");
String info = "Data I need to extract is 'here' and 'also here'";
Matcher m = p.matcher(info);
while (m.find()) {
System.out.println("Group 1: " + m.group(1));
System.out.println("Group 2: " + m.group(2));
}
Here's the printout:
Group 1: here
Group 2: also here
Why not using simply the following?
'.*?'
Related
I have a bunch of strings that I'm looking to parse in the following format and extract just the email and string which is followed by a delimiter
email[delimiter]string
In other words
[email with any ascii characters][delimiter][string with any ascii characters]
The delimiters can be ,;:| or ||
e.g.
abc#xyz.com,blah
abc#xyz.au;blah1
abc#xyz.ru:blah2
abc#xyz.ru|blah,2
abc#xyz.ru||blah2
My progress so far is following regex to match the above strings, however how can I modify this regex so that I can form appropriate groups to extract only the email and the string which is followed by the delimiter in Java/Scala
.+#.+([:;,|])+.+$
The java code would look something like this:
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Email: " + m.group(0));
System.out.println("Value: " + m.group(1));
} else {
System.out.println("NO MATCH");
}
You seem to have worked out the regex part for yourself. I have a suggestion for result extraction: use kantan.regex.
This allows you to write:
import kantan.regex.implicits._
// Declare your regular expression, validated at compile time.
val regex = rx"(.+#[A-Za-z0-9.]+)(?:[:;,|]+)(.*)"
// Sample input
val input = "abc#xyz.com,blah"
// Returns an Iterator[(String, String)] on all matches, where
// ._1 is the email and ._2 the string
input.evalRegex[(String, String)](regex)
Note that you might want to use better typed values for this - a case class rather than a (String, String), say. This is also possible - you can either provide decoders yourself, or let shapeless derive them:
import kantan.regex.generic._
// Case class in which to store results.
case class MailMatch(mail: String, value: String)
// Returns an Iterator[MailMatch]
input.evalRegex[MailMatch](regex)
Full disclosure: I'm the author.
So, answering my own question with what I got working. Regex experts - any holes you can find here, please?
Pattern COMPILE = Pattern.compile("(.+#[A-Za-z0-9.\"]+)(?:[:;,|]+)(.*)");
Matcher m = COMPILE.matcher(next);
if (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
} else {
System.out.println("NO MATCH");
}
EDIT : Edited to use non capturing group as per MYGz's answer
(\\w+#\\w+)[:;,\\|](.+)$
Then use Java to extract the groups from the Match. Group 1 is the email and group 2 is the string after the delimiter.
I am trying to use Regex to extract the values from a string and use them for the further processing.
The string I have is :
String tring =Format_FRMT: <<<$gen>>>(((valu e))) <<<$gen>>>(((value 13231)))
<<<$gen>>>(((value 13231)))
Regex pattern I have made is :
Pattern p = Pattern.compile("\\<{3}\\$([\\w ]+)\\>{3}\\s?\\({3}([\\w ]+)\\){3}");
When I am running the whole program
Matcher m = p.matcher(tring);
String[] try1 = new String[m.groupCount()];
for(int i = 1 ; i<= m.groupCount();i++)
{
try1[i] = m.group(i);
//System.out.println("group - i" +try1[i]+"\n");
}
I am getting
No match found
Can anybody help me with this? where exactly this is going wrong?
My first aim is just to see whether I am able to get the values in the corresponding groups or not. and If that is working fine then I would like to use them for further processing.
Thanks
Here is an exaple of how to get all the values you need with find():
String tring = "CHARDATA_FRMT: <<<$gen>>>(((valu e))) <<<$gen>>>(((value 13231)))\n<<<$gen>>>(((value 13231)))";
Pattern p = Pattern.compile("<{3}\\$([\\w ]+)>{3}\\s?\\({3}([\\w ]+)\\){3}");
Matcher m = p.matcher(tring);
while (m.find()){
System.out.println("Gen: " + m.group(1) + ", and value: " + m.group(2));
}
See IDEONE demo
Note that you do not have to escape < and > in Java regex.
After you create the Matcher and before you reference its groups, you must call one of the methods that attempts the actual match, like find, matches, or lookingAt. For example:
Matcher m = p.matcher(tring);
if (!m.find()) return; // <---- Add something like this
String[] try1 = new String[m.groupCount()];
You should read the javadocs on the Matcher class to decide which of the above methods makes sense for your data and application. http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html
I want to find every instance of a number, followed by a comma (no space), followed by any number of characters in a string. I was able to get a regex to find all the instances of what I was looking for, but I want to print them individually rather than all together. I'm new to regex in general, so maybe my pattern is wrong?
This is my code:
String test = "1 2,A 3,B 4,23";
Pattern p = Pattern.compile("\\d+,.+");
Matcher m = p.matcher(test);
while(m.find()) {
System.out.println("found: " + m.group());
}
This is what it prints:
found: 2,A 3,B 4,23
This is what I want it to print:
found: 2,A
found: 3,B
found: 4,23
Thanks in advance!
try this regex
Pattern p = Pattern.compile("\\d+,.+?(?= |$)");
You could take an easier route and split by space, then ignore anything without a comma:
String values = test.split(' ');
for (String value : values) {
if (value.contains(",") {
System.out.println("found: " + value);
}
}
What you apparently left out of your requirements statement is where "any number of characters" is supposed to end. As it stands, it ends at the end of the string; from your sample output, it seems you want it to end at the first space.
Try this pattern: "\\d+,[^\\s]*"
UPDATE: Thanks for all the great responses! I tried many different regex patterns but didn't understand why m.matches() was not doing what I think it should be doing. When I switched to m.find() instead, as well as adjusting the regex pattern, I was able to get somewhere.
I'd like to match a pattern in a Java string and then extract the portion matched using a regex (like Perl's $& operator).
This is my source string "s": DTSTART;TZID=America/Mexico_City:20121125T153000
I want to extract the portion "America/Mexico_City".
I thought I could use Pattern and Matcher and then extract using m.group() but it's not working as I expected. I've tried monkeying with different regex strings and the only thing that seems to hit on m.matches() is ".*TZID.*" which is pointless as it just returns the whole string. Could someone enlighten me?
Pattern p = Pattern.compile ("TZID*:"); // <- change to "TZID=([^:]*):"
Matcher m = p.matcher (s);
if (m.matches ()) // <- change to m.find()
Log.d (TAG, "looking at " + m.group ()); // <- change to m.group(1)
You use m.match() that tries to match the whole string, if you will use m.find(), it will search for the match inside, also I improved a bit your regexp to exclude TZID prefix using zero-width look behind:
Pattern p = Pattern.compile("(?<=TZID=)[^:]+"); //
Matcher m = p.matcher ("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group());
}
This should work nicely:
Pattern p = Pattern.compile("TZID=(.*?):");
Matcher m = p.matcher(s);
if (m.find()) {
String zone = m.group(1); // group count is 1-based
. . .
}
An alternative regex is "TZID=([^:]*)". I'm not sure which is faster.
You are using the wrong pattern, try this:
Pattern p = Pattern.compile(".*?TZID=([^:]+):.*");
Matcher m = p.matcher (s);
if (m.matches ())
Log.d (TAG, "looking at " + m.group(1));
.*? will match anything in the beginning up to TZID=, then TZID= will match and a group will begin and match everything up to :, the group will close here and then : will match and .* will match the rest of the String, now you can get what you need in group(1)
You are missing a dot before the asterisk. Your expression will match any number of uppercase Ds.
Pattern p = Pattern.compile ("TZID[^:]*:");
You should also add a capturing group unless you want to capture everything, including the "TZID" and the ":"
Pattern p = Pattern.compile ("TZID=([^:]*):");
Finally, you should use the right API to search the string, rather than attempting to match the string in its entirety.
Pattern p = Pattern.compile("TZID=([^:]*):");
Matcher m = p.matcher("DTSTART;TZID=America/Mexico_City:20121125T153000");
if (m.find()) {
System.out.println(m.group(1));
}
This prints
America/Mexico_City
Why not simply use split as:
String origStr = "DTSTART;TZID=America/Mexico_City:20121125T153000";
String str = origStr.split(":")[0].split("=")[1];
I'm a Java user but I'm new to regular expressions.
I just want to have a tiny expression that, given a word (we assume that the string is only one word), answers with a boolean, telling if the word is valid or not.
An example... I want to catch all words that is plausible to be in a dictionary... So, i just want words with chars from a-z A-Z, an hyphen (for example: man-in-the-middle) and an apostrophe (like I'll or Tiffany's).
Valid words:
"food"
"RocKet"
"man-in-the-middle"
"kahsdkjhsakdhakjsd"
"JESUS", etc.
Non-valid words:
"gipsy76"
"www.google.com"
"me#gmail.com"
"745474"
"+-x/", etc.
I use this code, but it won't gave the correct answer:
Pattern p = Pattern.compile("[A-Za-z&-&']");
Matcher m = p.matcher(s);
System.out.println(m.matches());
What's wrong with my regex?
Add a + after the expression to say "one or more of those characters":
Escape the hyphen with \ (or put it last).
Remove those & characters:
Here's the code:
Pattern p = Pattern.compile("[A-Za-z'-]+");
Matcher m = p.matcher(s);
System.out.println(m.matches());
Complete test:
String[] ok = {"food","RocKet","man-in-the-middle","kahsdkjhsakdhakjsd","JESUS"};
String[] notOk = {"gipsy76", "www.google.com", "me#gmail.com", "745474","+-x/" };
Pattern p = Pattern.compile("[A-Za-z'-]+");
for (String shouldMatch : ok)
if (!p.matcher(shouldMatch).matches())
System.out.println("Error on: " + shouldMatch);
for (String shouldNotMatch : notOk)
if (p.matcher(shouldNotMatch).matches())
System.out.println("Error on: " + shouldNotMatch);
(Produces no output.)
This should work:
"[A-Za-z'-]+"
But "-word" and "word-" are not valid. So you can uses this pattern:
WORD_EXP = "^[A-Za-z]+(-[A-Za-z]+)*$"
Regex - /^([a-zA-Z]*('|-)?[a-zA-Z]+)*/
You can use above regex if you don't want successive "'" or "-".
It will give you accurate matching your text.
It accepts
man-in-the-middle
asd'asdasd'asd
It rejects following string
man--in--midle
asdasd''asd
Hi Aloob please check with this, Bit lengthy, might be having shorter version of this, Still...
[A-z]*||[[A-z]*[-]*]*||[[A-z]*[-]*[']*]*