Java regex - get specific part of string

Java regex - get specific part of string - java

I'm trying to access a certain part of multiple strings that follow a pattern.
Here's an example of what I'm trying to do.
String s = "Hello my name is Joe";
if(Pattern.matches(s,"Hello my name is ([\\w]*)"))
{
System.out.println("Name entered: $1");
}
However, my code never enters inside the "if-statement"

Swap the parameters to the matches method, and your if will work (regex is the 1st parameter, not the second).
However, you still won't print the first capturing group with $1. To do so:
String s = "Hello my name is Joe";
Matcher m = Pattern.compile("Hello my name is ([\\w]*)").matcher(s);
if(m.matches())
{
System.out.println("Name entered: " + m.group(1));
}

I think that you are looking for this:
final String s = "Hello my name is Joe";
final Pattern p = Pattern.compile("Hello my name is (\\w++)");
final Matcher m = p.matcher(s);
if (m.matches()) {
System.out.printf("Name entered: %s\n", m.group(1));
}
This will capture the \w++ group value, only if p matches the entire content of the String. I've replaced \w* with \w++ to exclude zero length matches and eliminate backtracks.
For further reference take a look at The Java Tutorial > Essential Classes - Lesson: Regular Expressions.

Try using the Matcher class, see this page for more information about Regular Expressions in Java http://java.sun.com/developer/technicalArticles/releases/1.4regex/

Because you have the parameters to Pattern.matches() backwards.
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html
Also, you don't need a set (the brackets). In addition, You are going to need to use the Matcher class and get the brackreference via the group() method.

Related

How to parse string using regex

I'm pretty new to java, trying to find a way to do this better. Potentially using a regex.
String text = test.get(i).toString()
// text looks like this in string form:
// EnumOption[enumId=test,id=machine]
String checker = text.replace("[","").replace("]","").split(",")[1].split("=")[1];
// checker becomes machine
My goal is to parse that text string and just return back machine. Which is what I did in the code above.
But that looks ugly. I was wondering what kinda regex can be used here to make this a little better? Or maybe another suggestion?

Use a regex' lookbehind:
(?<=\bid=)[^],]*
See Regex101.
(?<= ) // Start matching only after what matches inside
\bid= // Match "\bid=" (= word boundary then "id="),
[^],]* // Match and keep the longest sequence without any ']' or ','
In Java, use it like this:
import java.util.regex.*;
class Main {
public static void main(String[] args) {
Pattern pattern = Pattern.compile("(?<=\\bid=)[^],]*");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(0));
}
}
}
This results in
machine

Assuming you’re using the Polarion ALM API, you should use the EnumOption’s getId method instead of deparsing and re-parsing the value via a string:
String id = test.get(i).getId();

Using the replace and split functions don't take the structure of the data into account.
If you want to use a regex, you can just use a capturing group without any lookarounds, where enum can be any value except a ] and comma, and id can be any value except ].
The value of id will be in capture group 1.
\bEnumOption\[enumId=[^=,\]]+,id=([^\]]+)\]
Explanation
\bEnumOption Match EnumOption preceded by a word boundary
\[enumId= Match [enumId=
[^=,\]]+, Match 1+ times any char except = , and ]
id= Match literally
( Capture group 1
[^\]]+ Match 1+ times any char except ]
)\]
Regex demo | Java demo
Pattern pattern = Pattern.compile("\\bEnumOption\\[enumId=[^=,\\]]+,id=([^\\]]+)\\]");
Matcher matcher = pattern.matcher("EnumOption[enumId=test,id=machine]");
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Output
machine
If there can be more comma separated values, you could also only match id making use of negated character classes [^][]* before and after matching id to stay inside the square bracket boundaries.
\bEnumOption\[[^][]*\bid=([^,\]]+)[^][]*\]
In Java
String regex = "\\bEnumOption\\[[^][]*\\bid=([^,\\]]+)[^][]*\\]";
Regex demo

A regex can of course be used, but sometimes is less performant, less readable and more bug-prone.
I would advise you not use any regex that you did not come up with yourself, or at least understand completely.
PS: I think your solution is actually quite readable.
Here's another non-regex version:
String text = "EnumOption[enumId=test,id=machine]";
text = text.substring(text.lastIndexOf('=') + 1);
text = text.substring(0, text.length() - 1);
Not doing you a favor, but the downvote hurt, so here you go:
String input = "EnumOption[enumId=test,id=machine]";
Matcher matcher = Pattern.compile("EnumOption\\[enumId=(.+),id=(.+)\\]").matcher(input);
if(!matcher.matches()) {
throw new RuntimeException("unexpected input: " + input);
}
System.out.println("enumId: " + matcher.group(1));
System.out.println("id: " + matcher.group(2));

Java/Scala Extract email and a string with format email[delimiter]string

I have a bunch of strings that I'm looking to parse in the following format and extract just the email and string which is followed by a delimiter
email[delimiter]string
In other words
[email with any ascii characters][delimiter][string with any ascii characters]
The delimiters can be ,;:| or ||
e.g.
abc#xyz.com,blah
abc#xyz.au;blah1
abc#xyz.ru:blah2
abc#xyz.ru|blah,2
abc#xyz.ru||blah2
My progress so far is following regex to match the above strings, however how can I modify this regex so that I can form appropriate groups to extract only the email and the string which is followed by the delimiter in Java/Scala
.+#.+([:;,|])+.+$
The java code would look something like this:
// Create a Pattern object
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(line);
if (m.find()) {
System.out.println("Email: " + m.group(0));
System.out.println("Value: " + m.group(1));
} else {
System.out.println("NO MATCH");
}

You seem to have worked out the regex part for yourself. I have a suggestion for result extraction: use kantan.regex.
This allows you to write:
import kantan.regex.implicits._
// Declare your regular expression, validated at compile time.
val regex = rx"(.+#[A-Za-z0-9.]+)(?:[:;,|]+)(.*)"
// Sample input
val input = "abc#xyz.com,blah"
// Returns an Iterator[(String, String)] on all matches, where
// ._1 is the email and ._2 the string
input.evalRegex[(String, String)](regex)
Note that you might want to use better typed values for this - a case class rather than a (String, String), say. This is also possible - you can either provide decoders yourself, or let shapeless derive them:
import kantan.regex.generic._
// Case class in which to store results.
case class MailMatch(mail: String, value: String)
// Returns an Iterator[MailMatch]
input.evalRegex[MailMatch](regex)
Full disclosure: I'm the author.

So, answering my own question with what I got working. Regex experts - any holes you can find here, please?
Pattern COMPILE = Pattern.compile("(.+#[A-Za-z0-9.\"]+)(?:[:;,|]+)(.*)");
Matcher m = COMPILE.matcher(next);
if (m.find()) {
System.out.println(m.group(1));
System.out.println(m.group(2));
} else {
System.out.println("NO MATCH");
}
EDIT : Edited to use non capturing group as per MYGz's answer

(\\w+#\\w+)[:;,\\|](.+)$
Then use Java to extract the groups from the Match. Group 1 is the email and group 2 is the string after the delimiter.

Regex matcher - No match found

I am trying to use Regex to extract the values from a string and use them for the further processing.
The string I have is :
String tring =Format_FRMT: <<<$gen>>>(((valu e))) <<<$gen>>>(((value 13231)))
<<<$gen>>>(((value 13231)))
Regex pattern I have made is :
Pattern p = Pattern.compile("\\<{3}\\$([\\w ]+)\\>{3}\\s?\\({3}([\\w ]+)\\){3}");
When I am running the whole program
Matcher m = p.matcher(tring);
String[] try1 = new String[m.groupCount()];
for(int i = 1 ; i<= m.groupCount();i++)
{
try1[i] = m.group(i);
//System.out.println("group - i" +try1[i]+"\n");
}
I am getting
No match found
Can anybody help me with this? where exactly this is going wrong?
My first aim is just to see whether I am able to get the values in the corresponding groups or not. and If that is working fine then I would like to use them for further processing.
Thanks

Here is an exaple of how to get all the values you need with find():
String tring = "CHARDATA_FRMT: <<<$gen>>>(((valu e))) <<<$gen>>>(((value 13231)))\n<<<$gen>>>(((value 13231)))";
Pattern p = Pattern.compile("<{3}\\$([\\w ]+)>{3}\\s?\\({3}([\\w ]+)\\){3}");
Matcher m = p.matcher(tring);
while (m.find()){
System.out.println("Gen: " + m.group(1) + ", and value: " + m.group(2));
}
See IDEONE demo
Note that you do not have to escape < and > in Java regex.

After you create the Matcher and before you reference its groups, you must call one of the methods that attempts the actual match, like find, matches, or lookingAt. For example:
Matcher m = p.matcher(tring);
if (!m.find()) return; // <---- Add something like this
String[] try1 = new String[m.groupCount()];
You should read the javadocs on the Matcher class to decide which of the above methods makes sense for your data and application. http://docs.oracle.com/javase/7/docs/api/java/util/regex/Matcher.html

forming correct regular expression in dynamic string

I have a FileInputStream who reads a file which somewhere contains a string subset looking like:
...
OperatorSpecific(XXX)
{
Customer(someContent)
SaveImage()
{
...
I would like to identify the Customer(someContent) part of the string and switch the someContent inside the parenthesis for something else.
someContent will be a dynamic parameter and will contain a string of maybe 5-10 chars.
I have used regEx before, like once or twice, but I feel that in a context such as this where I don't know what value will be inside the parenthesis I'm at a loss of how I should express it...
In summary I want to have a string returned to me which has my someContent value inside the Customer-parenthesis.
Does anyone have any bright ideas of how to get this done?

Try this one (double the escaping backslashes for the use in java!)
(?<=Customer\()[^\)]*
And replace with your content.
See it here at Regexr
(?<=Customer\() is look behind assertion. It checks at every position if there is a "Customer(" on the left, if yes it matches on the right all characters that are not a ")" with the [^\)]*, this is then the part that will be replaced.
Some working java code
Pattern p = Pattern.compile("(?<=Customer\\()[^\\)]*");
String original = "Customer(someContent)";
String Replacement = "NewContent";
Matcher m = p.matcher(original);
String result = m.replaceAll(Replacement);
System.out.println(result);
This will print
Customer(NewContent)

Using groups works and non-greedy works:
String s =
"OperatorSpecific(XXX)\n {\n" +
" Customer(someContent)\n" +
" SaveImage() {";
Pattern p = Pattern.compile("Customer\\((.*?)\\)");
Matcher matcher = p.matcher(s);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
will print
someContent

Untested, but something like the following should work:
Pattern pattern = Pattern.compile("\\s+Customer\\(\\s*(\\w+)\\s*\\)\\s*");
Matcher matcher = pattern.matcher(input);
matcher.matches();
System.out.println(matcher.group(1));
EDIT
This of course won't work with all possible cases:
// legal variable names
Customer(_someContent)
Customer($some_Content)

Regular expression matching "dictionary words"

I'm a Java user but I'm new to regular expressions.
I just want to have a tiny expression that, given a word (we assume that the string is only one word), answers with a boolean, telling if the word is valid or not.
An example... I want to catch all words that is plausible to be in a dictionary... So, i just want words with chars from a-z A-Z, an hyphen (for example: man-in-the-middle) and an apostrophe (like I'll or Tiffany's).
Valid words:
"food"
"RocKet"
"man-in-the-middle"
"kahsdkjhsakdhakjsd"
"JESUS", etc.
Non-valid words:
"gipsy76"
"www.google.com"
"me#gmail.com"
"745474"
"+-x/", etc.
I use this code, but it won't gave the correct answer:
Pattern p = Pattern.compile("[A-Za-z&-&']");
Matcher m = p.matcher(s);
System.out.println(m.matches());
What's wrong with my regex?

Add a + after the expression to say "one or more of those characters":
Escape the hyphen with \ (or put it last).
Remove those & characters:
Here's the code:
Pattern p = Pattern.compile("[A-Za-z'-]+");
Matcher m = p.matcher(s);
System.out.println(m.matches());
Complete test:
String[] ok = {"food","RocKet","man-in-the-middle","kahsdkjhsakdhakjsd","JESUS"};
String[] notOk = {"gipsy76", "www.google.com", "me#gmail.com", "745474","+-x/" };
Pattern p = Pattern.compile("[A-Za-z'-]+");
for (String shouldMatch : ok)
if (!p.matcher(shouldMatch).matches())
System.out.println("Error on: " + shouldMatch);
for (String shouldNotMatch : notOk)
if (p.matcher(shouldNotMatch).matches())
System.out.println("Error on: " + shouldNotMatch);
(Produces no output.)

This should work:
"[A-Za-z'-]+"

But "-word" and "word-" are not valid. So you can uses this pattern:
WORD_EXP = "^[A-Za-z]+(-[A-Za-z]+)*$"

Regex - /^([a-zA-Z]*('|-)?[a-zA-Z]+)*/
You can use above regex if you don't want successive "'" or "-".
It will give you accurate matching your text.
It accepts
man-in-the-middle
asd'asdasd'asd
It rejects following string
man--in--midle
asdasd''asd

Hi Aloob please check with this, Bit lengthy, might be having shorter version of this, Still...
[A-z]*||[[A-z]*[-]*]*||[[A-z]*[-]*[']*]*

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regex - get specific part of string - java

Try using the Matcher class, see this page for more information about Regular Expressions in Java http://java.sun.com/developer/technicalArticles/releases/1.4regex/

Because you have the parameters to Pattern.matches() backwards. http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html Also, you don't need a set (the brackets). In addition, You are going to need to use the Matcher class and get the brackreference via the group() method.

Related

How to parse string using regex

Java/Scala Extract email and a string with format email[delimiter]string

Regex matcher - No match found

forming correct regular expression in dynamic string

Regular expression matching "dictionary words"

Categories

Resources