What would be the best way to parse the following string in Java using a single regex?
String:
someprefix foo=someval baz=anotherval baz=somethingelse
I need to extract someprefix, someval, anotherval and somethingelse. The string always contains a prefix value (someprefix in the example) and can have from 0 to 4 key-value pairs (foo=someval baz=anotherval baz=somethingelse in the example)
You can use this regex for capturing your intended text,
(?<==|^)\w+
Which captures a word that is preceded by either an = character or is at ^ start of string.
Sample java code for same,
Pattern p = Pattern.compile("(?<==|^)\\w+");
String s = "someprefix foo=someval baz=anotherval baz=somethingelse";
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Prints,
someprefix
someval
anotherval
somethingelse
Live Demo
Related
the String is:"LinksImagesListCodeHt1233ddmlImagesConsider112dd2Download",I want to get "ImagesConsider112dd2Download". so I used this expression "Images.*?Download".but it matches "ImagesListCodeHt1233ddmlImagesConsider112dd2Download".what's the correct expression should be?
Temporarily,there is a ugly way to solve this problem:
Pattern p = Pattern.compile(StringUtils.reverse("Download")+ ".*?" + StringUtils.reverse("Images") );
String s = "LinksImagesListCodeHt1233ddmlImagesConsider112dd2Download";
s = StringUtils.reverse(s);
Matcher m = p.matcher(s);
while (m.find()){
m.end();
System.out.println(StringUtils.reverse(m.group()));
}
To match the text between Images to Download which does not contain the word Images inside you can use negative lookaround like this
Images((?!Images).)*Download
Explanation
Images -- Match literal string Images
(?!Images). -- Match a character that does not follow Images word
((?!Images).)* -- Match zero or more times
I have the below java string in the below format.
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:"
Using the java.util.regex package matter and pattern classes I have to get the output string int the following format:
Output: [NYK:1100][CLT:2300][KTY:3540]
Can you suggest a RegEx pattern which can help me get the above output format?
You can use this regex \[name:([A-Z]+)\]\[distance:(\d+)\] with Pattern like this :
String regex = "\\[name:([A-Z]+)\\]\\[distance:(\\d+)\\]";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
StringBuilder result = new StringBuilder();
while (matcher.find()) {
result.append("[");
result.append(matcher.group(1));
result.append(":");
result.append(matcher.group(2));
result.append("]");
}
System.out.println(result.toString());
Output
[NYK:1100][CLT:2300][KTY:3540]
regex demo
\[name:([A-Z]+)\]\[distance:(\d+)\] mean get two groups one the upper letters after the \[name:([A-Z]+)\] the second get the number after \[distance:(\d+)\]
Another solution from #tradeJmark you can use this regex :
String regex = "\\[name:(?<name>[A-Z]+)\\]\\[distance:(?<distance>\\d+)\\]";
So you can easily get the results of each group by the name of group instead of the index like this :
while (matcher.find()) {
result.append("[");
result.append(matcher.group("name"));
//----------------------------^^
result.append(":");
result.append(matcher.group("distance"));
//------------------------------^^
result.append("]");
}
If the format of the string is fixed, and you always have just 3 [...] groups inside to deal with, you may define a block that matches [name:...] and captures the 2 parts into separate groups and use a quite simple code with .replaceAll:
String s = "City: [name:NYK][distance:1100] [name:CLT][distance:2300] [name:KTY][distance:3540] Price:";
String matchingBlock = "\\s*\\[name:([A-Z]+)]\\[distance:(\\d+)]";
String res = s.replaceAll(String.format(".*%1$s%1$s%1$s.*", matchingBlock),
"[$1:$2][$3:$4][$5:$6]");
System.out.println(res); // [NYK:1100][CLT:2300][KTY:3540]
See the Java demo and a regex demo.
The block pattern matches:
\\s* - 0+ whitespaces
\\[name: - a literal [name: substring
([A-Z]+) - Group n capturing 1 or more uppercase ASCII chars (\\w+ can also be used)
]\\[distance: - a literal ][distance: substring
(\\d+) - Group m capturing 1 or more digits
] - a ] symbol.
In the .*%1$s%1$s%1$s.* pattern, the groups will have 1 to 6 IDs (referred to with $1 - $6 backreferences from the replacement pattern) and the leading and final .* will remove start and end of the string (add (?s) at the start of the pattern if the string can contain line breaks).
I have a string say :
test=t1,test2=1,test3=t4
I want to find group or value where test2 value is not equal to 1,
I know I can find its value easily by using regex like .+,test2=(.+?),.+. but it also give me where test2=1, but I want test2 value only if it is not equal to one?
You can use negative lookahead assertion:
"test2=(?!1\\b)([^,]*)"
Above pattern will matchtest2 will match only if it is not followed by 1 (word boundary \b is used to not match numbers like 17, but only match 1)
This will work for you :
String s = "test=t1,test2=2,test3=t4";
Pattern p = Pattern.compile("test2=(?!1,)(\\d+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group(1));
}
I/O :
"test=t1,test2=2,test3=t4" 2
"test=t1,test2=11,test3=t4" 11
"test=t1,test2=1,test3=t4" no result
What would be the correct regular expression (that I can use in Java) if I want to extract a value from the string below?
<Name_id = bob>
I know that \<(.*?)\> will extract everything between the angle brackets but I only need to extract "bob".
The only part of the string that will change will be "bob". I also want to make sure that if someone enters =bob as the Name_id, the string that pulled out will be just that and doesn't mess up the regular expression.
Use capturing groups to capture the characters you want.
"<Name_id\\s+=\\s+([^>]+)>"
OR
"<Name_id\\s+=\\s+([\w]+)>"
And then print group index 1 at the last. \s+ matches one or more space characters and \w+ matches one or more word characters.
String i = "<Name_id = bob>";
Matcher m = Pattern.compile("<Name_id\\s+=\\s+([^>]+)>").matcher(i);
while(m.find())
{
System.out.println(m.group(1));
}
Output:
bob
I am trying to read a line and parse a value using regular expression in java. The line that contains the value looks something like this,
...... TESTYY912345 .......
...... TESTXX967890 ........
Basically, it contains 4 letters, then any two ASCII values followed by numeric 9 then (any) digits. And, i want to get the value, 912345 and 967890.
This is what I have so far in regular expression,
... TEST[\x00-\xff]{2}[9]{1} ...
But, this skips the 9 and parse 12345 and 67890. (I want to include 9 as well).
Thanks for your help.
You are pretty close. Capture the entire group (9\\d*) after matching TEST\\p{ASCII}{2}. This way, you'll capture the 9 and the following digits:
String s = "...... TESTYY912345 ......";
Pattern p = Pattern.compile("TEST\\p{ASCII}{2}(9\\d+)");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1)); // 912345
}
See my comment for a working expression, "TEST.{2}(9\\d*)".
final Pattern pattern = Pattern.compile("TEST.{2}(9\\d*)");
for (final String str : Arrays.asList("...... TESTYY912345 .......",
"...... TESTXX967890 ........")) {
final Matcher matcher = pattern.matcher(str);
if (matcher.find()) {
final int value = Integer.valueOf(matcher.group(1));
System.out.println(value);
}
}
See the result on ideone:
912345
967890
This will match any two characters (except a line terminator) for what is XX and YY in your example, and will take any digits after the 9.