What would be the correct regular expression (that I can use in Java) if I want to extract a value from the string below?
<Name_id = bob>
I know that \<(.*?)\> will extract everything between the angle brackets but I only need to extract "bob".
The only part of the string that will change will be "bob". I also want to make sure that if someone enters =bob as the Name_id, the string that pulled out will be just that and doesn't mess up the regular expression.
Use capturing groups to capture the characters you want.
"<Name_id\\s+=\\s+([^>]+)>"
OR
"<Name_id\\s+=\\s+([\w]+)>"
And then print group index 1 at the last. \s+ matches one or more space characters and \w+ matches one or more word characters.
String i = "<Name_id = bob>";
Matcher m = Pattern.compile("<Name_id\\s+=\\s+([^>]+)>").matcher(i);
while(m.find())
{
System.out.println(m.group(1));
}
Output:
bob
Related
I have these Strings:
"Turtle123456_fly.me"
"birdy_12345678_prd.tr"
I want the first words of each, ie:
Turtle
birdy
I tried this:
Pattern p = Pattern.compile("//d");
String[] items = p.split(String);
but of course it's wrong. I am not familiar with using Pattern.
Replace the stuff you don't want with nothing:
String firstWord = str.replaceAll("[^a-zA-Z].*", "");
to leave only the part you want.
The regex [^a-zA-Z] means "not a letter", the everything from (and including) the first non-letter to the end is "removed".
See live demo.
String s1 ="Turtle123456_fly.me";
String s2 ="birdy_12345678_prd.tr";
Pattern p = Pattern.compile("^([A-Za-z]+)[^A-Za-z]");
Matcher matcher = p.matcher(s1);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Explanation:
The first part ^([A-Za-z]+) is a group that captures all the letters anchored to the beginning of the input (using the ^ anchor).
The second part [^A-Za-z] captures the first non-letter, and serves as a terminator for the letters sequence.
Then all we have left to do is to fetch the group with index 1 (group 1 is what we have in the first parenthesis).
maybe you should try this \d+\w+.*
My Java program, in certain point, receives a string containing a couple of key-value properties like this example:
param1=value Param2=values can have spaces PARAM3=values cant have equal characters
The parameters' name/key are composed by a single word (a-z, A-Z, _ and 0-9) and are followed by an = character (not separated by spaces) and it's value. The value is a text that can contain spaces and last until the end of the string or the begin of another parameter. (which is a word followed by equals and it's value, etc.)
I need to extract a Properties object (string-to-string map) from this string. I was trying to use regex to find each key-value set. The code is like this:
public static String createProperties(String str) {
Properties prop = new Properties();
Matcher matcher = Pattern.compile(some regex).match(str);
while (matcher.find()) {
String match = matcher.group();
String param = ...; // What comes before '='
String value = ...; // What comes after '='
prop.setProperty(param, value);
}
return prop;
}
But the regex wrote is not working correctly.
String regex = "(\\w+=.*)+";
Since .* tells the regex to get "anything" it found, it will match the entire string. I want to tell the regex to search until it finds another \\w=.*. (word followed by equals and something after)
How could I write this regex? Or what would be another solution for the problem using regex?
You can use a Negative Lookahead here.
(\\w+)=((?:(?!\\s*\\w+=).)*)
The key is placed inside capturing group #1 and the value is in capturing group #2. Note that I used \s inside the lookaround in order to prevent the value from having trailing whitespace.
Live Demo
One way among several:
List<String> paramNames = new ArrayList<String>();
List<String> paramValues = new ArrayList<String>();
Pattern regex = Pattern.compile("([^\\s=]+)=([^\\s=]+)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
paramNames.add(regexMatcher.group(1));
paramValues.add(regexMatcher.group(2));
}
The regex:
([^\\s=]+)=([^\\s=]+)
The code retrieves keys as Group 1, values as Group 2.
Explanation
([^\\s=]+) captures any chars that are not a whitespace or an equal to Group 1
= matches the literal =
([^\\s=]+) captures any chars that are not a whitespace or an equal to Group 2
Your regex would be,
(\\w+=(?:(?!\\w+=).)*)
DEMO
It captures the param=value pair upto the next param=. It captures three param=value pair into three separate groups.
Explanation:
\\w+= Matches one or more word characters followed by an = symbol.
(?:(?!\\w+=).)* A non-capturing group and a negative lookahead is used to match any characters not of characters in this \w+= format. So it captures upto the next param=
I need some help to save my day (or my night). I would like to match:
Any number of digits
Enclosed by round brackets "()" [The brackets contain nothing else than digits]
If the closing bracket ")" is the last character in the String.
Here's the code I have come up with:
// this how the text looks, the part I want to match are the digits in the brackets at the end of it
String text = "Some text 45 Some text, text and text (1234)";
String regex = "[no idea how to express this.....]"; // this is where the regex should be
Pattern regPat = Pattern.compile(regex);
Matcher matcher = regPat.matcher(text);
String matchedText = "";
if (matcher.find()) {
matchedText = matcher.group();
}
Please help me out with the magic expression I have only managed to match any number of digits, but not if they are enclosed in brackets and are at the end of the line...
Thanks!
You can try this regex:
String regex = "\\(\\d+\\)$";
If you need to extract just the digits, you can use this regex:
String regex = "\\((\\d+)\\)$";
and get the value of matcher.group(1). (Explanation: The ( and ) characters preceded by backslashes match the round brackets literally; the ( and ) characters not preceded by
backslashes tell the matcher that the part inside, i.e. just the digits, form a capture group, and the part matching the group can be obtained by matcher.group(1), since this is the first, and only, capture group in the regex.)
This is the required regex for your condition
\\(\\d+\\)$
Given String
// 1 2 3
String a = "letters.1223434.more_letters";
I'd like to recognize that numbers come in a 2nd position after the first dot
I then would like to use this knowledge to replace "2nd position of"
// 1 2 3
String b = "someWords.otherwords.morewords";
with "hello" to effectively make
// 1 2 3
String b = "someWords.hello.morewords";
Substitution would have to be done based on the original position of matched element in String a
How can this be done using regex please?
For finding those numbers you can use group mechanism (round brackets in regular expresions):
import java.util.regex.*;
...
String data = "letters.1223434.more_letters";
String pattern="(.+?)\\.(.+?)\\.(.+)";
Matcher m = Pattern.compile(pattern).matcher(data);
if (m.find()) //or while if needed
for (int i = 1; i <= m.groupCount(); i++)
//group 0 == whole String, so I ignore it and start from i=1
System.out.println(i+") [" + m.group(i) + "] start="+m.start(i));
// OUT:
//1) [letters] start=0
//2) [1223434] start=8
//3) [more_letters] start=16
BUT if your goal is just replacing text between two dots try maybe replaceFirst(String regex, String replacement) method on String object:
//find ALL characters between 2 dots once and replace them
String a = "letters.1223434abc.more_letters";
a=a.replaceFirst("\\.(.+)\\.", ".hello.");
System.out.println(a);// OUT => letters.hello.more_letters
regex tells to search all characters between two dots (including these dots), so replacement should be ".hello." (with dots).
If your String will have more dots it will replace ALL characters between first and last dot. If you want regex to search for minimum number of characters necessary to satisfy the pattern you need to use Reluctant Quantifier ->? like:
String b = "letters.1223434abc.more_letters.another.dots";
b=b.replaceFirst("\\.(.+?)\\.", ".hello.");//there is "+?" instead of "+"
System.out.println(b);// OUT => letters.hello.more_letters.another.dots
What you want to do is not directly possible in RegExp, because you cannot get access to the number of the capture group and use this in the replacement operation.
Two alternatives:
If you can use any programming language: Split a using regexp into groups. Check each group if it matches your numeric identifier condition. Split the b string into groups. Replace the corresponding match.
If you only want to use a number of regexp, then you can concatenate a and b using a unique separator (let's say |). Then match .*?\.\d+?\..*?|.*?\.(.*?)\..*? and replace $1. You need to apply this regexp in the three variations first position, second position, third position.
the regex for string a would be
\w+\.(\d+)\.\w+
using the match group to grab the number.
the regex for the second would be
\w+\.(\w+)\.\w+
to grab the match group for the second string.
Then use code like this to do what you please with the matches.
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.find();
where patternStr is the pattern I mentioned above and inputStr is the input string.
You can use variations of this to try each combination you want. So you can move the match group to the first position, try that. If it returns a match, then do the replacement in the second string at the first position. If not, go to position 2 and so on...
Greetings All;
I am a beginner in using regex. What I want to do is to extract 2 or 3 arabic words after a certain pattern.
for example:
If I have an arabic string
inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "
I need to extract the names after
الدكتور
and
والدكتورة
so the output shall be:
احمد زويل
سميرة موسى
what i have done so far is the following:
inputtext = "تكريم الدكتور احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية "
Pattern pattern = Pattern.compile("(?<=الدكتور).*");
Matcher matcher = pattern.matcher(inputtext);
boolean found = false;
while (matcher.find()) {
// Get the matching string
String match = matcher.group();
System.out.println("the match is: "+match);
found = true;
}
if (!found)
{
System.out.println("I didn't found the text");
}
but it returns:
احمد زويل والدكتورة سميرة موسي عن ابحاثهم العلمية
I don't know how to add another pattern and how to stop after 2 words?
Would you please help me with any ideas?
To match only the following two words try this one:
(?<=الدكتور)\s[^\s]+\s[^\s]+
.* will match everything till the end of the string so that is not what you want
\s is a whitespace character
[^\s] is a negated character group, that will match anything but a whitespace
So my solution will match a whitespace, then at least one non whitespace (the first word), then again a whitespace and once more at least one non whitespace (the second word).
To match your second pattern I would just do a second regex (just exchange the part inside the lookbehind) and match this pattern in a second step. The regular expression is easier to read that way.
Or you can try this
(?<=الدكتور)\s[^\s]+\s[^\s]+|(?<=والدكتورة)\s[^\s]+\s[^\s]+