Extract letter from String characters and numbers - java

I have these Strings:
"Turtle123456_fly.me"
"birdy_12345678_prd.tr"
I want the first words of each, ie:
Turtle
birdy
I tried this:
Pattern p = Pattern.compile("//d");
String[] items = p.split(String);
but of course it's wrong. I am not familiar with using Pattern.

Replace the stuff you don't want with nothing:
String firstWord = str.replaceAll("[^a-zA-Z].*", "");
to leave only the part you want.
The regex [^a-zA-Z] means "not a letter", the everything from (and including) the first non-letter to the end is "removed".
See live demo.

String s1 ="Turtle123456_fly.me";
String s2 ="birdy_12345678_prd.tr";
Pattern p = Pattern.compile("^([A-Za-z]+)[^A-Za-z]");
Matcher matcher = p.matcher(s1);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Explanation:
The first part ^([A-Za-z]+) is a group that captures all the letters anchored to the beginning of the input (using the ^ anchor).
The second part [^A-Za-z] captures the first non-letter, and serves as a terminator for the letters sequence.
Then all we have left to do is to fetch the group with index 1 (group 1 is what we have in the first parenthesis).

maybe you should try this \d+\w+.*

Related

Split String at different lengths in Java

I want to split a string after a certain length.
Let's say we have a string of "message"
123456789
Split like this :
"12" "34" "567" "89"
I thought of splitting them into 2 first using
"(?<=\\G.{2})"
Regexp and then join the last two and again split into 3 but is there any way to do it on a single go using RegExp. Please help me out
Use ^(.{2})(.{2})(.{3})(.{2}).* (See it in action in regex101) to group the String to the specified length and grab the groups as separate Strings
String input = "123456789";
List<String> output = new ArrayList<>();
Pattern pattern = Pattern.compile("^(.{2})(.{2})(.{3})(.{2}).*");
Matcher matcher = pattern.matcher(input);
if (matcher.matches()) {
for (int i = 1; i <= matcher.groupCount(); i++) {
output.add(matcher.group(i));
}
}
System.out.println(output);
NOTE: Group capturing starts from 1 as the group 0 matches the whole String
And a Magnificent Sorcery from #YCF_L from comment
String pattern = "^(.{2})(.{2})(.{3})(.{2}).*";
String[] vals = "123456789".replaceAll(pattern, "$1-$2-$3-$4").split("-");
Whats the magic here is you can replace the captured group by replaceAll() method. Use $n (where n is a digit) to refer to captured subsequences. See this stackoverflow question for better explanation.
NOTE: here its assumed that no input string contains - in it.
if so, then find any other character that will not be in any of
your input strings so that it can be used as a delimiter.
test this regex in regex101 with 123456789 test string.
^(\d{2})(\d{2})(\d{3})(\d{2})$
output :
Match 1
Full match 0-9 `123456789`
Group 1. 0-2 `12`
Group 2. 2-4 `34`
Group 3. 4-7 `567`
Group 4. 7-9 `89`

Regex: extracting a value in a string <Name_id = bob>?

What would be the correct regular expression (that I can use in Java) if I want to extract a value from the string below?
<Name_id = bob>
I know that \<(.*?)\> will extract everything between the angle brackets but I only need to extract "bob".
The only part of the string that will change will be "bob". I also want to make sure that if someone enters =bob as the Name_id, the string that pulled out will be just that and doesn't mess up the regular expression.
Use capturing groups to capture the characters you want.
"<Name_id\\s+=\\s+([^>]+)>"
OR
"<Name_id\\s+=\\s+([\w]+)>"
And then print group index 1 at the last. \s+ matches one or more space characters and \w+ matches one or more word characters.
String i = "<Name_id = bob>";
Matcher m = Pattern.compile("<Name_id\\s+=\\s+([^>]+)>").matcher(i);
while(m.find())
{
System.out.println(m.group(1));
}
Output:
bob

What would be the regex for this pattern?

My Java program, in certain point, receives a string containing a couple of key-value properties like this example:
param1=value Param2=values can have spaces PARAM3=values cant have equal characters
The parameters' name/key are composed by a single word (a-z, A-Z, _ and 0-9) and are followed by an = character (not separated by spaces) and it's value. The value is a text that can contain spaces and last until the end of the string or the begin of another parameter. (which is a word followed by equals and it's value, etc.)
I need to extract a Properties object (string-to-string map) from this string. I was trying to use regex to find each key-value set. The code is like this:
public static String createProperties(String str) {
Properties prop = new Properties();
Matcher matcher = Pattern.compile(some regex).match(str);
while (matcher.find()) {
String match = matcher.group();
String param = ...; // What comes before '='
String value = ...; // What comes after '='
prop.setProperty(param, value);
}
return prop;
}
But the regex wrote is not working correctly.
String regex = "(\\w+=.*)+";
Since .* tells the regex to get "anything" it found, it will match the entire string. I want to tell the regex to search until it finds another \\w=.*. (word followed by equals and something after)
How could I write this regex? Or what would be another solution for the problem using regex?
You can use a Negative Lookahead here.
(\\w+)=((?:(?!\\s*\\w+=).)*)
The key is placed inside capturing group #1 and the value is in capturing group #2. Note that I used \s inside the lookaround in order to prevent the value from having trailing whitespace.
Live Demo
One way among several:
List<String> paramNames = new ArrayList<String>();
List<String> paramValues = new ArrayList<String>();
Pattern regex = Pattern.compile("([^\\s=]+)=([^\\s=]+)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
paramNames.add(regexMatcher.group(1));
paramValues.add(regexMatcher.group(2));
}
The regex:
([^\\s=]+)=([^\\s=]+)
The code retrieves keys as Group 1, values as Group 2.
Explanation
([^\\s=]+) captures any chars that are not a whitespace or an equal to Group 1
= matches the literal =
([^\\s=]+) captures any chars that are not a whitespace or an equal to Group 2
Your regex would be,
(\\w+=(?:(?!\\w+=).)*)
DEMO
It captures the param=value pair upto the next param=. It captures three param=value pair into three separate groups.
Explanation:
\\w+= Matches one or more word characters followed by an = symbol.
(?:(?!\\w+=).)* A non-capturing group and a negative lookahead is used to match any characters not of characters in this \w+= format. So it captures upto the next param=

Java Regex for changing every ith index in every word of a string

I've written a regex \b\S\w(\S(?=.)) to find every third symbol in a word and replace it with '1'. Now I'm trying to use this expression but really don't know how to do it right.
Pattern pattern = Pattern.compile("\\b\\S\\w(\\S(?=.))");
Matcher matcher = pattern.matcher("lemon apple strawberry pumpkin");
while (matcher.find()) {
System.out.print(matcher.group(1) + " ");
}
So result is:
m p r m
And how can I use this to make a string like this
le1on ap1le st1awberry pu1pkin
You could use something like this:
"lemon apple strawberry pumpkin".replaceAll("(?<=\\b\\S{2})\\S", "1")
Would produce your example output. The regex would replace any non space character preceded by two non space characters and then a word boundary.
This means that "words" like 12345 would be changed into 12145 since 3 is matched by \\S (not space).
Edit:
Updated the regex to better cater to the revised question title, change 2 to i-1 to replace the ith letter of the word.
There is another way to access the index of the matcher
Like this:
Pattern pattern = Pattern.compile("\\b\\S\\w(\\S(?=.))");
String string = "lemon apple strawberry pumpkin";
char[] c = string.toCharArray();
Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
c[matcher.end() - 1] = '1';////// may be it's not perfect , but this way in case of you want to access the index in which the **sring** is matches with the pattern
}
System.out.println(c);

How to count a position of element, relative to another element using regex?

Given String
// 1 2 3
String a = "letters.1223434.more_letters";
I'd like to recognize that numbers come in a 2nd position after the first dot
I then would like to use this knowledge to replace "2nd position of"
// 1 2 3
String b = "someWords.otherwords.morewords";
with "hello" to effectively make
// 1 2 3
String b = "someWords.hello.morewords";
Substitution would have to be done based on the original position of matched element in String a
How can this be done using regex please?
For finding those numbers you can use group mechanism (round brackets in regular expresions):
import java.util.regex.*;
...
String data = "letters.1223434.more_letters";
String pattern="(.+?)\\.(.+?)\\.(.+)";
Matcher m = Pattern.compile(pattern).matcher(data);
if (m.find()) //or while if needed
for (int i = 1; i <= m.groupCount(); i++)
//group 0 == whole String, so I ignore it and start from i=1
System.out.println(i+") [" + m.group(i) + "] start="+m.start(i));
// OUT:
//1) [letters] start=0
//2) [1223434] start=8
//3) [more_letters] start=16
BUT if your goal is just replacing text between two dots try maybe replaceFirst(String regex, String replacement) method on String object:
//find ALL characters between 2 dots once and replace them
String a = "letters.1223434abc.more_letters";
a=a.replaceFirst("\\.(.+)\\.", ".hello.");
System.out.println(a);// OUT => letters.hello.more_letters
regex tells to search all characters between two dots (including these dots), so replacement should be ".hello." (with dots).
If your String will have more dots it will replace ALL characters between first and last dot. If you want regex to search for minimum number of characters necessary to satisfy the pattern you need to use Reluctant Quantifier ->? like:
String b = "letters.1223434abc.more_letters.another.dots";
b=b.replaceFirst("\\.(.+?)\\.", ".hello.");//there is "+?" instead of "+"
System.out.println(b);// OUT => letters.hello.more_letters.another.dots
What you want to do is not directly possible in RegExp, because you cannot get access to the number of the capture group and use this in the replacement operation.
Two alternatives:
If you can use any programming language: Split a using regexp into groups. Check each group if it matches your numeric identifier condition. Split the b string into groups. Replace the corresponding match.
If you only want to use a number of regexp, then you can concatenate a and b using a unique separator (let's say |). Then match .*?\.\d+?\..*?|.*?\.(.*?)\..*? and replace $1. You need to apply this regexp in the three variations first position, second position, third position.
the regex for string a would be
\w+\.(\d+)\.\w+
using the match group to grab the number.
the regex for the second would be
\w+\.(\w+)\.\w+
to grab the match group for the second string.
Then use code like this to do what you please with the matches.
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.find();
where patternStr is the pattern I mentioned above and inputStr is the input string.
You can use variations of this to try each combination you want. So you can move the match group to the first position, try that. If it returns a match, then do the replacement in the second string at the first position. If not, go to position 2 and so on...

Categories

Resources