capture expected result with regex - java

I am looking for regex with capture group where question mark (?) can be present in my input string. If it is not present it returns the input string as it is, but if ? is present the return the string before the first occurrence of ?.
My input can be in following format
Pattern 1
abc.txt // result should be abc.txt
Pattern 2
abc.txt?param1=qwerty.controller&param2=xsd.txt // result should be abc.txt
I tried below
Matcher matcher = Pattern.compile("(.*?)\\?").matcher(str1);
String group1 = "";
if (matcher.find()) {
group1 = matcher.group();
}
With this I am able to capture expected result for pattern 2, but I am not sure how to modify it so that I can
capture expected result for both pattern 1 and pattern 2.
Update:- I know if group1 is empty string, i can make out that input string does not contain any ? and input string is the expected output here. But i am looking for if i can capture both patterns with single regex ?

You could make use of a negated class like this:
^[^?]+
regex101 demo
^ first makes sure the match starts at the beginning.
[^?]+ matches all non-? characters (if there are none, it will match till the end).

Replace the first ? and everything after it (if it exists):
str = str.replaceAll("\\?.*", "");

One way is to remove everything from your string starting with the first question mark, like this:
String res = orig.replaceAll("[?].*$", "");
If there's no question mark, the expression will match nothing, so you would get the original string. Otherwise, the expression would match everything starting from the question mark, so replaceAll will delete it, because the replacement string is empty.
String orig = "abc.txt?param1=qwerty.controller&param2=xs?d.txt";
String res = orig.replaceAll("[?].*$", "");
System.out.println(res);
orig = "hello world";
res = orig.replaceAll("[?].*$", "");
System.out.println(res);
This prints
abc.txt
hello world
Link to a demo on ideone.
EDIT : I would like to capture both with a single regex
You can use "^[^?]*" for your regex. ^ anchors to the beginning, while [^?] captures everything - either up to the end of the string, or up to the first question mark. Either way, the question mark would be left out.
Here is the code:
String[] strings = new String[] {"abc.txt?param1=qwerty.controller&param2=xs?d.txt", "Hello, world!", "a?b"};
for (String str1 : strings) {
Matcher matcher = Pattern.compile("^[^?]*").matcher(str1);
String group1 = "";
if (matcher.find()) {
group1 = matcher.group();
}
System.out.println(group1);
}
Second demo on ideone.

Related

How to match a string between two same delimiters?

some-string-test-moretext.csv
I want to extract the string test, which is always found after the 2nd and 3rd - delimiter.
The expression [-](.*?)[-] would match -string-. So it's probably close, but how can I move on to the next match?
If that matters, I'm using java.
If you know the number of delimiters in advance, you can just split the String.
String[] test = {
"some-string-test-moretext.csv",
"another-string-test-andthensome.csv"
};
for (String s: test) {
System.out.println(s.split("-")[2]);
}
Output
test
test
This should give you quite a good head start:
[^-]+-[^-]+-(.*?)-[^-]+\.csv
https://regex101.com/r/YjWDkv/1
I would propose this, using regex, and very short :
String str = "some-string-test-moretext.csv\n";
Matcher m = Pattern.compile("\\w+-\\w+-(\\w+).*").matcher(str);
String res = m.find() ? m.group(1) : "";
System.out.println(res);
For sureString.split() is another way :
String res = str.split("-")[2];
In sed:
$ echo 'some-string-test-moretext.csv' | sed 's/[^-]*-[^-]*-\([^-]*\)-.*/\1/'
test
[^-]* means "zero or more occurrences of any char except "-". Let's call that "notHyphen". So we're matching on notHyphen-notHyphen-\(notHyphen\)-.* and replacing the whole match with \1, that is, whatever is captured by the \(\).
In Java, you won't need to escape ( to \(, and the technique for extracting from capturing groups is different:
Pattern patt = Pattern.compile("[^-]*-[^-]*-([^-]*)-.*");
Matcher m = patt.matcher(filename);
String extracted = null;
if (m.matches()) {
extracted = m.group(1);
}

Extract letter from String characters and numbers

I have these Strings:
"Turtle123456_fly.me"
"birdy_12345678_prd.tr"
I want the first words of each, ie:
Turtle
birdy
I tried this:
Pattern p = Pattern.compile("//d");
String[] items = p.split(String);
but of course it's wrong. I am not familiar with using Pattern.
Replace the stuff you don't want with nothing:
String firstWord = str.replaceAll("[^a-zA-Z].*", "");
to leave only the part you want.
The regex [^a-zA-Z] means "not a letter", the everything from (and including) the first non-letter to the end is "removed".
See live demo.
String s1 ="Turtle123456_fly.me";
String s2 ="birdy_12345678_prd.tr";
Pattern p = Pattern.compile("^([A-Za-z]+)[^A-Za-z]");
Matcher matcher = p.matcher(s1);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Explanation:
The first part ^([A-Za-z]+) is a group that captures all the letters anchored to the beginning of the input (using the ^ anchor).
The second part [^A-Za-z] captures the first non-letter, and serves as a terminator for the letters sequence.
Then all we have left to do is to fetch the group with index 1 (group 1 is what we have in the first parenthesis).
maybe you should try this \d+\w+.*

Regex including date string, email, number

I have this regex expression:
String patt = "(\\w+?)(:|<|>)(\\w+?),";
Pattern pattern = Pattern.compile(patt);
Matcher matcher = pattern.matcher(search + ",");
I am able to match a string like
search = "firstName:Giorgio"
But I'm not able to match string like
search = "email:giorgio.rossi#libero.it"
or
search = "dataregistrazione:27/10/2016"
How I should modify the regex expression in order to match these strings?
You may use
String pat = "(\\w+)[:<>]([^,]+)"; // Add a , at the end if it is necessary
See the regex demo
Details:
(\w+) - Group 1 capturing 1 or more word chars
[:<>] - one of the chars inside the character class, :, <, or >
([^,]+) - Group 2 capturing 1 or more chars other than , (in the demo, I added \n as the demo input text contains newlines).
You can use regex like this:
public static void main(String[] args) {
String[] arr = new String[]{"firstName:Giorgio", "email:giorgio.rossi#libero.it", "dataregistrazione:27/10/2016"};
String pattern = "(\\w+[:|<|>]\\w+)|(\\w+:\\w+\\.\\w+#\\w+\\.\\w+)|(\\w+:\\d{1,2}/\\d{1,2}/\\d{4})";
for(String str : arr){
if(str.matches(pattern))
System.out.println(str);
}
}
output is:
firstName:Giorgio
email:giorgio.rossi#libero.it
dataregistrazione:27/10/2016
But you have to remember that this regex will work only for your format of data. To make up the universal regex you should use RFC documents and articles (i.e here) about email format. Also this question can be useful.
Hope it helps.
The Character class \w matches [A-Za-z0-9_]. So kindly change the regex as (\\w+?)(:|<|>)(.*), to match any character from : to ,.
Or mention all characters that you can expect i.e. (\\w+?)(:|<|>)[#.\\w\\/]*, .

What is wrong in regexp in Java

I want to get the word text2, but it returns null. Could you please correct it ?
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR\\w+&&(\\w+)'\\)\\)");
Matcher matcher = patter1.matcher(str);
String result = null;
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
One way to do it is to match all possible pattern in parentheses:
String str = "Text SETVAR((&&text1 '&&text2'))";
Pattern patter1 = Pattern.compile("SETVAR[(]{2}&&\\w+\\s*'&&(\\w+)'[)]{2}");
Matcher matcher = patter1.matcher(str);
String result = "";
if (matcher.find()) {
result = matcher.group(1);
}
System.out.println(result);
See IDEONE demo
You can also use [^()]* inside the parentheses to just get to the value inside single apostrophes:
Pattern patter1 = Pattern.compile("SETVAR[(]{2}[^()]*'&&(\\w+)'[)]{2}");
^^^^^^
See another demo
Let me break down the regex for you:
SETVAR - match SETVAR literally, then...
[(]{2} - match 2 ( literally, then...
[^()]* - match 0 or more characters other than ( or ) up to...
'&& - match a single apostrophe and two & symbols, then...
(\\w+) - match and capture into Group 1 one or more word characters
'[)]{2} - match a single apostrophe and then 2 ) symbols literally.
Your regex doesn't match your string, because you didn't specify the opened parenthesis also \\w+ will match any combinations of word character and it won't match space and &.
Instead you can use a negated character class [^']+ which will match any combinations of characters with length 1 or more except one quotation :
String str = "Text SETVAR((&&text1 '&&text2'))";
"SETVAR\\(\\([^']+'&&(\\w+)'\\)\\)"
Debuggex Demo

Why the string does not split?

While trying to split a string xyz213123kop234430099kpf4532 into tokens :
xyz213123
kop234430099
kpf4532
I wrote the following code
String s = "xyz213123kop234430099kpf4532";
String regex = "/^[a-zA-z]+[0-9]+$/";
String tokens[] = s.split(regex);
for(String t : tokens) {
System.out.println(t);
}
but instead of tokens, I get the whole string as one output. What is wrong with the regular expression I used ?
You can do that:
String s = "xyz213123kop234430099kpf4532";
String[] result = s.split("(?<=[0-9])(?=[a-z])");
The idea is to use zero width assertions to find the place where to cut the string, then I use a lookbehind (preceded by a digit [0-9]) and a lookahead (followed by a letter [a-z]).
These lookarounds are just checks and match nothing, thus the delimiter of the split is an empty string and no characters are removed from the result.
You could split on this matching between a number and not-a-number.
String s = "xyz213123kop234430099kpf4532";
String[] parts = s.split("(?<![^\\d])(?=\\D)");
for (String p : parts) {
System.out.println(p);
}
Output
xyz213123
kop234430099
kpf4532
There's nothing in your string that matches the regular expression, because your expression starts with ^ (beginning of string) and ends with $ (end of string). So it would either match the whole string, or nothing at all. But because it doesn't match the string, it is not found when you split the string into tokens. That's why you get just one big token.
You don't want to use split for that. The argument to split is the delimiter between tokens. You don't have that. Instead, you have a pattern that repeats and you want each match to the pattern. Try this instead:
String s = "xyz213123kop234430099kpf4532";
Pattern p = Pattern.compile("([a-zA-z]+[0-9]+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output:
xyz213123
kop234430099
kpf4532
(I don't know by what logic you would have the second token be "3kop234430099" as in your posted question. I assume that the leading "3" is a typo.)

Categories

Resources