Java - Regular Expressions matching one to another - java

I am trying to retrieve bits of data using RE. Problem is I'm not very fluent with RE. Consider the code.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class HTTP{
private static String getServer(httpresp){
Pattern p = Pattern.compile("(\bServer)(.*[Server:-\r\n]"); //What RE syntax do I use here?
Matcher m = p.matcher(httpresp);
if (m.find()){
return m.group(2);
public static void main(String[] args){
String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n"; //Test data
System.out.println(getServer(testdata));
How would I get "Server:" to the next "\r\n" out which would output "Apache"? I googled around and tried myself, but have failed.

It's a one liner:
private static String getServer(httpresp) {
return httpresp.replaceAll(".*Server: (.*?)\r\n.*", "$1");
}
The trick here is two-part:
use .*?, which is a reluctant match (consumes as little as possible and still match)
regex matches whole input, but desired target captured and returned using a back reference

You could use capturing groups or positive lookbehind.
Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)");
Then print the group index 1.
Example:
String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n";
Matcher matcher = Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
OR
Matcher matcher = Pattern.compile("(?:\\bServer\\b\\S*\\s+)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
Output:
Apache
Explanation:
(?:\\bServer:\\s*) In regex, non-capturing group would be represented as (?:...), which will do matching only. \b called word boundary which matches between a word character and a non-word character. Server: matches the string Server: and the following zero or more spaces would be matched by \s*
(.*?) In regex (..) called capturing group which captures those characters which are matched by the pattern present inside the capturing group. In our case (.*?) will capture all the characters non-greedily upto,
(?=[\r\n]+) one or more line breaks are detected. (?=...) called positive lookahead which asserts that the match must be followed by the characters which are matched by the pattern present inside the lookahead.

Related

Java Regex to extract substring with optional trailing slash

Regex:
\/test\/(.*|\/?)
Input
/something/test/{abc}/listed
/something/test/{abc}
Expected
{abc} for both the inputs
You need to capture all characters other than / after /test/:
String s = "/something/test/{abc}/listed";
Pattern pattern = Pattern.compile("/test/([^/]+)"); // or "/test/\\{([^/}]+)"
Matcher matcher = pattern.matcher(s);
if (matcher.find()){
System.out.println(matcher.group(1));
}
See the online demo
Details:
/test/ - matches /test/
([^/]+) - matches and captures into Group 1 one or more (+) (but as many as possible, since + is greedy) characters other than / (due to the negated character class [^/]).
Note that in Java regex patterns you do not need to escape / since it is not a special character and one needs no regex delimiters.
This should work for you :
public static void main(String[] args) {
String s1 = "/something/test/{abc}/listed";
String s2 = "/something/test/{abc}";
System.out.println(s1.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
System.out.println(s2.replaceAll("[^{]+(\\{\\w+\\}).*", "$1"));
}
O/P :
{abc}
{abc}
Regex (as Java string, that is with doubled backslashes):
".*\\/test\\/([^/]*).*"

Android Java regexp pattern

I ping a host. In result a standard output. Below a REGEXP but it do not work correct. Where I did a mistake?
String REGEXP ="time=(\\\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}
You only need \\d+ in your regex because
Matcher looks for the pattern (using which it is created) and then tries to find every occurance of the pattern in the string being matched.
Use while(matcher.group(1) in case of multiple occurances.
each () represents a captured group.
You have too many backslashes. Assuming you want to get the number from a string like "time=32ms", then you need:
String REGEXP ="time=(\\d+)ms";
Pattern pattern = Pattern.compile(REGEXP);
Matcher matcher = pattern.matcher(result);
if (matcher.find()) {
result = matcher.group(1);
}
Explanation: The search pattern you are looking for is "\d", meaning a decimal number, the "+" means 1 or more occurrences.
To get the "\" to the matcher, it needs to be escaped, and the escape character is also "\".
The brackets define the matching group that you want to pick out.
With "\\\\d+", the matcher sees this as "\\d+", which would match a backslash followed by one or more "d"s. The first backslash protects the second backslash, and the third protects the fourth.

Java regexto match tuples

I need to extract tuples out of string
e.g. (1,1,A)(2,1,B)(1,1,C)(1,1,D)
and thought some regex like:
String tupleRegex = "(\\(\\d,\\d,\\w\\))*";
would work but it just gives me the first tuple. What would be proper regex to match all the tuples in the strings.
Remove the * from the regex and iterate over the matches using a java.util.regex.Matcher:
String input = "(1,1,A)(2,1,B)(1,1,C)(1,1,D)";
String tupleRegex = "(\\(\\d,\\d,\\w\\))";
Pattern pattern = Pattern.compile(tupleRegex);
Matcher matcher = pattern.matcher(input);
while(matcher.find()) {
System.out.println(matcher.group());
}
The * character is a quantifier that matches zero or more tuples. Hence your original regex would match the entire input string.
One line solution using String.split() method and here is the pattern (?!^\\()(?=\\()
Arrays.toString("(1,1,A)(2,1,B)(1,1,C)(1,1,D)".split("(?!^\\()(?=\\()"))
output:
[(1,1,A), (2,1,B), (1,1,C), (1,1,D)]
Here is DEMO as well.
Pattern explanation:
(?! look ahead to see if there is not:
^ the beginning of the string
\( '('
) end of look-ahead
(?= look ahead to see if there is:
\( '('
) end of look-ahead

Regex match repeatation punctuation in java

I have some punctuation [] punctuation = {'.', ',' , '!', '?'};. And I want create a regex that can match the word that was combined from those punctuations.
For example some string I want to find: "....???", "!!!!!......", "??.....!", so on.
Thanks for any advice.
Use String.matches() with the posix regex for "punctuation":
str.matches("\\p{Punct}+");
FYI according to the Pattern javadoc, \p{Punct} is one of
!"#$%&'()*+,-./:;<=>?#[\]^_`{|}~
Also, The ^ and $ aren't needed in the expression either, because matches() must matche the whole input to return true, so start and end are implied.
Try this, it should match and group all the symbols written between []:
([.,!?]+)
Tested it with
??..,..!fsdgsdfgsdfgsdfg
And output was
??..,..!
Also tested with this:
String s = "??.....!fsdgsdfgsdfgsdfg?.,!0000a";
Pattern p = Pattern.compile("([.,!?]+)");
Matcher m = p.matcher(s);
while(m.find()) {
System.out.println(m.group(1));
}
And output was
??.....!
?.,!
You can try with a Unicode category for punctuation and a while loop to match your input, as such:
String test = "!...abcd??...!!efgh....!!??abc!";
Pattern pattern = Pattern.compile("\\p{Punct}{2,}");
Matcher matcher = pattern.matcher(test);
while (matcher.find()) {
System.out.println(matcher.group());
}
Output:
!...
??...!!
....!!??
Note: this has the advantage of matching any punctuation character sequence larger than 1 character (hence, the last "!" is not matched by design). To decide the minimum length of the punctuation sequence, just play with the {2,} part of the Pattern.

Java regex patterns

I need help with this matter. Look at the following regex:
Pattern pattern = Pattern.compile("[A-Za-z]+(\\-[A-Za-z]+)");
Matcher matcher = pattern.matcher(s1);
I want to look for words like this: "home-made", "aaaa-bbb" and not "aaa - bbb", but not
"aaa--aa--aaa". Basically, I want the following:
word - hyphen - word.
It is working for everything, except this pattern will pass: "aaa--aaa--aaa" and shouldn't. What regex will work for this pattern?
Can can remove the backslash from your expression:
"[A-Za-z]+-[A-Za-z]+"
The following code should work then
Pattern pattern = Pattern.compile("[A-Za-z]+-[A-Za-z]+");
Matcher matcher = pattern.matcher("aaa-bbb");
match = matcher.matches();
Note that you can use Matcher.matches() instead of Matcher.find() in order to check the complete string for a match.
If instead you want to look inside a string using Matcher.find() you can use the expression
"(^|\\s)[A-Za-z]+-[A-Za-z]+(\\s|$)"
but note that then only words separated by whitespace will be found (i.e. no words like aaa-bbb.). To capture also this case you can then use lookbehinds and lookaheads:
"(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])"
which will read
(?<![A-Za-z-]) // before the match there must not be and A-Z or -
[A-Za-z]+ // the match itself consists of one or more A-Z
- // followed by a -
[A-Za-z]+ // followed by one or more A-Z
(?![A-Za-z-]) // but afterwards not by any A-Z or -
An example:
Pattern pattern = Pattern.compile("(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])");
Matcher matcher = pattern.matcher("It is home-made.");
if (matcher.find()) {
System.out.println(matcher.group()); // => home-made
}
Actually I can't reproduce the problem mentioned with your expression, if I use single words in the String. As cleared up with the discussion in the comments though, the String s contains a whole sentence to be first tokenised in words and then matched or not.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExp {
private static void match(String s) {
Pattern pattern = Pattern.compile("[A-Za-z]+(\\-[A-Za-z]+)");
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println("'" + s + "' match");
} else {
System.out.println("'" + s + "' doesn't match");
}
}
/**
* #param args
*/
public static void main(String[] args) {
match(" -home-made");
match("home-made");
match("aaaa-bbb");
match("aaa - bbb");
match("aaa--aa--aaa");
match("home--home-home");
}
}
The output is:
' -home-made' doesn't match
'home-made' match
'aaaa-bbb' match
'aaa - bbb' doesn't match
'aaa--aa--aaa' doesn't match
'home--home-home' doesn't match

Categories

Resources