Java regex patterns

Java regex patterns - java

I need help with this matter. Look at the following regex:
Pattern pattern = Pattern.compile("[A-Za-z]+(\\-[A-Za-z]+)");
Matcher matcher = pattern.matcher(s1);
I want to look for words like this: "home-made", "aaaa-bbb" and not "aaa - bbb", but not
"aaa--aa--aaa". Basically, I want the following:
word - hyphen - word.
It is working for everything, except this pattern will pass: "aaa--aaa--aaa" and shouldn't. What regex will work for this pattern?

Can can remove the backslash from your expression:
"[A-Za-z]+-[A-Za-z]+"
The following code should work then
Pattern pattern = Pattern.compile("[A-Za-z]+-[A-Za-z]+");
Matcher matcher = pattern.matcher("aaa-bbb");
match = matcher.matches();
Note that you can use Matcher.matches() instead of Matcher.find() in order to check the complete string for a match.
If instead you want to look inside a string using Matcher.find() you can use the expression
"(^|\\s)[A-Za-z]+-[A-Za-z]+(\\s|$)"
but note that then only words separated by whitespace will be found (i.e. no words like aaa-bbb.). To capture also this case you can then use lookbehinds and lookaheads:
"(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])"
which will read
(?<![A-Za-z-]) // before the match there must not be and A-Z or -
[A-Za-z]+ // the match itself consists of one or more A-Z
- // followed by a -
[A-Za-z]+ // followed by one or more A-Z
(?![A-Za-z-]) // but afterwards not by any A-Z or -
An example:
Pattern pattern = Pattern.compile("(?<![A-Za-z-])[A-Za-z]+-[A-Za-z]+(?![A-Za-z-])");
Matcher matcher = pattern.matcher("It is home-made.");
if (matcher.find()) {
System.out.println(matcher.group()); // => home-made
}

Actually I can't reproduce the problem mentioned with your expression, if I use single words in the String. As cleared up with the discussion in the comments though, the String s contains a whole sentence to be first tokenised in words and then matched or not.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegExp {
private static void match(String s) {
Pattern pattern = Pattern.compile("[A-Za-z]+(\\-[A-Za-z]+)");
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println("'" + s + "' match");
} else {
System.out.println("'" + s + "' doesn't match");
}
}
/**
* #param args
*/
public static void main(String[] args) {
match(" -home-made");
match("home-made");
match("aaaa-bbb");
match("aaa - bbb");
match("aaa--aa--aaa");
match("home--home-home");
}
}
The output is:
' -home-made' doesn't match
'home-made' match
'aaaa-bbb' match
'aaa - bbb' doesn't match
'aaa--aa--aaa' doesn't match
'home--home-home' doesn't match

Related

How to check if specific pattern precedes some character?

I am new into java regex and I could't find an answer.
This is my regex: -?\\d*\\.?\\d+(?!i)
and I want it not to recognize eg. String 551i
This is my method:
private static double regexMatcher(String s, String regex) {
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s.replaceAll("\\s+", ""));
if (!matcher.find()) {
return 0;
}
String found = matcher.group();
return Double.parseDouble(matcher.group());
}
I want this method to return 0.0 but it keeps returning 55.0.
What am I doing wrong?

Use an atomic group to avoid backtracking into the whole digit dot digit matching pattern:
"-?(?>\\d*\\.?\\d+)(?!i)"
See the Java demo and a regex demo.

java regex matching &[text]

I am currently working on creating a regex to split out all occurrences of Strings that match the following format: &[text] and need to get at the text. Strings could look like: something &[text] &[text] anything &[text] etc.
I have tried the following regex but I cannot seem to get it to work: &\[(.*)\]
Any help would be greatly appreciated.

Brackets are a bit tricky regarding escaping. Try this:
Pattern r = Pattern.compile("&\\[([^\\]]*)\\]");
Matcher m = r.matcher("foo &[bla] [foo] &[blub]&[blab]");
while (m.find()) {
System.out.println("Found value: " + m.group(1));
}
I replaced your dot with a group of any sign that is not a closing bracket. The star operator would otherwise greedily match until the very end of the string. You could also suppress the greedy matching with a question mark, this reads even better: "&\\[(.*?)\\]"

Two things you need to do:
Double escape your square brackets
Prevent the capture group from matching other occurrences of the pattern, by preventing it from matching an opening or a closing bracket
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Test {
public static void main(String[] args) {
String s = "&[test] something ] something &[test2]";
Pattern pattern = Pattern.compile("&\\[([^\\[\\]]*)\\]");
Matcher matcher = pattern.matcher(s);
while (matcher.find()) {
System.out.println("capture group: " + matcher.group(1));
}
}
}

Java - Regular Expressions matching one to another

I am trying to retrieve bits of data using RE. Problem is I'm not very fluent with RE. Consider the code.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
class HTTP{
private static String getServer(httpresp){
Pattern p = Pattern.compile("(\bServer)(.*[Server:-\r\n]"); //What RE syntax do I use here?
Matcher m = p.matcher(httpresp);
if (m.find()){
return m.group(2);
public static void main(String[] args){
String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n"; //Test data
System.out.println(getServer(testdata));
How would I get "Server:" to the next "\r\n" out which would output "Apache"? I googled around and tried myself, but have failed.

It's a one liner:
private static String getServer(httpresp) {
return httpresp.replaceAll(".*Server: (.*?)\r\n.*", "$1");
}
The trick here is two-part:
use .*?, which is a reluctant match (consumes as little as possible and still match)
regex matches whole input, but desired target captured and returned using a back reference

You could use capturing groups or positive lookbehind.
Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)");
Then print the group index 1.
Example:
String testdata = "HTTP/1.1 302 Found\r\nServer: Apache\r\n\r\n";
Matcher matcher = Pattern.compile("(?:\\bServer:\\s*)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
OR
Matcher matcher = Pattern.compile("(?:\\bServer\\b\\S*\\s+)(.*?)(?=[\r\n]+)").matcher(testdata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}
Output:
Apache
Explanation:
(?:\\bServer:\\s*) In regex, non-capturing group would be represented as (?:...), which will do matching only. \b called word boundary which matches between a word character and a non-word character. Server: matches the string Server: and the following zero or more spaces would be matched by \s*
(.*?) In regex (..) called capturing group which captures those characters which are matched by the pattern present inside the capturing group. In our case (.*?) will capture all the characters non-greedily upto,
(?=[\r\n]+) one or more line breaks are detected. (?=...) called positive lookahead which asserts that the match must be followed by the characters which are matched by the pattern present inside the lookahead.

Extracting both matching and not matching regex

I have a String like this one abc3a de'f gHi?jk I want to split it into the substrings abc3a, de'f, gHi, ? and jk. In other terms, I want to return Strings that match the regular expression [a-zA-Z0-9'] and the Strings that do not match this regular expression. If there is a way to tell whether each resulting substring is a match or not, this will be a plus.
Thanks!

import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HelloWorld{
public static void main(String []args){
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9']*)?");
String str = "abc3a de'f gHi?jk";
Matcher matcher = pattern.matcher(str);
while(matcher.find()){
if(matcher.group(1).length() > 0)
System.out.println("Match:" + matcher.group(1));
if(matcher.group(2).length() > 0)
System.out.println("Miss: `" + matcher.group(2) + "`");
}
}
}
Output:
Match:abc3a
Miss: ` `
Match:de'f
Miss: ` `
Match:gHi
Miss: `?`
Match:jk
If you don't want white space.
Pattern pattern = Pattern.compile("([a-zA-Z0-9']*)?([^a-zA-Z0-9'\\s]*)?");
Output:
Match:abc3a
Match:de'f
Match:gHi
Miss: `?`
Match:jk

You can use this regex:
"[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+"
Will give:
["abc3a", "de'f", "gHi", "?", "jk"]
Online Demo: http://regex101.com/r/xS0qG4
Java code:
Pattern p = Pattern.compile("[a-zA-Z0-9']+|[^a-zA-Z0-9' ]+");
Matcher m = p.matcher("abc3a de'f gHi?jk");
while (m.find())
System.out.println(m.group());
OUTPUT
abc3a
de'f
gHi
?
jk

myString.split("\\s+|(?<=[a-zA-Z0-9'])(?=[^a-zA-Z0-9'\\s])|(?<=[^a-zA-Z0-9'\\s])(?=[a-zA-Z0-9'])")
splits at all the boundaries between runs of characters in that charset.
The lookbehind (?<=...) matches after a character in a run, while the lookahead (?=...) matches before a character in a run of characters outside the set.
The \\s+ is not a boundary match, and matches a run of whitespace characters. This has the effect of removing white-space from the result entirely.
The | allows causing splitting to happy at either boundary or at a run of white-space.
Since the lookbehind and lookahead are both positive, the boundaries will not match at the start or end of the string, so there's no need to ignore empty strings in the output unless there is white-space there.

You can use anchors to split
private static String[] splitString(final String s) {
final String [] arr = s.split("(?=[^a-zA-Z0-9'])|(?<=[^a-zA-Z0-9'])");
final ArrayList<String> strings = new ArrayList<String>(arr.length);
for (final String str : arr) {
if(!"".equals(str.trim())) {
strings.add(str);
}
}
return strings.toArray(new String[strings.size()]);
}
(?=xxx) means xxx will follow here and (?<=xxx) mean xxx precedes this position.
As you did not want to include all-whitespace-matches into the result you need to filter the Array given by split.

Returning the String found with a regular expression

If I have a regular expression, how do I return the substring that it has found?
I'm sure I must be missing something obvious, but I've found various methods to confirm that that substring is contained in the string I'm searching, or to replace it with something else, but not to return what I've found.

Matcher matcher = Pattern.compile("a+").matcher("bbbbaaaaabbbb");
if(matcher.find())
System.out.println(matcher.group(0)); //aaaaa
If you want specific parts
Matcher matcher = Pattern.compile("(a+)b*(c+)").matcher("bbbbaaaaabbbbccccbbb");
if(matcher.find()){
System.out.println(matcher.group(1)); //aaaaa
System.out.println(matcher.group(2)); //cccc
System.out.println(matcher.group(0)); //aaaaabbbbcccc
}
Group 0 is the complete pattern.. other groups are separated with parenthesis in the regex (a+)b*(c+) and can be get individually

CharSequence inputStr = "abbabcd";
String patternStr = "(a(b*))+(c*)";
// Compile and use regular expression
Pattern pattern = Pattern.compile(patternStr);
Matcher matcher = pattern.matcher(inputStr);
boolean matchFound = matcher.find();
if (matchFound)
{
// Get all groups for this match
for (int i=0; i<=matcher.groupCount(); i++)
{
String groupStr = matcher.group(i);
}
}
A CharSequence is a readable sequence of char values. This interface provides uniform, read-only access to many different kinds of char sequences. A char value represents a character in the Basic Multilingual Plane (BMP) or a surrogate. Refer to Unicode Character Representation for details.
CharSequence is an interface
public interface CharSequence
See Capturing groups
See group with parameter example
See Java Regex Tutorial

import java.util.regex.*;
class Reg
{
public static void main(String [] args)
{
Pattern p = Pattern.compile("ab");
Matcher m = p.matcher("abcabd");
System.out.println("Pattern is " + m.pattern());
while(m.find())
{
System.out.println(m.start() + " " + m.group());
// m.start() will give the index and m.group() will give the substring
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Java regex patterns - java

Related

How to check if specific pattern precedes some character?

java regex matching &[text]

Java - Regular Expressions matching one to another

Extracting both matching and not matching regex

Returning the String found with a regular expression

Categories

Resources