java regexp get more than need - java

I have following regexp
http://[a-z./].*(js)
and the string
efwefewfhttp://assets.main.com/zepto-1.1.3.min.js fffhttp://assets.main.com/zepto-1.1.3.min.js
Code:
List<String> kk = new ArrayList<String>();
while (urlMatcher.find()){
kk.add(urlMatcher.group());
}
This regexp output is
http://assets.main.com/zepto-1.1.3.min.js fffhttp://assets.main.com/zepto-1.1.3.min.js
but should be 2 strings in result
How change regexp to get two string as result?

Use the following regex with lazy dot matching pattern:
http://[a-z./].*?js
^
See the regex demo
With this, you will match http://assets.main.com/zepto-1.1.3.min.js and http://assets.main.com/zepto-1.1.3.min.js.
The thing is that .* matches the whole line and then backtracks, checking if it can accommodate for the right-hand pattern. Thus it matches the longest possible substring (from the left-most up to the right-most). Lazy matching will match from the left-most to the first occurrence of the next subpattern yielding 2 matches.
See Watch Out for The Greediness! section.
Also, since these are links, and there should be no spaces, you can use \S (non-whitespace) shorthand char class:
http://[a-z./]\S*\.js
Also, the literal dot can be matched with \.. See another demo.
Lazy/greedy dot matching should be avoided as often as possible due to heavy backtracking they might involve!
Sample code:
String str = "efwefewfhttp://assets.main.com/zepto-1.1.3.min.js fffhttp://assets.main.com/zepto-1.1.3.min.js";
Pattern ptrn = Pattern.compile("http://[a-z./]\\S*\\.js");
Matcher urlMatcher = ptrn.matcher(str);
List<String> kk = new ArrayList<String>();
while (urlMatcher.find()){
kk.add(urlMatcher.group());
}
System.out.println(kk);
// [http://assets.main.com/zepto-1.1.3.min.js, http://assets.main.com/zepto-1.1.3.min.js]

Related

What is the Regex for decimal numbers in Java?

I am not quite sure of what is the correct regex for the period in Java. Here are some of my attempts. Sadly, they all meant any character.
String regex = "[0-9]*[.]?[0-9]*";
String regex = "[0-9]*['.']?[0-9]*";
String regex = "[0-9]*["."]?[0-9]*";
String regex = "[0-9]*[\.]?[0-9]*";
String regex = "[0-9]*[\\.]?[0-9]*";
String regex = "[0-9]*.?[0-9]*";
String regex = "[0-9]*\.?[0-9]*";
String regex = "[0-9]*\\.?[0-9]*";
But what I want is the actual "." character itself. Anyone have an idea?
What I'm trying to do actually is to write out the regex for a non-negative real number (decimals allowed). So the possibilities are: 12.2, 3.7, 2., 0.3, .89, 19
String regex = "[0-9]*['.']?[0-9]*";
Pattern pattern = Pattern.compile(regex);
String x = "5p4";
Matcher matcher = pattern.matcher(x);
System.out.println(matcher.find());
The last line is supposed to print false but prints true anyway. I think my regex is wrong though.
Update
To match non negative decimal number you need this regex:
^\d*\.\d+|\d+\.\d*$
or in java syntax : "^\\d*\\.\\d+|\\d+\\.\\d*$"
String regex = "^\\d*\\.\\d+|\\d+\\.\\d*$"
String string = "123.43253";
if(string.matches(regex))
System.out.println("true");
else
System.out.println("false");
Explanation for your original regex attempts:
[0-9]*\.?[0-9]*
with java escape it becomes :
"[0-9]*\\.?[0-9]*";
if you need to make the dot as mandatory you remove the ? mark:
[0-9]*\.[0-9]*
but this will accept just a dot without any number as well... So, if you want the validation to consider number as mandatory you use + ( which means one or more) instead of *(which means zero or more). That case it becomes:
[0-9]+\.[0-9]+
If you on Kotlin, use ktx:
fun String.findDecimalDigits() =
Pattern.compile("^[0-9]*\\.?[0-9]*").matcher(this).run { if (find()) group() else "" }!!
Your initial understanding was probably right, but you were being thrown because when using matcher.find(), your regex will find the first valid match within the string, and all of your examples would match a zero-length string.
I would suggest "^([0-9]+\\.?[0-9]*|\\.[0-9]+)$"
There are actually 2 ways to match a literal .. One is using backslash-escaping like you do there \\., and the other way is to enclose it inside a character class or the square brackets like [.]. Most of the special characters become literal characters inside the square brackets including .. So use \\. shows your intention clearer than [.] if all you want is to match a literal dot .. Use [] if you need to match multiple things which represents match this or that for example this regex [\\d.] means match a single digit or a literal dot
I have tested all the cases.
public static boolean isDecimal(String input) {
return Pattern.matches("^[-+]?\\d*[.]?\\d+|^[-+]?\\d+[.]?\\d*", input);
}

Using java regex how to find particular word any where in the string?

Using java regex how to find particular word anywhere in the string. My need is to check whether the string "Google" contains the word "gooe" or not.
For example:-
String: Goolge
word to find : gooe
The string "Google" contains all the characters g,o,o,e then it should return true.
IF the string is "wikipedia" and my word to find is "gooe" then it should return false.
How to form regex expression in this scenario..?
I've just tested such RegEx that makes a use of "look-ahead":
(?=^.*g)(?=^.*o)(?=^.*e)
It should return true for all strings that contain g, o and e, while returning false if any of these characters is missing.
If you want to find word in whole string you can use:
"^(?=.*e)(?=.*o.*o)(?=.*g).*"
You have to build a positive lookahead for each letter. In case of having gooe as search term our RegEx would be:
(?i)(?=.*g)(?=.*o)(?=.*o)(?=.*e)
It's obvious that we have two exact same lookaheads. They will satisfy at the position of second o letter, so one is redundant. You can remove duplicate letters from search term before building final pattern. (?i) sets case-insensitivity flag on.
String term = "Gooe"; // Search term
String word = "google"; // Against word `Google`
String pattern = "(?i)(?=.*" + String.join(")(?=.*", term.split("(?!^)")) + ")";
Pattern regex = Pattern.compile(pattern);
Matcher match = regex.matcher(word);
if (match.find()) {
// Matched
}
See demo here
If order is important and while looking for two os, exactly both of them should exist then our RegEx would be:
(?i).*?g.*?o.*?o.*?e
Java:
String pattern = "(?i).*?" + String.join(".*?", term.split("(?!^)"));

Java regex to match the start of the word?

Objective: for a given term, I want to check if that term exist at the start of the word. For example if the term is 't'. then in the sentance:
"This is the difficult one Thats it"
I want it to return "true" because of :
This, the, Thats
so consider:
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "/\\b"+term+"[^\\b]*?\\b/gi";
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
}
}
I am getting following Exception:
Exception in thread "main" java.util.regex.PatternSyntaxException:
Illegal/unsupported escape sequence near index 7
/\bt[^\b]*?\b/gi
^
at java.util.regex.Pattern.error(Pattern.java:1924)
at java.util.regex.Pattern.escape(Pattern.java:2416)
at java.util.regex.Pattern.range(Pattern.java:2577)
at java.util.regex.Pattern.clazz(Pattern.java:2507)
at java.util.regex.Pattern.sequence(Pattern.java:2030)
at java.util.regex.Pattern.expr(Pattern.java:1964)
at java.util.regex.Pattern.compile(Pattern.java:1665)
at java.util.regex.Pattern.<init>(Pattern.java:1337)
at java.util.regex.Pattern.compile(Pattern.java:1022)
at java.util.regex.Pattern.matches(Pattern.java:1128)
at java.lang.String.matches(String.java:2063)
at HelloWorld.main(HelloWorld.java:8)
Also the following does not work:
import java.util.regex.*;
public class HelloWorld{
public static void main(String []args){
String term = "t";
String regex = "\\b"+term+"gi";
//String regex = ".";
System.out.println(regex);
String str = "This is the difficult one Thats it";
System.out.println(str.matches(regex));
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
System.out.println(m.find());
}
}
Example:
{ This , one, Two, Those, Thanks }
for words This Two Those Thanks; result should be true.
Thanks
Since you're using the Java regex engine, you need to write the expressions in a way Java understands. That means removing trailing and leading slashes and adding flags as (?<flags>) at the beginning of the expression.
Thus you'd need this instead:
String regex = "(?i)\\b"+term+".*?\\b"
Have a look at regular-expressions.info/java.html for more information. A comparison of supported features can be found here (just as an entry point): regular-expressions.info/refbasic.html
In Java we don't surround regex with / so instead of "/regex/flags" we just write regex. If you want to add flags you can do it with (?flags) syntax and place it in regex at position from which flag should apply, for instance a(?i)a will be able to find aa and aA but not Aa because flag was added after first a.
You can also compile your regex into Pattern like this
Pattern pattern = Pattern.compile(regex, flags);
where regex is String (again not enclosed with /) and flag is integer build from constants from Pattern like Pattern.DOTALL or when you need more flags you can use Pattern.CASE_INSENSITIVE|Pattern.MULTILINE.
Next thing which may confuse you is matches method. Most people are mistaken by its name, because they assume that it will try to check if it can find in string element which can be matched by regex, but in reality, it checks if entire string can be matched by regex.
What you seem to want is mechanism to test of some regex can be found at least once in string. In that case you may either
add .* at start and end of your regex to let other characters which are not part of element you want to find be matched by regex engine, but this way matches must iterate over entire string
use Matcher object build from Pattern (representing your regex), and use its find() method, which will iterate until it finds match for regex, or will find end of string. I prefer this approach because it will not need to iterate over entire string, but will stop when match will be found.
So your code could look like
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(str);
System.out.println(matcher.find());
In case your term could contain some regex special characters but you want regex engine to treat them as normal characters you need to make sure that they will be escaped. To do this you can use Pattern.quote method which will add all necessary escapes for you, so instead of
Pattern pattern = Pattern.compile("\\b"+term, Pattern.CASE_INSENSITIVE);
for safety you should use
Pattern pattern = Pattern.compile("\\b"+Pattern.quote(term), Pattern.CASE_INSENSITIVE);
String regex = "(?i)\\b"+term;
In Java, the modifiers must be inserted between "(?" and ")" and there is a variant for turning them off again: "(?-" and ")".
For finding all words beginning with "T" or "t", you may want to use Matcher's find method repeatedly. If you just need the offset, Matcher's start method returns the offset.
If you need to match the full word, use
String regex = "(?i)\\b"+term + "\\w*";
String str = "This is the difficult one Thats it";
String term = "t";
Pattern pattern = Pattern.compile("^[+"+term+"].*",Pattern.CASE_INSENSITIVE);
String[] strings = str.split(" ");
for (String s : strings) {
if (pattern.matcher(s).matches()) {
System.out.println(s+"-->"+true);
} else {
System.out.println(s+"-->"+false);
}
}

What would be the regex for this pattern?

My Java program, in certain point, receives a string containing a couple of key-value properties like this example:
param1=value Param2=values can have spaces PARAM3=values cant have equal characters
The parameters' name/key are composed by a single word (a-z, A-Z, _ and 0-9) and are followed by an = character (not separated by spaces) and it's value. The value is a text that can contain spaces and last until the end of the string or the begin of another parameter. (which is a word followed by equals and it's value, etc.)
I need to extract a Properties object (string-to-string map) from this string. I was trying to use regex to find each key-value set. The code is like this:
public static String createProperties(String str) {
Properties prop = new Properties();
Matcher matcher = Pattern.compile(some regex).match(str);
while (matcher.find()) {
String match = matcher.group();
String param = ...; // What comes before '='
String value = ...; // What comes after '='
prop.setProperty(param, value);
}
return prop;
}
But the regex wrote is not working correctly.
String regex = "(\\w+=.*)+";
Since .* tells the regex to get "anything" it found, it will match the entire string. I want to tell the regex to search until it finds another \\w=.*. (word followed by equals and something after)
How could I write this regex? Or what would be another solution for the problem using regex?
You can use a Negative Lookahead here.
(\\w+)=((?:(?!\\s*\\w+=).)*)
The key is placed inside capturing group #1 and the value is in capturing group #2. Note that I used \s inside the lookaround in order to prevent the value from having trailing whitespace.
Live Demo
One way among several:
List<String> paramNames = new ArrayList<String>();
List<String> paramValues = new ArrayList<String>();
Pattern regex = Pattern.compile("([^\\s=]+)=([^\\s=]+)");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
paramNames.add(regexMatcher.group(1));
paramValues.add(regexMatcher.group(2));
}
The regex:
([^\\s=]+)=([^\\s=]+)
The code retrieves keys as Group 1, values as Group 2.
Explanation
([^\\s=]+) captures any chars that are not a whitespace or an equal to Group 1
= matches the literal =
([^\\s=]+) captures any chars that are not a whitespace or an equal to Group 2
Your regex would be,
(\\w+=(?:(?!\\w+=).)*)
DEMO
It captures the param=value pair upto the next param=. It captures three param=value pair into three separate groups.
Explanation:
\\w+= Matches one or more word characters followed by an = symbol.
(?:(?!\\w+=).)* A non-capturing group and a negative lookahead is used to match any characters not of characters in this \w+= format. So it captures upto the next param=

Author and time matching regex

I would to use a regex in my Java program to recognize some feature of my strings.
I've this type of string:
`-Author- has wrote (-hh-:-mm-)
So, for example, I've a string with:
Cecco has wrote (15:12)
and i've to extract author, hh and mm fields. Obviously I've some restriction to consider:
hh and mm must be numbers
author hasn't any restrictions
I've to consider space between "has wrote" and (
How can I can use regex?
EDIT: I attach my snippet:
String mRegex = "(\\s)+ has wrote \\((\\d\\d):(\\d\\d)\\)";
Pattern mPattern = Pattern.compile(mRegex);
String[] str = {
"Cecco CQ has wrote (14:55)", //OK (matched)
"yesterday you has wrote that I'm crazy", //NO (different text)
"Simon has wrote (yesterday)", // NO (yesterday isn't numbers)
"John has wrote (22:32)", //OK
"James has wrote(22:11)", //NO (missed space between has wrote and ()
"Tommy has wrote (xx:ss)" //NO (xx and ss aren't numbers)
};
for(String s : str) {
Matcher mMatcher = mPattern.matcher(s);
while (mMatcher.find()) {
System.out.println(mMatcher.group());
}
}
homework?
Something like:
(.+) has wrote \((\d\d):(\d\d)\)
Should do the trick
() - mark groups to capture (there are three in the above)
.+ - any chars (you said no restrictions)
\d - any digit
\(\) escape the parens as literals instead of a capturing group
use:
Pattern p = Pattern.compile("(.+) has wrote \\((\\d\\d):(\\d\\d)\\)");
Matcher m = p.matcher("Gareth has wrote (12:00)");
if( m.matches()){
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
}
To cope with an optional (HH:mm) at the end you need to start to use some dark regex voodoo:
Pattern p = Pattern.compile("(.+) has wrote\\s?(?:\\((\\d\\d):(\\d\\d)\\))?");
Matcher m = p.matcher("Gareth has wrote (12:00)");
if( m.matches()){
System.out.println(m.group(1));
System.out.println(m.group(2));
System.out.println(m.group(3));
}
m = p.matcher("Gareth has wrote");
if( m.matches()){
System.out.println(m.group(1));
// m.group(2) == null since it didn't match anything
}
The new unescaped pattern:
(.+) has wrote\s?(?:\((\d\d):(\d\d)\))?
\s? optionally match a space (there might not be a space at the end if there isn't a (HH:mm) group
(?: ... ) is a none capturing group, i.e. allows use to put ? after it to make is optional
I think #codinghorror has something to say about regex
The easiest way to figure out regular expressions is to use a testing tool before coding.
I use an eclipse plugin from http://www.brosinski.com/regex/
Using this I came up with the following result:
([a-zA-Z]*) has wrote \((\d\d):(\d\d)\)
Cecco has wrote (15:12)
Found 1 match(es):
start=0, end=23
Group(0) = Cecco has wrote (15:12)
Group(1) = Cecco
Group(2) = 15
Group(3) = 12
An excellent turorial on regular expression syntax can be found at http://www.regular-expressions.info/tutorial.html
Well, just in case you didn't know, Matcher has a nice function that can draw out specific groups, or parts of the pattern enclosed by (), Matcher.group(int). Like if I wanted to match for a number between two semicolons like:
:22:
I could use the regex ":(\\d+):" to match one or more digits between two semicolons, and then I can fetch specifically the digits with:
Matcher.group(1)
And then its just a matter of parsing the String into an int. As a note, group numbering starts at 1. group(0) is the whole match, so Matcher.group(0) for the previous example would return :22:
For your case, I think the regex bits you need to consider are
"[A-Za-z]" for alphabet characters (you could probably also safely use "\\w", which matchers alphabet characters, as well as numbers and _).
"\\d" for digits (1,2,3...)
"+" for indicating you want one or more of the previous character or group.

Categories

Resources