String regex = "^;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}.[\\d]{1,}";
String str = ";ABC12;10;250.3";
System.out.println(str.matches(regex));
The above regex works fine.
Consider the following strings
str1=";ABC12;10;250.3"
str2=;ABB62;5;2.3
str3=;ABF02;8;25120.3
str4=;AKC12;11;2504.303
Now i have the string as String strToMatch= str1,str2,str3,str4
How do i convert my regex expression above inorder to match the above string.
Note : There can be n number of comma separated values in the above string. And i also need to take care that the string strToMatch doesnot end with comma.
You can capture the regex with round brackets and repeat one or more times:
String regex = "^(;[A-Z0-9]{5};\\d+;\\d+\\.\\d+){1,}";
Try this pattern instead: (;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}\\.[\\d]{1,},?)+
This has two differences to your pattern: first I use \\. to denote that this has to be a . because a single dot means "any character" in regex.
Then I used the grouping brackets (...) and the + at the end to say: "Look for this once or more". As the , is optional at the end, I added a ?
If you want to get single matches to process using a Matcher later on, a simple modification should do the trick: (;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}\\.[\\d]{1,}),?
The + is gone and the ,? is outside the grouping brackets, because those are now capturing brackets (as well).
Example:
final Pattern pattern = Pattern.compile("(;[A-Z0-9]{5};[\\d]{1,};[\\d]{1,}\\.[\\d]{1,}),?");
final Matcher matcher = pattern.matcher(";ABC12;10;250.3,;ABB62;5;2.3,;ABF02;8;25120.3,;AKC12;11;2504.303");
while (matcher.find()) {
System.out.println("Whole match: " + matcher.group());
for (int i = 1; i <= matcher.groupCount(); ++i) {
System.out.println("Group #" + i + ": " + matcher.group(i));
}
}
I have found below way of solving the problem.
String strToMatch = ";ABC12;10;250.3,;ABB62;5;2.3,;ABF02;8;25120.3,;AKC12;11;2504.303";
if(strToMatch.endsWith(",") || strToMatch.startsWith(","))
return false;
else{
String[] str = strToMatch.split(",");
int count = 0;
for (String s : str){
String regex = ";[A-Z0-9]{5};\\d+;\\d+\\.\\d+";
if(s.matches(regex))
return false;
}
return true;
}
Any simpler way than this?
Related
Basically my desired outcome is to split a string based on known keywords regardless on if whitespace seperates the keyword. Below is an example of my current implementation, expect param String line = "sum:=5;":
private static String[] nextLineAsToken(String line) {
return line.split("\\s+(?=(:=|<|>|=))");
}
Expected:
String[] {"sum", ":=", "5;"};
Actual:
String[] {"sum:=5;"};
I have a feeling this isn't possible, but it would be great to hear from you guys.
Thanks.
Here is an example code that you can use to split your input into groups. White space characters like regular space are ignored. It is later printed to the output in for loop:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Example {
public static void main(String[] args) {
final String regex = "(\\w*)\\s*(:=)\\s*(\\d*;)";
final String string = "sum:=5;";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
System.out.println("Full match: " + matcher.group(0));
for (int i = 1; i <= matcher.groupCount(); i++) {
System.out.println("Group " + i + ": " + matcher.group(i));
}
}
}
}
And this is the output:
Full match: sum:=5;
Group 1: sum
Group 2: :=
Group 3: 5;
Your main problem is you coded \s+ instead of \s*, which required there to be spaces to split, instead of spaces being optional. The other problem is your regex only splits before operators.
Use this regex:
\s*(?=(:=|<|>|(?<!:)=))|(?<=(=|<|>))\s*
See live demo.
Or as Java:
return line.split("\\s*(?=(:=|<|>|(?<!:)=))|(?<=(=|<|>))\\s*");
Which uses a look ahead to split before operators and a look behind to split after operators.
\s* has been added to consume any spaces between terms.
Note also the negative look behind (?<!:) within the look ahead to prevent splitting between : and =.
String regex = "(some|text|)";
Pattern pattern = Pattern.compile(regex, Pattern.CASE_INSENSITIVE);
Matcher matcher = pattern.matcher(input);
while (matcher.find())
{
int start = matcher.start();
int end = matcher.end();
System.out.print("Start index: " + start);
System.out.print(" End index: " + end + " ");
System.out.println(matcher.group());
}
Hi I would like to return the the full substring including to the end of the word. For example,
if the input is:
String input = "I am a texte";
I would expect it to return 7 11, I would basically would want to return text upto "e" rather than "t". Is this possible? If so, how can this be implemented?
Why use a regexp for this? String has lastIndexOf to find the last index of a delimiter (like space), and it looks like you're not trying to find a "word" but "the substring after the last space" (which are not the same thing in many, many languages), so given that, just use:
String last = input.substring(input.getLastIndexOf(' ') + 1);
(optionally as two lines with a check to see if getLastIndexOf is a sensible position), and done?
you can use the following regex instead:
String regex = "(.[^\\s+].*some*.[^\\s]+|.[^\\s+].*text*.[^\\s]+)";
This will take all the words that start with some or text. For example: someone.
I'm currently trying to solve a problem from codingbat.com with regular expressions.
I'm new to this, so step-by-step explanations would be appreciated. I could solve this with String methods relatively easily, but I am trying to use regular expressions.
Here is the prompt:
Given a string and a non-empty word string, return a string made of each char just before and just after every appearance of the word in the string. Ignore cases where there is no char before or after the word, and a char may be included twice if it is between two words.
wordEnds("abcXY123XYijk", "XY") → "c13i"
wordEnds("XY123XY", "XY") → "13"
wordEnds("XY1XY", "XY") → "11"
etc
My code thus far:
String regex = ".?" + word+ ".?";
Pattern p = Pattern.compile(regex);
Matcher m = p.matcher(str);
String newStr = "";
while(m.find())
newStr += m.group().replace(word, "");
return newStr;
The problem is that when there are multiple instances of word in a row, the program misses the character preceding the word because m.find() progresses beyond it.
For example: wordEnds("abc1xyz1i1j", "1") should return "cxziij", but my method returns "cxzij", not repeating the "i"
I would appreciate a non-messy solution with an explanation I can apply to other general regex problems.
This is a one-liner solution:
String wordEnds = input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
This matches your edge case as a look ahead within a non-capturing group, then matches the usual (consuming) case.
Note that your requirements don't require iteration, only your question title assumes it's necessary, which it isn't.
Note also that to be absolutely safe, you should escape all characters in word in case any of them are special "regex" characters, so if you can't guarantee that, you need to use Pattern.quote(word) instead of word.
Here's a test of the usual case and the edge case, showing it works:
public static String wordEnds(String input, String word) {
word = Pattern.quote(word); // add this line to be 100% safe
return input.replaceAll(".*?(.)" + word + "(?:(?=(.)" + word + ")|(.).*?(?=$|." + word + "))", "$1$2$3");
}
public static void main(String[] args) {
System.out.println(wordEnds("abcXY123XYijk", "XY"));
System.out.println(wordEnds("abc1xyz1i1j", "1"));
}
Output:
c13i
cxziij
Use positive lookbehind and postive lookahead which are zero-width assertions
(?<=(.)|^)1(?=(.)|$)
^ ^ ^-looks for a character after 1 and captures it in group2
| |->matches 1..you can replace it with any word
|
|->looks for a character just before 1 and captures it in group 1..this is zero width assertion that doesn't move forward to match.it is just a test and thus allow us to capture the values
$1 and $2 contains your value..Go on finding till the end
So this should be like
String s1 = "abcXY123XYiXYjk";
String s2 = java.util.regex.Pattern.quote("XY");
String s3 = "";
String r = "(?<=(.)|^)"+s2+"(?=(.)|$)";
Pattern p = Pattern.compile(r);
Matcher m = p.matcher(s1);
while(m.find()) s3 += m.group(1)+m.group(2);
//s3 now contains c13iij
works here
Use regex as follows:
Matcher m = Pattern.compile("(.|)" + Pattern.quote(b) + "(?=(.?))").matcher(a);
for (int i = 1; m.find(); c += m.group(1) + m.group(2), i++);
Check this demo.
I have a sort of a problem with this code:
String[] paragraph;
if(paragraph[searchKeyword_counter].matches("(.*)(\\b)"+"is"+"(\\b)(.*)")){
if i am not mistaken to use .matches() and search a particular character in a string i need a .* but what i want to happen is to search a character without matching it to another word.
For example is the keyword i am going to search I do not want it to match with words that contain is character like ship, his, this. so i used \b for boundary but the code above is not working for me.
Example:
String[] Content= {"is,","his","fish","ish","its","is"};
String keyword = "is";
for(int i=0;i<Content.length;i++){
if(content[i].matches("(.*)(\\b)"+keyword+"(\\b)(.*)")){
System.out.println("There are "+i+" is.");
}
}
What i want to happen here is that it will only match with is is, but not with his fish. So is should match with is, and is meaning I want it to match even the character is beside a non-alphanumerical character and spaces.
What is the problem with the code above?
what if one of the content has a uppercase character example IS and it is compared with is, it will be unmatched. Correct my if i am wrong. How to match a lower cased character to a upper cased character without changing the content of the source?
String string = "...";
String word = "is";
Pattern p = Pattern.compile("\\b" + Pattern.quote(word) + "\\b");
Matcher m = p.matcher(string);
if (m.find()) {
...
}
just add spaces like this:
suppose message equal your content string and pattern is your keyword
if ((message).matches(".* " + pattern + " .*")||(message).matches("^" + pattern + " .*")
||(message).matches(".* " + pattern + "$")) {
I am trying to use a simple split to break up the following string: 00-00000
My expression is: ^([0-9][0-9])(-)([0-9])([0-9])([0-9])([0-9])([0-9])
And my usage is:
String s = "00-00000";
String pattern = "^([0-9][0-9])(-)([0-9])([0-9])([0-9])([0-9])([0-9])";
String[] parts = s.split(pattern);
If I play around with the Pattern and Matcher classes I can see that my pattern does match and the matcher tells me my groupCount is 7 which is correct. But when I try and split them I have no luck.
String.split does not use capturing groups as its result. It finds whatever matches and uses that as the delimiter. So the resulting String[] are substrings in between what the regex matches. As it is the regex matches the whole string, and with the whole string as a delimiter there is nothing else left so it returns an empty array.
If you want to use regex capturing groups you will have to use Matcher.group(), String.split() will not do.
for your example, you could simply do this:
String s = "00-00000";
String pattern = "-";
String[] parts = s.split(pattern);
I can not be sure, but I think what you are trying to do is to get each matched group into an array.
Matcher matcher = Pattern.compile(pattern).matcher();
if (matcher.matches()) {
String s[] = new String[matcher.groupCount()) {
for (int i=0;i<matches.groupCount();i++) {
s[i] = matcher.group(i);
}
}
}
From the documentation:
String[] split(String regex) -- Returns: the array of strings computed by splitting this string around matches of the given regular expression
Essentially the regular expression is used to define delimiters in the input string. You can use capturing groups and backreferences in your pattern (e.g. for lookarounds), but ultimately what matters is what and where the pattern matches, because that defines what goes into the returned array.
If you want to split your original string into 7 parts using regular expression, then you can do something like this:
String s = "12-3456";
String[] parts = s.split("(?!^)");
System.out.println(parts.length); // prints "7"
for (String part : parts) {
System.out.println("Part [" + part + "]");
} // prints "[1] [2] [-] [3] [4] [5] [6] "
This splits on zero-length matching assertion (?!^), which is anywhere except before the first character in the string. This prevents the empty string to be the first element in the array, and trailing empty string is already discarded because we use the default limit parameter to split.
Using regular expression to get individual character of a string like this is an overkill, though. If you have only a few characters, then the most concise option is to use foreach on the toCharArray():
for (char ch : "12-3456".toCharArray()) {
System.out.print("[" + ch + "] ");
}
This is not the most efficient option if you have a longer string.
Splitting on -
This may also be what you're looking for:
String s = "12-3456";
String[] parts = s.split("-");
System.out.println(parts.length); // prints "2"
for (String part : parts) {
System.out.print("[" + part + "] ");
} // prints "[12] [3456] "