Regex pattern matcher - java

I have a string :
154545K->12345K(524288K)
Suppose I want to extract numbers from this string.
The string contains the group 154545 at position 0, 12345 at position 1 and 524288 at position 2.
Using regex \\d+, I need to extract 12345 which is at position 1.
I am getting the desired result using this :
String lString = "154545K->12345K(524288K)";
Pattern lPattern = Pattern.compile("\\d+");
Matcher lMatcher = lPattern.matcher(lString);
String lOutput = "";
int lPosition = 1;
int lGroupCount = 0;
while(lMatcher.find()) {
if(lGroupCount == lPosition) {
lOutput = lMatcher.group();
break;
}
else {
lGroupCount++;
}
}
System.out.println(lOutput);
But, is there any other simple and direct way to achieve this keeping the regex same \\d+(without using the group counter)?

try this
String d1 = "154545K->12345K(524288K)".replaceAll("(\\d+)\\D+(\\d+).*", "$1");

If you expect your number to be at the position 1, then you can use find(int start) method like this
if (lMatcher.find(1) && lMatcher.start() == 1) {
// Found lMatcher.group()
}
You can also convert your loop into for loop to get ride of some boilerplate code
String lString = "154540K->12341K(524288K)";
Pattern lPattern = Pattern.compile("\\d+");
Matcher lMatcher = lPattern.matcher(lString);
int lPosition = 2;
for (int i = 0; i < lPosition && lMatcher.find(); i++) {}
if (!lMatcher.hitEnd()) {
System.out.println(lMatcher.group());
}

Related

How to find the words starting after line breaks, using regex, Java?

I have an input string, consisting of several lines, e.g.:
When I was younger
I never needed
And I was always OK
but it was a long Time Ago
The problem is to invert first letters of all the words which length is more than 3. That is an output must be the following:
when I Was Younger
I Never Needed
and I Was Always OK
But it Was a Long time ago
There is my code:
import java.util.regex.*;
public class Part3_1 {
public static void main(String[] args) {
String str = "When I was younger\r\nI never needed\r\nAnd I was always OK\r\nbut it was a long Time Ago";
System.out.println(convert(str));
}
public static String convert(String str) {
String result = "";
String[] strings = str.split(" ");
String regexLowerCase = "\\b[a-z]{3,}\\b";
String regexLowerCaseInitial = "(\\r\\n)[a-z]{3,}\\b";
String regexUpperCase = "\\b([A-Z][a-z]{2,})+\\b";
String regexUpperCaseInitial = "(\\r\\n)([A-Z][a-z]{2,})\\b";
Pattern patternLowerCase = Pattern.compile(regexLowerCase, Pattern.MULTILINE);
Pattern patternUpperCase = Pattern.compile(regexUpperCase, Pattern.MULTILINE);
Pattern patternLowerCaseInitial = Pattern.compile(regexLowerCaseInitial, Pattern.MULTILINE);
Pattern patternUpperCaseInitial = Pattern.compile(regexUpperCaseInitial, Pattern.MULTILINE);
for (int i = 0; i < strings.length; i++) {
Matcher matcherLowerCase = patternLowerCase.matcher(strings[i]);
Matcher matcherUpperCase = patternUpperCase.matcher(strings[i]);
Matcher matcherLowerCaseInitial = patternLowerCaseInitial.matcher(strings[i]);
Matcher matcherUpperCaseInitial = patternUpperCaseInitial.matcher(strings[i]);
char[] words = strings[i].toCharArray();
if (matcherLowerCase.find() || matcherLowerCaseInitial.find()) {
char temp = Character.toUpperCase(words[0]);
words[0] = temp;
result += new String(words);
} else if (matcherUpperCase.find() || matcherUpperCaseInitial.find()) {
char temp = Character.toLowerCase(words[0]);
words[0] = temp;
result += new String(words);
} else {
result += new String(words);
}
if (i < strings.length - 1) {
result += " ";
}
}
return result;
}
}
Here:
"\\b[a-z]{3,}\\b" is a regular expression, selecting all words in lower case which length is 3 or more symbols,
"\\b([A-Z][a-z]{2,})+\\b" is a regular expression, selecting all words starting from capital letter which length is 3 or more symbols.
Both regular expressions works properly but when we have a line breaks - they do not work. The output of my program execution is following:
when I Was Younger
I Never Needed
And I Was Always OK
but it Was a Long Time ago
As I understood, these regular expressions cannot select words And and but from needed\r\nAnd and OK\r\nbut respectively.
To fix this bug I tried to add new regular expressions "(\\r\\n)[a-z]{3,}\\b" and "(\\r\\n)([A-Z][a-z]{2,})\\b", but they do not work.
How to compose the regular expressions, selecting words after line breaks?
One option would be to split the string on a word break (\b) instead, and then pass the white space through to the final string in the strings array. This removes the need to have separate regex for the different situations, and also the need to add back space characters. This will give you the results you want:
public static String convert(String str) {
String result = "";
String[] strings = str.split("\\b");
String regexLowerCase = "^[a-z]{3,}";
String regexUpperCase = "^[A-Z][a-z]{2,}+";
Pattern patternLowerCase = Pattern.compile(regexLowerCase, Pattern.MULTILINE);
Pattern patternUpperCase = Pattern.compile(regexUpperCase, Pattern.MULTILINE);
for (int i = 0; i < strings.length; i++) {
Matcher matcherLowerCase = patternLowerCase.matcher(strings[i]);
Matcher matcherUpperCase = patternUpperCase.matcher(strings[i]);
char[] words = strings[i].toCharArray();
if (matcherLowerCase.find()) {
char temp = Character.toUpperCase(words[0]);
words[0] = temp;
result += new String(words);
} else if (matcherUpperCase.find()) {
char temp = Character.toLowerCase(words[0]);
words[0] = temp;
result += new String(words);
} else {
result += new String(words);
}
}
return result;
}
Output:
when I Was Younger
I Never Needed
and I Was Always OK
But it Was a Long time ago
Demo on rextester

getting string from string start wtih "abc" and end with "def"

I am using StringUtils (import org.apache.commons.lang3.StringUtils;) library to split string like:
String str = "ZXCVFMS2ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEFZXCVFMS3ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEFZXCVFMERRORDEF";
I need to take out string start with zxcv* and end with *def as
String tmp1 = "ZXCVFMS2ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEF";
String tmp2 = "ZXCVFMS3ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEF";
any help?
Solution thanks to #assylias :
Pattern p = Pattern.compile("ZXCV.*?DEF");
Matcher m = p.matcher(str);
List<String> result = new ArrayList<> ();
while (m.find()) {
result.add(m.group());
}
How about using replaceAll?
String tmp = str.replaceAll(".*(zxcv.*def).*", "$1"); //zxcvVariableCanChancedef
UPDATE following your edit
if you have a repeating pattern, you could use a Matcher - to avoid matching the whole string use the ? quantifier to make the match lazy.
Pattern p = Pattern.compile("zxcv.*?def");
String input = "15684zxcvVariableCanChancedefABCDEND15684zxcvVariableCanChancedefABCDEND";
Matcher m = p.matcher(input);
List<String> result = new ArrayList<> ();
while (m.find()) {
result.add(m.group());
}
This can be done without any additional libraries using core java.util.regex functionality. For example:
String str = "15684zxcvVariableCanChancedefABCDEND";
Pattern pattern = Pattern.compile(".*(zxcv.*def).*");
Matcher matcher = pattern.matcher(str);
if (matcher.matches()) {
System.out.println(matcher.group(1)); // ==> zxcvVariableCanChancedef
}
String line = "15684zxcvAAAAAAAncedefABCDEND15684zxcvBBBBBBBBBBdefABCDEND";
Last occurrence :
Matcher matcher = Pattern.compile(".*(zxcv.*def).*").matcher(line);
String tmp = matcher.find() ? matcher.group(1) : null;
System.out.println(tmp);
First occurence :
Matcher matcher = Pattern.compile(".*?(zxcv.*?def).*").matcher(line);
Biggest occurence (from first zxcv to last def) :
Matcher matcher = Pattern.compile(".*?(zxcv.*def).*").matcher(line);
All occurrences
Matcher matcher = Pattern.compile(".*?(zxcv.*?def)").matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
I am not sure about it because I wrote it using a text document, I don't have any java IDE in this computer. I hope it helps
public String XXX()
{
int firstStorage = 0;
int secondStorage = 0;
for (int i = 0 ; i < tmp.lenght() < i++)
{
if( tmp.substring(i,i+4).equals("zxcv"))
{
firstStorage = i;
break;
}
}
for (int i = firstStorage ; i < tmp.lenght() < i++)
{
if( tmp.substring(i,i+3).equals("def"))
{
secondStorage = i + 2;
break;
}
}
return tmp.substring(firstStorage, secondStorage + 1);
}
Let me know if it is working or not. Have a nice day !!
String str = "15684zxcvVariableCanChancedefABCDEND15684zxcvVariableCanChancedefABCDEND";
List<string> strList = new List<string>();
while (str.IndexOf("zxc") >= 0 && str.IndexOf("def") >= 0)
{
var startIndex = str.IndexOf("zxc");
var stopIndex = str.IndexOf("def");
var item = str.Substring(startIndex, stopIndex - startIndex + 3);
strList.Add(item);
str = str.Substring(0, startIndex) + str.Substring(stopIndex+3);
}

Finding the longest "number sequence" in a string using only a single regex

I want to find a single regex which matches the longest numerical string in a URL.
I.e for the URL: http://stackoverflow.com/1234/questions/123456789/ask, I would like it to return : 123456789
I thought I could use : ([\d]+)
However this returns the first match from the left, not the longest.
Any ideas :) ?
This regex will be used as an input to a strategy pattern, which extracts certain characteristics from urls:
public static String parse(String url, String RegEx) {
Pattern pattern = Pattern.compile(regex);
Matcher m = pattern.matcher(url);
if (m.find()) {
return m.group(1);
}
return null;
}
So it would be much tidier if I could use a single regex. :( –
Don't use regex. Just iterate the characters:
String longest = 0;
int i = 0;
while (i < str.length()) {
while (i < str.length() && !Character.isDigit(str.charAt(i))) {
++i;
}
int start = i;
while (i < str.length() && Character.isDigit(str.charAt(i))) {
++i;
}
if (i - start > longest.length()) {
longest = str.substring(start, i);
}
}
#Andy already gave a non-regex answer, which is probably faster, but if you want to use regex, you must, as #Jan points out, add logic, e.g.:
public String findLongestNumber(String input) {
String longestMatch = "";
int maxLength = 0;
Matcher m = Pattern.compile("([\\d]+)").matcher(input);
while (m.find()) {
String currentMatch = m.group();
int currentLength = currentMatch.length();
if (currentLength > maxLength) {
maxLength = currentLength;
longestMatch = currentMatch;
}
}
return longestMatch;
}
t
Not possible with pure Regex, however I would do it this way (using Stream Max and Regex) :
String url = "http://stackoverflow.com/1234/questions/123456789/ask";
Pattern biggest = Pattern.compile("/(\\d+)/");
Matcher m = biggest.matcher(url);
List<String> matches = new ArrayList<>();
while(m.find()){
matches.add(m.group(1));
}
System.out.println(matches.parallelStream().max((String a, String b) -> Integer.compare(a.length(), b.length())).get());
Will print : 123456789

Extract an int from a large string

I'm trying to get an int from a String. The String will always come as:
"mombojumbomombojumbomombojumbomombojumbomombojumbomombojumbohello=1?fdjaslkd;fdsjaflkdjfdklsa;fjdklsa;djsfklsa;dfjklds;afj=124214fdsamf=352"
The only constant in all of this, is that I will have a "hello=" followed by a number. With just that, I can't figure out how to pull out the number after the "hello=". This is what I have tried so far with no luck.
EDIT: The number will always be followed by a "?"
String[] tokens = s.split("hello=");
for (String t : tokens)
System.out.println(t);
I can't figure out how to isolate it from both sides of the int.
Pattern p = Pattern.compile("hello=(\\d+)");
Matcher m = p.matcher (s);
while (m.find())
System.out.println(m.group(1));
This sets up a search for anywhere in s that contains hello= followed by one or more digits (\\d+ means one or more digits). The loop looks for each occurrence of this pattern, and then whenever it finds a match, m.group(1) extracts the digits (since those are grouped in the pattern).
You should use a regular expression for this:
String str = "mombojumbomombojumbomombojumbomombojumbomombojumbomombojumbohello=1fdjaslkd;fdsjaflkdjfdklsa;fjdklsa;djsfklsa;dfjklds;afj=124214fdsamf=352";
Pattern p = Pattern.compile("hello=(\\d+)");
Matcher m = p.matcher(str);
if (m.find()) {
System.out.println(m.group(1)); // prints 1
}
Try this:
String r = null;
int col = s.indexOf("hello="); // find the starting column of the marker string
if (col >= 0) {
String s2 = s.substring(col + 6); // get digits and the rest (add length of marker)
col = 0;
// now find the end of the digits (assume no plus or comma or dot chars)
while (col < s2.length() && Character.isDigit(s2.charAt(col))) {
col++;
}
if (col > 0) {
r = s2.substring(0, col); // get the digits off the front
}
}
r will be the string you want or it will be null if no number was found.
Here is another non-regex performance approach. Wrapped in a method for your convenience
Helper method
public static Integer getIntegerForKey(String key, String s)
{
int startIndex = s.indexOf(key);
if (startIndex == -1)
return null;
startIndex += key.length();
int endIndex = startIndex;
int len = s.length();
while(endIndex < len && Character.isDigit(s.charAt(endIndex))) {
++endIndex;
}
if (endIndex > startIndex)
return new Integer(s.substring(startIndex, endIndex));
return null;
}
Usage
Integer result = getIntegerForKey("hello=", yourInputString);
if (result != null)
System.out.println(result);
else
System.out.println("Key-integer pair not found.");
Yet another non regex solution:
String str = "mombojumbomombojumbomombojumbomombojumbomombojumbomombojumbohello=142?fdjaslkd;fdsjaflkdjfdklsa;fjdklsa;djsfklsa;dfjklds;afj=124214fdsamf=352";
char arr[] = str.substring(str.indexOf("hello=")+6).toCharArray();
String buff ="";
int i=0;
while(Character.isDigit(arr[i])){
buff += arr[i++];
}
int result = Integer.parseInt(buff);
System.out.println(result);

Java: regular expressions to parse the "edges" of a string

Java novice here.
Say I'm given a string:
===This 銳is a= stri = ng身===
How would I use pattern-matching to efficiently figure out how many "=" signs there are at the edges of "This 銳is a= stri = ng身"?
Also, I'm trying to use Java escape sequences such as \G, but apparently they don't compile.
I personally probably wouldn't use a regex for this, but ... this is what works:
Matcher m = Pattern.compile("^(=+).+[^=](=+)$").matcher("===Som=e=Text====");
m.find();
int count = m.group(1).length() + m.group(2).length();
System.out.println(count);
(Note this isn't doing error checking and assume there are = on both ends)
Edit to Add: And here's one that works regardless if there's = on either end:
public static int equalsCount(String source)
{
int count = 0;
Matcher m = Pattern.compile("^(=+)?.+[^=](=+)?$").matcher(source);
if (m.find())
{
count += m.group(1) == null ? 0 : m.group(1).length();
count += m.group(2) == null ? 0 : m.group(2).length();
}
return count;
}
public static void main(String[] args)
{
System.out.println(equalsCount("===Some=tex=t="));
System.out.println(equalsCount("===Some=tex=t"));
System.out.println(equalsCount("Some=tex=t="));
System.out.println(equalsCount("Some=tex=t"));
}
On the other hand ... you could avoid the regex and do:
String myString = "==blah=";
int count = 0;
int i = 0;
while (myString.charAt(i++) == '=')
{
count++;
}
i = myString.length() - 1;
while (myString.charAt(i--) == '=')
{
count++;
}
If you want to count the number of occurrence of "=" at the edges then try this.
int count = str.length() - str.replaceAll("[^=]=[^=]", "").length();
This can be one probable answer:
public static void main(String[] args) {
int count = 0;
String str = "===This is a= stri = ng===";
Pattern edgeEq = Pattern.compile("=");
Pattern wordEq = Pattern.compile("[^=]=+[^=]");
Matcher edgeMatch = edgeEq.matcher(str);
while (edgeMatch.find()) {
count++;
}
Matcher wordMatch = wordEq.matcher(str);
while (wordMatch.find()) {
count--;
}
System.out.println(count);
}
This will help you find the number of = on the edges of the string.
Assuming there are always the same number of = at the start as at the end:
import java.util.regex.*;
Matcher m = Pattern.compile("^=*").matcher(s);
int count = m.find()? m.group(0).length(): 0;
Use the following code
String s1 = "===This 銳is a= stri = ng身===";
System.out.println("Length : "+s1.length());
p = Pattern.compile("^=+");
m = p.matcher(s1);
int count = 0;
while (m.find())
{
count = m.group().length();
System.out.println("Group : "+m.group());
}
p = Pattern.compile("(=+)$");
m = p.matcher(s1);
while (m.find())
{
count += m.group().length();
System.out.println("End Group : "+m.group());
}
System.out.println("Total : " + count);
If = at the edges are balanced you can use
^(=+).*\1$
Group1's length is the length of = at the edges

Categories

Resources