Extract strings from a pattern in java using Matcher and pattern - java

If I have a String like this "Error. LineNumber = 2, originalLine = 'ABC', lineErrors = [Special chars found]", I would like to extract
the line number as '2',
originalLine as 'ABC' and
error as 'Special chars found'
I am very new to regex, any pointers would be very helpful. I browsed through few past questions but did not get what I wanted.

You can use capturing groups to capture the values. This is the sample code in Java. This works for the specified string but you can tweak and change it accordingly.
public class Main {
public static void main(String[] args) {
String s = "Error. LineNumber = 2, originalLine = 'ABC', lineErrors = [Special chars found]";
String patternStr = "Error. LineNumber = ([\\S ]+), originalLine = ([\\S ]+), lineErrors = ([\\S ]+)";
Pattern p = Pattern.compile(patternStr);
Matcher m = p.matcher(s);
if (m.find()) {
int count = m.groupCount();
System.out.println("group count is " + count);
for (int i = 0; i < count; i++) {
System.out.println(m.group(i+1));
}
}
}
}

Related

Count & Split by regex pattern in java

I have a string in below format.
-52/ABC/35/BY/200/L/DEF/307/C/110/L
I need to perform the following.
1. Find the no of occurrences of 3 letter word's like ABC,DEF in the above text.
2. Split the above string by ABC and DEF as shown below.
ABC/35/BY/200/L
DEF/307/C/110/L
I have tried using regex with below code, but it always shows the match count is zero. How to approach this easily.
static String DEST_STRING = "^[A-Z]{3}$";
static Pattern DEST_PATTERN = Pattern.compile(DEST_STRING,
Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
public static void main(String[] args) {
String test = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
Matcher destMatcher = DEST_PATTERN.matcher(test);
int destCount = 0;
while (destMatcher.find()) {
destCount++;
}
System.out.println(destCount);
}
Please note i need to use JDK 6 for this,
You can use this code :
public static void main(String[] args) throws Exception {
String s = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
// Pattern to find all 3 letter words . The \\b means "word boundary", which ensures that the words are of length 3 only.
Pattern p = Pattern.compile("(\\b[a-zA-Z]{3}\\b)");
Matcher m = p.matcher(s);
Map<String, Integer> countMap = new HashMap<>();
// COunt how many times each 3 letter word is used.
// Find each 3 letter word.
while (m.find()) {
// Get the 3 letter word.
String val = m.group();
// If the word is present in the map, get old count and add 1, else add new entry in map and set count to 1
if (countMap.containsKey(val)) {
countMap.put(val, countMap.get(val) + 1);
} else {
countMap.put(val, 1);
}
}
System.out.println(countMap);
// Get ABC.. and DEF.. using positive lookahead for a 3 letter word or end of String
// Finds and selects everything starting from a 3 letter word until another 3 letter word is found or until string end is found.
p = Pattern.compile("(\\b[a-zA-Z]{3}\\b.*?)(?=/[A-Za-z]{3}|$)");
m = p.matcher(s);
while (m.find()) {
String val = m.group();
System.out.println(val);
}
}
O/P :
{ABC=1, DEF=1}
ABC/35/BY/200/L
DEF/307/C/110/L
Check this one:
String stringToSearch = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
Pattern p1 = Pattern.compile("\\b[a-zA-Z]{3}\\b");
Matcher m = p1.matcher(stringToSearch);
int startIndex = -1;
while (m.find())
{
//Try to use Apache Commons' StringUtils
int count = StringUtils.countMatches(stringToSearch, m.group());
System.out.println(m.group +":"+ count);
if(startIndex != -1){
System.out.println(stringToSearch.substring(startIndex,m.start()-1));
}
startIndex = m.start();
}
if(startIndex != -1){
System.out.println(stringToSearch.substring(startIndex));
}
output:
ABC : 1
ABC/35/BY/200/L
DEF : 1
DEF/307/C/110/L

getting string from string start wtih "abc" and end with "def"

I am using StringUtils (import org.apache.commons.lang3.StringUtils;) library to split string like:
String str = "ZXCVFMS2ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEFZXCVFMS3ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEFZXCVFMERRORDEF";
I need to take out string start with zxcv* and end with *def as
String tmp1 = "ZXCVFMS2ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEF";
String tmp2 = "ZXCVFMS3ZZ1012ZZ1012ZZ1000ZZ0923ZZ0990ZZ0990ZZ0990ZZ1020DEF";
any help?
Solution thanks to #assylias :
Pattern p = Pattern.compile("ZXCV.*?DEF");
Matcher m = p.matcher(str);
List<String> result = new ArrayList<> ();
while (m.find()) {
result.add(m.group());
}
How about using replaceAll?
String tmp = str.replaceAll(".*(zxcv.*def).*", "$1"); //zxcvVariableCanChancedef
UPDATE following your edit
if you have a repeating pattern, you could use a Matcher - to avoid matching the whole string use the ? quantifier to make the match lazy.
Pattern p = Pattern.compile("zxcv.*?def");
String input = "15684zxcvVariableCanChancedefABCDEND15684zxcvVariableCanChancedefABCDEND";
Matcher m = p.matcher(input);
List<String> result = new ArrayList<> ();
while (m.find()) {
result.add(m.group());
}
This can be done without any additional libraries using core java.util.regex functionality. For example:
String str = "15684zxcvVariableCanChancedefABCDEND";
Pattern pattern = Pattern.compile(".*(zxcv.*def).*");
Matcher matcher = pattern.matcher(str);
if (matcher.matches()) {
System.out.println(matcher.group(1)); // ==> zxcvVariableCanChancedef
}
String line = "15684zxcvAAAAAAAncedefABCDEND15684zxcvBBBBBBBBBBdefABCDEND";
Last occurrence :
Matcher matcher = Pattern.compile(".*(zxcv.*def).*").matcher(line);
String tmp = matcher.find() ? matcher.group(1) : null;
System.out.println(tmp);
First occurence :
Matcher matcher = Pattern.compile(".*?(zxcv.*?def).*").matcher(line);
Biggest occurence (from first zxcv to last def) :
Matcher matcher = Pattern.compile(".*?(zxcv.*def).*").matcher(line);
All occurrences
Matcher matcher = Pattern.compile(".*?(zxcv.*?def)").matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
I am not sure about it because I wrote it using a text document, I don't have any java IDE in this computer. I hope it helps
public String XXX()
{
int firstStorage = 0;
int secondStorage = 0;
for (int i = 0 ; i < tmp.lenght() < i++)
{
if( tmp.substring(i,i+4).equals("zxcv"))
{
firstStorage = i;
break;
}
}
for (int i = firstStorage ; i < tmp.lenght() < i++)
{
if( tmp.substring(i,i+3).equals("def"))
{
secondStorage = i + 2;
break;
}
}
return tmp.substring(firstStorage, secondStorage + 1);
}
Let me know if it is working or not. Have a nice day !!
String str = "15684zxcvVariableCanChancedefABCDEND15684zxcvVariableCanChancedefABCDEND";
List<string> strList = new List<string>();
while (str.IndexOf("zxc") >= 0 && str.IndexOf("def") >= 0)
{
var startIndex = str.IndexOf("zxc");
var stopIndex = str.IndexOf("def");
var item = str.Substring(startIndex, stopIndex - startIndex + 3);
strList.Add(item);
str = str.Substring(0, startIndex) + str.Substring(stopIndex+3);
}

Regex pattern matcher

I have a string :
154545K->12345K(524288K)
Suppose I want to extract numbers from this string.
The string contains the group 154545 at position 0, 12345 at position 1 and 524288 at position 2.
Using regex \\d+, I need to extract 12345 which is at position 1.
I am getting the desired result using this :
String lString = "154545K->12345K(524288K)";
Pattern lPattern = Pattern.compile("\\d+");
Matcher lMatcher = lPattern.matcher(lString);
String lOutput = "";
int lPosition = 1;
int lGroupCount = 0;
while(lMatcher.find()) {
if(lGroupCount == lPosition) {
lOutput = lMatcher.group();
break;
}
else {
lGroupCount++;
}
}
System.out.println(lOutput);
But, is there any other simple and direct way to achieve this keeping the regex same \\d+(without using the group counter)?
try this
String d1 = "154545K->12345K(524288K)".replaceAll("(\\d+)\\D+(\\d+).*", "$1");
If you expect your number to be at the position 1, then you can use find(int start) method like this
if (lMatcher.find(1) && lMatcher.start() == 1) {
// Found lMatcher.group()
}
You can also convert your loop into for loop to get ride of some boilerplate code
String lString = "154540K->12341K(524288K)";
Pattern lPattern = Pattern.compile("\\d+");
Matcher lMatcher = lPattern.matcher(lString);
int lPosition = 2;
for (int i = 0; i < lPosition && lMatcher.find(); i++) {}
if (!lMatcher.hitEnd()) {
System.out.println(lMatcher.group());
}

Splitting strings by {} & []

I'm sort of new to Java.
I would like to know if there's an easier yet efficient way to implement the following Splitting of String. I've tried with pattern and matcher but doesn't really come out the way I want it.
"{1,24,5,[8,5,9],7,[0,1]}"
to be split into:
1
24
5
[8,5,9]
7
[0,1]
This is a completely wrong code but I'm posting it anyway:
String str = "{1,24,5,[8,5,9],7,[0,1]}";
str= str.replaceAll("\\{", "");
str= str.replaceAll("}", "");
Pattern pattern = Pattern.compile("\\[(.*?)\\]");
Matcher matcher = pattern.matcher(str);
String[] test = new String[10];
// String[] _test = new String[10];
int i = 0;
String[] split = str.split(",");
while (matcher.find()) {
test[i] = matcher.group(0);
String[] split1 = matcher.group(0).split(",");
// System.out.println(split1[i]);
for (int j = 0; j < split.length; j++) {
if(!split[j].equals(test[j])&&((!split[j].contains("\\["))||!split[j].contains("\\]"))){
System.out.println(split[j]);
}
}
i++;
}
}
With a given String format lets say {a,b,[c,d,e],...} format. I want to enlist all the contents but the ones in the Square brackets are to be denoted as one element ( like an array).
This works:
public static void main(String[] args)
{
customSplit("{1,24,5,[8,5,9],7,[0,1]}");
}
static void customSplit(String str){
Pattern pattern = Pattern.compile("[0-9]+|\\[.*?\\]");
Matcher matcher =
pattern.matcher(str);
while (matcher.find()) {
System.out.println(matcher.group());
}
}
Yields the output
1
24
5
[8,5,9]
7
[0,1]

Java: regular expressions to parse the "edges" of a string

Java novice here.
Say I'm given a string:
===This 銳is a= stri = ng身===
How would I use pattern-matching to efficiently figure out how many "=" signs there are at the edges of "This 銳is a= stri = ng身"?
Also, I'm trying to use Java escape sequences such as \G, but apparently they don't compile.
I personally probably wouldn't use a regex for this, but ... this is what works:
Matcher m = Pattern.compile("^(=+).+[^=](=+)$").matcher("===Som=e=Text====");
m.find();
int count = m.group(1).length() + m.group(2).length();
System.out.println(count);
(Note this isn't doing error checking and assume there are = on both ends)
Edit to Add: And here's one that works regardless if there's = on either end:
public static int equalsCount(String source)
{
int count = 0;
Matcher m = Pattern.compile("^(=+)?.+[^=](=+)?$").matcher(source);
if (m.find())
{
count += m.group(1) == null ? 0 : m.group(1).length();
count += m.group(2) == null ? 0 : m.group(2).length();
}
return count;
}
public static void main(String[] args)
{
System.out.println(equalsCount("===Some=tex=t="));
System.out.println(equalsCount("===Some=tex=t"));
System.out.println(equalsCount("Some=tex=t="));
System.out.println(equalsCount("Some=tex=t"));
}
On the other hand ... you could avoid the regex and do:
String myString = "==blah=";
int count = 0;
int i = 0;
while (myString.charAt(i++) == '=')
{
count++;
}
i = myString.length() - 1;
while (myString.charAt(i--) == '=')
{
count++;
}
If you want to count the number of occurrence of "=" at the edges then try this.
int count = str.length() - str.replaceAll("[^=]=[^=]", "").length();
This can be one probable answer:
public static void main(String[] args) {
int count = 0;
String str = "===This is a= stri = ng===";
Pattern edgeEq = Pattern.compile("=");
Pattern wordEq = Pattern.compile("[^=]=+[^=]");
Matcher edgeMatch = edgeEq.matcher(str);
while (edgeMatch.find()) {
count++;
}
Matcher wordMatch = wordEq.matcher(str);
while (wordMatch.find()) {
count--;
}
System.out.println(count);
}
This will help you find the number of = on the edges of the string.
Assuming there are always the same number of = at the start as at the end:
import java.util.regex.*;
Matcher m = Pattern.compile("^=*").matcher(s);
int count = m.find()? m.group(0).length(): 0;
Use the following code
String s1 = "===This 銳is a= stri = ng身===";
System.out.println("Length : "+s1.length());
p = Pattern.compile("^=+");
m = p.matcher(s1);
int count = 0;
while (m.find())
{
count = m.group().length();
System.out.println("Group : "+m.group());
}
p = Pattern.compile("(=+)$");
m = p.matcher(s1);
while (m.find())
{
count += m.group().length();
System.out.println("End Group : "+m.group());
}
System.out.println("Total : " + count);
If = at the edges are balanced you can use
^(=+).*\1$
Group1's length is the length of = at the edges

Categories

Resources