Get substring in a string with multiple occurring string

Get substring in a string with multiple occurring string - java

I have a string something like
(D#01)5(D#02)14100319530033M(D#03)1336009-A-A(D#04)141002A171(D#05)1(D#06)
Now i want to get substring between (D#01)5(D#02)
If i have something like
(D#01)5(D#02)
i can get detail with
quantity = content.substring(content.indexOf("(D#01)") + 6, content.indexOf("(D#02)"));
But somethings D#02 can be different like #05, Now how can i use simple (D# to get string in between. there are multiple repetitions of (D#
Basically this is what i want to do
content.substring(content.indexOf("(D#01)") + 6, content.nextOccurringIndexOf("(D#"));

I suppose you can do
int fromIndex = content.indexOf("(D#01)") + 6;
int toIndex = content.indexOf("(D#", fromIndex); // next occurring
if (fromIndex != -1 && toIndex != -1)
str = content.substring(fromIndex, toIndex);
Output
5
See http://ideone.com/RrUtBy demo.

Assuming that the marker and value are some how linked and you want to know each ((D#01) == 5), then you can make use of the Pattern/Matcher API, for example
String text = "(D#01)5(D#02)14100319530033M(D#03)1336009-A-A(D#04)141002A171(D#05)1(D#06)";
Pattern p = Pattern.compile("\\(D#[0-9]+\\)");
Matcher m = p.matcher(text);
while (m.find()) {
String name = m.group();
if (m.end() < text.length()) {
String content = text.substring(m.end()) + 1;
content = content.substring(0, content.indexOf("("));
System.out.println(name + " = " + content);
}
}
Which outputs
(D#01) = 5
(D#02) = 14100319530033M
(D#03) = 1336009-A-A
(D#04) = 141002A171
(D#05) = 1
Now, this is a little heavy handed, I'd create some kind of "marker" object which contained the key (D#01) and it's start and end indices. I'd then keep this information in a List and cut up each value based on the end of the earlier key and the start of the last key...but that's just me ;)

You can use regex capture groups if want the content between the (D###)'s
Pattern p = Pattern.compile("(\\(D#\\d+\\))(.*?)(?=\\(D#\\d+\\))");
Matcher matcher = p.matcher("(D#01)5(D#02)14100319530033M(D#03)1336009-A-A(D#04)141002A171(D#05)1(D#06)");
while(matcher.find()) {
System.out.println(String.format("%s start: %2s end: %2s matched: %s ",
matcher.group(1), matcher.start(2), matcher.end(2), matcher.group(2)));
}
(D#01) start: 6 end: 7 matched: 5
(D#02) start: 13 end: 28 matched: 14100319530033M
(D#03) start: 34 end: 45 matched: 1336009-A-A
(D#04) start: 51 end: 61 matched: 141002A171
(D#05) start: 67 end: 68 matched: 1

You can user regex to split the input - as suggested by #MadProgrammer. split() method produces a table of Strings, so the order of the occurrences of the searched values will be exactly the same as the order of the values in the table produced by split(). For example:
String input = "(D#01)5(D#02)14100319530033M(D#03)1336009-A-A(D#04)141002A171(D#05)1(D#06)";
String[] table = input.split("\(D#[0-9]+\)");

Try this:
public static void main(String[] args) {
String input = "(D#01)5(D#02)14100319530033M(D#03)1336009-A-A(D#04)141002A171(D#05)1(D#06)";
Pattern p = Pattern.compile("\\(D#\\d+\\)(.*?)(?=\\(D#\\d+\\))");
Matcher matches = p.matcher(input);
while(matches.find()) {
int number = getNum(matches.group(0)); // parses the number
System.out.printf("%d. %s\n", number, matches.group(1)); // print the string
}
}
public static int getNum(String str) {
int start = str.indexOf('#') + 1;
int end = str.indexOf(')', start);
return Integer.parseInt(str.substring(start,end));
}
Result:
1. 5
2. 14100319530033M
3. 1336009-A-A
4. 141002A171
5. 1

Related

Count & Split by regex pattern in java

I have a string in below format.
-52/ABC/35/BY/200/L/DEF/307/C/110/L
I need to perform the following.
1. Find the no of occurrences of 3 letter word's like ABC,DEF in the above text.
2. Split the above string by ABC and DEF as shown below.
ABC/35/BY/200/L
DEF/307/C/110/L
I have tried using regex with below code, but it always shows the match count is zero. How to approach this easily.
static String DEST_STRING = "^[A-Z]{3}$";
static Pattern DEST_PATTERN = Pattern.compile(DEST_STRING,
Pattern.CASE_INSENSITIVE | Pattern.DOTALL);
public static void main(String[] args) {
String test = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
Matcher destMatcher = DEST_PATTERN.matcher(test);
int destCount = 0;
while (destMatcher.find()) {
destCount++;
}
System.out.println(destCount);
}
Please note i need to use JDK 6 for this,

You can use this code :
public static void main(String[] args) throws Exception {
String s = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
// Pattern to find all 3 letter words . The \\b means "word boundary", which ensures that the words are of length 3 only.
Pattern p = Pattern.compile("(\\b[a-zA-Z]{3}\\b)");
Matcher m = p.matcher(s);
Map<String, Integer> countMap = new HashMap<>();
// COunt how many times each 3 letter word is used.
// Find each 3 letter word.
while (m.find()) {
// Get the 3 letter word.
String val = m.group();
// If the word is present in the map, get old count and add 1, else add new entry in map and set count to 1
if (countMap.containsKey(val)) {
countMap.put(val, countMap.get(val) + 1);
} else {
countMap.put(val, 1);
}
}
System.out.println(countMap);
// Get ABC.. and DEF.. using positive lookahead for a 3 letter word or end of String
// Finds and selects everything starting from a 3 letter word until another 3 letter word is found or until string end is found.
p = Pattern.compile("(\\b[a-zA-Z]{3}\\b.*?)(?=/[A-Za-z]{3}|$)");
m = p.matcher(s);
while (m.find()) {
String val = m.group();
System.out.println(val);
}
}
O/P :
{ABC=1, DEF=1}
ABC/35/BY/200/L
DEF/307/C/110/L

Check this one:
String stringToSearch = "-52/ABC/35/BY/200/L/DEF/307/C/110/L";
Pattern p1 = Pattern.compile("\\b[a-zA-Z]{3}\\b");
Matcher m = p1.matcher(stringToSearch);
int startIndex = -1;
while (m.find())
{
//Try to use Apache Commons' StringUtils
int count = StringUtils.countMatches(stringToSearch, m.group());
System.out.println(m.group +":"+ count);
if(startIndex != -1){
System.out.println(stringToSearch.substring(startIndex,m.start()-1));
}
startIndex = m.start();
}
if(startIndex != -1){
System.out.println(stringToSearch.substring(startIndex));
}
output:
ABC : 1
ABC/35/BY/200/L
DEF : 1
DEF/307/C/110/L

Regular Expression to check if a String contains '1-n' Integers and then '0-m' Alphabets

I got a peculiar situation where I need to validate a String.
String has to satisfy some criteria to move further. which are :
String should start with an Integer value whose length should be > 1
and < n
and then followed by alphabets whose length should be from 0 to m (which means alphabet may be present or may not be present)
myString.charAt(0) is giving me if the string starts with Integer.
How to validate it contains only < n integers ?
How to validate it is by > 0 and < n integers followed by 0 to < m alphabets ?
can I get a regular expression to solve it ?

This should work
^\d{1,n - 1}[A-Za-z]{0,m - 1}$
As you want < n. So it should be n-1
DEMO
Code in JAVA
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class RegexMatches
{
static boolean isValid(String x, int n, int m)
{
String pattern = "^\\d{1," + (n - 1) + "}[A-Za-z]{0," + (m - 1) + "}$";
Pattern r = Pattern.compile(pattern);
Matcher t = r.matcher(x);
return t.find();
}
public static void main( String args[] ){
// String to be scanned to find the pattern.
String line = "123abcdef";
int n = 4, m = 4;
if (isValid(line, n, m)) {
System.out.println("FOUND");
} else {
System.out.println("NOT FOUND");
}
}
}
The value of n should be greater than or equal to 2 and the value of m should be greater than 1
IDEONE DEMO

You can match this with a very simple regex:
^(\d+)([A-z]*)$
1 or more digits, followed by 0 or more letters. You can very easily grab the capture groups to find out exactly how many digits or how many letters are in the string. If you know m and n ahead of time as specific numbers, then insert them into the regex like so:
For n = 4 and m = 3,
^(\d{1,4})([A-z]{0,3})$
This will match 0000aaa, but not aaa or 000aaaa.

Another variant of the solution that we have seen so far. All answers are really good. This should also match unicode.
Pattern
\b\p{N}{1,n}\p{L}{0,m}\W
Source Code
public static void matchNumeroAlphaString(){
int n = 3;
int m = 3;
String text =
"John32 54writes about this, 444 and 456Joh writes about that," +
" and John writes #about 9EveryThing. ";
String patternString = "\\b\\p{N}{1," + n + "}\\p{L}{0," + m + "}\\W";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
while(matcher.find()) {
System.out.println("Found: " + matcher.group());
}
}
Output
Found: 444
Found: 456Joh

How to extract a multiple quoted substrings in Java

I have a string that has multiple substring which has to be extracted. Strings which will be extracted is between ' character.
I could only extract the first or the last one when I use indexOf or regex.
How could I extract them and put them into array or list without parsing the same string only?
resultData = "Error 205: 'x' data is not crawled yet. Check 'y' and 'z' data and update dataset 't'";
I have a tried below;
protected static String errorsTPrinted(String errStr, int errCode) {
if (errCode== 202 ) {
ArrayList<String> ar = new ArrayList<String>();
Pattern p = Pattern.compile("'(.*?)'");
Matcher m = p.matcher(errStr);
String text;
for (int i = 0; i < errStr.length(); i++) {
m.find();
text = m.group(1);
ar.add(text);
}
return errStr = "Err 202: " + ar.get(0) + " ... " + ar.get(1) + " ..." + ar.get(2) + " ... " + ar.get(3);
}
Edit
I used #MinecraftShamrock 's approach.
if (errCode== 202 ) {
List<String> getQuotet = getQuotet(errStr, '\'');
return errStr = "Err 202: " + getQuotet.get(0) + " ... " + getQuotet.get(1) + " ..." + getQuotet.get(2) + " ... " + getQuotet.get(3);
}

You could use this very straightforward algorithm to do so and avoid regex (as one can't be 100% sure about its complexity):
public List<String> getQuotet(final String input, final char quote) {
final ArrayList<String> result = new ArrayList<>();
int n = -1;
for(int i = 0; i < input.length(); i++) {
if(input.charAt(i) == quote) {
if(n == -1) { //not currently inside quote -> start new quote
n = i + 1;
} else { //close current quote
result.add(input.substring(n, i));
n = -1;
}
}
}
return result;
}
This works with any desired quote-character and has a runtime complexity of O(n). If the string ends with an open quote, it will not be included. However, this can be added quite easily.
I think this is preferable over regex as you can ba absolutely sure about its complexity. Also, it works with a minimum of library classes. If you care about efficiency for big inputs, use this.
And last but not least, it does absolutely not care about what is between two quote characters so it works with any input string.

Simply use the pattern:
'([^']++)'
And a Matcher like so:
final Pattern pattern = Pattern.compile("'([^']++)'");
final Matcher matcher = pattern.matcher(resultData);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
This loops through each match in the String and prints it.
Output:
x
y
z
t

Here is a simple approach (assuming there are no escaping characters etc.):
// Compile a pattern to find the wanted strings
Pattern p = Pattern.compile("'([^']+)'");
// Create a matcher for given input
Matcher m = p.matcher(resultData);
// A list to put the found strings into
List<String> list = new ArrayList<String>();
// Loop over all occurrences
while(m.find()) {
// Retrieve the matched text
String text = m.group(1);
// Do something with the text, e.g. add it to a List
list.add(text);
}

String - Search Index of All Specified Strings

I have a String for example
String value1 = "12345 abc 123 def 123";
and another one to be searched on it
String value2 ="123";
for example.
How can I return all the indices the 2nd string appears on the first? From what I've searched, indexOf and lastIndexOf searches the string for the first and last index of the specified string.

Use the optional parameter of indexOf.
List<Integer> indexes = new ArrayList<>();
for (int idx = haystack.indexOf(needle);
idx != -1;
idx = haystack.indexOf(needle, idx + 1)) {
indexes.add(idx);
}

Use indexOf(String str, int fromIndex)) in a loop.

Use Regex it is better
class Test {
public static void main(String[] args) {
String value1= "12345 abc 123 def 123";
Pattern pattern = Pattern.compile("123");
Matcher matcher = pattern.matcher(value1);
int count = 0;
while (matcher.find()){
count++;
}
System.out.println(count);
}
}

Try Regexes with Pattern and Matcher.
Then you can get the start indices with matcher.start(), the end indices (exclusive) with matcher.end().
String value1 = "12345 abc 123 def 123";
String value2 = "123";
Pattern pattern = Pattern.compile(value2);
Matcher matcher = pattern.matcher(value1);
while (matcher.find()){
System.out.println("Start: " + matcher.start() + " -- End: " + matcher.end());
}
This will give you as output:
Start: 0 -- End: 3
Start: 10 -- End: 13
Start: 18 -- End: 21

You will need to implement this function yourself. You could use something like:
package com.example.stringutils;
import java.util.ArrayList;
public class Util {
/** Return a list of all the indexes of the 'key' string that occur in the
* 'arbitrary' string. If there are none an empty list is returned.
*
* #param key
* #param arbitrary
* #return
*/
private static ArrayList<Integer> allIndexesOf(String key, String arbitrary) {
ArrayList<Integer> result = new ArrayList<Integer>();
if (key == null | key.length() == 0 | arbitrary == null | arbitrary.length()<key.length()) {
return result;
}
int loc = -1;
while ((loc = arbitrary.indexOf(key, loc+1)) > -1) {
result.add(loc);
}
return result;
}
}
You may want to see if regular expressions actually perform faster (fewer code lines aren't always faster, just simpler "code").

Java regex partial match with period

In Java I am using Pattern and Matcher to find all instances of ".A (a number)" in a set of strings to retrieve the numbers.
I run into problems because one of the words in the file is "P.A.M.X." and the number returns 0. It won't go through the rest of the file. I've tried using many different regular expressions but I can't get past that occurrence of "P.A.M.X." and onto the next ".A (number)"
for (int i = 0; i < input.size(); i++) {
Pattern pattern = Pattern.compile("\\.A\\s\\d+");
Matcher matcher = pattern.matcher(input.get(i));
while (matcherDocId.find())
{
String matchFound = matcher.group().toString();
int numMatch = 0;
String[] tokens = matchFound.split(" ");
numMatch = Integer.parseInt(tokens[1]);
System.out.println("The number is: " + numMatch);
}
}

Short sample for you:
Pattern pattern = Pattern.compile("\\.A\\s(\\d+)"); // grouping number
Matcher matcher = pattern.matcher(".A 1 .A 2 .A 3 .A 4 *text* .A5"); // full input string
while (matcher.find()) {
int n = Integer.valueOf(matcher.group(1)); // getting captured number - group #1
System.out.println(n);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Get substring in a string with multiple occurring string - java

I suppose you can do int fromIndex = content.indexOf("(D#01)") + 6; int toIndex = content.indexOf("(D#", fromIndex); // next occurring if (fromIndex != -1 && toIndex != -1) str = content.substring(fromIndex, toIndex); Output 5 See http://ideone.com/RrUtBy demo.

Related

Count & Split by regex pattern in java

Regular Expression to check if a String contains '1-n' Integers and then '0-m' Alphabets

How to extract a multiple quoted substrings in Java

String - Search Index of All Specified Strings

Java regex partial match with period

Categories

Resources