java regex matching each group starting with specific string - java

I have a string like a1wwa1xxa1yya1zz.
I would like to get every groups starting with a1 until next a1 excluded.
(In my example, i would be : a1ww, a1xx, a1yyand a1zz
If I use :
Matcher m = Pattern.compile("(a1.*?)a1").matcher("a1wwa1xxa1yya1zz");
while(m.find()) {
String myGroup = m.group(1);
}
myGroup capture 1 group every two groups.
So in my example, I can only capture a1ww and a1yy.
Anyone have a great idea ?

Split is a good solution, but if you want to remain in the regex world, here is a solution:
Matcher m = Pattern.compile("(a1.*?)(?=a1|$)").matcher("a1wwa1xxa1yya1zz");
while (m.find()) {
String myGroup = m.group(1);
System.out.println("> " + myGroup);
}
I used a positive lookahead to ensure the capture is followed by a1, or alternatively by the end of line.
Lookahead are zero-width assertions, ie. they verify a condition without advancing the match cursor, so the string they verify remains available for further testing.

You can use split() method, then append "a1" as a prefix to splitted elements:
String str = "a1wwa1xxa1yya1zz";
String[] parts = str.split("a1");
String[] output = new String[parts.length - 1];
for (int i = 0; i < output.length; i++)
output[i] = "a1" + parts[i + 1];
for (String p : output)
System.out.println(p);
Output:
a1ww
a1xx
a1yy
a1zz

I would use an approach like this:
String str = "a1wwa1xxa1yya1zz";
String[] parts = str.split("a1");
for (int i = 1; i < parts.length; i++) {
String found = "a1" + parts[i];
}

Related

Splitting string by square brackets

I have a string that comes out like this: 1.[Aagaard,Lindsay][SeniorPolicyAdvisor][TREASURYBOARDSECRETARIAT][DEPUTYPREMIERANDPRESIDENTOFTHETREASURYBOARD,Toronto][416-327-0948][lindsay.aagaard#ontario.ca]2.[Aalto,Margaret][ProbationOfficer][CHILDRENANDYOUTHSERVICES][THUNDERBAY,ThunderBay][807-475-1310][margaret.aalto#ontario.ca]
I want to split it into an arraylist like this:
1.
Aagaard,Lindsay
SeniorPolicyAdvisor
etc.
Any suggestions?
I read the JavaDoc and used Pattern and Matcher like so:
Pattern p = Pattern.compile("\\[(.*?)\\]");
Matcher m = p.matcher(tableContent);
while(m.find()) {
System.out.println(m.group(1));
}
First delete the first and the last brackets and then split by '][':
String arr = "[Aalto,Margaret][ProbationOfficer][CHILDRENANDYOUTHSERVICES]";
String[] items = arr.substring(1, arr.length() - 1).split("][");
Simply this:
String[] list = str.split("\\[");
for(int i = 0 ; i < list.length ; i++) {
list[i] = list[i].replace("\\]", "");
}

How to extract a multiple quoted substrings in Java

I have a string that has multiple substring which has to be extracted. Strings which will be extracted is between ' character.
I could only extract the first or the last one when I use indexOf or regex.
How could I extract them and put them into array or list without parsing the same string only?
resultData = "Error 205: 'x' data is not crawled yet. Check 'y' and 'z' data and update dataset 't'";
I have a tried below;
protected static String errorsTPrinted(String errStr, int errCode) {
if (errCode== 202 ) {
ArrayList<String> ar = new ArrayList<String>();
Pattern p = Pattern.compile("'(.*?)'");
Matcher m = p.matcher(errStr);
String text;
for (int i = 0; i < errStr.length(); i++) {
m.find();
text = m.group(1);
ar.add(text);
}
return errStr = "Err 202: " + ar.get(0) + " ... " + ar.get(1) + " ..." + ar.get(2) + " ... " + ar.get(3);
}
Edit
I used #MinecraftShamrock 's approach.
if (errCode== 202 ) {
List<String> getQuotet = getQuotet(errStr, '\'');
return errStr = "Err 202: " + getQuotet.get(0) + " ... " + getQuotet.get(1) + " ..." + getQuotet.get(2) + " ... " + getQuotet.get(3);
}
You could use this very straightforward algorithm to do so and avoid regex (as one can't be 100% sure about its complexity):
public List<String> getQuotet(final String input, final char quote) {
final ArrayList<String> result = new ArrayList<>();
int n = -1;
for(int i = 0; i < input.length(); i++) {
if(input.charAt(i) == quote) {
if(n == -1) { //not currently inside quote -> start new quote
n = i + 1;
} else { //close current quote
result.add(input.substring(n, i));
n = -1;
}
}
}
return result;
}
This works with any desired quote-character and has a runtime complexity of O(n). If the string ends with an open quote, it will not be included. However, this can be added quite easily.
I think this is preferable over regex as you can ba absolutely sure about its complexity. Also, it works with a minimum of library classes. If you care about efficiency for big inputs, use this.
And last but not least, it does absolutely not care about what is between two quote characters so it works with any input string.
Simply use the pattern:
'([^']++)'
And a Matcher like so:
final Pattern pattern = Pattern.compile("'([^']++)'");
final Matcher matcher = pattern.matcher(resultData);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
This loops through each match in the String and prints it.
Output:
x
y
z
t
Here is a simple approach (assuming there are no escaping characters etc.):
// Compile a pattern to find the wanted strings
Pattern p = Pattern.compile("'([^']+)'");
// Create a matcher for given input
Matcher m = p.matcher(resultData);
// A list to put the found strings into
List<String> list = new ArrayList<String>();
// Loop over all occurrences
while(m.find()) {
// Retrieve the matched text
String text = m.group(1);
// Do something with the text, e.g. add it to a List
list.add(text);
}

How to split the string in java?

String str = "AlwinX-road-9:00pm-kanchana travels-25365445421";
String[] names = str.split("-");
I want output like following:
AlwinX-road
9:00pm
kanchana travels
25365445421
Use pattern matching to match your requirement
String str = "AlwinX-road-9:00pm-kanchana travels-25365445421";
String regex = "(^[A-Z-a-z ]+)[-]+(\\d+:\\d+pm)[-]([a-z]+\\s+[a-z]+)[-](\\d+)";
Matcher matcher = Pattern.compile( regex ).matcher( str);
while (matcher.find( ))
{
String roadname = matcher.group(1);
String time = matcher.group(2);
String travels = matcher.group(3);
String digits= matcher.group(4);
System.out.println("time="+time);
System.out.println("travels="+travels);
System.out.println("digits="+digits);
}
Since you want to include the delimiter in your first output line, you can do the split, and merge the first two element with a -: -
String[] names = str.split("-");
System.out.println(names[0] + "-" + names[1])
for (int i = 2;i < names.length; i++) {
System.out.println(names[i])
}
The split() method can't distinguish the dash in AlwinX-road and the other dashes in the string, it treats all the dashes the same. You will need to do some sort of post processing on the resulting array. If you will always need the first two strings in the array joined you can just do that. If your strings are more complex you will need to add additional logic to join the strings in the array.
One way you could do it, assuming the first '-' is always part of a two part identifier.
String str = "AlwinX-road-9:00pm-kanchana travels-25365445421";
String[] tokens = str.split("-");
String[] output = new String[tokens.length - 1];
output[0] = tokens[0] + '-' + tokens[1];
System.out.println(output[0]);
for(int i = 1; i < output.length; i++){
output[i] = tokens[i+1];
System.out.println(output[i]);
}
Looks like you want to split (with removal of all dashes but the first one).
String str = "AlwinX-road-9:00pm-kanchana travels-25365445421";
String[] names = str.split("-");
for (String value : names)
{
System.out.println(value);
}
So its produces:
AlwinX
road
9:00pm
kanchana travels
25365445421
Notice that "AlwinX" and "road" we split as well since they had a dash in between. So you will need custom logic to handle this case. here is an example how to do it (I used StringTokenizer):
StringTokenizer tk = new StringTokenizer(str, "-", true);
String firstString = null;
String secondString = null;
while (tk.hasMoreTokens())
{
final String token = tk.nextToken();
if (firstString == null)
{
firstString = token;
continue;
}
if (secondString == null && firstString != null && !token.equals("-"))
{
secondString = token;
System.out.println(firstString + "-" + secondString);
continue;
}
if (!token.equals("-"))
{
System.out.println(token);
}
}
This will produce:
AlwinX-road
9:00pm
kanchana travels
25365445421
from your format, I think you want to split the first one just before the time part. You can do it this way:
String str =yourString;
String beforetime=str.split("-\\d+:\\d+[ap]m")[0]; //this is your first token,
//AlwinX-road in your example
String rest=str.substring(beforetime.length()+1);
String[] restNames=rest.split("-");
If you really need it all together in one array then see the code below:
String[] allTogether=new String[restNames.length+1];//the string with all your tokens
allTogether[0]=beforetime;
System.arraycopy(restNames, 0, allTogether, 1, restNames.length);
If you use "_" as a separator instead of "-": AlwinX-road_9:00pm_kanchana travels_25365445421
New code:
String str = new String("AlwinX-road_9:00pm_kanchana travels_25365445421");
String separator = new String("_");
String[] names = str.split(separator);
for(int i=0; i<names.length; i++){
System.out.println(names[i]);
}

Java regex partial match with period

In Java I am using Pattern and Matcher to find all instances of ".A (a number)" in a set of strings to retrieve the numbers.
I run into problems because one of the words in the file is "P.A.M.X." and the number returns 0. It won't go through the rest of the file. I've tried using many different regular expressions but I can't get past that occurrence of "P.A.M.X." and onto the next ".A (number)"
for (int i = 0; i < input.size(); i++) {
Pattern pattern = Pattern.compile("\\.A\\s\\d+");
Matcher matcher = pattern.matcher(input.get(i));
while (matcherDocId.find())
{
String matchFound = matcher.group().toString();
int numMatch = 0;
String[] tokens = matchFound.split(" ");
numMatch = Integer.parseInt(tokens[1]);
System.out.println("The number is: " + numMatch);
}
}
Short sample for you:
Pattern pattern = Pattern.compile("\\.A\\s(\\d+)"); // grouping number
Matcher matcher = pattern.matcher(".A 1 .A 2 .A 3 .A 4 *text* .A5"); // full input string
while (matcher.find()) {
int n = Integer.valueOf(matcher.group(1)); // getting captured number - group #1
System.out.println(n);
}

How to Split a string in java based on limit

I have following String and i want to split this string into number of sub strings(by taking ',' as a delimeter) when its length reaches 36. Its not exactly splitting on 36'th position
String message = "This is some(sampletext), and has to be splited properly";
I want to get the output as two substrings follows:
1. 'This is some (sampletext)'
2. 'and has to be splited properly'
Thanks in advance.
A solution based on regex:
String s = "This is some sample text and has to be splited properly";
Pattern splitPattern = Pattern.compile(".{1,15}\\b");
Matcher m = splitPattern.matcher(s);
List<String> stringList = new ArrayList<String>();
while (m.find()) {
stringList.add(m.group(0).trim());
}
Update:
trim() can be droped by changing the pattern to end in space or end of string:
String s = "This is some sample text and has to be splited properly";
Pattern splitPattern = Pattern.compile("(.{1,15})\\b( |$)");
Matcher m = splitPattern.matcher(s);
List<String> stringList = new ArrayList<String>();
while (m.find()) {
stringList.add(m.group(1));
}
group(1) means that I only need the first part of the pattern (.{1,15}) as output.
.{1,15} - a sequence of any characters (".") with any length between 1 and 15 ({1,15})
\b - a word break (a non-character before of after any word)
( |$) - space or end of string
In addition I've added () surrounding .{1,15} so I can use it as a whole group (m.group(1)).
Depending on the desired result, this expression can be tweaked.
Update:
If you want to split message by comma only if it's length would be over 36, try the following expression:
Pattern splitPattern = Pattern.compile("(.{1,36})\\b(,|$)");
The best solution I can think of is to make a function that iterates through the string. In the function you could keep track of whitespace characters, and for each 16th position you could add a substring to a list based on the position of the last encountered whitespace. After it has found a substring, you start anew from the last encountered whitespace. Then you simply return the list of substrings.
Here's a tidy answer:
String message = "This is some sample text and has to be splited properly";
String[] temp = message.split("(?<=^.{1,16}) ");
String part1 = message.substring(0, message.length() - temp[temp.length - 1].length() - 1);
String part2 = message.substring(message.length() - temp[temp.length - 1].length());
This should work on all inputs, except when there are sequences of chars without whitespace longer than 16. It also creates the minimum amount of extra Strings by indexing into the original one.
public static void main(String[] args) throws IOException
{
String message = "This is some sample text and has to be splited properly";
List<String> result = new ArrayList<String>();
int start = 0;
while (start + 16 < message.length())
{
int end = start + 16;
while (!Character.isWhitespace(message.charAt(end--)));
result.add(message.substring(start, end + 1));
start = end + 2;
}
result.add(message.substring(start));
System.out.println(result);
}
If you have a simple text as the one you showed above (words separated by blank spaces) you can always think of StringTokenizer. Here's some simple code working for your case:
public static void main(String[] args) {
String message = "This is some sample text and has to be splited properly";
while (message.length() > 0) {
String token = "";
StringTokenizer st = new StringTokenizer(message);
while (st.hasMoreTokens()) {
String nt = st.nextToken();
String foo = "";
if (token.length()==0) {
foo = nt;
}
else {
foo = token + " " + nt;
}
if (foo.length() < 16)
token = foo;
else {
System.out.print("'" + token + "' ");
message = message.substring(token.length() + 1, message.length());
break;
}
if (!st.hasMoreTokens()) {
System.out.print("'" + token + "' ");
message = message.substring(token.length(), message.length());
}
}
}
}

Categories

Resources