How can I split this string power:110V;220V;Color:Pink;White;Type:1;2;Condition:New;Used;
into these 4 strings
power:110V;220V;
Color:Pink;White;
Type:1;2;
Condition:New;Used;
Split your input according to the below regex.
string.split("(?<=;)(?=\\w+:)");
The above regex would match all the boundaries which exists next to a semicolon and the boundary must be followed by one or more word characters and a colon.
OR
string.split("(?<=;)(?=[^;:]*:)");
Example:
String s = "power:110V;220V;Color:Pink;White;Type:1;2;Condition:New;Used;";
String[] parts = s.split("(?<=;)(?=\\w+:)");
for(String i: parts)
{
System.out.println(i);
}
Related
I have a string "'GLO', FLO" Now, I want a regex expression that will check each words in the string and if:
-word begins and ends with a single quote, replace single quotes with spaces
-if a comma is encounted between words split both words using space.
so, in the end, I should get GLO FLO.
Any help on how to do this using replaceAll() method on the string?
This regex didn't do it for me : "'([^' ]+)|\\s+'"
public static void displaySplitString(final String str) {
String pattern1 = "^'?(\\w+)'?,\\s+(\\w+)$";
StringTokenizer strTok = new StringTokenizer(str, " , ");
while (strTok.hasMoreTokens()) {
String delim = (strTok.nextToken());
delim.replaceAll(pattern1, "$1$2");
System.out.println(delim);
}
} //in main method displaySplitString("'GLO', FLO");
Here is the snippet that should get you going:
public static void displaySplitString(String str)
{
String pattern1 = "^'?(\\w+)'?(?=\\S)";
str = str.replaceAll(pattern1, " $1 ");
StringTokenizer strTok = new StringTokenizer(str, " , ");
while (strTok.hasMoreTokens())
{
String delim = (strTok.nextToken());
System.out.println(delim);
}
}
Here,
I change str argument declaration as not final (so that we could change the str value inside the method)
I am using the first regex ^'?(\\w+)'?(?=\\S) to remove potential single quotes from around the first word
Since you use a StringTokenizer, just 2 lines inside the while block are enough.
The regex means:
^ - Start looking for the match at the very start of the string
'? - match 0 or 1 single quote
(\\w+) - match and capture 1 or more alphanumeric symbols (we'll refer to them as $1 in the replacement pattern)
'? - match 0 or 1 single quote
(?=\\S) - match only if there is no space after the optional single quote. Perhaps, you can even replace this lookahead with a mere , if you always have it there, after the first word.
I asked How to split a string with conditions. Now I know how to ignore the delimiter if it is between two characters.
How can I check multiple groups of two characters instead of one?
I found Regex for splitting a string using space when not surrounded by single or double quotes, but I don't understand where to change '' to []. Also, it works with two groups only.
Is there a regex that will split using , but ignore the delimiter if it is between "" or [] or {}?
For instance:
// Input
"text1":"text2","text3":"text,4","text,5":["text6","text,7"],"text8":"text9","text10":{"text11":"text,12","text13":"text14","text,15":["text,16","text17"],"text,18":"text19"}
// Output
"text1":"text2"
"text3":"text,4"
"text,5":["text6","text,7"]
"text8":"text9"
"text10":{"text11":"text,12","text13":"text14","text,15":["text,16","text17"],"text,18":"text19"}
You can use:
text = "\"text1\":\"text2\",\"text3\":\"text,4\",\"text,5\":[\"text6\",\"text,7\"],\"text8\":\"text9\",\"text10\":{\"text11\":\"text,12\",\"text13\":\"text14\",\"text,15\":[\"text,16\",\"text17\"],\"text,18\":\"text19\"}";
String[] toks = text.split("(?=(?:(?:[^\"]*\"){2})*[^\"]*$)(?![^{]*})(?![^\\[]*\\]),+");
for (String tok: toks)
System.out.printf("%s%n", tok);
- RegEx Demo
OUTPUT:
"text1":"text2"
"text3":"text,4"
"text,5":["text6","text,7"]
"text8":"text9"
"text10":{"text11":"text,12","text13":"text14","text,15":["text,16","text17"],"text,18":"text19"}
I have following string
String str="aaaaaaaaa\n\n\nbbbbbbbbbbb\n \n";
I want to break it on \n so at the end i should two string aaaaaaaa and bbbbbbbb. I dont want last one as it only contain white space. so if i split it based on new line character using str.split() final array should have two entry only.
I tried below:
String str="aaaaaaaaa\n\n\nbbbbbbbbbbb\n \n".replaceAll("\\s+", " ");
String[] split = str.split("\n+");
it ignore all \n and give single string aaaaaaaaaa bbbbbbbb.
Delete the call to replaceAll(), which is removing the newlines too. Just this will do:
String[] split = str.split("\n\\s*");
This will not split on just spaces - the split must start at a newline (followed by optional further whitespace).
Here's some test code using your sample input with edge case enhancement:
String str = "aaaaaaaaa\nbbbbbb bbbbb\n \n";
String[] split = str.split("\n\\s*");
System.out.println(Arrays.toString(split));
Output:
[aaaaaaaaa, bbbbbb bbbbb]
This should do the trick:
String str="aaaaaaaaa\n\n\nbbbbbbbbbbb\n \n";
String[] lines = str.split("\\s*\n\\s*");
It will also remove all trailing and leading whitespace from all lines.
The \ns are removed by your first statement: \s matches \n
I have following code in my program. It splits a line when a hyphen is encountered and stores each word in the String Array 'tokens'. But I want the hyphen also to be stored in the String Array 'tokens' when it is encountered in a sentence.
String[] tokens = line.split("-");
The above code splits the sentence but also totally ignores the hyphen in the resulting array.
What can I do to store hyphen also in the resulting array?
Edit : -
Seems like you want to split on both whitespaces and hyphen but keeping only the hyphen in the array (As, I infer from your this line - stores each word in the String Array), you can use this: -
String[] tokens = "abc this is-a hyphen def".split("((?<=-)|(?=-))|\\s+");
System.out.println(Arrays.toString(tokens));
Output: -
[abc, this, is, -, a, hyphen, def]
For handling spaces before and after hyphen, you can first trim those spaces using replaceAll method, and then do split: -
"abc this is - a hyphen def".replaceAll("[ ]*-[ ]*", "-")
.split("((?<=-)|(?=-))|\\s+");
Previous answer : -
You can use this: -
String[] tokens = "abc-efg".split("((?<=-)|(?=-))");
System.out.println(Arrays.toString(tokens));
OUTPUT : -
[abc, -, efg]
It splits on an empty character before and after the hyphen (-).
I suggest to use a regular expression in combination with the Java Pattern and Matcher. Example:
String line = "a-b-c-d-e-f-";
Pattern p = Pattern.compile("[^-]+|-");
Matcher m = p.matcher(line);
while (m.find())
{
String match = m.group();
System.out.println("match:" + match);
}
To test your regular expression you could use an online regexp tester like this
I have a string that needs to be split based on the occurrence of a ","(comma), but need to ignore any occurrence of it that comes within a pair of parentheses.
For example, B2B,(A2C,AMM),(BNC,1NF),(106,A01),AAA,AX3
Should be split into
B2B,
(A2C,AMM),
(BNC,1NF),
(106,A01),
AAA,
AX3
FOR NON NESTED
,(?![^\(]*\))
FOR NESTED(parenthesis inside parenthesis)
(?<!\([^\)]*),(?![^\(]*\))
Try below:
var str = 'B2B,(A2C,AMM),(BNC,1NF),(106,A01),AAA,AX3';
console.log(str.match(/\([^)]*\)|[A-Z\d]+/g));
// gives you ["B2B", "(A2C,AMM)", "(BNC,1NF)", "(106,A01)", "AAA", "AX3"]
Java edition:
String str = "B2B,(A2C,AMM),(BNC,1NF),(106,A01),AAA,AX3";
Pattern p = Pattern.compile("\\([^)]*\\)|[A-Z\\d]+");
Matcher m = p.matcher(str);
List<String> matches = new ArrayList<String>();
while(m.find()){
matches.add(m.group());
}
for (String val : matches) {
System.out.println(val);
}
One simple iteration will be probably better option then any regex, especially if your data can have parentheses inside parentheses. For example:
String data="Some,(data,(that),needs),to (be, splited) by, comma";
StringBuilder buffer=new StringBuilder();
int parenthesesCounter=0;
for (char c:data.toCharArray()){
if (c=='(') parenthesesCounter++;
if (c==')') parenthesesCounter--;
if (c==',' && parenthesesCounter==0){
//lets do something with this token inside buffer
System.out.println(buffer);
//now we need to clear buffer
buffer.delete(0, buffer.length());
}
else
buffer.append(c);
}
//lets not forget about part after last comma
System.out.println(buffer);
output
Some
(data,(that),needs)
to (be, splited) by
comma
Try this
\w{3}(?=,)|(?<=,)\(\w{3},\w{3}\)(?=,)|(?<=,)\w{3}
Explanation: There are three parts separated by OR (|)
\w{3}(?=,) - matches the 3 any alphanumeric character (including underscore) and does the positive look ahead for comma
(?<=,)\(\w{3},\w{3}\)(?=,) - matches this pattern (ABC,E4R) and also does a positive lookahead and look behind for the comma
(?<=,)\w{3} - matches the 3 any alphanumeric character (including underscore) and does the positive look behind for comma