Why the string does not split? - java

While trying to split a string xyz213123kop234430099kpf4532 into tokens :
xyz213123
kop234430099
kpf4532
I wrote the following code
String s = "xyz213123kop234430099kpf4532";
String regex = "/^[a-zA-z]+[0-9]+$/";
String tokens[] = s.split(regex);
for(String t : tokens) {
System.out.println(t);
}
but instead of tokens, I get the whole string as one output. What is wrong with the regular expression I used ?

You can do that:
String s = "xyz213123kop234430099kpf4532";
String[] result = s.split("(?<=[0-9])(?=[a-z])");
The idea is to use zero width assertions to find the place where to cut the string, then I use a lookbehind (preceded by a digit [0-9]) and a lookahead (followed by a letter [a-z]).
These lookarounds are just checks and match nothing, thus the delimiter of the split is an empty string and no characters are removed from the result.

You could split on this matching between a number and not-a-number.
String s = "xyz213123kop234430099kpf4532";
String[] parts = s.split("(?<![^\\d])(?=\\D)");
for (String p : parts) {
System.out.println(p);
}
Output
xyz213123
kop234430099
kpf4532

There's nothing in your string that matches the regular expression, because your expression starts with ^ (beginning of string) and ends with $ (end of string). So it would either match the whole string, or nothing at all. But because it doesn't match the string, it is not found when you split the string into tokens. That's why you get just one big token.

You don't want to use split for that. The argument to split is the delimiter between tokens. You don't have that. Instead, you have a pattern that repeats and you want each match to the pattern. Try this instead:
String s = "xyz213123kop234430099kpf4532";
Pattern p = Pattern.compile("([a-zA-z]+[0-9]+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output:
xyz213123
kop234430099
kpf4532
(I don't know by what logic you would have the second token be "3kop234430099" as in your posted question. I assume that the leading "3" is a typo.)

Related

How to match a string between two same delimiters?

some-string-test-moretext.csv
I want to extract the string test, which is always found after the 2nd and 3rd - delimiter.
The expression [-](.*?)[-] would match -string-. So it's probably close, but how can I move on to the next match?
If that matters, I'm using java.
If you know the number of delimiters in advance, you can just split the String.
String[] test = {
"some-string-test-moretext.csv",
"another-string-test-andthensome.csv"
};
for (String s: test) {
System.out.println(s.split("-")[2]);
}
Output
test
test
This should give you quite a good head start:
[^-]+-[^-]+-(.*?)-[^-]+\.csv
https://regex101.com/r/YjWDkv/1
I would propose this, using regex, and very short :
String str = "some-string-test-moretext.csv\n";
Matcher m = Pattern.compile("\\w+-\\w+-(\\w+).*").matcher(str);
String res = m.find() ? m.group(1) : "";
System.out.println(res);
For sureString.split() is another way :
String res = str.split("-")[2];
In sed:
$ echo 'some-string-test-moretext.csv' | sed 's/[^-]*-[^-]*-\([^-]*\)-.*/\1/'
test
[^-]* means "zero or more occurrences of any char except "-". Let's call that "notHyphen". So we're matching on notHyphen-notHyphen-\(notHyphen\)-.* and replacing the whole match with \1, that is, whatever is captured by the \(\).
In Java, you won't need to escape ( to \(, and the technique for extracting from capturing groups is different:
Pattern patt = Pattern.compile("[^-]*-[^-]*-([^-]*)-.*");
Matcher m = patt.matcher(filename);
String extracted = null;
if (m.matches()) {
extracted = m.group(1);
}

Regex including date string, email, number

I have this regex expression:
String patt = "(\\w+?)(:|<|>)(\\w+?),";
Pattern pattern = Pattern.compile(patt);
Matcher matcher = pattern.matcher(search + ",");
I am able to match a string like
search = "firstName:Giorgio"
But I'm not able to match string like
search = "email:giorgio.rossi#libero.it"
or
search = "dataregistrazione:27/10/2016"
How I should modify the regex expression in order to match these strings?
You may use
String pat = "(\\w+)[:<>]([^,]+)"; // Add a , at the end if it is necessary
See the regex demo
Details:
(\w+) - Group 1 capturing 1 or more word chars
[:<>] - one of the chars inside the character class, :, <, or >
([^,]+) - Group 2 capturing 1 or more chars other than , (in the demo, I added \n as the demo input text contains newlines).
You can use regex like this:
public static void main(String[] args) {
String[] arr = new String[]{"firstName:Giorgio", "email:giorgio.rossi#libero.it", "dataregistrazione:27/10/2016"};
String pattern = "(\\w+[:|<|>]\\w+)|(\\w+:\\w+\\.\\w+#\\w+\\.\\w+)|(\\w+:\\d{1,2}/\\d{1,2}/\\d{4})";
for(String str : arr){
if(str.matches(pattern))
System.out.println(str);
}
}
output is:
firstName:Giorgio
email:giorgio.rossi#libero.it
dataregistrazione:27/10/2016
But you have to remember that this regex will work only for your format of data. To make up the universal regex you should use RFC documents and articles (i.e here) about email format. Also this question can be useful.
Hope it helps.
The Character class \w matches [A-Za-z0-9_]. So kindly change the regex as (\\w+?)(:|<|>)(.*), to match any character from : to ,.
Or mention all characters that you can expect i.e. (\\w+?)(:|<|>)[#.\\w\\/]*, .

Split a String based on regex

I have a string that needs to be split based on the occurrence of a ","(comma), but need to ignore any occurrence of it that comes within a pair of parentheses.
For example, B2B,(A2C,AMM),(BNC,1NF),(106,A01),AAA,AX3
Should be split into
B2B,
(A2C,AMM),
(BNC,1NF),
(106,A01),
AAA,
AX3
FOR NON NESTED
,(?![^\(]*\))
FOR NESTED(parenthesis inside parenthesis)
(?<!\([^\)]*),(?![^\(]*\))
Try below:
var str = 'B2B,(A2C,AMM),(BNC,1NF),(106,A01),AAA,AX3';
console.log(str.match(/\([^)]*\)|[A-Z\d]+/g));
// gives you ["B2B", "(A2C,AMM)", "(BNC,1NF)", "(106,A01)", "AAA", "AX3"]
Java edition:
String str = "B2B,(A2C,AMM),(BNC,1NF),(106,A01),AAA,AX3";
Pattern p = Pattern.compile("\\([^)]*\\)|[A-Z\\d]+");
Matcher m = p.matcher(str);
List<String> matches = new ArrayList<String>();
while(m.find()){
matches.add(m.group());
}
for (String val : matches) {
System.out.println(val);
}
One simple iteration will be probably better option then any regex, especially if your data can have parentheses inside parentheses. For example:
String data="Some,(data,(that),needs),to (be, splited) by, comma";
StringBuilder buffer=new StringBuilder();
int parenthesesCounter=0;
for (char c:data.toCharArray()){
if (c=='(') parenthesesCounter++;
if (c==')') parenthesesCounter--;
if (c==',' && parenthesesCounter==0){
//lets do something with this token inside buffer
System.out.println(buffer);
//now we need to clear buffer
buffer.delete(0, buffer.length());
}
else
buffer.append(c);
}
//lets not forget about part after last comma
System.out.println(buffer);
output
Some
(data,(that),needs)
to (be, splited) by
comma
Try this
\w{3}(?=,)|(?<=,)\(\w{3},\w{3}\)(?=,)|(?<=,)\w{3}
Explanation: There are three parts separated by OR (|)
\w{3}(?=,) - matches the 3 any alphanumeric character (including underscore) and does the positive look ahead for comma
(?<=,)\(\w{3},\w{3}\)(?=,) - matches this pattern (ABC,E4R) and also does a positive lookahead and look behind for the comma
(?<=,)\w{3} - matches the 3 any alphanumeric character (including underscore) and does the positive look behind for comma

How to convert a String to String array in Java ( Ignore whitespace and parentheses )

The String will looks like this:
String temp = "IF (COND_ITION) (ACT_ION)";
// Only has one whitespace in either side of the parentheses
or
String temp = " IF (COND_ITION) (ACT_ION) ";
// Have more irrelevant whitespace in the String
// But no whitespace in condition or action
I hope to get a new String array which contains three elemets, ignore the parentheses:
String[] tempArray;
tempArray[0] = IF;
tempArray[1] = COND_ITION;
tempArray[2] = ACT_ION;
I tried to use String.split(regex) method but I don't know how to implement the regex.
If your input string will always be in the format you described, it is better to parse it based on the whole pattern instead of just the delimiter, as this code does:
Pattern pattern = Pattern.compile("(.*?)[/s]\\((.*?)\\)[/s]\\((.*?)\\)");
Matcher matcher = pattern.matcher(inputString);
String tempArray[3];
if(matcher.find()) {
tempArray[0] name = matcher.group(1);
tempArray[1] name = matcher.group(2);
tempArray[2] name = matcher.group(3);
}
Pattern breakdown:
(.*?) IF
[/s] white space
\\((.*?)\\) (COND_ITION)
[/s] white space
\\((.*?)\\) (ACT_ION)
You can use StringTokenizer to split into strings delimited by whitespace. From Java documentation:
The following is one example of the use of the tokenizer. The code:
StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
prints the following output:
this
is
a
test
Then write a loop to process the strings to replace the parentheses.
I think you want a regular expression like "\\)? *\\(?", assuming any whitespace inside the parentheses is not to be removed. Note that this doesn't validate that the parentheses match properly. Hope this helps.

Java regex validating special chars

This seems like a well known title, but I am really facing a problem in this.
Here is what I have and what I've done so far.
I have validate input string, these chars are not allowed :
&%$###!~
So I coded it like this:
String REGEX = "^[&%$###!~]";
String username= "jhgjhgjh.#";
Pattern pattern = Pattern.compile(REGEX);
Matcher matcher = pattern.matcher(username);
if (matcher.matches()) {
System.out.println("matched");
}
Change your first line of code like this
String REGEX = "[^&%$##!~]*";
And it should work fine. ^ outside the character class denotes start of line. ^ inside a character class [] means a negation of the characters inside the character class. And, if you don't want to match empty usernames, then use this regex
String REGEX = "[^&%$##!~]+";
i think you want this:
[^&%$###!~]*
To match a valid input:
String REGEX = "[^&%$##!~]*";
To match an invalid input:
String REGEX = ".*[&%$##!~]+.*";

Categories

Resources