scanner.useDelimiter() regex java - java

I have a string in an inconvenient format. Here is an example:
(Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24)
I need to go through this string and extract only the first items in the parenthesis. So using the snippet from above as input, I would like my code to return:
Air Fresheners
Chocolate Chips
Juice-Frozen
Note that some of the items have - in the name of the item. These should be kept and included in the final output. I was trying to use:
Scanner.useDelimiter(insert regex here)
...but I am not having any luck. Other methods of accomplishing the task are fine, but please keep it relatively simple.

I know this is old and I'm no expert but can't you use replaceAll? As below:
String s = "(Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24)".replaceAll("(->)|[\\(\\)]|\\d+","");
for (String str : s.split(","))
{
System.out.println(str);
}

Try this one
Use regex to split on the basis of )->(
String s="(Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24)";
Pattern regex = Pattern.compile("\\)->\\(");
Matcher regexMatcher = regex.matcher(s);
int i=0;
while (regexMatcher.find()) {
System.out.println(s.substring(i+1,regexMatcher.start()));
i=regexMatcher.end()-1;
}
System.out.println(s.substring(i+1,s.length()-1));
Try String.split() method
String s = "(Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24)";
for (String str : s.substring(1, s.length() - 1).split("\\)->\\(")) {
System.out.println(str);
}

This can be done with regular expressions. Where we match ([^,)(]*) matches any name that do not contain brackets or commas, ,\\d+\\) matches the ,14) part and (?:->)? matches possible -> after the tuple. We use group(1) to get the name (group(0) returns the whole tuple (Air Fresheners,17)->
List<String> ans = new ArrayList<>();
Matcher m = Pattern.compile("\\(([^,)(]*),\\d+\\)(?:->)?").matcher(str);
while(m.find()){
String s = m.group(1);
ans.add(m.group(1));
}
Given (Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24), this program returns [Air Fresheners, Chocolate Chips, Juice-Frozen]

(Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24)
You could think of it as everything between the ( and the ,
So,
\(.*?\,
would match "(Air Fresheners," (the ? is to make it non-greedy, and stop when it sees a comma)
So if you're keen to use regex, then just match these, and take a substring to get rid of the ( and ,

I would first go through using )-> as the delimiter. On each scanner.next() get rid of the first character (the parenthesis) using substring, and then place a second scanner on that string that uses , as the delimiter. In code this would look something like:
Scanner s1 = new Scanner(string).useDelimiter("\\s*)->\\s*");
while(s1.hasNext())
{
Scanner s2 = new Scanner(s1.next).useDelimiter("\\s*,\\s*");
System.out.println(s2.next.substring(1));
}

Related

Java extract only first letters/characters from String

Hello guys I want to extract only first letters from this String:
String str = "使 徒 行 傳 16:31 ERV-ZH";
I only want to get these characters:
使 徒 行 傳
and not include
ERV-ZH
Only the letters or characters before the numbers plus the colon.
Note that Chinese letters can also be English and other letters.
this is what I've tried:
str.split(" ")[0];
But I'm only getting the first letter. Do you have an idea how to achieve my requirement? Any help will be appreciated. Thanks.
NOTE:
Also, strings are dynamic so I only presented sample characters.
This should give you the desired output
String str = "使 徒 行 傳 16:31 ERV-ZH";
String[] test = str.split("\\d\\d:\\d\\d");
for (String s : test) {
System.out.println(s);
}
The first element will be the part before the time and so on
Edit: if you are in need to be more dynamic for times like 6:31 or 16:6 then you could use this regex "\\d{1,2}:\\d{1,2}"
You can use the following regex ^([\\D\\s]+), this is what you need:
String str = "使 徒 行 傳 16:31 ERV-ZH";
String pattern = "^([\\D\\s]+)";
Pattern r = Pattern.compile(pattern);
Matcher m = r.matcher(str);
if (m.find( )) {
System.out.println("Found value: " + m.group(0) );
} else {
System.out.println("NO MATCH");
}
}
This is a live DEMO here.
In the following regex ^([\\D\\s]+):
^ will match only in the begginnig.
\\D will avoid matching any number.
Note that this will be the case for any string.
If you don't always have a date pattern that can be used as a delimiter in the middle, and are looking for a more generic solution, you could go with this: str.replaceAll("[^\\p{L}\\s]+.*", "")

Regex to split around a word

I am struggling to get the String.split() to do what I would like it to do.
I have an Input of a string of words separated by spaces. Some words have a special function. They look something like this: "special:word".
The input string I am using to test my regex looks like this:
String str = "Hello wonderful special:world what a great special:day";
The result I would like to get from str.split(regex) is an array with the words "world" and "day";
I tried doing it with lookahead (?<=special\:)(\w+) but this splits the string at the words I am looking for. How do I inverse this expression to get the result I am looking for and what exactly do lookaheads and reverse lookaheads do?
Using split in this case would create few problems:
overcomplicated regex to match part that we should split on
Hello wonderful special:world what a great special:day
^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
after split your first element would be empty string "" because split doesn't trim first empty elements like it does in last empty elements so your result would be
["", "world", "day"]
To avoid this use more intuitive approach: instead of finding everything that is NOT part that you want, find only part that you are interested in. To do this use Pattern and Matcher classes. Here is example of how you can find all your special words:
String str = "Hello wonderful special:world what a great special:day";
Pattern p = Pattern.compile("\\b\\w+:(\\w+)\\b");//word after : will be in group 1
Matcher m = p.matcher(str);
while(m.find()){//this will iterate over all found substrings
//here we can use found substrings
System.out.println(m.group(1));
}
Output:
world
day
use patter and matcher, simple example
public static ArrayList<String> parseOut(String s)
{
ArrayList<String> list = new ArrayList<String>();
String regex = "([:])(\\w+)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
list.add(matcher.group().substring(1));
}
return list;
}
Try Pattern and Matcher
String searchPattern = "Hello wonderful special:world what a great special:day";
Pattern pa = Pattern.compile(":[a-zA-Z0-9]+");
Matcher ma = pa.matcher(searchPattern);
while(ma.find()){
System.out.println(ma.group().replaceFirst(":",""));
}
output:
world
day
By using split() we can do as:
String searchPattern1 = "Hello wonderful special:world what a great special:day";
for(String i:searchPattern1.split("\\s")){
if(i.contains(":")){
System.out.println(i.split[1]);
}
}
Here also we get the same output as above.

Why the string does not split?

While trying to split a string xyz213123kop234430099kpf4532 into tokens :
xyz213123
kop234430099
kpf4532
I wrote the following code
String s = "xyz213123kop234430099kpf4532";
String regex = "/^[a-zA-z]+[0-9]+$/";
String tokens[] = s.split(regex);
for(String t : tokens) {
System.out.println(t);
}
but instead of tokens, I get the whole string as one output. What is wrong with the regular expression I used ?
You can do that:
String s = "xyz213123kop234430099kpf4532";
String[] result = s.split("(?<=[0-9])(?=[a-z])");
The idea is to use zero width assertions to find the place where to cut the string, then I use a lookbehind (preceded by a digit [0-9]) and a lookahead (followed by a letter [a-z]).
These lookarounds are just checks and match nothing, thus the delimiter of the split is an empty string and no characters are removed from the result.
You could split on this matching between a number and not-a-number.
String s = "xyz213123kop234430099kpf4532";
String[] parts = s.split("(?<![^\\d])(?=\\D)");
for (String p : parts) {
System.out.println(p);
}
Output
xyz213123
kop234430099
kpf4532
There's nothing in your string that matches the regular expression, because your expression starts with ^ (beginning of string) and ends with $ (end of string). So it would either match the whole string, or nothing at all. But because it doesn't match the string, it is not found when you split the string into tokens. That's why you get just one big token.
You don't want to use split for that. The argument to split is the delimiter between tokens. You don't have that. Instead, you have a pattern that repeats and you want each match to the pattern. Try this instead:
String s = "xyz213123kop234430099kpf4532";
Pattern p = Pattern.compile("([a-zA-z]+[0-9]+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output:
xyz213123
kop234430099
kpf4532
(I don't know by what logic you would have the second token be "3kop234430099" as in your posted question. I assume that the leading "3" is a typo.)

Problems with building this regex [1,2,3]

i have a problem to build following regex:
[1,2,3,4]
i found a work-around, but i think its ugly
String stringIds = "[1,2,3,4]";
stringIds = stringIds.replaceAll("\\[", "");
stringIds = stringIds.replaceAll("\\]", "");
String[] ids = stringIds.split("\\,");
Can someone help me please to build one regex, which i can use in the split function
Thanks for help
edit:
i want to get from this string "[1,2,3,4]" to an array with 4 entries. the entries are the 4 numbers in the string, so i need to eliminate "[","]" and ",". the "," isn't the problem.
the first and last number contains [ or ]. so i needed the fix with replaceAll. But i think if i use in split a regex for ",", i also can pass a regex which eliminates "[" "]" too. But i cant figure out, who this regex should look like.
This is almost what you're looking for:
String q = "[1,2,3,4]";
String[] x = q.split("\\[|\\]|,");
The problem is that it produces an extra element at the beginning of the array due to the leading open bracket. You may not be able to do what you want with a single regex sans shenanigans. If you know the string always begins with an open bracket, you can remove it first.
The regex itself means "(split on) any open bracket, OR any closed bracket, OR any comma."
Punctuation characters frequently have additional meanings in regular expressions. The double leading backslashes... ugh, the first backslash tells the Java String parser that the next backslash is not a special character (example: \n is a newline...) so \\ means "I want an honest to God backslash". The next backslash tells the regexp engine that the next character ([ for example) is not a special regexp character. That makes me lol.
Maybe substring [ and ] from beginning and end, then split the rest by ,
String stringIds = "[1,2,3,4]";
String[] ids = stringIds.substring(1,stringIds.length()-1).split(",");
Looks to me like you're trying to make an array (not sure where you got 'regex' from; that means something different). In this case, you want:
String[] ids = {"1","2","3","4"};
If it's specifically an array of integer numbers you want, then instead use:
int[] ids = {1,2,3,4};
Your problem is not amenable to splitting by delimiter. It is much safer and more general to split by matching the integers themselves:
static String[] nums(String in) {
final Matcher m = Pattern.compile("\\d+").matcher(in);
final List<String> l = new ArrayList<String>();
while (m.find()) l.add(m.group());
return l.toArray(new String[l.size()]);
}
public static void main(String args[]) {
System.out.println(Arrays.toString(nums("[1, 2, 3, 4]")));
}
If the first line your code is following:
String stringIds = "[1,2,3,4]";
and you're trying to iterate over all number items, then the follwing code-frag only could work:
try {
Pattern regex = Pattern.compile("\\b(\\d+)\\b", Pattern.MULTILINE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
for (int i = 1; i <= regexMatcher.groupCount(); i++) {
// matched text: regexMatcher.group(i)
// match start: regexMatcher.start(i)
// match end: regexMatcher.end(i)
}
}
} catch (PatternSyntaxException ex) {
// Syntax error in the regular expression
}

Escape comma when using String.split

I'm trying to perform some super simple parsing o log files, so I'm using String.split method like this:
String [] parts = input.split(",");
And works great for input like:
a,b,c
Or
type=simple, output=Hello, repeat=true
Just to say something.
How can I escape the comma, so it doesn't match intermediate commas?
For instance, if I want to include a comma in one of the parts:
type=simple, output=Hello, world, repeate=true
I was thinking in something like:
type=simple, output=Hello\, world, repeate=true
But I don't know how to create the split to avoid matching the comma.
I've tried:
String [] parts = input.split("[^\,],");
But, well, is not working.
You can solve it using a negative look behind.
String[] parts = str.split("(?<!\\\\), ");
Basically it says, split on each ", " that is not preceeded by a backslash.
String str = "type=simple, output=Hello\\, world, repeate=true";
String[] parts = str.split("(?<!\\\\), ");
for (String s : parts)
System.out.println(s);
Output:
type=simple
output=Hello\, world
repeate=true
(ideone.com link)
If you happen to be stuck with the non-escaped comma-separated values, you could do the following (similar) hack:
String[] parts = str.split(", (?=\\w+=)");
Which says split on each ", " which is followed by some word-characters and an =
(ideone.com link)
I'm afraid, there's no perfect solution for String.split. Using a matcher for the three parts would work. In case the number of parts is not constant, I'd recommend a loop with matcher.find. Something like this maybe
final String s = "type=simple, output=Hello, world, repeat=true";
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,|$)");
final Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group(1));
You'll probably want to skip the spaces after the comma as well:
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,\\s*|$)");
It's not really complicated, just note that you need four backslashes in order to match one.
Escaping works with the opposite of aioobe's answer (updated: aioobe now uses the same construct but I didn't know that when I wrote this), negative lookbehind
final String s = "type=simple, output=Hello\\, world, repeate=true";
final String[] tokens = s.split("(?<!\\\\),\\s*");
for(final String item : tokens){
System.out.println("'" + item.replace("\\,", ",") + "'");
}
Output:
'type=simple'
'output=Hello, world'
'repeate=true'
Reference:
Pattern: Special Constructs
I think
input.split("[^\\\\],");
should work. It will split at all commas that are not preceeded with a backslash.
BTW if you are working with Eclipse, I can recommend the QuickRex Plugin to test and debug Regexes.

Categories

Resources