I am struggling to get the String.split() to do what I would like it to do.
I have an Input of a string of words separated by spaces. Some words have a special function. They look something like this: "special:word".
The input string I am using to test my regex looks like this:
String str = "Hello wonderful special:world what a great special:day";
The result I would like to get from str.split(regex) is an array with the words "world" and "day";
I tried doing it with lookahead (?<=special\:)(\w+) but this splits the string at the words I am looking for. How do I inverse this expression to get the result I am looking for and what exactly do lookaheads and reverse lookaheads do?
Using split in this case would create few problems:
overcomplicated regex to match part that we should split on
Hello wonderful special:world what a great special:day
^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^
after split your first element would be empty string "" because split doesn't trim first empty elements like it does in last empty elements so your result would be
["", "world", "day"]
To avoid this use more intuitive approach: instead of finding everything that is NOT part that you want, find only part that you are interested in. To do this use Pattern and Matcher classes. Here is example of how you can find all your special words:
String str = "Hello wonderful special:world what a great special:day";
Pattern p = Pattern.compile("\\b\\w+:(\\w+)\\b");//word after : will be in group 1
Matcher m = p.matcher(str);
while(m.find()){//this will iterate over all found substrings
//here we can use found substrings
System.out.println(m.group(1));
}
Output:
world
day
use patter and matcher, simple example
public static ArrayList<String> parseOut(String s)
{
ArrayList<String> list = new ArrayList<String>();
String regex = "([:])(\\w+)";
Pattern pattern = Pattern.compile(regex);
Matcher matcher = pattern.matcher(s);
while (matcher.find()){
list.add(matcher.group().substring(1));
}
return list;
}
Try Pattern and Matcher
String searchPattern = "Hello wonderful special:world what a great special:day";
Pattern pa = Pattern.compile(":[a-zA-Z0-9]+");
Matcher ma = pa.matcher(searchPattern);
while(ma.find()){
System.out.println(ma.group().replaceFirst(":",""));
}
output:
world
day
By using split() we can do as:
String searchPattern1 = "Hello wonderful special:world what a great special:day";
for(String i:searchPattern1.split("\\s")){
if(i.contains(":")){
System.out.println(i.split[1]);
}
}
Here also we get the same output as above.
Related
I am far from mastering regular expressions but I would like to split a string on first and last underscore e.g.
split the string on first and last underscore with regular expression
"hello_5_9_2018_world"
to
"hello"
"5_9_2018"
"world"
I can split it on the last underscore with
String[] splitArray = subjectString.split("_(?=[^_]*$)");
but I am not able to figure out how to split on first underscore.
Could anyone show me how I can do this?
Thanks
David
You can achieve this without regex. You can achieve this by finding the first and last index of _ and getting substrings based on them.
String s = "hello_5_9_2018_world";
int firstIndex = s.indexOf("_");
int lastIndex = s.lastIndexOf("_");
System.out.println(s.substring(0, firstIndex));
System.out.println(s.substring(firstIndex + 1, lastIndex));
System.out.println(s.substring(lastIndex + 1));
The above prints
hello
5_9_2018
world
Note:
If the string does not have two _ you will get a StringIndexOutOfBoundsException.
To safeguard against it, you can check if the extracted indices are valid.
If firstIndex == lastIndex == -1 then it means the string does
not have any underscores.
If firstIndex == lastIndex then the string has just one underscore.
If you have always three parts as above, you can use
([^_]*)_(.*)_(^_)*
and get the single elements as groups.
Regular Expression
(?<first>[^_]+)_(?<middle>.+)+_(?<last>[^_]+)
Demo
Java Code
final String str = "hello_5_9_2018_world";
Pattern pattern = Pattern.compile("(?<first>[^_]+)_(?<middle>.+)+_(?<last>[^_]+)");
Matcher matcher = pattern.matcher(str);
if(matcher.matches()) {
String first = matcher.group("first");
String middle = matcher.group("middle");
String last = matcher.group("last");
}
I see that a lot of guys provided their solution, but I have another regex pattern for your question
You can achieve your goal with this pattern:
"([a-zA-Z]+)_(.*)_([a-zA-Z]+)"
The whole code looks like this:
String subjectString= "hello_5_9_2018_world";
Pattern pattern = Pattern.compile("([a-zA-Z]+)_(.*)_([a-zA-Z]+)");
Matcher matcher = pattern.matcher(subjectString);
if(matcher.matches()){
System.out.println(matcher.group(1));
System.out.println(matcher.group(2));
System.out.println(matcher.group(3));
}
It outputs:
hello
5_9_2018
world
While the other answers are actually nicer and better, if you really want to use split, this is the way to go:
"hello_5_9_2018_world".split("((?<=^[^_]*)_)|(_(?=[^_]*$))")
==> String[3] { "hello", "5_9_2018", "world" }
This is a combination of your lookahead pattern (_(?=[^_]*$))
and the symmetrical look-behind pattern: ((?<=^[^_]*)_)
(match the _ preceeded by ^ (start of the string) and [^_]* (0..n non-underscore chars).
I'm trying to come up with a regex pattern for this but to no avail. Here are some examples of what I need. [] represents an array as output.
Input
Hello $World
Output
[$World]
Input
My name is $John Smith and I like $pancakes
Output
[$John, $pancakes]
I managed to come up with this, it matches the pattern but doesn't keep the words it finds.
String test = "My name $is John $Smith";
String[] testSplit = test.split("(\\$\\S+)");
System.out.println(testSplit);
Output
[My name , John ]
As you can see, it's completely swallowing the words I need, more specifically, the words that match the pattern. How can I have it return an array with only the words I need? (as shown in the examples)
split takes a regex, and specifically splits the string around that regex, so that what it splits on is not retained in the output. If you want what it found to split around, you should use the Matcher class, for example:
String line = "My name $is John $Smith";
Pattern pattern = Pattern.compile("(\\$\\S+)");
Matcher matcher = pattern.matcher(line);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
This will find all the matches of a pattern in a String and print them out. These are the same strings that split will use to divide up a string.
split just uses your pattern to separate strings. If you want to return the matched string, try something like this:
String test = "My name $is John $Smith";
Pattern patt = Pattern.compile("(\\$\\S+)");
Matcher matcher = patt.matcher(test);
while (matcher.find()) {
System.out.println(matcher.group());
}
I have a string in an inconvenient format. Here is an example:
(Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24)
I need to go through this string and extract only the first items in the parenthesis. So using the snippet from above as input, I would like my code to return:
Air Fresheners
Chocolate Chips
Juice-Frozen
Note that some of the items have - in the name of the item. These should be kept and included in the final output. I was trying to use:
Scanner.useDelimiter(insert regex here)
...but I am not having any luck. Other methods of accomplishing the task are fine, but please keep it relatively simple.
I know this is old and I'm no expert but can't you use replaceAll? As below:
String s = "(Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24)".replaceAll("(->)|[\\(\\)]|\\d+","");
for (String str : s.split(","))
{
System.out.println(str);
}
Try this one
Use regex to split on the basis of )->(
String s="(Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24)";
Pattern regex = Pattern.compile("\\)->\\(");
Matcher regexMatcher = regex.matcher(s);
int i=0;
while (regexMatcher.find()) {
System.out.println(s.substring(i+1,regexMatcher.start()));
i=regexMatcher.end()-1;
}
System.out.println(s.substring(i+1,s.length()-1));
Try String.split() method
String s = "(Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24)";
for (String str : s.substring(1, s.length() - 1).split("\\)->\\(")) {
System.out.println(str);
}
This can be done with regular expressions. Where we match ([^,)(]*) matches any name that do not contain brackets or commas, ,\\d+\\) matches the ,14) part and (?:->)? matches possible -> after the tuple. We use group(1) to get the name (group(0) returns the whole tuple (Air Fresheners,17)->
List<String> ans = new ArrayList<>();
Matcher m = Pattern.compile("\\(([^,)(]*),\\d+\\)(?:->)?").matcher(str);
while(m.find()){
String s = m.group(1);
ans.add(m.group(1));
}
Given (Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24), this program returns [Air Fresheners, Chocolate Chips, Juice-Frozen]
(Air Fresheners,17)->(Chocolate Chips,14)->(Juice-Frozen,24)
You could think of it as everything between the ( and the ,
So,
\(.*?\,
would match "(Air Fresheners," (the ? is to make it non-greedy, and stop when it sees a comma)
So if you're keen to use regex, then just match these, and take a substring to get rid of the ( and ,
I would first go through using )-> as the delimiter. On each scanner.next() get rid of the first character (the parenthesis) using substring, and then place a second scanner on that string that uses , as the delimiter. In code this would look something like:
Scanner s1 = new Scanner(string).useDelimiter("\\s*)->\\s*");
while(s1.hasNext())
{
Scanner s2 = new Scanner(s1.next).useDelimiter("\\s*,\\s*");
System.out.println(s2.next.substring(1));
}
I know that this question can be stupid but I am trying to get some information from text and you are my last hope after last three hours of trying..
DIC: C/40764176 IC: 407641'6
Dekujerne a t8ime se na shledanou
I need to get for example this 40764176
I need to get string with 8-10 length, sometimes there can be some special chars like I,i,G,S,O,ó,l) but I have tried a lot of patterns for this and no one works...
I tried:
String generalDicFormatPattern = "([0-9IiGSOól]{8,10})";
String generalDicFormatPattern = ".*([0-9IiGSOól]{8,10}).*";
String generalDicFormatPattern = "\\b([0-9IiGSOól]{8,10})\\b";
nothing works... do you know where is the problem?
edit:
I use regex in this way:
private List<String> getGeneralDicFromLine(String concreteLine) {
List<String> allMatches = new ArrayList<String>();
Pattern pattern = Pattern.compile(generalDicFormatPattern);
Matcher matcher = pattern.matcher(concreteLine);
while (matcher.find()) {
allMatches.add(matcher.group(1));
}
return allMatches;
}
If your string's pattern is fixed you can use the regex
C/([^\s]{8,10})\sIC:
Sample code:
String s = "DIC: C/40764176 IC: 407641'6";
Pattern p = Pattern.compile("C/([^\\s]{8,10})\\sIC:");
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println(m.group(1)); // 40764176
}
I'm expecting any character (includes the special ones you've shown in examples) but a white space.
May be you can split your string with spaces (string.split('\\s');), then you should have an array like this :
DIC:
C/40764176
IC: 407641'6
...
shledanou
Get the second string, split it using '/', and get the second element.
I hope it helped you.
Tip : you can check after the result using a regex (([0-9IiGSOól]{8,10})
I am trying to get an array of strings, from a lengthy string. Array consist of strings matching between two other strings (??? and ??? in my case). I tried the following code and it's not giving me the expected results
Pattern pattern = Pattern.compile("\\?\\?\\?(.*?)\\?\\?\\?");
String[] arrayOfKeys = pattern.split("???label.missing???sdfjkhsjkdf sjkdghfjksdg ???some.label???sdjkhsdj");
for (String key : arrayOfKeys) {
System.out.println(key);
}
My expected result is:
["label.missing", "some.label"]
Use Pattern.matcher() to obtain a Matcher for the input string, then use Matcher.find() to find the pattern you want. Matcher.find() will find substring(s) that matches the Pattern provided.
Pattern pattern = Pattern.compile("\\?{3}(.*?)\\?{3}");
Matcher m = pattern.matcher(inputString);
while (m.find()) {
System.out.println(m.group(1));
}
Pattern.split() will use your pattern as delimiter to split the string (then the delimiter part is discarded), which is obviously not what you want in this case. Your regex is designed to match the text that you want to extract.
I shorten the pattern to use quantifier repeating exactly 3 times {3}, instead of writing \? 3 times.
I would create a string input with what you're trying to split, and call input.split() on it.
String input = "???label.missing???sdfjkhsjkdf sjkdghfjksdg ???some.label???sdjkhsdj";
String[] split = input.split("\\?\\?\\?");
Try it here:
http://ideone.com/VAmCyu
Pattern pattern = Pattern.compile("\\?{3}(.+?)\\?{3}");
Matcher matcher= pattern.matcher("???label.missing???sdfjkhsjkdf sjkdghfjksdg ???some.label???sdjkhsdj");
List<String> aList = new ArrayList<String>();
while(matcher.find()) {
aList.add(matcher.group(1));
}
for (String key : aList) {
System.out.println(key);
}