I need to use regular expression to get some values from the String. Thing is, that it is quite complicated for me.
For example i have a string like this:
oneWord [first, second, third]
My output should be:
first
second
third
So i need words which are between [ and ]. Plus there can be a different number of words between [].
Tried using some regex creator, but that wasn't very accurate:
String re1=".*?"; // Non-greedy match on filler
String re2="(?:[a-z][a-z]+)"; // Uninteresting: word
String re3=".*?"; // Non-greedy match on filler
String re4="((?:[a-z][a-z]+))"; // Word 1
String re5=".*?"; // Non-greedy match on filler
String re6="((?:[a-z][a-z]+))"; // Word 2
String re7=".*?"; // Non-greedy match on filler
String re8="((?:[a-z][a-z]+))"; // Word 3
I would do it like this, in just one line:
String[] words = str.replaceAll(".*\\[|\\].*", "").split(", ");
The first replaceAll() call strips off the leading and trailing wrapper, and the split() breaks up what's left into separate words.
You could try the below regex and get the words you want from group index 1.
(?:\[|(?<!^)\G),? *(\w+)(?=[^\[\]]*\])
DEMO
Java regex would be,
(?:\\[|(?<!^)\\G),? *(\\w+)(?=[^\\[\\]]*\\])
Example:
String s = "oneWord [first, second, third] foo bar [foobar]";
Pattern regex = Pattern.compile("(?:\\[|(?<!^)\\G),? *(\\w+)(?=[^\\[\\]]*\\])");
Matcher matcher = regex.matcher(s);
while(matcher.find()){
System.out.println(matcher.group(1));
}
Output:
first
second
third
foobar
You should use this string.
String[] words = str.replaceAll(".\[|\].", "").split(", ");
Hope it helps.
You can do it easily with method split.
String string = [first, second, third];
String[] parts = string.split(",");
String part1 = parts[0]; // first
String part2 = parts[1]; // second
String part3 = parts[2]; // third
if it dont work for you, please tell me that I will debug your regular expression.
Related
I have these Strings:
"Turtle123456_fly.me"
"birdy_12345678_prd.tr"
I want the first words of each, ie:
Turtle
birdy
I tried this:
Pattern p = Pattern.compile("//d");
String[] items = p.split(String);
but of course it's wrong. I am not familiar with using Pattern.
Replace the stuff you don't want with nothing:
String firstWord = str.replaceAll("[^a-zA-Z].*", "");
to leave only the part you want.
The regex [^a-zA-Z] means "not a letter", the everything from (and including) the first non-letter to the end is "removed".
See live demo.
String s1 ="Turtle123456_fly.me";
String s2 ="birdy_12345678_prd.tr";
Pattern p = Pattern.compile("^([A-Za-z]+)[^A-Za-z]");
Matcher matcher = p.matcher(s1);
if (matcher.find()) {
System.out.println(matcher.group(1));
}
Explanation:
The first part ^([A-Za-z]+) is a group that captures all the letters anchored to the beginning of the input (using the ^ anchor).
The second part [^A-Za-z] captures the first non-letter, and serves as a terminator for the letters sequence.
Then all we have left to do is to fetch the group with index 1 (group 1 is what we have in the first parenthesis).
maybe you should try this \d+\w+.*
I have a string "'GLO', FLO" Now, I want a regex expression that will check each words in the string and if:
-word begins and ends with a single quote, replace single quotes with spaces
-if a comma is encounted between words split both words using space.
so, in the end, I should get GLO FLO.
Any help on how to do this using replaceAll() method on the string?
This regex didn't do it for me : "'([^' ]+)|\\s+'"
public static void displaySplitString(final String str) {
String pattern1 = "^'?(\\w+)'?,\\s+(\\w+)$";
StringTokenizer strTok = new StringTokenizer(str, " , ");
while (strTok.hasMoreTokens()) {
String delim = (strTok.nextToken());
delim.replaceAll(pattern1, "$1$2");
System.out.println(delim);
}
} //in main method displaySplitString("'GLO', FLO");
Here is the snippet that should get you going:
public static void displaySplitString(String str)
{
String pattern1 = "^'?(\\w+)'?(?=\\S)";
str = str.replaceAll(pattern1, " $1 ");
StringTokenizer strTok = new StringTokenizer(str, " , ");
while (strTok.hasMoreTokens())
{
String delim = (strTok.nextToken());
System.out.println(delim);
}
}
Here,
I change str argument declaration as not final (so that we could change the str value inside the method)
I am using the first regex ^'?(\\w+)'?(?=\\S) to remove potential single quotes from around the first word
Since you use a StringTokenizer, just 2 lines inside the while block are enough.
The regex means:
^ - Start looking for the match at the very start of the string
'? - match 0 or 1 single quote
(\\w+) - match and capture 1 or more alphanumeric symbols (we'll refer to them as $1 in the replacement pattern)
'? - match 0 or 1 single quote
(?=\\S) - match only if there is no space after the optional single quote. Perhaps, you can even replace this lookahead with a mere , if you always have it there, after the first word.
I am attempting to split a word from its punctuation:
So for example if the word is "Hello?". I want to store "Hello" in one variable and the "?" in another variable.
I tried using .split method but deletes the delimiter (the punctuation) , which means you wouldn't conserve the punctuation character.
String inWord = "hello?";
String word;
String punctuation = null;
if (inWord.contains(","+"?"+"."+"!"+";")) {
String parts[] = inWord.split("\\," + "\\?" + "\\." + "\\!" + "\\;");
word = parts[0];
punctuation = parts[1];
} else {
word = inWord;
}
System.out.println(word);
System.out.println(punctuation);
I am stuck I cant see another method of doing it.
Thanks in advance
You could use a positive lookahead to split so you don't actually use the punctuation to split, but the position right before it:
inWord.split("(?=[,?.!;])");
ideone demo
Further to the other suggestions, you can also use the 'word boundary' matcher '\b'. This may not always match what you are looking for, it detects the boundary between a word and a non-word, as documented: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
In your example, it works, though the first element in the array will be a blank string.
Here is some working code:
String inWord = "hello?";
String word;
String punctuation = null;
if (inWord.matches(".*[,?.!;].*")) {
String parts[] = inWord.split("\\b");
word = parts[1];
punctuation = parts[2];
System.out.println(parts.length);
} else {
word = inWord;
}
System.out.println(word);
System.out.println(punctuation);
You can see it running here: http://ideone.com/3GmgqD
I've also fixed your .contains to use .matches instead.
I think you can use the below regex. But not tried. Give it a try.
input.split("[\\p{P}]")
You could use substring here. Something like this:
String inWord = "hello?";
String word = inWord.substring (0, 5);
String punctuation = inWord.substring (5, inWord.length ());
System.out.println (word);
System.out.println (punctuation);
While trying to split a string xyz213123kop234430099kpf4532 into tokens :
xyz213123
kop234430099
kpf4532
I wrote the following code
String s = "xyz213123kop234430099kpf4532";
String regex = "/^[a-zA-z]+[0-9]+$/";
String tokens[] = s.split(regex);
for(String t : tokens) {
System.out.println(t);
}
but instead of tokens, I get the whole string as one output. What is wrong with the regular expression I used ?
You can do that:
String s = "xyz213123kop234430099kpf4532";
String[] result = s.split("(?<=[0-9])(?=[a-z])");
The idea is to use zero width assertions to find the place where to cut the string, then I use a lookbehind (preceded by a digit [0-9]) and a lookahead (followed by a letter [a-z]).
These lookarounds are just checks and match nothing, thus the delimiter of the split is an empty string and no characters are removed from the result.
You could split on this matching between a number and not-a-number.
String s = "xyz213123kop234430099kpf4532";
String[] parts = s.split("(?<![^\\d])(?=\\D)");
for (String p : parts) {
System.out.println(p);
}
Output
xyz213123
kop234430099
kpf4532
There's nothing in your string that matches the regular expression, because your expression starts with ^ (beginning of string) and ends with $ (end of string). So it would either match the whole string, or nothing at all. But because it doesn't match the string, it is not found when you split the string into tokens. That's why you get just one big token.
You don't want to use split for that. The argument to split is the delimiter between tokens. You don't have that. Instead, you have a pattern that repeats and you want each match to the pattern. Try this instead:
String s = "xyz213123kop234430099kpf4532";
Pattern p = Pattern.compile("([a-zA-z]+[0-9]+)");
Matcher m = p.matcher(s);
while (m.find()) {
System.out.println(m.group());
}
Output:
xyz213123
kop234430099
kpf4532
(I don't know by what logic you would have the second token be "3kop234430099" as in your posted question. I assume that the leading "3" is a typo.)
The String will looks like this:
String temp = "IF (COND_ITION) (ACT_ION)";
// Only has one whitespace in either side of the parentheses
or
String temp = " IF (COND_ITION) (ACT_ION) ";
// Have more irrelevant whitespace in the String
// But no whitespace in condition or action
I hope to get a new String array which contains three elemets, ignore the parentheses:
String[] tempArray;
tempArray[0] = IF;
tempArray[1] = COND_ITION;
tempArray[2] = ACT_ION;
I tried to use String.split(regex) method but I don't know how to implement the regex.
If your input string will always be in the format you described, it is better to parse it based on the whole pattern instead of just the delimiter, as this code does:
Pattern pattern = Pattern.compile("(.*?)[/s]\\((.*?)\\)[/s]\\((.*?)\\)");
Matcher matcher = pattern.matcher(inputString);
String tempArray[3];
if(matcher.find()) {
tempArray[0] name = matcher.group(1);
tempArray[1] name = matcher.group(2);
tempArray[2] name = matcher.group(3);
}
Pattern breakdown:
(.*?) IF
[/s] white space
\\((.*?)\\) (COND_ITION)
[/s] white space
\\((.*?)\\) (ACT_ION)
You can use StringTokenizer to split into strings delimited by whitespace. From Java documentation:
The following is one example of the use of the tokenizer. The code:
StringTokenizer st = new StringTokenizer("this is a test");
while (st.hasMoreTokens()) {
System.out.println(st.nextToken());
}
prints the following output:
this
is
a
test
Then write a loop to process the strings to replace the parentheses.
I think you want a regular expression like "\\)? *\\(?", assuming any whitespace inside the parentheses is not to be removed. Note that this doesn't validate that the parentheses match properly. Hope this helps.