Going by the suggestion provided here
I tried using \\W as a delimiter for non word character in string.split function of java.
String str = "id-INT, name-STRING,";
This looks like a really simple string. I wanted to extract just the words from this string. The length of the array that I get is 5 whereas it should be 4. There is an empty string at position right after INT. I don't understand why the space in there is not being considered as non word
The , and the space are been treated as separate entities, try using \\W+ instead
String str = "id-INT, name-STRING,";
String[] parts = str.split("\\W+");
System.out.println(parts.length);
System.out.println(Arrays.toString(parts));
Which outputs
4
[id, INT, name, STRING]
Related
I am trying to split a string that has two numbers and possibly a letter that will look similar to:
(2,3) (2,6) p (8,5) p (5,6)
I am trying:
String inputTokens = input.split([(),\\s]);
but that leaves me with with a bunch of empty strings in the tokens array. How do I stop them from appearing in the first place?
For clarification: By empty string I mean a string containing nothing, not even a space
Add the "one or more times" greediness quantifier to your character class:
String[] inputTokens = input.split("[(),\\s]+");
This will result in one leading empty String, which is unavoidable when using the split() method and splitting away the immediate start of the String and otherwise no empty Strings.
String inputTokens[] = input.split("[(),\\s]+");
This will read the whitespace as part of the regex so there will be no empty entries in your array.
Given a string S, find the number of words in that string. For this problem a word is defined by a string of one or more English letters.
Note: Space or any of the special characters like ![,?.\_'#+] will act as a delimiter.
Input Format: The string will only contain lower case English letters, upper case English letters, spaces, and these special characters: ![,?._'#+].
Output Format: On the first line, print the number of words in the string. The words don't need to be unique. Then, print each word in a separate line.
My code:
Scanner sc = new Scanner(System.in);
String str = sc.nextLine();
String regex = "( |!|[|,|?|.|_|'|#|+|]|\\\\)+";
String[] arr = str.split(regex);
System.out.println(arr.length);
for(int i = 0; i < arr.length; i++)
System.out.println(arr[i]);
When I submit the code, it works for just over half of the test cases. I do not know what the test cases are. I'm asking for help with the Murphy's law. What are the situations where the regex I implemented won't work?
You don't escape some special characters in your regex. Let's start with []. Since you don't escape them, the part [|,|?|.|_|'|#|+|] is treated like a set of characters |,?._'#+. This means that your regex doesn't split on [ and ].
For example x..]y+[z is split to x, ]y and [z.
You can fix that by escaping those characters. That will force you to escape more of them and you end up with a proper definition:
String regex = "( |!|\\[|,|\\?|\\.|_|'|#|\\+|\\])+";
Note that instead of defining alternatives, you could use a set which will make your regex easier to read:
String regex = "[!\\[,?._'#+\\].]+";
In this case you only need to escape [ and ].
UPDATE:
There's also a problem with leading special character (like in your example ".Hi?there[broski.]#####"). You need to split on it but it produces an empty string in the results. I don't think there's a way to use split function without producing it but you can mitigate it by removing the first group before splitting using the same regex:
String[] arr = str.replaceFirst(regex, "").split(regex);
Going by the suggestion provided here
I tried using \\W as a delimiter for non word character in string.split function of java.
String str = "id-INT, name-STRING,";
This looks like a really simple string. I wanted to extract just the words from this string. The length of the array that I get is 5 whereas it should be 4. There is an empty string at position right after INT. I don't understand why the space in there is not being considered as non word
The , and the space are been treated as separate entities, try using \\W+ instead
String str = "id-INT, name-STRING,";
String[] parts = str.split("\\W+");
System.out.println(parts.length);
System.out.println(Arrays.toString(parts));
Which outputs
4
[id, INT, name, STRING]
I have strings of the following format
(1, 3, value)
I want to split this string using String.split so that I can get the three values without the delemeters. I want the output to look like
1
3
value
The problem is that the value sometimes contains delimeter values itself, such as the string revolution_(1848)
How can I do this with String.split(), so that I can split the words based on commas and inside the brackets, so I get only the three values.
Thanks.
Assuming the parentheses are always on the outside, just don't consider them and split on the ,s:
String toSplit = "(1, 3, value, revolution_(1848))";
toSplit = toSplit.substring(1,toSplit.length() - 1); //ignore wrapping characters.
String[] splitted = toSplit.split(",");
I have a string like
String myString = "hello world~~hello~~world"
I am using the split method like this
String[] temp = myString.split("~|~~|~~~");
I want the array temp to contain only the strings separated by ~, ~~ or ~~~.
However, the temp array thus created has length 5, the 2 additional 'strings' being empty strings.
I want it to ONLY contain my non-empty string. Please help. Thank you!
You should use quantifier with your character:
String[] temp = myString.split("~+");
String#split() takes a regex. ~+ will match 1 or more ~, so it will split on ~, or ~~, or ~~~, and so on.
Also, if you just want to split on ~, ~~, or ~~~, then you can limit the repetition by using {m,n} quantifier, which matches a pattern from m to n times:
String[] temp = myString.split("~{1,3}");
When you split it the way you are doing, it will split a~~b twice on ~, and thus the middle element will be an empty string.
You could also have solved the problem by reversing the order of your delimiter like this:
String[] temp = myString.split("~~~|~~|~");
That will first try to split on ~~, before splitting on ~ and will work fine. But you should use the first approach.
Just turn the pattern around:
String myString = "hello world~~hello~~world";
String[] temp = myString.split("~~~|~~|~");
Try This :
myString.split("~~~|~~|~");
It will definitely works. In your code, what actually happens that when ~ occurs for the first time,it count as a first separator and split the string from that point. So it doesn't get ~~ or ~~~ anywhere in your string though it is there. Like :
[hello world]~[]~[hello]~[]~[world]
Square brackets are split-ed in to 5 different string values.