Java - Strings - counting sentences, words, etc - java

Here's a problem I'm facing when I try to break my string (which I took as an input).
IDK why it is taking this extra 'space' in the beginning of the arrayTry.
Why: " H i ."? Why not "H i ."
Please help me out.
2) Furthermore, how can i take paragraph as an input?
How will the 'new line' be shown in the broken string array?
Thanks.

There is no extra space. That is just an empty String with a space behind it. Namely, the one you wrote in your loop:
System.out.println(arrayTry[i] + " ");
So, this means that arrayTry[0] is an empty String.
When you split on "", it will return an array of Strings, where there is one character per String, and apparently an extra empty String in the beginning and at the end.
To split on newlines, just write it using the \n escape character for newline:
String[] paragraphs = input.split("\n+");

Trim your string (ie remove leading and trailing spaces) before splitting.
String strng = input.nextline();
strng = strng.trim();
strng.split(" ");

Related

How to remove white-space at the front of a string using string split in java?

I was solving a problem on string.split(), but I couldn't find the solution to remove white-space at the start. Actually the problem is in the regular expression. It needs to be changed. I tried to solve by changing the expression several times but didn't work out. What will be the regular expression? Here is the code given below:
String s = " YES leading spaces are valid, problemsetters are evillllll";
String delims = "[\\s._,?!'#\\t]+";
String[] words = s.split(delims);
System.out.println(words.length - 1);
for(String w:words) {
System.out.println(w);
}
System.out.println(Arrays.toString(words));
output:
8
YES
leading
spaces
are
valid
problemsetters
are
evillllll
[, YES, leading, spaces, are, valid, problemsetters, are, evillllll]
You can use String class trim() method to remove the space before using split

How to use split() to remove all delimiters from a sentence in Java?

String text = "Good morning. Have a good class. " +
"Have a good visit. Have fun!";
String[] words = text.split("[ \n\t\r.,;:!?(){");
This split method is provided in text book, meant to remove all the delimiters in the sentence as well as white space character but clearly it is not working and throws a regex exception to my disappointment....I am wondering what could we do here to make it work? The requirement is after the split method, everything in the `String[] words are either just English words without any delimiters attaching to it or whitespace character! Thanks a lot!
You are missing closing ] in your character class:
String[] words = text.split("[ \n\t\r.,;:!?(){]");
btw you can just do (and it is better option):
String[] words = text.split("\\W+");
to split on any non-word character.
String.split() is NOT for removing characters. It is used to divide the String into smaller substrings.
Example:
String s = "This is a string!";
String[] tokens = s.split(" ");
Split will have used the String " " (one space character) as a delimiter to, well, split the string. As a result, the array tokens will look something like
{"This", "is", "a", "string!"}
If you want to remove characters, try taking a look at String.replaceAll()

How to select string until a certain character in java

I have string where only a certain part should be selected. until i reach a character.
Ex. 5000 - 10000 i want only 5000 until the - or the white space.
input.replace("","");
What Regular expression should i be using.
Something like this:
final String beforeDash = input.split("-")[0].trim();
This should solve your problem:
String[] parts = input.split("-");
The string you are looking for is then in parts[0].
If you want to split on the whitespace instead of the dash, use string.split(" ").
You could try the below code which matches the first space or - upto the last character. Replacing those matched characters with an empty string will gave you the desired output.
input.replaceAll("[\\s-].*","");
You could also use string.split function.
String[] parts = input.split("[\\s-]");
System.out.println(parts[0]);
The above split function would split the input according to a space or a hyphen. Printing the index 0 from the splitted parts will give you the desired output.

What's wrong with my split() and its regex?

Part of my application I encountered this problem. The String line variable contains 12.2 Andrew and I'm trying to split them separately but it doesn't work and comes with a NumberFormatException error. Could you guys help me on that please?
String line = "12.2 Andrew";
String[] data = line.split("(?<=\\d)(?=[a-zA-Z])");
System.out.println(Double.valueOf.(data[0]));
Did you look at your data variable? It didn't split anything at all, since the condition never matches. You are looking for a place in the input immediately after a number and before a letter, and since there is a space in between this doesn't exist.
Try adding a space in the middle, that should fix it:
String[] data = line.split("(?<=\\d) (?=[a-zA-Z])");
Your split is not working, and not splitting the String.
Therefore Double.parseDouble is parsing the whole input.
Try the following:
String line = "12.2 Andrew";
String[] data = line.split("(?<=\\d)(?=[a-zA-Z])");
System.out.println(Arrays.toString(data));
// System.out.println(Double.valueOf(data[0]));
// fixed
data = line.split("(?<=\\d).(?=[a-zA-Z])");
System.out.println(Arrays.toString(data));
System.out.println(Double.valueOf(data[0]));
Output
[12.2 Andrew]
[12.2, Andrew]
12.2
If you print content of data[0] you will notice that it still contains 12.2 Andrew so you actually didn't split anything. That is because your regex says:
split on place which has digit before and letter after it
which for data like
123foo345bar 123 baz
effectively can only split in places marked with |
123|foo345|bar 123 baz
^it will not split `123 baz` like
`123| baz` because after digit is space (not letter)
`123 |baz` before letter is space (not digit)
so regex can't match it
What you need is to "split on space which has digit before and letter after it" so use
String[] data = line.split("(?<=\\d)\\s+(?=[a-zA-Z])");
// ^^^^ - this represent one ore more whitespaces

Split string by punctuation marks in Java

I'm trying to do the following:
String[] Res = Text.split("[\\p{Punct}\\s]+");
But, I always get a few words with space before them.
How can I parse the sentence without getting spaces and other punctuation marks as a part of the word itself?
Since you didn’t provide a sample input which can reproduce the problem I can only guess. I can’t see why the regex you provided should ever leave spaces in the result unless you are using non-ASCII white-space or punctuation characters. The reason that is both \\p{Punct} and \\s are POSIX character classes limited to ASCII, e.g. \\s will not match \u00a0. Use [\\p{IsPunctuation}\\p{IsWhite_Space}]+ if non-ASCII punctuation and white-space characters are your problem.
Example
String text="Some\u00a0words stick together⁈";
String[] res1 = text.split("[\\p{Punct}\\s]+");
System.out.println(Arrays.toString(res1));
String[] res2 = text.split("[\\p{IsPunctuation}\\p{IsWhite_Space}]+");
System.out.println(Arrays.toString(res2));
will produce:
[Some words, stick, together⁈]
[Some, words, stick, together]
You need to trim() all the Strings in the array before using them. This will eliminate all the leading and trailing white spaces.
str = str.trim();
In your case
for(String str : Res) {
str = str.trim();
// use str now, without any white spaces
}
If you need to keep the punctuations also, then, you need to use the StringTokenizer which takes in the boolean value of keeping the delimiters or not.
For removing spaces trailing or leading whatever it may be use
String str=" java ";
str = str.trim();

Categories

Resources