Traversing through a sentence word by word - java

How is it possible to traverse through any given sentence word by word? Is there any in-built functions in java? I have no idea how to begin.

Something like this:
String sentence = "Your sentence here.";
String[] words = sentence.split("\\s+"); // splits by whitespace
for (String word : words) {
System.out.println(word);
}

A lot of people are suggesting to split on spaces, but even this very sentence contains commas, etc. You should split on more than just spaces; split on punctuation characters too:
String words = sentence.split("([\\s.,;:\"?!,.…(){}[\\]%#/]|(- )|( -))+");
This regex splits on all reasonably expected punctuation characters. Note that the in-word hyphen and the apostrophe are not "punctuation"; they are part of the word.
This approach, or something similar, will also handle non-English character sentences.

String[] array = input.split(" ");
That way the string is converted into an array separated by spaces (you can change the separator in the split()'s argumen) and then you can loop through the array as you want.

Start with StringTokenizer for example or use String.split(" ")

Try splitting the sentence by whitespace character.
String sentence = "This is a sentence.";
for(String word: sentence.split("\\s+"){
System.out.println(word);
}

String s="sfgasdfg jhsadfkjashfd sajdfhjkasdfh hjskafhasj";
String wordArray[] =s.split("\\s+");
for(String sT :wordArray)
{
System.out.println(st);
}

Take a look at the String Split function here http://www.tek-tips.com/viewthread.cfm?qid=1167964

Assuming you already have the sentence stored as a string, you could use the String.replaceAll("[./,]"," ") method to remove the stop words and then use the String.split("\\s+") to obtain the individual words making up the phrase.

you can use StringTokenizer class which will divide the string into words.
public static void main(String ae[]){
String st = "This is Java";
StringTokenizer str= new StringTokenizer(st);
while(str.hasMoreTokens()){
System.out.println(str.nextToken());
}
}

I would Say StringTokenizer might help You.
String str = "This is String , split by StringTokenizer, created by mkyong";
StringTokenizer st = new StringTokenizer(str);
System.out.println("---- Split by space ------");
while (st.hasMoreElements()) {
System.out.println(st.nextElement());
}
System.out.println("---- Split by comma ',' ------");
StringTokenizer st2 = new StringTokenizer(str, ",");
while (st2.hasMoreElements()) {
System.out.println(st2.nextElement());
}
Also String.split() may help You:
String[] result = "this is a test".split("\\s");
for (int x=0; x<result.length; x++)
System.out.println(result[x]);
OUTPUT:
this
is
a
test

System.out.println(Arrays.toString(
"Many words//separated.by-different\tcharacters"
.split("\\W+")));
//[Many, words, separated, by, different, characters]

Related

Spilt sentence to array of words keep fomat

I wanna spilt a sentence to a array of words
e.g.
Hello this a sentence
this is a new line
I used String[] arr = String.spilt(str)
but when i wanna combine array to sentence the result is:
Hello this a sentence this is a new line
So how i can split string keep new line characters?
Thank you!
Split takes regular expression,
split("(?=\n)")
This will preserve the \n
Provided your string contains \n
public static void main(String[] arg){
String str= "Hello this a sentence \n this is a new line";
System.out.println(str);
String[] xyz = str.split("(?=\n)");
System.out.println(Arrays.toString(xyz));
}
this is a new line
you should use any special character when you make string i.e.
string str = "Hello this a sentence" +"\n"+ "this is a new line"
Then split your string on the base of special character i.e.
string[] arr = str.split("\n");

How split string at semi-colon that appears before a colon

How can I split this string power:110V;220V;Color:Pink;White;Type:1;2;Condition:New;Used;
into these 4 strings
power:110V;220V;
Color:Pink;White;
Type:1;2;
Condition:New;Used;
Split your input according to the below regex.
string.split("(?<=;)(?=\\w+:)");
The above regex would match all the boundaries which exists next to a semicolon and the boundary must be followed by one or more word characters and a colon.
OR
string.split("(?<=;)(?=[^;:]*:)");
Example:
String s = "power:110V;220V;Color:Pink;White;Type:1;2;Condition:New;Used;";
String[] parts = s.split("(?<=;)(?=\\w+:)");
for(String i: parts)
{
System.out.println(i);
}

Splitting a string into two

I am attempting to split a word from its punctuation:
So for example if the word is "Hello?". I want to store "Hello" in one variable and the "?" in another variable.
I tried using .split method but deletes the delimiter (the punctuation) , which means you wouldn't conserve the punctuation character.
String inWord = "hello?";
String word;
String punctuation = null;
if (inWord.contains(","+"?"+"."+"!"+";")) {
String parts[] = inWord.split("\\," + "\\?" + "\\." + "\\!" + "\\;");
word = parts[0];
punctuation = parts[1];
} else {
word = inWord;
}
System.out.println(word);
System.out.println(punctuation);
I am stuck I cant see another method of doing it.
Thanks in advance
You could use a positive lookahead to split so you don't actually use the punctuation to split, but the position right before it:
inWord.split("(?=[,?.!;])");
ideone demo
Further to the other suggestions, you can also use the 'word boundary' matcher '\b'. This may not always match what you are looking for, it detects the boundary between a word and a non-word, as documented: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html
In your example, it works, though the first element in the array will be a blank string.
Here is some working code:
String inWord = "hello?";
String word;
String punctuation = null;
if (inWord.matches(".*[,?.!;].*")) {
String parts[] = inWord.split("\\b");
word = parts[1];
punctuation = parts[2];
System.out.println(parts.length);
} else {
word = inWord;
}
System.out.println(word);
System.out.println(punctuation);
You can see it running here: http://ideone.com/3GmgqD
I've also fixed your .contains to use .matches instead.
I think you can use the below regex. But not tried. Give it a try.
input.split("[\\p{P}]")
You could use substring here. Something like this:
String inWord = "hello?";
String word = inWord.substring (0, 5);
String punctuation = inWord.substring (5, inWord.length ());
System.out.println (word);
System.out.println (punctuation);

Split Strings in java by words

How can I split the following word in to an array
That's the code
into
array
0 That
1 s
2 the
3 code
I tried something like this
String str = "That's the code";
String[] strs = str.split("\\'");
for (String sstr : strs) {
System.out.println(sstr);
}
But the output is
That
s the code
To specifically split on white space and the apostrophe:
public class Split {
public static void main(String[] args) {
String [] tokens = "That's the code".split("[\\s']");
for(String s:tokens){
System.out.println(s);
}
}
}
or to split on any non word character:
public class Split {
public static void main(String[] args) {
String [] tokens = "That's the code".split("[\\W]");
for(String s:tokens){
System.out.println(s);
}
}
}
The best solution I've found to split by words if your string contains accentuated letters is :
String[] listeMots = phrase.split("\\P{L}+");
For instance, if your String is
String phrase = "Salut mon homme, comment ça va aujourd'hui? Ce sera Noël puis Pâques bientôt.";
Then you will get the following words (enclosed within quotes and comma separated for clarity) :
"Salut", "mon", "homme", "comment", "ça", "va", "aujourd", "hui", "Ce",
"sera", "Noël", "puis", "Pâques", "bientôt".
Hope this helps!
You can split according to non-characters chars:
String str = "That's the code";
String[] splitted = str.split("[\\W]");
For your input, output will be:
That
s
the
code
You can split by a regex that would be one of the two characters - quote or space:
String[] strs = str.split("['\\s]");
You should first replace the ' with " " (blank space), using str.replaceAll("'", " ") and then you can split the string on the blank space separator, using str.split(" ").You could alternatively use a regular expression to split on ' OR space.
If you want to split on non alphabetic chars
String str = "That's the code";
String[] strs = str.split("\\P{Alpha}+");
for (String sstr : strs) {
System.out.println(sstr);
}
\P{Alpha} matches any non-alphabetic character and this is called POSIX character you can read more about it in this link It is very useful. + indicates that we should split on any continuous string of such characters.
and the output will be
That
s
the
code
You can use OR in regular expression
public static void main(String[] args) {
String str = "That's the code";
String[] strs = str.split("'|\\s");
for (String sstr : strs) {
System.out.println(sstr);
}
}
The string will be split by single quote (') or space. The single quote doesn't need to be escaped. The output would be
run:
That
s
the
code
BUILD SUCCESSFUL (total time: 0 seconds)
split uses regex and in regex ' is not special character so you don't need to escape it with \. To represent whitespaces you can use \s (which in String needs to be written as "\\s"). Also to create set of characters you can use "OR" operator | like a|b|c|d, or just use character class [abcd] which means exactly the same as (a|b|c|d).
To makes things simple you can use
String[] strs = str.split("'| ");
or
String[] strs = str.split("'|\\s");//to include all whitespaces
or
String[] strs = str.split("['\\s]");//equivalent of "'|\\s"

Break String into Sub-Strings, Android

I am making a program which would have the user enter a sentence and following that, the app would break the String into sub-strings where spaces are what break the original string up.
import java.util.StringTokenizer;
public class whitespace {
public static void main(String[] args) {
String text = "supervisors signature tom hanks";
int tokenCount; //number of words
int idx=0; // index
String words[]=new String [500]; // space for words
StringTokenizer st=new StringTokenizer(text); // split text into segements
tokenCount=st.countTokens();
while (st.hasMoreTokens()) // is there stuff to get?
{
words[idx]=st.nextToken();
idx++;
}
}
I have this code thus far and while it works fine as a regular Java program, the while loop seems to cause the app to go into an infinite loop. Any ideas?
I think that you can use the String.split method for this:
String text = "supervisors signature tom hanks";
String[] tokens = text.split("\\s+");
for (String str : tokens)
{
//Do what you need with your tokens here.
}
The regex will split the text into sentences wherever it encounters one or more space characters.
According to this page, the StringTokenizer has been replaced with String.split.
Use this:
words = text.split(" ");
String[] words = text.split(" ");
Use Apache StrTokenizer
StrTokenizer strTok = new StrTokenizer(text);
String[] strList = strTok.getTokenArray();
http://commons.apache.org/lang/api-2.6/org/apache/commons/lang/text/StrTokenizer.html
StringTokenizer sta=new StringTokenizer(text); // split text into segements
String[] words= new String[100];int idx=0;
while (sta.hasMoreTokens()) // is there stuff to get?
{
words[idx]=sta.nextToken();
System.out.println(words[idx]);
idx++;
}
This is what I copied your code and executed by changing little and it worked fine.

Categories

Resources