I have been trying to split a string that contains text in Vietnamese into individual words. For example:
s = "Chào bạn, mình tên Đạt."
Will be splitted into an array:
arr = {"Chào", "bạn", "mình", "tên", "Đạt"}
Normally in English, this would be easily solve by 1 line only:
arr = s.split("\\W+");
but since there are many non-alphabetic letters in Vietnamese, it can't be solve by just one line. So the question is: Is there any regular expressions that can replace this "\W+" (I'm not very good with regular expressions)? If not, is there any other ways around it?
Split a String by space and punctuation. You can add your punctuation. As some of the characters in regex are reserved, I prefer to use them a in a character class [].
arr = s.split("([ ]|[.]|[,]|[:]|[?])+"); //You can customize punctuation.
This is a working example.
public static void main(String[] args) {
String inputStr = "Chào bạn, mình tên Đạt.";
String [] splitArray = inputStr.split("([ ]|[.]|[,]|[:]|[?])+");
for (String s : splitArray) {
System.out.println(s);
}
}
Prints:
Chào
bạn
mình
tên
Đạt
Update
In case of simple space character [ ], it works well. However, for this String.
String inputStr = "Chào bạn,\n mình tên\t Đạt.";
Result
Chào
bạn
mình
tên
Đạt
To fix it, use space character class - \s.
String [] splitArray = inputStr.split("(\\s|[.]|[,]|[:]|[?])+");
Or loop through the array of Strings, and trim them.
Related
String Str = new String("(300+23)*(43-21)/(84+7)");
System.out.println("Return Value :" );
String[] a=Str.split(Str);
String a="("
String b="300"
String c="+"
I want to convert this single string to an array giving output as above till the end of the equation using split method any suggestions
The above code doesn't works for it
When you write Str.split(Str); , the parameter of the split function should be the string by which you want to break the bigger string into an array of smaller strings.
For example,
String s = "this is a string";
String [] array = s.split(" ");
The parameter for the split function here is basically just a space, so the split function will break the s string into parts delimited by the " " spring, which will result in array having the following values: {"this", "is", "a", "string"}.
I think this example is conclusive. What you are doing in your code is basically trying to break your string into parts using the string itself, which of course makes no sense.
You won't find an answer to what you want to achieve using just the split function, because there is no good string to act like a token by which to delimit the bigger string.
You could use a simple regular expression to achieve what you want, e.g.
public static void main(final String[] args) {
final String string = "(300+23)*(43-21)/(84+7)";
final String[] arr = string.split("(?<![\\d.])|(?![\\d.])");
for (final String s : arr)
System.out.println(s);
}
Has to be modified a bit if whitespace could be present etc. but works for your example input string.
I have the following regex:
String regx = "\\d{2}\\w{3}";
It says, two numeric and three alphanumeric chars in string. I want to split String by using above regex.
Example:
String stringToSplit = "99E0L";
Output will be
99
E0L
Is it possible to split above String using above regex in Java? What API should I use to do it?
Capturing instead of splitting should be your obvious choice. But if you want to split, then you can use zero-width assertions.
public static void main(String[] args) {
String stringToSplit = "99E0L";
String[] arr = stringToSplit.split("(?<=\\G\\d{2})");
for (String s : arr) {
System.out.println(s);
}
}
O/P:
99
E0L
Basically there are some images in my folder called Patterns. All images are in png file format.
Below is the code I'm using:
import java.io.File;
public class IMG_List {
public static void main(String [] args){
File file = new File("C:/images/Patterns");
String[] str = file.list();
for(String f_name : str){
String[] str_name = f_name.split(".");
System.out.println(str_name[0]);
}
}
}
When i use the above code I get:
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 0
at IMG_List.main(IMG_List.java:11)
However when i use the following code i get no error
import java.io.File;
public class IMG_List {
public static void main(String [] args){
File file = new File("C:/images/Patterns");
String[] str = file.list();
for(String f_name : str){
String[] str_name = f_name.split("png");
System.out.println(str_name[0]);
}
}
}
Why am I not being to split the string with the dot ?
Thank you,
MMK.
The '.' character in regular expressions means any character, according to the Pattern javadocs.
. Any character (may or may not match line terminators)
So, you get a bunch of empty strings in between the characters. But the no-arg split method discards trailing empty strings, and they're all empty, so you get a 0-length array, which explains the exception you received.
You must escape the '.' character with a backslash. To create a backslash character, you must escape the backslash itself for Java. Try
String[] str_name = f_name.split("\\.");
Then you'll get 2 elements in your array, e.g. "C:/images/Patterns/example" and "png".
you have to use escape characters before dot in order to be re-presentable as a regexp since split function accept regexp
public String[] split(String regex)
use \\. in regexp to represent dot because . means any character in regexp
You have to escape the dot:
String[] str_name = f_name.split("\\.");
If all the images are in PNG format, then you can also use String.substring()
String st_name = f_name.substring(0,f_name.length()-4);
How can I split the following word in to an array
That's the code
into
array
0 That
1 s
2 the
3 code
I tried something like this
String str = "That's the code";
String[] strs = str.split("\\'");
for (String sstr : strs) {
System.out.println(sstr);
}
But the output is
That
s the code
To specifically split on white space and the apostrophe:
public class Split {
public static void main(String[] args) {
String [] tokens = "That's the code".split("[\\s']");
for(String s:tokens){
System.out.println(s);
}
}
}
or to split on any non word character:
public class Split {
public static void main(String[] args) {
String [] tokens = "That's the code".split("[\\W]");
for(String s:tokens){
System.out.println(s);
}
}
}
The best solution I've found to split by words if your string contains accentuated letters is :
String[] listeMots = phrase.split("\\P{L}+");
For instance, if your String is
String phrase = "Salut mon homme, comment ça va aujourd'hui? Ce sera Noël puis Pâques bientôt.";
Then you will get the following words (enclosed within quotes and comma separated for clarity) :
"Salut", "mon", "homme", "comment", "ça", "va", "aujourd", "hui", "Ce",
"sera", "Noël", "puis", "Pâques", "bientôt".
Hope this helps!
You can split according to non-characters chars:
String str = "That's the code";
String[] splitted = str.split("[\\W]");
For your input, output will be:
That
s
the
code
You can split by a regex that would be one of the two characters - quote or space:
String[] strs = str.split("['\\s]");
You should first replace the ' with " " (blank space), using str.replaceAll("'", " ") and then you can split the string on the blank space separator, using str.split(" ").You could alternatively use a regular expression to split on ' OR space.
If you want to split on non alphabetic chars
String str = "That's the code";
String[] strs = str.split("\\P{Alpha}+");
for (String sstr : strs) {
System.out.println(sstr);
}
\P{Alpha} matches any non-alphabetic character and this is called POSIX character you can read more about it in this link It is very useful. + indicates that we should split on any continuous string of such characters.
and the output will be
That
s
the
code
You can use OR in regular expression
public static void main(String[] args) {
String str = "That's the code";
String[] strs = str.split("'|\\s");
for (String sstr : strs) {
System.out.println(sstr);
}
}
The string will be split by single quote (') or space. The single quote doesn't need to be escaped. The output would be
run:
That
s
the
code
BUILD SUCCESSFUL (total time: 0 seconds)
split uses regex and in regex ' is not special character so you don't need to escape it with \. To represent whitespaces you can use \s (which in String needs to be written as "\\s"). Also to create set of characters you can use "OR" operator | like a|b|c|d, or just use character class [abcd] which means exactly the same as (a|b|c|d).
To makes things simple you can use
String[] strs = str.split("'| ");
or
String[] strs = str.split("'|\\s");//to include all whitespaces
or
String[] strs = str.split("['\\s]");//equivalent of "'|\\s"
I've got a string '123' (yes, it's a string in my program). Could anyone explain, when I use this method:
String[] str1Array = str2.split(" ");
Why I got str1Array[0]='123' rather than str1Array[0]=1?
str2 does not contain any spaces, therefore split copies the entire contents of str2 to the first index of str1Array.
You would have to do:
String str2 = "1 2 3";
String[] str1Array = str2.split(" ");
Alternatively, to find every character in str2 you could do:
for (char ch : str2.toCharArray()){
System.out.println(ch);
}
You could also assign it to the array in the loop.
str2.split("") ;
Try this:to split each character in a string .
Output:
[, 1, 2, 3]
but it will return an empty first value.
str2.split("(?!^)");
Output :
[1, 2, 3]
the regular expression that you pass to the split() should have a match in the string so that it will split the string in places where there is a match found in the string. Here you are passing " " which is not found in '123' hence there is no split happening.
Because there's no space in your String.
If you want single chars, try char[] characters = str2.toCharArray()
Simple...You are trying to split string by space and in your string "123", there is no space
This is because the split() method literally splits the string based on the characters given as a parameter.
We remove the splitting characters and form a new String every time we find the splitting characters.
String[] strs = "123".split(" ");
The String "123" does not have the character " " (space) and therefore cannot be split apart. So returned is just a single item in the array - { "123" }.
To do the "Split" you must use a delimiter, in this case insert a "," between each number
public static void main(String[] args) {
String[] list = "123456".replaceAll("(\\d)", ",$1").substring(1)
.split(",");
for (String string : list) {
System.out.println(string);
}
}
Try this:
String str = "123";
String res = str.split("");
will return the following result:
1,2,3