How to use split function when input is new line? - java

The question is we have to split the string and write how many words we have.
Scanner in = new Scanner(System.in);
String st = in.nextLine();
String[] tokens = st.split("[\\W]+");
When I gave the input as a new line and printed the no. of tokens .I have got the answer as one.But i want it as zero.What should i do? Here the delimiters are all the symbols.

Short answer: To get the tokens in str (determined by whitespace separators), you can do the following:
String str = ... //some string
str = str.trim() + " "; //modify the string for the reasons described below
String[] tokens = str.split("\\s+");
Longer answer:
First of all, the argument to split() is the delimiter - in this case one or more whitespace characters, which is "\\s+".
If you look carefully at the Javadoc of String#split(String, int) (which is what String#split(String) calls), you will see why it behaves like this.
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
This is why "".split("\\s+") would return an array with one empty string [""], so you need to append the space to avoid this. " ".split("\\s+") returns an empty array with 0 elements, as you want.
When there is a positive-width match at the beginning of this string then an empty leading substring is included at the beginning of the resulting array.
This is why " a".split("\\s+") would return ["", "a"], so you need to trim() the string first to remove whitespace from the beginning.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Since String#split(String) calls String#split(String, int) with the limit argument of zero, you can add whitespace to the end of the string without changing the number of words (because trailing empty strings will be discarded).
UPDATE:
If the delimiter is "\\W+", it's slightly different because you can't use trim() for that:
String str = ...
str = str.replaceAll("^\\W+", "") + " ";
String[] tokens = str.split("\\W+");

public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String line = null;
while (!(line = in.nextLine()).isEmpty()) {
//logic
}
System.out.print("Empty Line");
}
output
Empty Line

Related

How do I escape parentheses in java 7?

I'm trying to split some input from BufferedReader.readLine()
String delimiters = " ,()";
String[] s = in.readLine().split(delimiters);
This gives me a runtime error.
Things I have tried that don't work:
String delimiters = " ,\\(\\)";
String delimiters = " ,[()]";
String[] s = in.readLine().split(Pattern.quote("() ,"));
I tried replacing the () using .replaceAll, didn't work
I tried this:
input = input.replaceAll(Pattern.quote("("), " ");
input = input.replaceAll(Pattern.quote(")"), " ");
input = input.replaceAll(Pattern.quote(","), " ");
String[] s = input.split(" ");
but s[] ends up with blank slots that look like this -> "" no clue why its doing that
Mine works, for
String delimiters = "[ \\(\\)]"
Edit:
You forgot Square brakcets which represents, "Any of the characters in the box will be used as delimiters", its a regex.
Edit:
To remove the empty elements: Idea is to replace any anagram of set of delimiters to just 1 delimiter
Like.
// regex to match any anagram of a given set of delimiters in square brackets
String r = "(?!.*(.).*\1)[ \\(\\)]";
input = input.replaceAll(r, "(");
// this will result in having double or more combinations of a single delimiter, so replace them with just one
input = input.replaceAll("[(]+", "(");
Then you will have the input, with any single delimiter. Then use the split, it will not have any blank words.
From your comment:
but I am only input 1 line: (1,3), (6,5), (2,3), (9,1) and I need 13652391 so s[0] = 1, s[1]=3, ... but I get s[0] = "" s[1] = "" s[2] = 1
You get that because your delimiters are either " ", ",", "(" or ")" so it will split at every single delimiter, even if there is no other characters between them, in which case it will be split into an empty string.
There is an easy fix to this problem, just remove the empty elements!
List<String> list = Arrays.stream(
"(1,3), (6,5), (2,3), (9,1)".split("[(), ]")).filter(x -> !x.isEmpty())
.collect(Collectors.toList());
But then you get a List as the result instead of an array.
Another way to do this, is to replace "[(), ]" with "":
String result = "(1,3), (6,5), (2,3), (9,1)".replaceAll("[(), ]", "");
This will give you a string as a result. But from the comment I'm not sure whether you wanted a string or not. If you want an array, just call .split("") and it will be split into individual characters.

why split() produces extra , after sets limit -1

I want to split Area Code and preceding number from Telephone number without brackets so i did this.
String pattern = "[\\(?=\\)]";
String b = "(079)25894029".trim();
String c[] = b.split(pattern,-1);
for (int a = 0; a < c.length; a++)
System.out.println("c[" + a + "]::->" + c[a] + "\nLength::->"+ c[a].length());
Output:
c[0]::-> Length::->0
c[1]::->079 Length::->3
c[2]::->25894029 Length::->8
Expected Output:
c[0]::->079 Length::->3
c[1]::->25894029 Length::->8
So my question is why split() produces and extra blank at the start, e.g
[, 079, 25894029]. Is this its behavior, or I did something go wrong here?
How can I get my expected outcome?
First you have unnecessary escaping inside your character class. Your regex is same as:
String pattern = "[(?=)]";
Now, you are getting an empty result because ( is the very first character in the string and split at 0th position will indeed cause an empty string.
To avoid that result use this code:
String str = "(079)25894029";
toks = (Character.isDigit(str.charAt(0))? str:str.substring(1)).split( "[(?=)]" );
for (String tok: toks)
System.out.printf("<<%s>>%n", tok);
Output:
<<079>>
<<25894029>>
From the Java8 Oracle docs:
When there is a positive-width match at the beginning of this string
then an empty leading substring is included at the beginning of the
resulting array. A zero-width match at the beginning however never
produces such empty leading substring.
You can check that the first character is an empty string, if yes then trim that empty string character.
Your regex has problems, as does your approach - you can't solve it using your approach with any regex. The magic one-liner you seek is:
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
This removes all leading/trailing non-digits, then splits on non-digits. This will handle many different formats and separators (try a few yourself).
See live demo of this:
String b = "(079)25894029".trim();
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
System.out.println(Arrays.toString(c));
Producing this:
[079, 25894029]

Add brackets to sequence of chars in string

I need to put a sequence of characters in a String in brackets in such way that it would choose the longest substring as the optimal to put in brackets. To make it clear because it is too complicated to explain with words:
If my input is:
'these are some chars *£&$'
'these are some chars *£&$^%(((£'
the output in both inputs respectively should be:
'these are some chars (*£&$)'
'these are some chars (*£&$^%)(((£'
so I would like to put in brackets the sequence *£&$^% IF it exists otherwise put in brackets just *£&$
I hope it makes sense!
In the general case, this method works. It surrounds the earliest substring of any keyword in any given String:
public String bracketize() {
String chars = ...; // you can put whatever input (such as 'these are some chars *£&$')
String keyword = ...; // you can put whatever keyword (such as *£&$^%)
String longest = "";
for(int i=0;i<keyword.length()-1;i++) {
for(int j=keyword.length(); j>i; j--) {
String tempString = keyword.substring(i,j);
if(chars.indexOf(tempString) != -1 && tempString.length()>longest.length()) {
longest = tempString;
}
}
}
if(longest.length() == 0)
return chars; // no possible substring of keyword exists in chars, so just return chars
String bracketized = chars.substring(0,chars.indexOf(longest))+"("+longest+")"+chars.substring(chars.indexOf(longest)+longest.length());
return bracketized;
}
The nested for loops check every possible substring of keyword and select the longest one that is contained in the bigger String, chars. For example, if the keyword is Dog, it will check the substrings "Dog", "Do", "D", "og", "o", and "g". It stores this longest possible substring in longest (which is initialized to the empty String). If the length of longest is still 0 after checking every substring, then no such substring of keyword can be found in chars, so the original String, chars, is returned. Otherwise, a new string is returned which is chars with the substring longest surrounded by brackets (parentheses).
Hope this helps, let me know if it works.
Try something like this (assuming target string only occurs once).
String input = "these are some chars *£&$"
String output = "";
String[] split;
if(input.indexOf("*£&$^%")!=(-1)){
split = input.split("*£&$^%");
output = split[0]+"(*£&$^%)";
if(split.length>1){
output = output+split[1];
}
}else if(input.indexOf("*£&$")!=(-1)){
split = input.split("*£&$");
output = split[0]+"(*£&$)";
if(split.length>1){
output = output+split[1];
}
}else{
System.out.println("does not contain either string");
}

Convert a string to an array of strings

If I have:
Scanner input = new Scanner(System.in);
System.out.println("Enter an infixed expression:");
String expression = input.nextLine();
String[] tokens;
How do I scan the infix expression around spaces one token at a time, from left to right and put in into an array of strings? Here a token is defined as an operand, operator, or parentheses symbol.
Example: "3 + (9-2)" ==> tokens = [3][+][(][9][-][2][)]
String test = "13 + (9-2)";
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile("\\d+|\\(|\\)|\\+|\\*|-|/")
.matcher(test);
while (m.find()) {
allMatches.add(m.group());
}
Can someone test this please?
I think it would be easiest to read the line into one string, and then split based on space. There is a handy string function split that does this for you.
String[] tokens = input.split("");
It's probably overkill for your example, but in case it gets more complex, take a look at JavaCC, the Java Compiler Compiler. JavaCC allows you to create a parser in Java based on a grammar definition.
Be aware that it is not an easy tool to get started with. However, the grammar definition will be much easier to read than the corresponding regular expressions.
if tokens[] must be String you can use this
String ex="3 + (9-2)";
String tokens[];
StringTokenizer tok=new StringTokenizer(ex);
String line="";
while(tok.hasMoreTokens())line+=tok.nextToken();
tokens=new String[line.length()];
for(int i=1;i<line.length()+1;i++)tokens[i-1]=line.substring(i-1,i);
tokens can be a charArray so:
String ex="3 + (9-2)";
char tokens[];
StringTokenizer tok=new StringTokenizer(ex);
String line="";
while(tok.hasMoreTokens())line+=tok.nextToken();
tokens=line.toCharArray();
This (IMHO elegant) single line of code works (tested):
String[] tokens = input.split("(?<=[^ ])(?<!\\B) *");
This regex also caters for input containing multiple character numbers (eg 123) which would be split into separate characters but for the negative look-behind for a non-word boundary (?<!\\B).
The first look-behind (?<=[^ ]) prevents an initial blank string split at start if input, and assures spaces are consumed.
The final part of the regex " *" assures spaces are consumed.

Java. How to remove white space on array

For example
I split a string "+name" by +. I got an white space" " and the "name" in the array(this doesn't happen if my string is "name+").
t="+name";
String[] temp=t.split("\\+");
the above code produces
temp[0]=" "
temp[1]=name
I only wants to get "name" without whitespace..
Also if t="name+" then temp[0]=name. I'm wondering what is difference between name+ and +name. Why do I get different output.
simply loop thru the items in array like the one below and remove white space
for (int i = 0; i < temp.length; i++){
temp[i] = if(!temp[i].trim().equals("") || temp[i]!=null)temp[i].trim();
}
The value of the first array item is not a space (" ") but an empty string (""). The following snippet demonstrates the behaviour and provides a workaround: I simply strip leading delimiters from the input. Note, that this should never be used for processing csv files, because a leading delimiter will create an empty column value which is usually wanted.
for (String s : "+name".split("\\+")) {
System.out.printf("'%s'%n", s);
}
System.out.println();
for (String s : "name+".split("\\+")) {
System.out.printf("'%s'%n", s);
}
System.out.println();
for (String s : "+name".replaceAll("^\\+", "").split("\\+")) {
System.out.printf("'%s'%n", s);
}
You get the extra element for "+name"'s case is because of non-empty value "name" after the delimiter.
The split() function only "trims" the trailing delimiters that result to empty elements at the end of an array. See JavaSE Manual.
Examples of .split("\\+") output:
"+++++" = { } // zero length array because all are trailing delimiters
"+name+" = { "", "name" } // trailing delimiter removed
"name+++++" = { "name" } // trailing delimiter removed
"name+" = { "name" } // trailing delimiter removed
"++name+" = { "", "", "name" } // trailing delimiter removed
I would suggest preventing to have those extra delimiters on both ends rather than cleaning up afterwards.
to remove white space
str.replaceAll("\\W","").
String yourString = "name +";
yourString = yourString.replaceAll("\\W", "");
yourArray = yourString.split("\\+");
For a one liner :
String temp[] = t.replaceAll("(^\\++)?(\\+)?(\\+*)?", "$2").split("\\+");
This will replace all multiple plus signs by one, or a plus sign at the start by empty String, and then split on plus signs.
Which will basically eliminate empty Strings in the result.
split(String regex) is equivalent to split(String regex, int limit) with limit = 0. And the documentation of the latter states :
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Which is why a '+' at the start works differently than a '+' at the end
You might want to give guavas Splitter a try. It has a nice fluent api to deal with emptyStrings, trim(), etc.
#Test
public void test() {
final String t1 = "+name";
final String t2 = "name+";
assertThat(split(t1), hasSize(1));
assertThat(split(t1).get(0), is("name"));
assertThat(split(t2), hasSize(1));
assertThat(split(t2).get(0), is("name"));
}
private List<String> split(final String sequence) {
final Splitter splitter = Splitter.on("+").omitEmptyStrings().trimResults();
return Lists.newArrayList(splitter.split(sequence));
}

Categories

Resources