Convert a string to an array of strings - java

If I have:
Scanner input = new Scanner(System.in);
System.out.println("Enter an infixed expression:");
String expression = input.nextLine();
String[] tokens;
How do I scan the infix expression around spaces one token at a time, from left to right and put in into an array of strings? Here a token is defined as an operand, operator, or parentheses symbol.
Example: "3 + (9-2)" ==> tokens = [3][+][(][9][-][2][)]

String test = "13 + (9-2)";
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile("\\d+|\\(|\\)|\\+|\\*|-|/")
.matcher(test);
while (m.find()) {
allMatches.add(m.group());
}
Can someone test this please?

I think it would be easiest to read the line into one string, and then split based on space. There is a handy string function split that does this for you.
String[] tokens = input.split("");

It's probably overkill for your example, but in case it gets more complex, take a look at JavaCC, the Java Compiler Compiler. JavaCC allows you to create a parser in Java based on a grammar definition.
Be aware that it is not an easy tool to get started with. However, the grammar definition will be much easier to read than the corresponding regular expressions.

if tokens[] must be String you can use this
String ex="3 + (9-2)";
String tokens[];
StringTokenizer tok=new StringTokenizer(ex);
String line="";
while(tok.hasMoreTokens())line+=tok.nextToken();
tokens=new String[line.length()];
for(int i=1;i<line.length()+1;i++)tokens[i-1]=line.substring(i-1,i);
tokens can be a charArray so:
String ex="3 + (9-2)";
char tokens[];
StringTokenizer tok=new StringTokenizer(ex);
String line="";
while(tok.hasMoreTokens())line+=tok.nextToken();
tokens=line.toCharArray();

This (IMHO elegant) single line of code works (tested):
String[] tokens = input.split("(?<=[^ ])(?<!\\B) *");
This regex also caters for input containing multiple character numbers (eg 123) which would be split into separate characters but for the negative look-behind for a non-word boundary (?<!\\B).
The first look-behind (?<=[^ ]) prevents an initial blank string split at start if input, and assures spaces are consumed.
The final part of the regex " *" assures spaces are consumed.

Related

How to use split function when input is new line?

The question is we have to split the string and write how many words we have.
Scanner in = new Scanner(System.in);
String st = in.nextLine();
String[] tokens = st.split("[\\W]+");
When I gave the input as a new line and printed the no. of tokens .I have got the answer as one.But i want it as zero.What should i do? Here the delimiters are all the symbols.
Short answer: To get the tokens in str (determined by whitespace separators), you can do the following:
String str = ... //some string
str = str.trim() + " "; //modify the string for the reasons described below
String[] tokens = str.split("\\s+");
Longer answer:
First of all, the argument to split() is the delimiter - in this case one or more whitespace characters, which is "\\s+".
If you look carefully at the Javadoc of String#split(String, int) (which is what String#split(String) calls), you will see why it behaves like this.
If the expression does not match any part of the input then the resulting array has just one element, namely this string.
This is why "".split("\\s+") would return an array with one empty string [""], so you need to append the space to avoid this. " ".split("\\s+") returns an empty array with 0 elements, as you want.
When there is a positive-width match at the beginning of this string then an empty leading substring is included at the beginning of the resulting array.
This is why " a".split("\\s+") would return ["", "a"], so you need to trim() the string first to remove whitespace from the beginning.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded.
Since String#split(String) calls String#split(String, int) with the limit argument of zero, you can add whitespace to the end of the string without changing the number of words (because trailing empty strings will be discarded).
UPDATE:
If the delimiter is "\\W+", it's slightly different because you can't use trim() for that:
String str = ...
str = str.replaceAll("^\\W+", "") + " ";
String[] tokens = str.split("\\W+");
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
String line = null;
while (!(line = in.nextLine()).isEmpty()) {
//logic
}
System.out.print("Empty Line");
}
output
Empty Line

Replacing substrings in a string Java

I'm trying to replace multiple substrings in a string, for example I have the following string wordlist
one two three
Where I want to replace \t tab characters with \r\n new line characters.
I define the separator variable as \n and replacement variable as \r\n.
Then I use wordlist = wordlist.replaceAll(separator, replacement); to replace all the characters, but when I display the wordlist again, it gives me the following result
onerntwornthree
I also tried splitting the wordlist by the substring separator into an array and then joining it again word by word into a new string separated by the replacement, but then it just gave me a result as
one\r\ntwo\r\nthree
Does anybody know how to solve this problem? In case you need it, here's the whole code:
System.out.print("Separator to replace: ");
separator = scanner.next( );
System.out.print("Replacement for separator: ");
replacement = scanner.next( );
wordlist = wordlist.replaceAll(separator, replacement);
Your input character for tab seems to be incorrect.
This code gives
String wordlist="one two three";
wordlist = wordlist.replaceAll("\t", "\r\n");
System.out.println(wordlist);
This output-
one
two
three
What you want to do is probably to split the string and the write the different lines one at a time to a PrintStream. That way you can use println.
Java is a platform independent language, and new lines are platform dependent. Making use of PrintStream.println will make sure your code is portable.
Why do you set the separator to \n?, it should be \t I assume?
The following code works fine for jdoodle:
String s = "one\ttwo\tthree";
s = s.replaceAll("\t","\r\n");
System.out.println(s);
EDIT
The reason why this doesn't work is because you query the user for the separator and when he enters \t, this is a string with the first character \ and the second t and not an escape character.
You should use StringEscapeUtils.unescapeJava first.
Thus:
Scanner sc = new Scanner(System.in);
String separator = sc.nextLine();
separator = StringEscapeUtils.unescapeJava(separator);
String s = "one\ttwo\tthree";
s = s.replaceAll(separator,"\r\n");
System.out.println(s);
If org.apache.commons.lang.StringEscapeUtils is not available, you can do this explicitly:
Scanner sc = new Scanner(System.in);
String separator = sc.nextLine();
separator = separator.replaceAll("\\t","\t");
String s = "one\ttwo\tthree";
s = s.replaceAll(separator,"\r\n");
System.out.println(s);
demo

StringTokenizer delimiters for each Character

I've got a string that I'm supposed to use StringTokenizer on for a course. I've got my plan on how to implement the project, but I cannot find any reference as to how I will make the delimiter each character.
Basically, a String such as "Hippo Campus is a party place" I need to divide into tokens for each character and then compare them to a set of values and swap out a particular one with another. I know how to do everything else, but what the delimiter would be for separating each character?
If you really want to use StringTokenizer you could use like below
String myStr = "Hippo Campus is a party place".replaceAll("", " ");
StringTokenizer tokens = new StringTokenizer(myStr," ");
Or even you can use split for this. And your result will be String array with each character.
String myStr = "Hippo Campus is a party place";
String [] chars = myStr.split("");
for(String str:chars ){
System.out.println(str);
}
Convert the String to an array. There is no delimiter for separating every single character, and it wouldnt make sense to use string tokenizer to do that even if there was.
You can do something like:
char[] individualChars = someString.toCharArray;
Then iterate through that array like so:
for (char c : individualChars){
//do something with the chars.
}
You can do some thing like make the string in to a Char array.
char[] simpleArray = sampleString.toCharArray();
This will split the String to a set of characters. So you can do the operations which you have stated above.

Splitting strings based on a delimiter

I am trying to break apart a very simple collection of strings that come in the forms of
0|0
10|15
30|55
etc etc. Essentially numbers that are seperated by pipes.
When I use java's string split function with .split("|"). I get somewhat unpredictable results. white space in the first slot, sometimes the number itself isn't where I thought it should be.
Can anybody please help and give me advice on how I can use a reg exp to keep ONLY the integers?
I was asked to give the code trying to do the actual split. So allow me to do that in hopes to clarify further my problem :)
String temp = "0|0";
String splitString = temp.split("|");
results
\n
0
|
0
I am trying to get
0
0
only. Forever grateful for any help ahead of time :)
I still suggest to use split(), it skips null tokens by default. you want to get rid of non numeric characters in the string and only keep pipes and numbers, then you can easily use split() to get what you want. or you can pass multiple delimiters to split (in form of regex) and this should work:
String[] splited = yourString.split("[\\|\\s]+");
and the regex:
import java.util.regex.*;
Pattern pattern = Pattern.compile("\\d+(?=([\\|\\s\\r\\n]))");
Matcher matcher = pattern.matcher(yourString);
while (matcher.find()) {
System.out.println(matcher.group());
}
The pipe symbol is special in a regexp (it marks alternatives), you need to escape it. Depending on the java version you are using this could well explain your unpredictable results.
class t {
public static void main(String[]_)
{
String temp = "0|0";
String[] splitString = temp.split("\\|");
for (int i=0; i<splitString.length; i++)
System.out.println("splitString["+i+"] is " + splitString[i]);
}
}
outputs
splitString[0] is 0
splitString[1] is 0
Note that one backslash is the regexp escape character, but because a backslash is also the escape character in java source you need two of them to push the backslash into the regexp.
You can do replace white space for pipes and split it.
String test = "0|0 10|15 30|55";
test = test.replace(" ", "|");
String[] result = test.split("|");
Hope this helps for you..
You can use StringTokenizer.
String test = "0|0";
StringTokenizer st = new StringTokenizer(test);
int firstNumber = Integer.parseInt(st.nextToken()); //will parse out the first number
int secondNumber = Integer.parseInt(st.nextToken()); //will parse out the second number
Of course you can always nest this inside of a while loop if you have multiple strings.
Also, you need to import java.util.* for this to work.
The pipe ('|') is a special character in regular expressions. It needs to be "escaped" with a '\' character if you want to use it as a regular character, unfortunately '\' is a special character in Java so you need to do a kind of double escape maneuver e.g.
String temp = "0|0";
String[] splitStrings = temp.split("\\|");
The Guava library has a nice class Splitter which is a much more convenient alternative to String.split(). The advantages are that you can choose to split the string on specific characters (like '|'), or on specific strings, or with regexps, and you can choose what to do with the resulting parts (trim them, throw ayway empty parts etc.).
For example you can call
Iterable<String> parts = Spliter.on('|').trimResults().omitEmptyStrings().split("0|0")
This should work for you:
([0-9]+)
Considering a scenario where in we have read a line from csv or xls file in the form of string and need to separate the columns in array of string depending on delimiters.
Below is the code snippet to achieve this problem..
{ ...
....
String line = new BufferedReader(new FileReader("your file"));
String[] splittedString = StringSplitToArray(stringLine,"\"");
...
....
}
public static String[] StringSplitToArray(String stringToSplit, String delimiter)
{
StringBuffer token = new StringBuffer();
Vector tokens = new Vector();
char[] chars = stringToSplit.toCharArray();
for (int i=0; i 0) {
tokens.addElement(token.toString());
token.setLength(0);
i++;
}
} else {
token.append(chars[i]);
}
}
if (token.length() > 0) {
tokens.addElement(token.toString());
}
// convert the vector into an array
String[] preparedArray = new String[tokens.size()];
for (int i=0; i < preparedArray.length; i++) {
preparedArray[i] = (String)tokens.elementAt(i);
}
return preparedArray;
}
Above code snippet contains method call to StringSplitToArray where in the method converts the stringline into string array splitting the line depending on the delimiter specified or passed to the method. Delimiter can be comma separator(,) or double code(").
For more on this, follow this link : http://scrapillars.blogspot.in

Escape comma when using String.split

I'm trying to perform some super simple parsing o log files, so I'm using String.split method like this:
String [] parts = input.split(",");
And works great for input like:
a,b,c
Or
type=simple, output=Hello, repeat=true
Just to say something.
How can I escape the comma, so it doesn't match intermediate commas?
For instance, if I want to include a comma in one of the parts:
type=simple, output=Hello, world, repeate=true
I was thinking in something like:
type=simple, output=Hello\, world, repeate=true
But I don't know how to create the split to avoid matching the comma.
I've tried:
String [] parts = input.split("[^\,],");
But, well, is not working.
You can solve it using a negative look behind.
String[] parts = str.split("(?<!\\\\), ");
Basically it says, split on each ", " that is not preceeded by a backslash.
String str = "type=simple, output=Hello\\, world, repeate=true";
String[] parts = str.split("(?<!\\\\), ");
for (String s : parts)
System.out.println(s);
Output:
type=simple
output=Hello\, world
repeate=true
(ideone.com link)
If you happen to be stuck with the non-escaped comma-separated values, you could do the following (similar) hack:
String[] parts = str.split(", (?=\\w+=)");
Which says split on each ", " which is followed by some word-characters and an =
(ideone.com link)
I'm afraid, there's no perfect solution for String.split. Using a matcher for the three parts would work. In case the number of parts is not constant, I'd recommend a loop with matcher.find. Something like this maybe
final String s = "type=simple, output=Hello, world, repeat=true";
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,|$)");
final Matcher m = p.matcher(s);
while (m.find()) System.out.println(m.group(1));
You'll probably want to skip the spaces after the comma as well:
final Pattern p = Pattern.compile("((?:[^\\\\,]|\\\\.)*)(?:,\\s*|$)");
It's not really complicated, just note that you need four backslashes in order to match one.
Escaping works with the opposite of aioobe's answer (updated: aioobe now uses the same construct but I didn't know that when I wrote this), negative lookbehind
final String s = "type=simple, output=Hello\\, world, repeate=true";
final String[] tokens = s.split("(?<!\\\\),\\s*");
for(final String item : tokens){
System.out.println("'" + item.replace("\\,", ",") + "'");
}
Output:
'type=simple'
'output=Hello, world'
'repeate=true'
Reference:
Pattern: Special Constructs
I think
input.split("[^\\\\],");
should work. It will split at all commas that are not preceeded with a backslash.
BTW if you are working with Eclipse, I can recommend the QuickRex Plugin to test and debug Regexes.

Categories

Resources