StringTokenizer Not Viewing Tab ("\t") as Whitespace ("\\s+")

StringTokenizer Not Viewing Tab ("\t") as Whitespace ("\\s+") - java

Given a string in the form of:
String myStr = "5.1\t3.5\t1.4\t0.2\t0.0";
If I call:
StringTokenizer token = new StringTokenizer(myStr, "\\s+");
String firstElement = token.nextToken();
firstElement then equals the whole string. In contrast, if I call:
StringTokenizer token2 = new StringTokenizer(myStr);
String firstElement = token2.nextToken();
firstElement equals "5.1". Similarly if I use String split as below:
String[] splitArray = myStr.split("\\s+")
String firstElement = splitArray[0];
then, firstElement is "5.1".
I understand StringTokenizer is discouraged for use and is a classified as a "legacy class". My intent here is to understand why the same delimiter works differently between split and StringTokenizer. I would have expected the first example to work like the latter two, but for some reason, it is skipping the tabs. Any guidance on what I am missing would be much appreciated.
Note I am running 1.7.0_19 on OSX in Eclipse, but I would not expect those variables to have an effect here.

StringTokenizer doesn't use a regular expression as the delimiters. The parameter is a string containing a list of delimiter characters.
The constructor StringTokenizer(String) is same as StringTokenizer(String, "\t\n\f\r") hence it works for your string.

StringTokenizer: it uses delimiters as string which may contain list of delimiter characters not as regex
Split: it uses delimiters as regex

Related

Deleting content of every string after first empty space

How can I delete everything after first empty space in a string which user selects? I was reading this how to remove some words from a string in java. Can this help me in my case?

You can use replaceAll with a regex \s.* which match every thing after space:
String str = "Hello java word!";
str = str.replaceAll("\\s.*", "");
output
Hello
regex demo
Like #Coffeehouse Coder mention in comment, This solution will replace every thing if the input start with space, so if you want to avoid this case, you can trim your input using string.trim() so it can remove the spaces in start and in end.

Assuming that there is no space in the beginning of the string.
Follow these steps-
Split the string at space. It will create an array.
Get the first element of that array.
Hope this helps.
str = "Example string"
String[] _arr = str.split("\\s");
String word = _arr[0];
You need to consider multiple white spaces and space in the beginning before considering the above code.
I am not native to JAVA Programming but have an idea that it has split function for string.
And the reference you cited in the question is bit complex, while you can achieve the desired thing very easily.
P.S. In future if you make a mind to get two words or three, splitting method is better (assuming you have already dealt with multiple white-spaces) else substring is better.

A simple way to do it can be:
System.out.println("Hello world!".split(" ")[0]);

// Taking 'str' as your string
// To remove the first space(s) of the string,
str = str.trim();
int index = str.indexOf(" ");
String word = str.substring(0, index);
This is just one method of many.
str = str.replaceAll("\\s+", " "); // This replaces one or more spaces with one space
String[] words = str.split("\\s");
String first = words[0];

The simplest solution in my opinion would be to just locate the index which the user wants it to be cut off at and then call the substring() method from 0 to the index they wanted. Set that = to a new string and you have the string they want.
If you want to replace the string then just set the original string = to the result of the substring() method.
Link to substring() method: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#substring(int,%20int)

There are already 5 perfectly good answers, so let me add a sixth one. Variety is the spice of life!
private static final Pattern FIRST_WORD = Pattern.compile("\\S+");
public static String firstWord(CharSequence text) {
Matcher m = FIRST_WORD.matcher(text);
return m.find() ? m.group() : "";
}
Advantages over the .split(...)[0]-type answers:
It directly does exactly what is being asked, i.e. "Find the first sequence of non-space characters." So the self-documentation is more explicit.
It is more efficient when called on multiple strings (e.g. for batch processing a large list of strings) because the regular expression is compiled only once.
It is more space-efficient because it avoids unnecessarily creating a whole array with references to each word when we only need the first.
It works without having to trim the string.
(I know this is probably too late to be of any use to the OP but I'm leaving it here as an alternative solution for future readers.)

This would be more efficient
String str = "Hello world!";
int spaceInd = str.indexOf(' ');
if(spaceInd != -1) {
str = str.substring(0, spaceInd);
}
System.out.println(String.format("[%s]", str));

Replace multiple characters in a string in Java

I have some strings with equations in the following format ((a+b)/(c+(d*e))).
I also have a text file that contains the names of each variable, e.g.:
a velocity
b distance
c time
etc...
What would be the best way for me to write code so that it plugs in velocity everywhere a occurs, and distance for b, and so on?

Don't use String#replaceAll in this case if there is slight chance part you will replace your string contains substring that you will want to replace later, like "distance" contains a and if you will want to replace a later with "velocity" you will end up with "disvelocityance".
It can be same problem as if you would like to replace A with B and B with A. For this kind of text manipulation you can use appendReplacement and appendTail from Matcher class. Here is example
String input = "((a+b)/(c+(d*e)))";
Map<String, String> replacementsMap = new HashMap<>();
replacementsMap.put("a", "velocity");
replacementsMap.put("b", "distance");
replacementsMap.put("c", "time");
StringBuffer sb = new StringBuffer();
Pattern p = Pattern.compile("\\b(a|b|c)\\b");
Matcher m = p.matcher(input);
while (m.find())
m.appendReplacement(sb, replacementsMap.get(m.group()));
m.appendTail(sb);
System.out.println(sb);
Output:
((velocity+distance)/(time+(d*e)))
This code will try to find each occurrence of a or b or c which isn't part of some word (it doesn't have any character before or after it - done with help of \b which represents word boundaries). appendReplacement is method which will append to StringBuffer text from last match (or from beginning if it is first match) but will replace found match with new word (I get replacement from Map). appendTail will put to StringBuilder text after last match.
Also to make this code more dynamic, regex should be generated automatically based on keys used in Map. You can use this code to do it
StringBuilder regexBuilder = new StringBuilder("\\b(");
for (String word:replacementsMap.keySet())
regexBuilder.append(Pattern.quote(word)).append('|');
regexBuilder.deleteCharAt(regexBuilder.length()-1);//lets remove last "|"
regexBuilder.append(")\\b");
String regex = regexBuilder.toString();

I'd make a hashMap mapping the variable names to the descriptions, then iterate through all the characters in the string and replace each occurrance of a recognised key with it's mapping.
I would use a StringBuilder to build up the new string.

Using a hashmap and iterating over the string as A Boschman suggested is one good solution.
Another solution would be to do what others have suggested and do a .replaceAll(); however, you would want to use a regular expression to specify that only the words matching the whole variable name and not a substring are replaced. A regex using word boundary '\b' matching will provide this solution.
String variable = "a";
String newVariable = "velocity";
str.replaceAll("\\b" + variable + "\\b", newVariable);
See http://docs.oracle.com/javase/tutorial/essential/regex/bounds.html

For string str, use the replaceAll() function:
str = str.toUpperCase(); //Prevent substitutions of characters in the middle of a word
str = str.replaceAll("A", "velocity");
str = str.replaceAll("B", "distance");
//etc.

use split() to split string like "004*034556"

In my project I used the code to split string like "004*034556" , code is like below :
String string = "004*034556";
String[] parts = string.split("*");
but it got some error and force closed !!
finally I found that if use "#" or another things its gonna work .
String string = "004#034556";
String[] parts = string.split("#");
how can I explain this ?!

Your forgetting something very trivial.
String string = "004*034556";
String[] parts = string.split("\\*");
I recommend you check out Escape Characters.

Use Pattern.quote to treat the * like the String * and not the Regex * (that have a special meaning):
String[] parts = string.split(Pattern.quote("*"));
See String#split:
public String[] split(String regex)
↑

Refer JavaDoc
String[] split(String regex)
Splits this string around matches of the given regular expression.
And the symbol "*" has a different meaning when we talk about Regex in Java
Thus you would have to use an escape character
String[] parts = string.split("\\*");

Convert a string to an array of strings

If I have:
Scanner input = new Scanner(System.in);
System.out.println("Enter an infixed expression:");
String expression = input.nextLine();
String[] tokens;
How do I scan the infix expression around spaces one token at a time, from left to right and put in into an array of strings? Here a token is defined as an operand, operator, or parentheses symbol.
Example: "3 + (9-2)" ==> tokens = [3][+][(][9][-][2][)]

String test = "13 + (9-2)";
List<String> allMatches = new ArrayList<String>();
Matcher m = Pattern.compile("\\d+|\\(|\\)|\\+|\\*|-|/")
.matcher(test);
while (m.find()) {
allMatches.add(m.group());
}
Can someone test this please?

I think it would be easiest to read the line into one string, and then split based on space. There is a handy string function split that does this for you.
String[] tokens = input.split("");

It's probably overkill for your example, but in case it gets more complex, take a look at JavaCC, the Java Compiler Compiler. JavaCC allows you to create a parser in Java based on a grammar definition.
Be aware that it is not an easy tool to get started with. However, the grammar definition will be much easier to read than the corresponding regular expressions.

if tokens[] must be String you can use this
String ex="3 + (9-2)";
String tokens[];
StringTokenizer tok=new StringTokenizer(ex);
String line="";
while(tok.hasMoreTokens())line+=tok.nextToken();
tokens=new String[line.length()];
for(int i=1;i<line.length()+1;i++)tokens[i-1]=line.substring(i-1,i);
tokens can be a charArray so:
String ex="3 + (9-2)";
char tokens[];
StringTokenizer tok=new StringTokenizer(ex);
String line="";
while(tok.hasMoreTokens())line+=tok.nextToken();
tokens=line.toCharArray();

This (IMHO elegant) single line of code works (tested):
String[] tokens = input.split("(?<=[^ ])(?<!\\B) *");
This regex also caters for input containing multiple character numbers (eg 123) which would be split into separate characters but for the negative look-behind for a non-word boundary (?<!\\B).
The first look-behind (?<=[^ ]) prevents an initial blank string split at start if input, and assures spaces are consumed.
The final part of the regex " *" assures spaces are consumed.

StringTokenizer delimiters for each Character

I've got a string that I'm supposed to use StringTokenizer on for a course. I've got my plan on how to implement the project, but I cannot find any reference as to how I will make the delimiter each character.
Basically, a String such as "Hippo Campus is a party place" I need to divide into tokens for each character and then compare them to a set of values and swap out a particular one with another. I know how to do everything else, but what the delimiter would be for separating each character?

If you really want to use StringTokenizer you could use like below
String myStr = "Hippo Campus is a party place".replaceAll("", " ");
StringTokenizer tokens = new StringTokenizer(myStr," ");
Or even you can use split for this. And your result will be String array with each character.
String myStr = "Hippo Campus is a party place";
String [] chars = myStr.split("");
for(String str:chars ){
System.out.println(str);
}

Convert the String to an array. There is no delimiter for separating every single character, and it wouldnt make sense to use string tokenizer to do that even if there was.
You can do something like:
char[] individualChars = someString.toCharArray;
Then iterate through that array like so:
for (char c : individualChars){
//do something with the chars.
}

You can do some thing like make the string in to a Char array.
char[] simpleArray = sampleString.toCharArray();
This will split the String to a set of characters. So you can do the operations which you have stated above.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

StringTokenizer Not Viewing Tab ("\t") as Whitespace ("\\s+") - java

StringTokenizer doesn't use a regular expression as the delimiters. The parameter is a string containing a list of delimiter characters. The constructor StringTokenizer(String) is same as StringTokenizer(String, "\t\n\f\r") hence it works for your string.

StringTokenizer: it uses delimiters as string which may contain list of delimiter characters not as regex Split: it uses delimiters as regex

Related

Deleting content of every string after first empty space

Replace multiple characters in a string in Java

use split() to split string like "004*034556"

Convert a string to an array of strings

StringTokenizer delimiters for each Character

Categories

Resources