string tokenizer in Java

string tokenizer in Java - java

I have a text file which contains data seperated by '|'. I need to get each field(seperated by '|') and process it. The text file can be shown as below :
ABC|DEF||FGHT
I am using string tokenizer(JDK 1.4) for getting each field value. Now the problem is, I should get an empty string after DEF.However, I am not getting the empty space between DEF & FGHT.
My result should be - ABC,DEF,"",FGHT but I am getting ABC,DEF,FGHT

From StringTokenizer documentation :
StringTokenizer is a legacy class that
is retained for compatibility reasons
although its use is discouraged in new
code. It is recommended that anyone
seeking this functionality use the
split method of String or the
java.util.regex package instead.
The following code should work :
String s = "ABC|DEF||FGHT";
String[] r = s.split("\\|");

Use the returnDelims flag and check two subsequent occurrences of the delimiter:
String str = "ABC|DEF||FGHT";
String delim = "|";
StringTokenizer tok = new StringTokenizer(str, delim, true);
boolean expectDelim = false;
while (tok.hasMoreTokens()) {
String token = tok.nextToken();
if (delim.equals(token)) {
if (expectDelim) {
expectDelim = false;
continue;
} else {
// unexpected delim means empty token
token = null;
}
}
System.out.println(token);
expectDelim = true;
}
this prints
ABC
DEF
null
FGHT
The API isn't pretty and therefore considered legacy (i.e. "almost obsolete"). Use it only with where pattern matching is too expensive (which should only be the case for extremely long strings) or where an API expects an Enumeration.
In case you switch to String.split(String), make sure to quote the delimiter. Either manually ("\\|") or automatically using string.split(Pattern.quote(delim));

StringTokenizer ignores empty elements. Consider using String.split, which is also available in 1.4.
From the javadocs:
StringTokenizer is a legacy class that
is retained for compatibility reasons
although its use is discouraged in new
code. It is recommended that anyone
seeking this functionality use the
split method of String or the
java.util.regex package instead.

you can use the constructor that takes an extra 'returnDelims' boolean, and pass true to it.
this way you will receive the delimiters, which will allow you to detect this condition.
alternatively you can just implement your own string tokenizer that does what you need, it's not that hard.

Here is another way to solve this problem
String str = "ABC|DEF||FGHT";
StringTokenizer s = new StringTokenizer(str,"|",true);
String currentToken="",previousToken="";
while(s.hasMoreTokens())
{
//Get the current token from the tokenize strings
currentToken = s.nextToken();
//Check for the empty token in between ||
if(currentToken.equals("|") && previousToken.equals("|"))
{
//We denote the empty token so we print null on the screen
System.out.println("null");
}
else
{
//We only print the tokens except delimiters
if(!currentToken.equals("|"))
System.out.println(currentToken);
}
previousToken = currentToken;
}

Here is a way to split a string into tokens (a token is one or more letters)
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
String s = scan.nextLine();
s = s.replaceAll("[^A-Za-z]", " ");
StringTokenizer arr = new StringTokenizer(s, " ");
int n = arr.countTokens();
System.out.println(n);
while(arr.hasMoreTokens()){
System.out.println(arr.nextToken());
}
scan.close();
}

package com.java.String;
import java.util.StringTokenizer;
public class StringWordReverse {
public static void main(String[] kam) {
String s;
String sReversed = "";
System.out.println("Enter a string to reverse");
s = "THIS IS ASHIK SKLAB";
StringTokenizer st = new StringTokenizer(s);
while (st.hasMoreTokens()) {
sReversed = st.nextToken() + " " + sReversed;
}
System.out.println("Original string is : " + s);
System.out.println("Reversed string is : " + sReversed);
}
}
Output:
Enter a string to reverse
Original string is : THIS IS ASHIK SKLAB
Reversed string is : SKLAB ASHIK IS THIS

Related

Splitting string on spaces unless in double quotes but double quotes can have a preceding string attached

I need to split a string in Java (first remove whitespaces between quotes and then split at whitespaces.)
"abc test=\"x y z\" magic=\" hello \" hola"
becomes:
firstly:
"abc test=\"xyz\" magic=\"hello\" hola"
and then:
abc
test="xyz"
magic="hello"
hola
Scenario :
I am getting a string something like above from input and I want to break it into parts as above. One way to approach was first remove the spaces between quotes and then split at spaces. Also string before quotes complicates it. Second one was split at spaces but not if inside quote and then remove spaces from individual split. I tried capturing quotes with "\"([^\"]+)\"" but I'm not able to capture just the spaces inside quotes. I tried some more but no luck.

We can do this using a formal pattern matcher. The secret sauce of the answer below is to use the not-much-used Matcher#appendReplacement method. We pause at each match, and then append a custom replacement of anything appearing inside two pairs of quotes. The custom method removeSpaces() strips all whitespace from each quoted term.
public static String removeSpaces(String input) {
return input.replaceAll("\\s+", "");
}
String input = "abc test=\"x y z\" magic=\" hello \" hola";
Pattern p = Pattern.compile("\"(.*?)\"");
Matcher m = p.matcher(input);
StringBuffer sb = new StringBuffer("");
while (m.find()) {
m.appendReplacement(sb, "\"" + removeSpaces(m.group(1)) + "\"");
}
m.appendTail(sb);
String[] parts = sb.toString().split("\\s+");
for (String part : parts) {
System.out.println(part);
}
abc
test="xyz"
magic="hello"
hola
Demo
The big caveat here, as the above comments hinted at, is that we are really using a regex engine as a rudimentary parser. To see where my solution would fail fast, just remove one of the quotes by accident from a quoted term. But, if you are sure you input is well formed as you have showed us, this answer might work for you.

I wanted to mention the java 9's Matcher.replaceAll lambda extension:
// Find quoted strings and remove there whitespace:
s = Pattern.compile("\"[^\"]*\"").matcher(s)
.replaceAll(mr -> mr.group().replaceAll("\\s", ""));
// Turn the remaining whitespace in a comma and brace all.
s = '{' + s.trim().replaceAll("\\s+", ", ") + '}';

Probably the other answer is better but still I have written it so I will post it here ;) It takes a different approach
public static void main(String[] args) {
String test="abc test=\"x y z\" magic=\" hello \" hola";
Pattern pattern = Pattern.compile("([^\\\"]+=\\\"[^\\\"]+\\\" )");
Matcher matcher = pattern.matcher(test);
int lastIndex=0;
while(matcher.find()) {
String[] parts=matcher.group(0).trim().split("=");
boolean newLine=false;
for (String string : parts[0].split("\\s+")) {
if(newLine)
System.out.println();
newLine=true;
System.out.print(string);
}
System.out.println("="+parts[1].replaceAll("\\s",""));
lastIndex=matcher.end();
}
System.out.println(test.substring(lastIndex).trim());
}
Result is
abc
test="xyz"
magic="hello"
hola

It sounds like you want to write a basic parser/Tokenizer. My bet is that after you make something that can deal with pretty printing in this structure, you will soon want to start validating that there arn't any mis-matching "'s.
But in essence, you have a few stages for this particular problem, and Java has a built in tokenizer that can prove useful.
import java.util.LinkedList;
import java.util.List;
import java.util.StringTokenizer;
import java.util.stream.Collectors;
public class Q50151376{
private static class Whitespace{
Whitespace(){ }
#Override
public String toString() {
return "\n";
}
}
private static class QuotedString {
public final String string;
QuotedString(String string) {
this.string = "\"" + string.trim() + "\"";
}
#Override
public String toString() {
return string;
}
}
public static void main(String[] args) {
String test = "abc test=\"x y z\" magic=\" hello \" hola";
StringTokenizer tokenizer = new StringTokenizer(test, "\"");
boolean inQuotes = false;
List<Object> out = new LinkedList<>();
while (tokenizer.hasMoreTokens()) {
final String token = tokenizer.nextToken();
if (inQuotes) {
out.add(new QuotedString(token));
} else {
out.addAll(TokenizeWhitespace(token));
}
inQuotes = !inQuotes;
}
System.out.println(joinAsStrings(out));
}
private static String joinAsStrings(List<Object> out) {
return out.stream()
.map(Object::toString)
.collect(Collectors.joining());
}
public static List<Object> TokenizeWhitespace(String in){
List<Object> out = new LinkedList<>();
StringTokenizer tokenizer = new StringTokenizer(in, " ", true);
boolean ignoreWhitespace = false;
while (tokenizer.hasMoreTokens()){
String token = tokenizer.nextToken();
boolean whitespace = token.equals(" ");
if(!whitespace){
out.add(token);
ignoreWhitespace = false;
} else if(!ignoreWhitespace) {
out.add(new Whitespace());
ignoreWhitespace = true;
}
}
return out;
}
}

How do you rebuild a string using StringBuilder?

Im trying to rebuild a string using StringBuilder. I'm a little unsure of which method to use to get the "'," inserted back into the same place. In the code below I'm using the
"insert(int dstOffset, CharSequence s, int start, int end)" method. My code doesn't contain any errors however it doesn't run properly.
Please note I will also be escaping characters (i.e., =) in the string but I havent written that part of the code yet. Currently I'm trying to learn how to split the string and then rebuild it.
Thanks
public class StringTestProgram
{
public static void main(String[] args)
{
String relativeDN = "cn=abc,dn=xyz,ou=abc/def";
String[] stringData = relativeDN.split(",");
for (String stringoutput : stringData)
{
System.out.print(stringoutput);
StringBuilder sb = new StringBuilder(stringoutput);
CharSequence charAdded = ",";
sb.insert(6,charAdded,0,12);
System.out.print(sb.toString());
}
}
}
Revised code
public class StringTestProgram {
public static void main(String[] args) {
String relativeDN = "cn=abc,dn=xyz,ou=abc/def";
System.out.println(relativeDN);
//Split String
String[] stringData = relativeDN.split(",");
{
StringBuilder sb = new StringBuilder();
CharSequence charAdded = ",";
// loop thru each element of the array
for (int place = 0; place < stringData.length; place++) {
System.out.println(stringData[place]);
{
int eq = relativeDN.indexOf('=');
String sub = relativeDN.substring(0, eq);
System.out.println(sub);
}
// append element to the StringBuilder
sb.append(stringData[place]);
// avoids adding an extra ',' at the end
if (place < stringData.length - 1)
// if not at the last element, add the ',' character
sb.append(charAdded);
}
System.out.print(sb.toString());
}
}
}
Im new to stackoverflow and I'm not sure if its ok to ask this question in this thread or if I should create a seperate thread for this question. If possible please advise.
The code above now splits the string at the "," character. It also rebuilds the
string back to its original state. I would also like to use the indexof and .substring
methods to get the string value after the "=" sign. Currently my program only outputs
the first two characters of the initial string value before the "=" sign. Not sure where
in my code I'm making an error. Any help would be appreciated.
My Current Output
cn=abc,dn=xyz,ou=abc/def
cn=abc
cn
dn=xyz
cn
ou=abc/def
cn
cn=abc,dn=xyz,ou=abc/def
Desired Output
cn=abc,dn=xyz,ou=abc/def
cn=abc
abc
dn=xyz
xyz
ou=abc/def
abc/def
cn=abc,dn=xyz,ou=abc/def

The easiest way to do this pre Java 8 is to use 1 StringBuilder for all the elements and add Strings to the builder by using the append() method
StringBuilder builder = new StringBuilder();
for (String stringoutput : stringData) {
builder.append(stringoutput).append(',');
}
//have an extra trailing comma so remove it
//use length -1 as end coord because it's exclusive
String result = builder.substring(0, builder.length() -1);
If you are using Java 8 you can use the new Stream API and Collectors.joining()
String result = Arrays.stream(relativeDN.split(","))
.collect(Collectors.joining(","));

You're initializing sb every time you enter the loop, meaning that you're disposing of your StringBuilder every time you enter the loop and recreate it with only the next subtring.
Fixed:
String relativeDN = "cn=abc,dn=xyz,ou=abc/def";
String[] stringData = relativeDN.split(",");
StringBuilder sb = new StringBuilder();
CharSequence charAdded = ",";
for (String stringoutput : stringData) {
System.out.print(stringoutput);
sb.append(stringoutput).append(charAdded);
}
sb.setLength(sb.length() - 1);
System.out.print(sb.toString());

Try out this code
public class StringTestProgram {
public static void main(String[] args) {
String relativeDN = "cn=abc,dn=xyz,ou=abc/def";
String[] stringData = relativeDN.split(",");
StringBuilder sb = new StringBuilder();
CharSequence charAdded = ",";
for (int i = 0; i < stringData .length; i++) { //walk over each element of the array
System.out.println(stringData[i]);
sb.append(stringData[i]); // append element to the StringBuilder
if (i < stringData.length - 1) //avoids adding an extra ',' at the end
sb.append(charAdded); // if not at the last element, add the ',' character
}
System.out.print(sb.toString());
}
}
Here you will reconstruct the original string exactly as it was (i.e. without adding a trailing ','):
cn=abc,dn=xyz,ou=abc/def
UPDATE: In the for loop I just walk over every element of the array that stores the splitted String and append the elements to the StringBuilder instance one by one. After appending each element I check if we are currently at the last element of the array. If not, I append the ',' character.

Like this:
for (String stringoutput : stringData)
sb.append(stringoutput).append(',');
Fixed: Using this approach, you would have to remove the last ,
String result = sb.toString().substring(0,sb.toString().length()-1);
System.out.println(result);

I noticed in the other answers that there would be an extra comma at the end. You have to use a prefix variable and then change it in the loop so that there won't be an extra comma.
String relativeDN = "cn=abc,dn=xyz,ou=abc/def";
String[] stringData = relativeDN.split(",");
StringBuilder sb = new StringBuilder();
String prefix = "";
for (String element : stringData) {
sb.append(prefix);
prefix=",";
sb.append(element);
}
String output = sb.toString();
Inside the loop the prefix is appended, but on the first time through the loop the prefix is set to empty quotes so that there won't be a comma before the first element. Next prefix is changed to a comma so that in the next turn through the loop a comma will be added after the first element. Lastly, the element is added. This results in the correct output because the comma is added before the element, but only after the first iteration.

Checking whether the String contains multiple words

I am getting the names as String. How can I display in the following format: If it's single word, I need to display the first character alone. If it's two words, I need to display the first two characters of the word.
John : J
Peter: P
Mathew Rails : MR
Sergy Bein : SB
I cannot use an enum as I am not sure that the list would return the same values all the time. Though they said, it's never going to change.
String name = myString.split('');
topTitle = name[0].subString(0,1);
subTitle = name[1].subString(0,1);
String finalName = topTitle + finalName;
The above code fine, but its not working. I am not getting any exception either.

There are few mistakes in your attempted code.
String#split takes a String as regex.
Return value of String#split is an array of String.
so it should be:
String[] name = myString.split(" ");
or
String[] name = myString.split("\\s+);
You also need to check for # of elements in array first like this to avoid exception:
String topTitle, subTitle;
if (name.length == 2) {
topTitle = name[0].subString(0,1);
subTitle = name[1].subString(0,1);
}
else
topTitle = name.subString(0,1);

The String.split method split a string into an array of strings, based on your regular expression.
This should work:
String[] names = myString.split("\\s+");
String topTitle = names[0].subString(0,1);
String subTitle = names[1].subString(0,1);
String finalName = topTitle + finalName;

First: "name" should be an array.
String[] names = myString.split(" ");
Second: You should use an if function and the length variable to determine the length of a variable.
String initial = "";
if(names.length > 1){
initial = names[0].subString(0,1) + names[1].subString(0,1);
}else{
initial = names[0].subString(0,1);
}
Alternatively you could use a for loop
String initial = "";
for(int i = 0; i < names.length; i++){
initial += names[i].subString(0,1);
}

You were close..
String[] name = myString.split(" ");
String finalName = name[0].charAt(0)+""+(name.length==1?"":name[1].charAt(0));
(name.length==1?"":name[1].charAt(0)) is a ternary operator which would return empty string if length of name array is 1 else it would return 1st character

This will work for you
public static void getString(String str) throws IOException {
String[] strr=str.split(" ");
StringBuilder sb=new StringBuilder();
for(int i=0;i<strr.length;i++){
sb.append(strr[i].charAt(0));
}
System.out.println(sb);
}

How to remove matched words from end of String

I want to remove the following words from end of String ‘PTE’, ‘LTD’, ‘PRIVATE’ and ‘LIMITED’
i tried the code but then i stuck. i tried this
String[] str = {"PTE", "LTD", "PRIVATE", "LIMITED"};
String company = "Basit LTD";
for(int i=0;i<str.length;i++) {
if (company.endsWith(str[i])) {
int position = company.lastIndexOf(str[i]);
company = company.substring(0, position);
}
}
System.out.println(company.replaceAll("\\s",""));
It worked. But suppose the company is Basit LIMITED PRIVATE LTD PTE or Basit LIMITED PRIVATE PTE LTD or any combination of four words in the end. Then the above code just remove the last name i.e., PTE or PRIVATE and so on, and the output is BasitLIMITEDPRIVATELTD.
I want output to be just Basit
How can i do it?
Thanks
---------------Edit---
Please note here the company name is just an example, it is not necessary that it is always the same. may be i have name like
String company = "Masood LIMITED LTD PTE PRIVATE"
or any name that can have the above mentioned words at the end.
Thanks

You can do this in single line. no need to loop through. just use String#replaceAll(regex, str).
company = company.replaceAll("PTE$*?|LTD$*?|PRIVATE$*?|LIMITED$*?","");

If you place the unwanted words in the map it will be ommitted in the resultant string
HashMap map = new HashMap();
map.put("PTE", "");
map.put("LTD", "");
map.put("PRIVATE", "");
map.put("LIMITED", "");
String company = "Basit LTD PRIVATE PTE";
String words[] = company.split(" ");
String resultantStr = "";
for(int k = 0; k < words.length; k++){
if(map.get(words[k]) == null) {
resultantStr += words[k] + " ";
}
}
resultantStr = resultantStr.trim();
System.out.println(" Trimmed String: "+ resultantStr);

If you want to remove these suffixes only at the end of the string, then you could introduce a while loop:
String[] str = {"PTE", "LTD", "PRIVATE", "LIMITED"};
boolean foundSuffix = true;
String company = "Basit LTD";
while (foundSuffix) {
foundSuffix = false;
for(int i=0;i<str.length;i++) {
if (company.endsWith(str[i])) {
foundSuffix = true;
int position = company.lastIndexOf(str[i]);
company = company.substring(0, position);
}
}
}
System.out.println(company.replaceAll("\\s",""));
If you don't mind transforming PTE Basit LIMITED INC to Basit (and also remove the first PTE), then replaceAll should work, as explained by others.

I was trying to do exactly same thing for one of my projects. I wrote this code few days earlier. Now I was exactly trying to find a much better way to do it, that's how I found this Question. But after seeing other answers I decided to share my version of the code.
Collection<String> stopWordSet = Arrays.asList("PTE", "LTD", "PRIVATE", "LIMITED");
String company = "Basit LTD"; //Or Anything
String[] tokens = company.split("[\#\]\\\_\^\[\"\#\ \!\&\'\`\$\%\*\+\(\)\.\/\,\-\;\~\:\}\|\{\?\>\=\<]+");
Stack<String> tokenStack = new Stack<>();
tokenStack.addAll(Arrays.asList(tokens));
while (!tokenStack.isEmpty()) {
String token = tokenStack.peek();
if (stopWordSet.contains(token))
tokenStack.pop();
else
break;
}
String formattedCompanyName = StringUtils.join(tokenStack.toArray());

Try this :
public static void main(String a[]) {
String[] str = {"PTE", "LTD", "PRIVATE", "LIMITED"};
String company = "Basit LIMITED PRIVATE LTD PTE";
for(int i=0;i<str.length;i++) {
company = company.replaceAll(str[i], "");
}
System.out.println(company.replaceAll("\\s",""));
}

All you need is to use trim() and call your function recursively, Or each time you remove a sub string from the end, reset your i to 0.

public class StringMatchRemove {
public static void main(String[] args) {
String str="my name is noorus khan";
String search="noorus";
String newString="";
String word=str.replace(search," ");
StringTokenizer st = new StringTokenizer(word," ");
while(st.hasMoreTokens())
{
newString = newString + st.nextToken() + " ";
}
System.out.println(newString);
}
first using the replace method we get word=my name is ..... khan (Note: here(.) represents the space). Now we should have to remove these spaces for that we are creating a new string adding all the token simply.
Output: my name is khan

how to process string in java

I want to make strings like "a b c" to "prefix_a prefix_b prefix_c"
how to do that in java?

You can use the String method: replaceAll(String regex,String replacement)
String s = "a xyz c";
s = s.replaceAll("(\\w+)", "prefix_$1");
System.out.println(s);
You may need to tweek the regexp to meet your exact requirements.

Assuming a split character of a space (" "), the String can be split using the split method, then each new String can have the prefix_ appended, then concatenated back to a String:
String[] tokens = "a b c".split(" ");
String result = "";
for (String token : tokens) {
result += ("prefix_" + token + " ");
}
System.out.println(result);
Output:
prefix_a prefix_b prefix_c
Using a StringBuilder would improve performance if necessary:
String[] tokens = "a b c".split(" ");
StringBuilder result = new StringBuilder();
for (String token : tokens) {
result.append("prefix_");
result.append(token);
result.append(" ");
}
result.deleteCharAt(result.length() - 1);
System.out.println(result.toString());
The only catch with the first sample is that there will be an extraneous space at the end of the last token.

hope I'm not mis-reading the question. Are you just looking for straight up concatenation?
String someString = "a";
String yourPrefix = "prefix_"; // or whatever
String result = yourPrefix + someString;
System.out.println(result);
would show you
prefix_a

You can use StringTokenizer to enumerate over your string, with a "space" delimiter, and in your loop you can add your prefix onto the current element in your enumeration. Bottom line: See StringTokenizer in the javadocs.
You could also do it with regex and a word boundary ("\b"), but this seems brittle.
Another possibility is using String.split to convert your string into an array of strings, and then loop over your array of "a", "b", and "c" and prefix your array elements with the prefix of your choice.

You can split a string using regular expressions and put it back together with a loop over the resulting array:
public class Test {
public static void main (String args[]) {
String s = "a b c";
String[] s2 = s.split("\\s+");
String s3 = "";
if (s2.length > 0)
s3 = "pattern_" + s2[0];
for (int i = 1; i < s2.length; i++) {
s3 = s3 + " pattern_" + s2[i];
}
System.out.println (s3);
}
}

This is C# but should easily translate to Java (but it's not a very smart solution).
String input = "a b c";
String output (" " + input).Replace(" ", "prefix_")
UPDATE
The first solution has no spaces in the output. This solution requires a place holder symbol (#) not occuring in the input.
String output = ("#" + input.Replace(" ", " #")).Replace("#", "prefix_");
It's probably more efficient to use a StringBuilder.
String input = "a b c";
String[] items = input.Split(new[] {' '}, StringSplitOptions.RemoveEmptyEntries);
StringBuilder sb = new StringBuilder();
foreach (String item in items)
{
sb.Append("prefix_");
sb.Append(item);
sb.Append(" ");
}
sb.Length--;
String output = sb.ToString();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

string tokenizer in Java - java

you can use the constructor that takes an extra 'returnDelims' boolean, and pass true to it. this way you will receive the delimiters, which will allow you to detect this condition. alternatively you can just implement your own string tokenizer that does what you need, it's not that hard.

Related

Splitting string on spaces unless in double quotes but double quotes can have a preceding string attached

How do you rebuild a string using StringBuilder?

Checking whether the String contains multiple words

How to remove matched words from end of String

how to process string in java

Categories

Resources