Split words and capitalize first character - java

I'm getting user input that I need to format. I need to remove all leading/trailing spaces and I need to capitalize the first letter of each word.
Here is what I'm trying, however... if you input something with 2 spaces between words, it crashes. How might I solve this?
String formattedInput = "";
String inputLineArray[] = inputLine.getText().toString().trim().split("\\s");
for (int d=0; d<inputLineArray.length; d++) {
formattedInput = formattedInput.trim() + " " +
inputLineArray[d].trim().substring(0,1).toUpperCase() +
inputLineArray[d].trim().substring(1).toLowerCase();
}

Your code is blowing up on multiple spaces because when you split you're getting a member in your array that is an empty string "hello there" when split becomes array[0] = "hello", array[1] = "", array[2] = "there".
So when you do substring(0,1) you should get an IndexOutOfBoundsException.
Try changing your split("\\s") to split("\\s+") this way your multiple spaces get picked up in the regex and thrown out.
Edit:
This also will let you get rid of the .trim() inside your loop since all of the spaces will be taken care of by the split.

tokenize the string and split by space " " and then take each iteration and capitalize it and then put it back together again
read all about string tokenization here

Related

Extracting signs/symbols from a string in Java

I have a string with signs and i want to get the signs only and put them in a string array, here is what I've done:
String str = "155+40-5+6";
// replace the numbers with spaces, leaving the signs and spaces
String signString = str.replaceAll("[0-9]", " ");
// then get an array that contains the signs without spaces
String[] signsArray = stringSigns.trim().split(" ");
However the the 2nd element of the signsArray is a space, [+ , , -, +]
Thank you for your time.
You could do this a couple of ways. Either replace multiple adjacent digits with a single space:
// replace the numbers with spaces, leaving the signs and spaces
String signString = str.replaceAll("[0-9]+", " ");
Or alternatively in the last step, split on multiple spaces:
// then get an array that contains the signs without spaces
String[] signsArray = signString.trim().split(" +");
Just replace " " to "" in your code
String str = "155+40-5+6";
// replace the numbers with spaces, leaving the signs and spaces
String signString = str.replaceAll("[0-9]","");
// then get an array that contains the signs without spaces
String[] signsArray = stringSigns.split("");
This should work for you. Cheers
Image of running code

Why does System.out.println print a new line while System.out.print prints nothing?

I am having trouble with split function. I do not know how it works.
Here is my source code:
// Using println to print out the result
String str = " Welcome to Java Tutorial ";
str = str.trim();
String[] arr = str.split(" ");
for (String c : arr) {
System.out.println(c);
}
//Using print to print out the result
String str = " Welcome to Java Tutorial ";
str = str.trim();
String[] arr = str.split(" ");
for (String c : arr) {
System.out.print(c);
}
and the results are:
The first is the result when using println, the second is the result when using print,
I do not understand why space appears in println, while it does not appear in print. Can anyone explain it for me?
Since you have many spaces in your string, if you look at the output of split function, the resulted array looks like
[Welcome, , , , , , , to, , , , Java, , , , , Tutorial]
So if you look close they are empty String's "".
When you do a println("") it is printing a line with no string.
However when you do print(""), it is no more visibility of that string and it looks nothing getting printed.
If you want to separate them regardless of spaces between them, split them by white space.
Lastly, trim() won't remove the spaces within the String. It can only trim spaces in the first and last.
What you are doing is splitting by every individual white space. So for every space you have in your string, it is split as a separate string If you want to split just the words, you can use the whitespace regex:
String[] arr = str.split("\\s+");
This will fix your problem with the consecutive whitespaces you are printing.
Also, When you use print instead of println your print value DOES NOT carry over to the next line. Thus when you cann println("") you are just going to a new line.
println() method prints everything in a new line but the print() method prints everything in the same line.
So, when you are splitting your string by space(" ") then,
for the first one, the string is splitting for every space. So, every time a new line comes while printing. You can't see anything just the new line because you are printing: "" this. Because your code is:
System.out.println("");
But for the second one, the string is splitting for every space but you are using print() method that's why it's not going to the new line.
You can overcome this by regex.
You can use this regex: \\s+
So, you have to write:
String[] arr = str.split("\\s+");

why split() produces extra , after sets limit -1

I want to split Area Code and preceding number from Telephone number without brackets so i did this.
String pattern = "[\\(?=\\)]";
String b = "(079)25894029".trim();
String c[] = b.split(pattern,-1);
for (int a = 0; a < c.length; a++)
System.out.println("c[" + a + "]::->" + c[a] + "\nLength::->"+ c[a].length());
Output:
c[0]::-> Length::->0
c[1]::->079 Length::->3
c[2]::->25894029 Length::->8
Expected Output:
c[0]::->079 Length::->3
c[1]::->25894029 Length::->8
So my question is why split() produces and extra blank at the start, e.g
[, 079, 25894029]. Is this its behavior, or I did something go wrong here?
How can I get my expected outcome?
First you have unnecessary escaping inside your character class. Your regex is same as:
String pattern = "[(?=)]";
Now, you are getting an empty result because ( is the very first character in the string and split at 0th position will indeed cause an empty string.
To avoid that result use this code:
String str = "(079)25894029";
toks = (Character.isDigit(str.charAt(0))? str:str.substring(1)).split( "[(?=)]" );
for (String tok: toks)
System.out.printf("<<%s>>%n", tok);
Output:
<<079>>
<<25894029>>
From the Java8 Oracle docs:
When there is a positive-width match at the beginning of this string
then an empty leading substring is included at the beginning of the
resulting array. A zero-width match at the beginning however never
produces such empty leading substring.
You can check that the first character is an empty string, if yes then trim that empty string character.
Your regex has problems, as does your approach - you can't solve it using your approach with any regex. The magic one-liner you seek is:
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
This removes all leading/trailing non-digits, then splits on non-digits. This will handle many different formats and separators (try a few yourself).
See live demo of this:
String b = "(079)25894029".trim();
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
System.out.println(Arrays.toString(c));
Producing this:
[079, 25894029]

Error when splitting a string in java

I am trying to split a string according to a certain set of delimiters.
My delimiters are: ,"():;.!? single spaces or multiple spaces.
This is the code i'm currently using,
String[] arrayOfWords= inputString.split("[\\s{2,}\\,\"\\(\\)\\:\\;\\.\\!\\?-]+");
which works fine for most cases but i'm have a problem when the the first word is surrounded by quotation marks. For example
String inputString = "\"Word\" some more text.";
Is giving me this output
arrayOfWords[0] = ""
arrayOfWords[0] = "Word"
arrayOfWords[1] = "some"
arrayOfWords[2] = "more"
arrayOfWords[3] = "text"
I want the output to give me an array with
arrayOfWords[0] = "Word"
arrayOfWords[1] = "some"
arrayOfWords[2] = "more"
arrayOfWords[3] = "text"
This code has been working fine when quotation marks are used in the middle of the sentence, I'm not sure what the trouble is when it's at the beginning.
EDIT: I just realized I have same problem when any of the delimiters are used as the first character of the string
Unfortunately you wont be able to remove this empty first element using only split. You should probably remove first elements from your string that match your delimiters and split after it. Also your regex seems to be incorrect because
by adding {2,} inside [...] you are in making { 2 , and } characters delimiters,
you don't need to escape rest of your delimiters (note that you don't have to escape - only because it is at end of character class [] so he cant be used as range operator).
Try maybe this way
String regexDelimiters = "[\\s,\"():;.!?\\-]+";
String inputString = "\"Word\" some more text.";
String[] arrayOfWords = inputString.replaceAll(
"^" + regexDelimiters,"").split(regexDelimiters);
for (String s : arrayOfWords)
System.out.println("'" + s + "'");
output:
'Word'
'some'
'more'
'text'
A delimiter is interpreted as separating the strings on either side of it, thus the empty string on its left is added to the result as well as the string to its right ("Word"). To prevent this, you should first strip any leading delimiters, as described here:
How to prevent java.lang.String.split() from creating a leading empty string?
So in short form you would have:
String delim = "[\\s,\"():;.!?\\-]+";
String[] arrayOfWords = inputString.replaceFirst("^" + delim, "").split(delim);
Edit: Looking at Pshemo's answer, I realize he is correct regarding your regex. Inside the brackets it's unnecessary to specify the number of space characters, as they will be caught be the + operator.

Replace new line/return with space using regex

Pretty basic question for someone who knows.
Instead of getting from
"This is my text.
And here is a new line"
To:
"This is my text. And here is a new line"
I get:
"This is my text.And here is a new line.
Any idea why?
L.replaceAll("[\\\t|\\\n|\\\r]","\\\s");
I think I found the culprit.
On the next line I do the following:
L.replaceAll( "[^a-zA-Z0-9|^!|^?|^.|^\\s]", "");
And this seems to be causing my issue.
Any idea why?
I am obviously trying to do the following: remove all non-chars, and remove all new lines.
\s is a shortcut for whitespace characters in regex. It has no meaning in a string. ==> You can't use it in your replacement string. There you need to put exactly the character(s) that you want to insert. If this is a space just use " " as replacement.
The other thing is: Why do you use 3 backslashes as escape sequence? Two are enough in Java. And you don't need a | (alternation operator) in a character class.
L.replaceAll("[\\t\\n\\r]+"," ");
Remark
L is not changed. If you want to have a result you need to do
String result = L.replaceAll("[\\t\\n\\r]+"," ");
Test code:
String in = "This is my text.\n\nAnd here is a new line";
System.out.println(in);
String out = in.replaceAll("[\\t\\n\\r]+"," ");
System.out.println(out);
The new line separator is different for different OS-es - '\r\n' for Windows and '\n' for Linux.
To be safe, you can use regex pattern \R - the linebreak matcher introduced with Java 8:
String inlinedText = text.replaceAll("\\R", " ");
Try
L.replaceAll("(\\t|\\r?\\n)+", " ");
Depending on the system a linefeed is either \r\n or just \n.
I found this.
String newString = string.replaceAll("\n", " ");
Although, as you have a double line, you will get a double space. I guess you could then do another replace all to replace double spaces with a single one.
If that doesn't work try doing:
string.replaceAll(System.getProperty("line.separator"), " ");
If I create lines in "string" by using "\n" I had to use "\n" in the regex. If I used System.getProperty() I had to use that.
Your regex is good altough I would replace it with the empty string
String resultString = subjectString.replaceAll("[\t\n\r]", "");
You expect a space between "text." and "And" right?
I get that space when I try the regex by copying your sample
"This is my text. "
So all is well here. Maybe if you just replace it with the empty string it will work. I don't know why you replace it with \s. And the alternation | is not necessary in a character class.
You May use first split and rejoin it using white space.
it will work sure.
String[] Larray = L.split("[\\n]+");
L = "";
for(int i = 0; i<Larray.lengh; i++){
L = L+" "+Larray[i];
}
This should take care of space, tab and newline:
data = data.replaceAll("[ \t\n\r]*", " ");

Categories

Resources