Remove all spaces and punctuation (anything not a letter) from a string?

Remove all spaces and punctuation (anything not a letter) from a string? - java

In Java, how can I take a string as a parameter, and then remove all punctuation and spaces and then convert the rest of the letters to uppercase?
Example 1:
Input: How's your day going?
Output: HOWSYOURDAYGOING
Example 2:
Input: What's your name again?
Output: WHATSYOURNAMEAGAIN

This should do the trick
String mystr= "How's your day going?";
mystr = mystr.replaceAll("[^A-Za-z]+", "").toUpperCase();
System.out.println(mystr);
Output:
HOWSYOURDAYGOING
The regex [^A-Za-z]+ means one or more characters that do not match anything in the range A-Za-z, and we replace them with the empty string.

String yourString = "How's your day going";
yourString=yourString.replaceAll("\\s+",""); //remove white space
yourString=yourString.replaceAll("[^a-zA-Z ]", ""); //removes all punctuation
yourString=yourString.toUpperCase(); //convert to Upper case

I did it with
inputText = inputText.replaceAll("\\s|[^a-zA-Z0-9]","");
inputText.toUpper(); //and later uppercase the complete string
Though #italhourne 's answer is correct but you can just reduce it in single step by just removing the spaces as well as keeping all the characters from a-zA-Z and 0-9, in a single statement by adding "or".
Just a help for those who need it!!

public static String repl1(String n){
n = n.replaceAll("\\p{Punct}|\\s","");
return n;
}

Well, I did it the long way, take a look if you want. I used the ACII code values (this is my main method, transform it to a function on your own).
String str="How's your day going?";
char c=0;
for(int i=0;i<str.length();i++){
c=str.charAt(i);
if(c<65||(c>90&&c<97)||(c>122)){
str=str.replace(str.substring(i,i+1) , "");
}
}
str=str.toUpperCase();
System.out.println(str);

Related

Deleting content of every string after first empty space

How can I delete everything after first empty space in a string which user selects? I was reading this how to remove some words from a string in java. Can this help me in my case?

You can use replaceAll with a regex \s.* which match every thing after space:
String str = "Hello java word!";
str = str.replaceAll("\\s.*", "");
output
Hello
regex demo
Like #Coffeehouse Coder mention in comment, This solution will replace every thing if the input start with space, so if you want to avoid this case, you can trim your input using string.trim() so it can remove the spaces in start and in end.

Assuming that there is no space in the beginning of the string.
Follow these steps-
Split the string at space. It will create an array.
Get the first element of that array.
Hope this helps.
str = "Example string"
String[] _arr = str.split("\\s");
String word = _arr[0];
You need to consider multiple white spaces and space in the beginning before considering the above code.
I am not native to JAVA Programming but have an idea that it has split function for string.
And the reference you cited in the question is bit complex, while you can achieve the desired thing very easily.
P.S. In future if you make a mind to get two words or three, splitting method is better (assuming you have already dealt with multiple white-spaces) else substring is better.

A simple way to do it can be:
System.out.println("Hello world!".split(" ")[0]);

// Taking 'str' as your string
// To remove the first space(s) of the string,
str = str.trim();
int index = str.indexOf(" ");
String word = str.substring(0, index);
This is just one method of many.
str = str.replaceAll("\\s+", " "); // This replaces one or more spaces with one space
String[] words = str.split("\\s");
String first = words[0];

The simplest solution in my opinion would be to just locate the index which the user wants it to be cut off at and then call the substring() method from 0 to the index they wanted. Set that = to a new string and you have the string they want.
If you want to replace the string then just set the original string = to the result of the substring() method.
Link to substring() method: https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#substring(int,%20int)

There are already 5 perfectly good answers, so let me add a sixth one. Variety is the spice of life!
private static final Pattern FIRST_WORD = Pattern.compile("\\S+");
public static String firstWord(CharSequence text) {
Matcher m = FIRST_WORD.matcher(text);
return m.find() ? m.group() : "";
}
Advantages over the .split(...)[0]-type answers:
It directly does exactly what is being asked, i.e. "Find the first sequence of non-space characters." So the self-documentation is more explicit.
It is more efficient when called on multiple strings (e.g. for batch processing a large list of strings) because the regular expression is compiled only once.
It is more space-efficient because it avoids unnecessarily creating a whole array with references to each word when we only need the first.
It works without having to trim the string.
(I know this is probably too late to be of any use to the OP but I'm leaving it here as an alternative solution for future readers.)

This would be more efficient
String str = "Hello world!";
int spaceInd = str.indexOf(' ');
if(spaceInd != -1) {
str = str.substring(0, spaceInd);
}
System.out.println(String.format("[%s]", str));

JAVA: Replacing words in string

I want to replace words in a string, but I am having little difficulties. Here is what I want to do. I have string:
String a = "I want to replace some words in this string";
It should work like some kind of a translator. I am doing this with String.replaceAll(), but it doesn't work completely because of this. Let's say I am translating from English to German, than this should be the output (Ich means I in German).
String toTranslate = "I";
String translated = "Ich";
a = a.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
Now the output of the String a will be this:
"ich want to replace some words ich**n** **th**ich**s** **str**ich**ng**"
How to replace just the words, not the subwords in the words?

replaceAll uses regex, so you may add word boundaries or look-around mechanisms to check if there are no non-space characters surrounding word you want to replace.
String toTranslate = "I";
String translated = "Ich";
a = a.replaceAll("(?<!\\S)"+toTranslate.toLowerCase()+"(?!\\S)", translated.toLowerCase());
You can also add quotation mechanism to escape any regex metacharacters like + * ( inside word you want to replace. BTW you don't need to change your string to lower case, simply add case-insensitive flag to regex (?i).
a = a.replaceAll("(?i)(?<!\\S)"+Pattern.quote(toTranslate)+"(?!\\S)", translated.toLowerCase());

Use split(" ") for getting each word in the sentence. And then use replaceAll on each word.
String a = "I want to replace some words in this string";
String toTranslate = "I";
String translated = "Ich";
String newString[]=a.split(" ");
for (String string : newString) {
string=string.replaceAll(toTranslate, toTranslate.toLowerCase());//Adding this line ensures you dont miss any uppercase toTranslate
string=string.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
System.out.println("after translation ="+string);
}

String toTranslate = "I ";
String translated = "Ich ";
a = a.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
If you add a space after the "I" it should replace it when it comes to the word "Ich" but if your word ends in a "I" then thats another problem

If you assume that I will always be capitalized in English as it should be then
a = a.replaceAll(toTranslate, translated);
will work, otherwise you need to replace both cases
a = a.replaceAll(toTranslate, translated);
a = a.replaceAll("([^a-zA-Z])("+toTranslate.toLowerCase()+")([^a-zA-Z])", "$1"+translated.toLowerCase()+"$3");
Here is a working example

Yes, the word boundaries are the solution. I just did this in the regex:
text.replaceAll("\\b" + parts1[i] + "\\b", map.element.value);
Don't be confused with the second argument it's string (from Hash table).

You can use RegEx's word bound, which is \b
String toTranslate = "\\bI\\b";
String translated = "Ich";
a = a.replaceAll(toTranslate.toLowerCase(), translated.toLowerCase());
This should ensure I is separated entirely into its own word
Edit: I misread the question and realized you want whole words. See above, as I have accounted for that

why split() produces extra , after sets limit -1

I want to split Area Code and preceding number from Telephone number without brackets so i did this.
String pattern = "[\\(?=\\)]";
String b = "(079)25894029".trim();
String c[] = b.split(pattern,-1);
for (int a = 0; a < c.length; a++)
System.out.println("c[" + a + "]::->" + c[a] + "\nLength::->"+ c[a].length());
Output:
c[0]::-> Length::->0
c[1]::->079 Length::->3
c[2]::->25894029 Length::->8
Expected Output:
c[0]::->079 Length::->3
c[1]::->25894029 Length::->8
So my question is why split() produces and extra blank at the start, e.g
[, 079, 25894029]. Is this its behavior, or I did something go wrong here?
How can I get my expected outcome?

First you have unnecessary escaping inside your character class. Your regex is same as:
String pattern = "[(?=)]";
Now, you are getting an empty result because ( is the very first character in the string and split at 0th position will indeed cause an empty string.
To avoid that result use this code:
String str = "(079)25894029";
toks = (Character.isDigit(str.charAt(0))? str:str.substring(1)).split( "[(?=)]" );
for (String tok: toks)
System.out.printf("<<%s>>%n", tok);
Output:
<<079>>
<<25894029>>

From the Java8 Oracle docs:
When there is a positive-width match at the beginning of this string
then an empty leading substring is included at the beginning of the
resulting array. A zero-width match at the beginning however never
produces such empty leading substring.
You can check that the first character is an empty string, if yes then trim that empty string character.

Your regex has problems, as does your approach - you can't solve it using your approach with any regex. The magic one-liner you seek is:
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
This removes all leading/trailing non-digits, then splits on non-digits. This will handle many different formats and separators (try a few yourself).
See live demo of this:
String b = "(079)25894029".trim();
String[] c = b.replaceAll("^\\D+|\\D+$", "").split("\\D+");
System.out.println(Arrays.toString(c));
Producing this:
[079, 25894029]

why does this for loop wordcount method not work in java

Can anyone let me know why this wordsearch method doesn't work - the returned value of count is 0 everytime I run it.
public int wordcount(){
String spaceString = " ";
int count = 0;
for(int i = 0; i < this.getString().length(); i++){
if (this.getString().substring(i).equals(spaceString)){
count++;
}
}
return count;
}
The value of getString = my search string.
Much appreciated if anyone can help - I'm sure I'm prob doing something dumb.
Dylan

Read the docs:
The substring begins with the character at the specified index and extends to the end of this string.
Your if condition is only true once, if the last character of the string is a space. Perhaps you wanted charAt? (And even this won't properly handle double spaces; splitting on whitespace might be a better option.)

Because substring with only one argument returns the sub string starting from that index till the end of the string. So you're not comparing just one character.
Instead of substring define spaceString as a char, and use charAt(i)

this.getString().substring(i) -> this returns a sub string from the index i to the end of the String
So for example if your string was Test the above would return Test, est, st and finally t
For what you're trying to do there are alternative methods, but you could simple replace
this.getString().substring(i)
with
spaceString.equals(this.getString().charAt(i))
An alternative way of doing what you're trying to do is:
this.getString().split(spaceString)
This would return an array of Strings - the original string broken up by spaces.

Read the documentation of the method you are using:
http://docs.oracle.com/javase/7/docs/api/java/lang/String.html#substring(int)
I.e. the count will be non zero only if you have a space on the end of your string

Using substring as you are will not work. If the value of getString() is "my search string" every iteration through the loop with have substring(i) return:
my search string
y search string
search string
search string
earch string
arch string
rch string
ch string
h string
string
string
tring
ring
ing
ng
g
Notice none of those equals " ".
Try using split.
public int countWords(String s){
return s.split("\\s+").length;
}

Change
if (this.getString().substring(i).equals(spaceString))
to
if (this.getString().charAt(i) == ' ')

this.getString().substring(i) returns a string from the index of (i) to the end of the string.
Example: for i=5, it will return "rown cow" from the string "the brown cow". This functionality isn't what you need.
If you pepper System.out.println() throughout your code (or use the debugger), you will see this.
I think it would be better to use something like String.split() or charAt(i).
By the way, even if you fix your code by counting spaces, it will not return the correct value for these conditions: "my dog" (word count=2) and "cow" (word count=1). There is also a problem if there are more than one space between words. ALso, this will produce a word cound of three:
" the cow ".

Insert special sign between characters

I have a String which has numbers and I want to add this sign ":" between every two numbers as if the string was 0123456789 I want it to be like this 01:23:45:67:89
Is there any way to insert it ?? as I read about replace() but this does not help in my case

You could use this magic piece of regex:
System.out.println("0123456789".replaceAll(".{2}(?!$)", "$0:"));
.{2} match 2 characters
(?!$) not at end
$0: First matched argument with : included

String x="0123456789";
String result="";
for(int i=0;i<x.length();i++){
result+=x.charAt(i);
if(i%2==1 && i+1<x.length())
result+=":";
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Remove all spaces and punctuation (anything not a letter) from a string? - java

In Java, how can I take a string as a parameter, and then remove all punctuation and spaces and then convert the rest of the letters to uppercase? Example 1: Input: How's your day going? Output: HOWSYOURDAYGOING Example 2: Input: What's your name again? Output: WHATSYOURNAMEAGAIN

String yourString = "How's your day going"; yourString=yourString.replaceAll("\\s+",""); //remove white space yourString=yourString.replaceAll("[^a-zA-Z ]", ""); //removes all punctuation yourString=yourString.toUpperCase(); //convert to Upper case

public static String repl1(String n){ n = n.replaceAll("\\p{Punct}|\\s",""); return n; }

Related

Deleting content of every string after first empty space

JAVA: Replacing words in string

why split() produces extra , after sets limit -1

why does this for loop wordcount method not work in java

Insert special sign between characters

Categories

Resources