Replace tab with blank space

Replace tab with blank space - java

final String remove = " " // tab is 3 spaces
while (lineOfText != null)
{
if (lineOfText.contains(remove))
{
lineOfText = " ";
}
outputFile.println(lineOfText);
lineOfText = inputFile.readLine();
}
I tried running this but it doesn't replace the tabs with one blank space. Any solutions?

Tab is not three spaces. It's a special character that you obtain with an escape, specifically final String remove = "\t"; and
if (lineOfText.contains(remove))
lineOfText = lineOfText.replaceAll(remove, " ");
}
or remove the if (because replaceAll doesn't need it) like,
lineOfText = lineOfText.replaceAll(remove, " ");

You can simply use this regular expression to replace any type of escapes( including tabs, newlines, spaces etc.) within a String with the desired one:
lineOfText.replaceAll("\\s", " ");
Here in this example in the string named lineOfText we have replaced all escapes with whitespaces.

Related

How to detect the letters in a String and switch them?

How to detect the letters in a String and switch them?
I thought about something like this...but is this possible?
//For example:
String main = hello/bye;
if(main.contains("/")){
//Then switch the letters before "/" with the letters after "/"
}else{
//nothing
}

Well, if you are interested in a cheeky regex :P
public static void main(String[] args) {
String s = "hello/bye";
//if(s.contains("/")){ No need to check this
System.out.println(s.replaceAll("(.*?)/(.*)", "$2/$1")); // () is a capturing group. it captures everything inside the braces. $1 and $2 are the captured values. You capture values and then swap them. :P
//}
}
O/P :
bye/hello --> This is what you want right?

Use String.substring:
main = main.substring(main.indexOf("/") + 1)
+ "/"
+ main.substring(0, main.indexOf("/")) ;

You can use String.split e.g.
String main = "hello/bye";
String[] splitUp = main.split("/"); // Now you have two strings in the array.
String newString = splitUp[1] + "/" + splitUp[0];
Of course you have to also implement some error handling when there is no slash etc..

you can use string.split(separator,limit)
limit : Optional. An integer that specifies the number of splits, items after the split limit will not be included in the array
String main ="hello/bye";
if(main.contains("/")){
//Then switch the letters before "/" with the letters after "/"
String[] parts = main.split("/",1);
main = parts[1] +"/" + parts[0] ; //main become 'bye/hello'
}else{
//nothing
}

Also you can use StringTokenizer to split the string.
String main = "hello/bye";
StringTokenizer st = new StringTokenizer(main,"\");
String part1 = st.nextToken();
String part2 = st.nextToken();
String newMain = part2 + "\" + part1;

More efficient way to make a string in a string of just words

I am making an application where I will be fetching tweets and storing them in a database. I will have a column for the complete text of the tweet and another where only the words of the tweet will remain (I need the words to calculate which words were most used later).
How I currently do it is by using 6 different .replaceAll() functions which some of them might be triggered twice. For example I will have a for loop to remove every "hashtag" using replaceAll().
The problem is that I will be editing as many as thousands of tweets that I fetch every few minutes and I think that the way I am doing it will not be too efficient.
What my requirements are in this order (also written in comments down bellow):
Delete all usernames mentioned
Delete all RT (retweets flags)
Delete all hashtags mentioned
Replace all break lines with spaces
Replace all double spaces with single spaces
Delete all special characters except spaces
Here is a Short and Compilable Example:
public class StringTest {
public static void main(String args[]) {
String text = "RT #AshStewart09: Vote for Lady Gaga for \"Best Fans\""
+ " at iHeart Awards\n"
+ "\n"
+ "RT!!\n"
+ "\n"
+ "My vote for #FanArmy goes to #LittleMonsters #iHeartAwards"
+ " htt…";
String[] hashtags = {"#FanArmy", "#LittleMonsters", "#iHeartAwards"};
System.out.println("Before: " + text + "\n");
// Delete all usernames mentioned (may run multiple times)
text = text.replaceAll("#AshStewart09", "");
System.out.println("First Phase: " + text + "\n");
// Delete all RT (retweets flags)
text = text.replaceAll("RT", "");
System.out.println("Second Phase: " + text + "\n");
// Delete all hashtags mentioned
for (String hashtag : hashtags) {
text = text.replaceAll(hashtag, "");
}
System.out.println("Third Phase: " + text + "\n");
// Replace all break lines with spaces
text = text.replaceAll("\n", " ");
System.out.println("Fourth Phase: " + text + "\n");
// Replace all double spaces with single spaces
text = text.replaceAll(" +", " ");
System.out.println("Fifth Phase: " + text + "\n");
// Delete all special characters except spaces
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();
System.out.println("Finaly: " + text);
}
}

Relying on replaceAll is probably the biggest performance killer as it compiles the regex again and again. The use of regexes for everything is probably the second most significant problem.
Assuming all usernames start with #, I'd replace
// Delete all usernames mentioned (may run multiple times)
text = text.replaceAll("#AshStewart09", "");
by a loop copying everything until it founds a #, then checking if the following chars match any of the listed usernames and possibly skipping them. For this lookup you could use a trie. A simpler method would be a replaceAll-like loop for the regex #\w+ together with a HashMap lookup.
// Delete all RT (retweets flags)
text = text.replaceAll("RT", "");
Here,
private static final Pattern RT_PATTERN = Pattern.compile("RT");
is a sure win. All the following parts could be handled similarly. Instead of
// Delete all special characters except spaces
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();
you could use Guava's CharMatcher. The method removeFrom does exactly what you did, but collapseFrom or trimAndCollapseFrom might be better.

According to the now closed question, it all boils down to
tweet = tweet.replaceAll("#\\w+|#\\w+|\\bRT\\b", "")
.replaceAll("\n", " ")
.replaceAll("[^\\p{L}\\p{N} ]+", " ")
.replaceAll(" +", " ")
.trim();
The second line seems to be redundant as the third one does remove \n too. Changing the first line's replacement to " " doesn't change the outcome an allows to aggregate the replacements.
tweet = tweet.replaceAll("#\\w*|#\\w*|\\bRT\\b|[^##\\p{L}\\p{N} ]+", " ")
.replaceAll(" +", " ")
.trim();
I've changed the usernames and hashtags part to eating also lone # or #, so that it doesn't need to be consumed by the special chars part. This is necessary for corrent processing of strings like !#AshStewart09.
For maximum performance, you surely need a precompiled pattern. I'd also re-suggest to use Guava's CharMatcher for the second part. Guava is huge (2 MB I guess), but you surely find more useful things there. So in the end you can get
private static final Pattern PATTERN =
Pattern.compile("#\\w*|#\\w*|\\bRT\\b|[^##\\p{L}\\p{N} ]+");
private static final CharMatcher CHAR_MATCHER = CharMacher.is(" ");
tweet = PATTERN.matcher(tweet).replaceAll(" ");
tweet = CHAR_MATCHER.trimAndCollapseFrom(tweet, " ");

You can inline all of the things that are being replaced with nothing into one call to replace all and everything that is replaced with a space into one call like so (also using a regex to find the hashtags and usernames as this seems easier):
text = text.replaceAll("#\w+|#\w+|RT", "");
text = text.replaceAll("\n| +", " ");
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();

How to remove special characters in String

I want to remove "[", "]", "," from my string
for example,
[569.24, 569.24, 568.10, 566.00, 566.01, 566.00, 567.98, 565.14]
to
569.24 569.24 568.10 566.00 566.01 566.00 567.98 565.14
however, I can remove "," but "[" and "]"
my codes are as follows.
String content = price_result.toString();
//remove special characters
String content_modified = content.replaceAll("[ \t\"',;]+", " ");
System.out.println(content_modified);
the above result in [569.24, 569.24, 568.10, 566.00, 566.01, 566.00, 567.98, 565.14]..
How can I remove "[" and "]"?

just use this
String content = price_result.toString();
//remove special characters
String content_modified = content.replace("[","").replace("]","").replace(",","");
System.out.println(content_modified);

You can try the next:
// Characters you want to remove
String unwanted = "[],";
// It will be used frequently? Use a constant.
Pattern pattern = Pattern.compile("[" + Pattern.quote(unwanted) + "]");
String content = price_result.toString();
String content_modified = pattern.matcher(content).replaceAll("");
System.out.println(content_modified);

Put them in character class [] with escape character \
String content_modified = content.replaceAll("[\\[\t\"',;\\]]+", " ");
Or pipe them one by one(put other characters yourself :)
String content_modified = content.replaceAll("\\[|\\]|,|;", " ");

JAVA - Ignore part of strings containing "#"

I'm having some difficulties in excluding part of strings after the "#" symbol.
I explain myself better:
This is a sample input text a user could insert in a textbox:
Some Text
Some Text again #A comment
#A comment line
Another Text
Another Text again#Comment
I need to read this text and ignore all text after "#" symbol.
This should be the expected output:
Some Text;Some Text again;Another Text;Another Text again
As for now here's the code:
This replaces all newlines with ";"
readText = userInputTextArea.getText();
readTextAllInALine = readText.replaceAll("\\n", ";");
so the output after this is:
Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment
This code is to ignore all characters after the first "#" but works fine just for the first line if we read it all sequentially.
int startIndex = inputCommandText.indexOf("#");
int endIndex = inputCommandText.indexOf(";");
String toBeReplaced = inputCommandText.substring(startIndex, endIndex);
readTextAllInALine.replace(toBeReplaced, "");
I'm stuck in finding a way for having the expected output. I was thinking of using a StringTokenizer, processing every line, removing text after "#" or ignoring the whole line if it starts with "#", and then printing all tokens (i.e. all lines) separating them with ";" but I cannot make it work.
Any help will be appreciated.
Thank you very much in advance.
Regards.

Just call this replace command on your pure string, retrieved from the text input. The regex #[^;]* grabs everything, starting at the hash until it reads a semicolon. Afterwards it replaces it with an empty string.
public static void main(String[] args) {
String text = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment";
System.out.println(text);
text = text.replaceAll("#[^;]*", "");
System.out.println(text);
}

A regex is useful here but it's tricky because your pattern is moderately complex. The comments are end line so they can appear in more than one arrangement.
I came up with the following which is a two-pass:
replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";");
The two-pass circumvents the fact that sometimes you get a duplicate line break. The first expression replaces comments but not new line characters and the second expression replaces multiple new line characters with a single semicolon.
The individual parts of the expression in the first pass are the following:
" *"
This includes zero or more leading spaces in the comment match. IE in "...again #A...", we want to remove that space between n and #.
"(#.* )"
The start of the comment match: matches a # followed by zero or more characters. (Typically the . matches any character except a new line.)
"(?= )"
This is a positive lookahead and where the regex starts to get tricky. It looks for whatever is inside this expression but doesn't include it in the text that's matched. It asserts that the #.* is followed by a certain string but doesn't replace that certain string.
"\\n|$"
The lookahead finds a new line or the end anchor. This will find a comment ended with a new line character or a comment that is at the end of the String. But again, since it's inside the lookahead, the new line doesn't get replaced.
So given the input:
String text = (
"Some Text" + '\n' +
"Some Text again #A comment" + '\n' +
"#A comment line" + '\n' +
"Another Text" + '\n' +
"Another Text again#Comment"
);
System.out.println(
text.replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";")
);
The output is:
Some Text;Some Text again;Another Text;Another Text again

readText = userInputTextArea.getText();
readText = readText.replaceAll("\\s*#[^\n]*", "");
readText = readText.replaceAll("\n+", ";");

Just to make it clear, Coxer's reply is the way to go. Far more precise and clean. But in any case, if you fancy experimenting here is a recursive solution that will work:
public class IgnoreHash {
#Test
public void test() {
String readTextAllInALine = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment;";
String actualResult = removeHashComments(readTextAllInALine);
Assert.assertEquals(actualResult, "Some Text;Some Text again ;Another Text;Another Text again");
}
private String removeHashComments(String input) {
StringBuffer result = new StringBuffer();
int hashIndex = input.indexOf("#");
int endIndex = input.indexOf(";");
if(hashIndex != -1){
result.append(input.substring(0, hashIndex));
//first line
if(hashIndex < endIndex ) {
result.append(removeHashComments(input.substring(endIndex)));
} // the case of ;#
else if (endIndex == hashIndex-1) {
int endIndex2 = input.indexOf(";", hashIndex+1);
result.append(removeHashComments(input.substring(endIndex2+1)));
}
else {
result.append(removeHashComments(input.substring(hashIndex)));
}
}
return result.toString();
}
}

Java how to replace 2 or more spaces with single space in string and delete leading and trailing spaces

Looking for quick, simple way in Java to change this string
" hello there "
to something that looks like this
"hello there"
where I replace all those multiple spaces with a single space, except I also want the one or more spaces at the beginning of string to be gone.
Something like this gets me partly there
String mytext = " hello there ";
mytext = mytext.replaceAll("( )+", " ");
but not quite.

Try this:
String after = before.trim().replaceAll(" +", " ");
See also
String.trim()
Returns a copy of the string, with leading and trailing whitespace omitted.
regular-expressions.info/Repetition
No trim() regex
It's also possible to do this with just one replaceAll, but this is much less readable than the trim() solution. Nonetheless, it's provided here just to show what regex can do:
String[] tests = {
" x ", // [x]
" 1 2 3 ", // [1 2 3]
"", // []
" ", // []
};
for (String test : tests) {
System.out.format("[%s]%n",
test.replaceAll("^ +| +$|( )+", "$1")
);
}
There are 3 alternates:
^_+ : any sequence of spaces at the beginning of the string
Match and replace with $1, which captures the empty string
_+$ : any sequence of spaces at the end of the string
Match and replace with $1, which captures the empty string
(_)+ : any sequence of spaces that matches none of the above, meaning it's in the middle
Match and replace with $1, which captures a single space
See also
regular-expressions.info/Anchors

You just need a:
replaceAll("\\s{2,}", " ").trim();
where you match one or more spaces and replace them with a single space and then trim whitespaces at the beginning and end (you could actually invert by first trimming and then matching to make the regex quicker as someone pointed out).
To test this out quickly try:
System.out.println(new String(" hello there ").trim().replaceAll("\\s{2,}", " "));
and it will return:
"hello there"

Use the Apache commons StringUtils.normalizeSpace(String str) method. See docs here

This worked perfectly for me : sValue = sValue.trim().replaceAll("\\s+", " ");

trim() method removes the leading and trailing spaces and using replaceAll("regex", "string to replace") method with regex "\s+" matches more than one space and will replace it with a single space
myText = myText.trim().replaceAll("\\s+"," ");

The following code will compact any whitespace between words and remove any at the string's beginning and end
String input = "\n\n\n a string with many spaces, \n"+
" a \t tab and a newline\n\n";
String output = input.trim().replaceAll("\\s+", " ");
System.out.println(output);
This will output a string with many spaces, a tab and a newline
Note that any non-printable characters including spaces, tabs and newlines will be compacted or removed
For more information see the respective documentation:
String#trim() method
String#replaceAll(String regex, String replacement) method
For information about Java's regular expression implementation see the documentation of the Pattern class

"[ ]{2,}"
This will match more than one space.
String mytext = " hello there ";
//without trim -> " hello there"
//with trim -> "hello there"
mytext = mytext.trim().replaceAll("[ ]{2,}", " ");
System.out.println(mytext);
OUTPUT:
hello there

To eliminate spaces at the beginning and at the end of the String, use String#trim() method. And then use your mytext.replaceAll("( )+", " ").

You can first use String.trim(), and then apply the regex replace command on the result.

Try this one.
Sample Code
String str = " hello there ";
System.out.println(str.replaceAll("( +)"," ").trim());
OUTPUT
hello there
First it will replace all the spaces with single space. Than we have to supposed to do trim String because Starting of the String and End of the String it will replace the all space with single space if String has spaces at Starting of the String and End of the String So we need to trim them. Than you get your desired String.

String blogName = "how to do in java . com";
String nameWithProperSpacing = blogName.replaceAll("\\\s+", " ");

trim()
Removes only the leading & trailing spaces.
From Java Doc,
"Returns a string whose value is this string, with any leading and trailing whitespace removed."
System.out.println(" D ev Dum my ".trim());
"D ev Dum my"
replace(), replaceAll()
Replaces all the empty strings in the word,
System.out.println(" D ev Dum my ".replace(" ",""));
System.out.println(" D ev Dum my ".replaceAll(" ",""));
System.out.println(" D ev Dum my ".replaceAll("\\s+",""));
Output:
"DevDummy"
"DevDummy"
"DevDummy"
Note: "\s+" is the regular expression similar to the empty space character.
Reference : https://www.codedjava.com/2018/06/replace-all-spaces-in-string-trim.html

In Kotlin it would look like this
val input = "\n\n\n a string with many spaces, \n"
val cleanedInput = input.trim().replace(Regex("(\\s)+"), " ")

A lot of correct answers been provided so far and I see lot of upvotes. However, the mentioned ways will work but not really optimized or not really readable.
I recently came across the solution which every developer will like.
String nameWithProperSpacing = StringUtils.normalizeSpace( stringWithLotOfSpaces );
You are done.
This is readable solution.

You could use lookarounds also.
test.replaceAll("^ +| +$|(?<= ) ", "");
OR
test.replaceAll("^ +| +$| (?= )", "")
<space>(?= ) matches a space character which is followed by another space character. So in consecutive spaces, it would match all the spaces except the last because it isn't followed by a space character. This leaving you a single space for consecutive spaces after the removal operation.
Example:
String[] tests = {
" x ", // [x]
" 1 2 3 ", // [1 2 3]
"", // []
" ", // []
};
for (String test : tests) {
System.out.format("[%s]%n",
test.replaceAll("^ +| +$| (?= )", "")
);
}

See String.replaceAll.
Use the regex "\s" and replace with " ".
Then use String.trim.

String str = " hello world"
reduce spaces first
str = str.trim().replaceAll(" +", " ");
capitalize the first letter and lowercase everything else
str = str.substring(0,1).toUpperCase() +str.substring(1,str.length()).toLowerCase();

you should do it like this
String mytext = " hello there ";
mytext = mytext.replaceAll("( +)", " ");
put + inside round brackets.

String str = " this is string ";
str = str.replaceAll("\\s+", " ").trim();

This worked for me
scan= filter(scan, " [\\s]+", " ");
scan= sac.trim();
where filter is following function and scan is the input string:
public String filter(String scan, String regex, String replace) {
StringBuffer sb = new StringBuffer();
Pattern pt = Pattern.compile(regex);
Matcher m = pt.matcher(scan);
while (m.find()) {
m.appendReplacement(sb, replace);
}
m.appendTail(sb);
return sb.toString();
}

The simplest method for removing white space anywhere in the string.
public String removeWhiteSpaces(String returnString){
returnString = returnString.trim().replaceAll("^ +| +$|( )+", " ");
return returnString;
}

check this...
public static void main(String[] args) {
String s = "A B C D E F G\tH I\rJ\nK\tL";
System.out.println("Current : "+s);
System.out.println("Single Space : "+singleSpace(s));
System.out.println("Space count : "+spaceCount(s));
System.out.format("Replace all = %s", s.replaceAll("\\s+", ""));
// Example where it uses the most.
String s = "My name is yashwanth . M";
String s2 = "My nameis yashwanth.M";
System.out.println("Normal : "+s.equals(s2));
System.out.println("Replace : "+s.replaceAll("\\s+", "").equals(s2.replaceAll("\\s+", "")));
}
If String contains only single-space then replace() will not-replace,
If spaces are more than one, Then replace() action performs and removes spacess.
public static String singleSpace(String str){
return str.replaceAll(" +| +|\t|\r|\n","");
}
To count the number of spaces in a String.
public static String spaceCount(String str){
int i = 0;
while(str.indexOf(" ") > -1){
//str = str.replaceFirst(" ", ""+(i++));
str = str.replaceFirst(Pattern.quote(" "), ""+(i++));
}
return str;
}
Pattern.quote("?") returns literal pattern String.

My method before I found the second answer using regex as a better solution. Maybe someone needs this code.
private String replaceMultipleSpacesFromString(String s){
if(s.length() == 0 ) return "";
int timesSpace = 0;
String res = "";
for (int i = 0; i < s.length(); i++) {
char c = s.charAt(i);
if(c == ' '){
timesSpace++;
if(timesSpace < 2)
res += c;
}else{
res += c;
timesSpace = 0;
}
}
return res.trim();
}

Stream version, filters spaces and tabs.
Stream.of(str.split("[ \\t]")).filter(s -> s.length() > 0).collect(Collectors.joining(" "))

I know replaceAll method is much easier but I wanted to post this as well.
public static String removeExtraSpace(String input) {
input= input.trim();
ArrayList <String> x= new ArrayList<>(Arrays.asList(input.split("")));
for(int i=0; i<x.size()-1;i++) {
if(x.get(i).equals(" ") && x.get(i+1).equals(" ")) {
x.remove(i);
i--;
}
}
String word="";
for(String each: x)
word+=each;
return word;
}

String myText = " Hello World ";
myText = myText.trim().replace(/ +(?= )/g,'');
// Output: "Hello World"

string.replaceAll("\s+", " ");

If you already use Guava (v. 19+) in your project you may want to use this:
CharMatcher.whitespace().trimAndCollapseFrom(input, ' ');
or, if you need to remove exactly SPACE symbol ( or U+0020, see more whitespaces) use:
CharMatcher.anyOf(" ").trimAndCollapseFrom(input, ' ');

public class RemoveExtraSpacesEfficient {
public static void main(String[] args) {
String s = "my name is mr space ";
char[] charArray = s.toCharArray();
char prev = s.charAt(0);
for (int i = 0; i < charArray.length; i++) {
char cur = charArray[i];
if (cur == ' ' && prev == ' ') {
} else {
System.out.print(cur);
}
prev = cur;
}
}
}
The above solution is the algorithm with the complexity of O(n) without using any java function.

Please use below code
package com.myjava.string;
import java.util.StringTokenizer;
public class MyStrRemoveMultSpaces {
public static void main(String a[]){
String str = "String With Multiple Spaces";
StringTokenizer st = new StringTokenizer(str, " ");
StringBuffer sb = new StringBuffer();
while(st.hasMoreElements()){
sb.append(st.nextElement()).append(" ");
}
System.out.println(sb.toString().trim());
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Replace tab with blank space - java

final String remove = " " // tab is 3 spaces while (lineOfText != null) { if (lineOfText.contains(remove)) { lineOfText = " "; } outputFile.println(lineOfText); lineOfText = inputFile.readLine(); } I tried running this but it doesn't replace the tabs with one blank space. Any solutions?

You can simply use this regular expression to replace any type of escapes( including tabs, newlines, spaces etc.) within a String with the desired one: lineOfText.replaceAll("\\s", " "); Here in this example in the string named lineOfText we have replaced all escapes with whitespaces.

Related

How to detect the letters in a String and switch them?

More efficient way to make a string in a string of just words

How to remove special characters in String

JAVA - Ignore part of strings containing "#"

Java how to replace 2 or more spaces with single space in string and delete leading and trailing spaces

Categories

Resources