I want to replace spaces from path string. I tried below but doesn't seems to be working :
String path = "/Users/TD/San Diego";
path=path.replaceAll(" ","\\ ");
System.out.println(path);
Goal is to convert
"/Users/TD/San Diego" to "/Users/TD/San\ Diego"
Any further space from string also needs to be replaced with "\ "
You could change
path = path.replaceAll(" ", "\\ ");
to escape the backslash
path = path.replaceAll(" ", "\\\\ ");
When I do that, I get (the requested)
/Users/TD/San\ Diego
Another option would be using String.replace like
path = path.replace(" ", "\\ ")
which outputs the same.
The suggested solution did not work for me (in Android Java).
So this is what I've came up with, after quite a few attempts:
path = path.replace(" ", (char) 92 + " ");
Related
My app has a feature to filter content based on some keywords.
This is case insensitive so in order to work I first call String.toLowerCase() on the source content.
The issue I have is when the source is in upper case and contains accentuated characters like with the french word: "INVITÉ"
This word when set to lowercase using the device default locale returns "invité"
The problem is that the last character is not the same as the lowercase character "é"
Instead it's the combination of 2 chars:
"e" 101 &
" ' " 769
Because of this "invité" does not match "invité"
How can I solve this? I would prefer not to remove accentuated characters altogether
You should normalize the string like this.
String upper = "INVITÉ";
System.out.println(upper + " length=" + upper.length());
String lower = upper.toLowerCase();
System.out.println(lower + " length=" + lower.length());
String normalized = Normalizer.normalize(lower, Normalizer.Form.NFC);
System.out.println(normalized + " length=" + normalized.length());
output:
INVITÉ length=7
invité length=7
invité length=6
It also works for Japanese.
String japanese = "が";
System.out.println(japanese + " length=" + japanese.length());
String normalized = Normalizer.normalize(japanese, Normalizer.Form.NFC);
System.out.println(normalized + " length=" + normalized.length());
output:
が length=2
が length=1
I am trying to get my output to display double quotations around the abbreviations and also the translated abbreviations. However I have not covered escape sequences in my current class so I was wondering if there was another way to accomplish this. The workbook will not accept when I try with escape sequence.
I have tried escape sequence and using two single quotes ('' '') but neither have worked. Perhaps I am missing something and am fairly new to the java language. Just trying to learn the most efficient way from a fundamental standpoint.
import java.util.Scanner;
public class TextMsgExpander {
public static void main(String[] args) {
Scanner scnr = new Scanner(System.in);
String txtMsg;
String BFF = "best friend forever";
String IDK = "I don't know";
String JK = "just kidding";
String TMI = "too much information";
String TTYL = "talk to you later";
System.out.println("Enter text: ");
txtMsg = scnr.nextLine();
System.out.println("You entered: " + txtMsg);
System.out.println();
if(txtMsg.contains("BFF")) {
txtMsg = txtMsg.replace("BFF", BFF);
System.out.println("Replaced BFF with " + BFF);
} // above line is where I tried escape sequence
if(txtMsg.contains("IDK")) {
txtMsg = txtMsg.replace("IDK", IDK);
System.out.println("Replaced IDK with " + IDK);
}
if(txtMsg.contains("JK")) {
txtMsg = txtMsg.replace("JK", JK);
System.out.println("Replaced JK with " + JK);
}
System.out.println();
System.out.println("Expanded: " + txtMsg);
return;
}
}
Your output
Enter text:
You entered: IDK how that happened. TTYL.
Replaced IDK with I don't know
Replaced TTYL with talk to you later
Expanded: I don't know how that happened. talk to you later.
Expected output
Enter text:
You entered: IDK how that happened. TTYL.
Replaced "IDK" with "I don't know".
Replaced "TTYL" with "talk to you later".
Expanded: I don't know how that happened. talk to you later.
Have you tried this:
\"example text\"
So you would have something like this:
System.out.println("Replaced \"BFF\" with " + "\"" + BFF + "\"");
or
System.out.println("Replaced \"BFF\" with \"" + BFF + "\"");
Normally it should work with escape characters.
Have u tried something like this:
System.out.println("\"These two semi colons are removed when i am printed\"");
I tested it and it worked for me.
If you cannot use \ escape sequences, for whatever reason, you can use the fact that an ' apostrophe doesn't need to be escaped in a "xx" string literal, and that a " double-quote doesn't need to be escaped in a 'x' character literal.
E.g. to print Replacing "foo" with 'bar' was easy, and foo and bar are from variables, you can do this:
String s = "Replacing " + '"' + foo + '"' + " with '" + bar + "' was easy"`;
I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
I have a string from which I need to remove all mentioned punctuations and spaces. My code looks as follows:
String s = "s[film] fever(normal) curse;";
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s+]");
System.out.println("spart[0]: " + spart[0]);
System.out.println("spart[1]: " + spart[1]);
System.out.println("spart[2]: " + spart[2]);
System.out.println("spart[3]: " + spart[3]);
But, I am getting some elements which are blank. The output is:
spart[0]: s
spart[1]: film
spart[2]:
spart[3]: normal
- is a special character in PHP character classes. For instance, [a-z] matches all chars from a to z inclusive. Note that you've got )-_ in your regex.
- defines a range in regular expressions as used by String.split argument so that needs to be escaped
String[] part = line.toLowerCase().split("[,/?:;\"{}()\\-_+*=|<>!`~##$%^&]");
String[] spart = s.split("[,/?:;\\[\\]\"{}()\\-_+*=|<>!`~##$%^&\\s]+");
I am making an application where I will be fetching tweets and storing them in a database. I will have a column for the complete text of the tweet and another where only the words of the tweet will remain (I need the words to calculate which words were most used later).
How I currently do it is by using 6 different .replaceAll() functions which some of them might be triggered twice. For example I will have a for loop to remove every "hashtag" using replaceAll().
The problem is that I will be editing as many as thousands of tweets that I fetch every few minutes and I think that the way I am doing it will not be too efficient.
What my requirements are in this order (also written in comments down bellow):
Delete all usernames mentioned
Delete all RT (retweets flags)
Delete all hashtags mentioned
Replace all break lines with spaces
Replace all double spaces with single spaces
Delete all special characters except spaces
Here is a Short and Compilable Example:
public class StringTest {
public static void main(String args[]) {
String text = "RT #AshStewart09: Vote for Lady Gaga for \"Best Fans\""
+ " at iHeart Awards\n"
+ "\n"
+ "RT!!\n"
+ "\n"
+ "My vote for #FanArmy goes to #LittleMonsters #iHeartAwards"
+ " htt…";
String[] hashtags = {"#FanArmy", "#LittleMonsters", "#iHeartAwards"};
System.out.println("Before: " + text + "\n");
// Delete all usernames mentioned (may run multiple times)
text = text.replaceAll("#AshStewart09", "");
System.out.println("First Phase: " + text + "\n");
// Delete all RT (retweets flags)
text = text.replaceAll("RT", "");
System.out.println("Second Phase: " + text + "\n");
// Delete all hashtags mentioned
for (String hashtag : hashtags) {
text = text.replaceAll(hashtag, "");
}
System.out.println("Third Phase: " + text + "\n");
// Replace all break lines with spaces
text = text.replaceAll("\n", " ");
System.out.println("Fourth Phase: " + text + "\n");
// Replace all double spaces with single spaces
text = text.replaceAll(" +", " ");
System.out.println("Fifth Phase: " + text + "\n");
// Delete all special characters except spaces
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();
System.out.println("Finaly: " + text);
}
}
Relying on replaceAll is probably the biggest performance killer as it compiles the regex again and again. The use of regexes for everything is probably the second most significant problem.
Assuming all usernames start with #, I'd replace
// Delete all usernames mentioned (may run multiple times)
text = text.replaceAll("#AshStewart09", "");
by a loop copying everything until it founds a #, then checking if the following chars match any of the listed usernames and possibly skipping them. For this lookup you could use a trie. A simpler method would be a replaceAll-like loop for the regex #\w+ together with a HashMap lookup.
// Delete all RT (retweets flags)
text = text.replaceAll("RT", "");
Here,
private static final Pattern RT_PATTERN = Pattern.compile("RT");
is a sure win. All the following parts could be handled similarly. Instead of
// Delete all special characters except spaces
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();
you could use Guava's CharMatcher. The method removeFrom does exactly what you did, but collapseFrom or trimAndCollapseFrom might be better.
According to the now closed question, it all boils down to
tweet = tweet.replaceAll("#\\w+|#\\w+|\\bRT\\b", "")
.replaceAll("\n", " ")
.replaceAll("[^\\p{L}\\p{N} ]+", " ")
.replaceAll(" +", " ")
.trim();
The second line seems to be redundant as the third one does remove \n too. Changing the first line's replacement to " " doesn't change the outcome an allows to aggregate the replacements.
tweet = tweet.replaceAll("#\\w*|#\\w*|\\bRT\\b|[^##\\p{L}\\p{N} ]+", " ")
.replaceAll(" +", " ")
.trim();
I've changed the usernames and hashtags part to eating also lone # or #, so that it doesn't need to be consumed by the special chars part. This is necessary for corrent processing of strings like !#AshStewart09.
For maximum performance, you surely need a precompiled pattern. I'd also re-suggest to use Guava's CharMatcher for the second part. Guava is huge (2 MB I guess), but you surely find more useful things there. So in the end you can get
private static final Pattern PATTERN =
Pattern.compile("#\\w*|#\\w*|\\bRT\\b|[^##\\p{L}\\p{N} ]+");
private static final CharMatcher CHAR_MATCHER = CharMacher.is(" ");
tweet = PATTERN.matcher(tweet).replaceAll(" ");
tweet = CHAR_MATCHER.trimAndCollapseFrom(tweet, " ");
You can inline all of the things that are being replaced with nothing into one call to replace all and everything that is replaced with a space into one call like so (also using a regex to find the hashtags and usernames as this seems easier):
text = text.replaceAll("#\w+|#\w+|RT", "");
text = text.replaceAll("\n| +", " ");
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();
my codes dont seem to properly address what i intend to achieve.
a long string instead of a well broken and seperated string
it does not handle the 'seperator' appropriately ( produces , instead of ",")
also the 'optional' ( produces ' instead of " '")
Current result:
LOAD DATA INFILE 'max.csv'BADFILE 'max.bad'DISCARDFILE
'max.dis' APPEND INTO TABLEADDRESSfields terminated by,optionally enclosed
by'(ID,Name,sex)
the intended result should look like this
is there a better way of doing this or improving the above codes
Yeah. Use the character \n to start a new line in the file, and escape " characters as \". Also, you'll want to add a space after each variable.
content = " LOAD DATA\nINFILE "+ fileName + " BADFILE "+ badName + " DISCARDFILE " +
discardName + "\n\nAPPEND\nINTO TABLE "+ table + "\n fields terminated by \"" + separator
+ "\" optionally enclosed by '" + optional + "'\n (" + column + ")";
This is assuming fileName, badName, and discardName include the quotes around the names.
Don't reinvent the wheel... the apache commons-io library does all that in one line:
FileUtils.write(new File(controlName), content);
Here's the javadoc for FileUtils.write(File, CharSequence):
Writes a CharSequence to a file creating the file if it does not exist
To insert a new line you need to use \n or \r\n for windows
for example
discardName + "\n" //New line here
"APPEND INTO TABLE"
For the double quote symbol on the other hand you need to specifically type \" around the comma:
"fields terminated by \"" + separator +"\""
which will produce this ","
and that is similar to what the optional variable needs to be