I know both NetBeans and Eclipse has options where if you paste multi-line, un-escaped string into a string variable, it will automatically add escape characters and add line breaks in. Is there a way to reverse the process?
For example:
function ShowHideOptions(trigger, element) {
if( trigger ) {
document.getElementById( element ).style.display = "";
} else {
document.getElementById( element ).style.display = "none";
}
}
if pasted in to string becomes:
private static final String LABEL_JAVASCRIPT = "function ShowHideOptions(trigger, element) {\n"
+ " if( trigger ) {\n"
+ " document.getElementById( element ).style.display = \"\";\n"
+ " } else {\n"
+ " document.getElementById( element ).style.display = \"none\";\n"
+ " }\n"
+ "}";
I want reverse this process.
I believe your question warrants another question. Why?
If you reverse this, the quotes would not be escaped and therefore you would get errors. Example:
System.out.println("System.out.println("Test");");
^
Error, everything after this quote is
considered code
Notice the quotes aren't escaped. This code would generate an error where I marked it because the quotes apparently mean the string should end.
Also, if the newlines are reversed, this example:
System.out.println("test");
System.out.println("test2");
Would become:
System.out.println("test");System.out.println("test2");
The following code works fine. Please clarify the problem.
System.out.println(LABEL_JAVASCRIPT);
Related
I have a big text files and I want to remove everything that is between
double curly brackets.
So given the text below:
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
String cleanedText = Pattern.compile("(?<=\\{\\{).*?\\}\\}", Pattern.DOTALL).matcher(text).replaceAll("");
System.out.println(cleanedText);
I want the output to be:
This is what I want.
I have googled around and tried many different things but I couldn't find anything close to my case and as soon as I change it a little bit everything gets worse.
Thanks in advance
You can use this :
public static void main(String[] args) {
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
String cleanedText = text.replaceAll("\\n", "");
while (cleanedText.contains("{{") && cleanedText.contains("}}")) {
cleanedText = cleanedText.replaceAll("\\{\\{[a-zA-Z\\s]*\\}\\}", "");
}
System.out.println(cleanedText);
}
A regular expression cannot express arbitrarily nested structures; i.e. any syntax that requires a recursive grammar to describe.
If you want to solve this using Java Pattern, you need to do it by repeated pattern matching. Here is one solution:
String res = input;
while (true) {
String tmp = res.replaceAll("\\{\\{[^}]*\\}\\}", "");
if (tmp.equals(res)) {
break;
}
res = tmp;
}
This is not very efficient ...
That can be transformed into an equivalent, but more concise form:
String res = input;
String tmp;
while (!(tmp = res.replaceAll("\\{\\{[^}]*\\}\\}", "")).equals(res)) {
res = tmp;
}
... but I prefer the first version because it is (IMO) a lot more readable.
I am not an expert in regular expression, so I just write a loop which does this for you. If you don't have/want to use a regEx, then it could be helpful for you;)
public static void main(String args[]) {
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
int openBrackets = 0;
String output = "";
char[] input = text.toCharArray();
for(int i=0;i<input.length;i++){
if(input[i] == '{'){
openBrackets++;
continue;
}
if(input[i] == '}'){
openBrackets--;
continue;
}
if(openBrackets==0){
output += input[i];
}
}
System.out.println(output);
}
My suggestion is to remove anything between curly brackets, starting at the innermost pair:
String text = "This is {{\n" +
"{{the multiline\n" +
"text}} file }}\n" +
"what I\n" +
"{{ to {{be\n" +
"changed}}\n" +
"}} want.";
Pattern p = Pattern.compile("\\{\\{[^{}]+?}}", Pattern.MULTILINE);
while (p.matcher(text).find()) {
text = p.matcher(text).replaceAll("");
}
resulting in the output
This is
what I
want.
This might fail when having single curly brackets or unpaired pair of brackets, but could be good enough for your case.
I want to initialize a String in Java, but that string needs to include quotes; for example: "ROM". I tried doing:
String value = " "ROM" ";
but that doesn't work. How can I include "s within a string?
In Java, you can escape quotes with \:
String value = " \"ROM\" ";
In reference to your comment after Ian Henry's answer, I'm not quite 100% sure I understand what you are asking.
If it is about getting double quote marks added into a string, you can concatenate the double quotes into your string, for example:
String theFirst = "Java Programming";
String ROM = "\"" + theFirst + "\"";
Or, if you want to do it with one String variable, it would be:
String ROM = "Java Programming";
ROM = "\"" + ROM + "\"";
Of course, this actually replaces the original ROM, since Java Strings are immutable.
If you are wanting to do something like turn the variable name into a String, you can't do that in Java, AFAIK.
Not sure what language you're using (you didn't specify), but you should be able to "escape" the quotation mark character with a backslash: "\"ROM\""
\ = \\
" = \"
new line = \r\n OR \n\r OR \n (depends on OS) bun usualy \n enough.
taabulator = \t
Just escape the quotes:
String value = "\"ROM\"";
In Java, you can use char value with ":
char quotes ='"';
String strVar=quotes+"ROM"+quotes;
Here is full java example:-
public class QuoteInJava {
public static void main (String args[])
{
System.out.println ("If you need to 'quote' in Java");
System.out.println ("you can use single \' or double \" quote");
}
}
Here is Out PUT:-
If you need to 'quote' in Java
you can use single ' or double " quote
Look into this one ... call from anywhere you want.
public String setdoubleQuote(String myText) {
String quoteText = "";
if (!myText.isEmpty()) {
quoteText = "\"" + myText + "\"";
}
return quoteText;
}
apply double quotes to non empty dynamic string. Hope this is helpful.
This tiny java method will help you produce standard CSV text of a specific column.
public static String getStandardizedCsv(String columnText){
//contains line feed ?
boolean containsLineFeed = false;
if(columnText.contains("\n")){
containsLineFeed = true;
}
boolean containsCommas = false;
if(columnText.contains(",")){
containsCommas = true;
}
boolean containsDoubleQuotes = false;
if(columnText.contains("\"")){
containsDoubleQuotes = true;
}
columnText.replaceAll("\"", "\"\"");
if(containsLineFeed || containsCommas || containsDoubleQuotes){
columnText = "\"" + columnText + "\"";
}
return columnText;
}
suppose ROM is string variable which equals "strval"
you can simply do
String value= " \" "+ROM+" \" ";
it will be stored as
value= " "strval" ";
I am making an application where I will be fetching tweets and storing them in a database. I will have a column for the complete text of the tweet and another where only the words of the tweet will remain (I need the words to calculate which words were most used later).
How I currently do it is by using 6 different .replaceAll() functions which some of them might be triggered twice. For example I will have a for loop to remove every "hashtag" using replaceAll().
The problem is that I will be editing as many as thousands of tweets that I fetch every few minutes and I think that the way I am doing it will not be too efficient.
What my requirements are in this order (also written in comments down bellow):
Delete all usernames mentioned
Delete all RT (retweets flags)
Delete all hashtags mentioned
Replace all break lines with spaces
Replace all double spaces with single spaces
Delete all special characters except spaces
Here is a Short and Compilable Example:
public class StringTest {
public static void main(String args[]) {
String text = "RT #AshStewart09: Vote for Lady Gaga for \"Best Fans\""
+ " at iHeart Awards\n"
+ "\n"
+ "RT!!\n"
+ "\n"
+ "My vote for #FanArmy goes to #LittleMonsters #iHeartAwards"
+ " htt…";
String[] hashtags = {"#FanArmy", "#LittleMonsters", "#iHeartAwards"};
System.out.println("Before: " + text + "\n");
// Delete all usernames mentioned (may run multiple times)
text = text.replaceAll("#AshStewart09", "");
System.out.println("First Phase: " + text + "\n");
// Delete all RT (retweets flags)
text = text.replaceAll("RT", "");
System.out.println("Second Phase: " + text + "\n");
// Delete all hashtags mentioned
for (String hashtag : hashtags) {
text = text.replaceAll(hashtag, "");
}
System.out.println("Third Phase: " + text + "\n");
// Replace all break lines with spaces
text = text.replaceAll("\n", " ");
System.out.println("Fourth Phase: " + text + "\n");
// Replace all double spaces with single spaces
text = text.replaceAll(" +", " ");
System.out.println("Fifth Phase: " + text + "\n");
// Delete all special characters except spaces
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();
System.out.println("Finaly: " + text);
}
}
Relying on replaceAll is probably the biggest performance killer as it compiles the regex again and again. The use of regexes for everything is probably the second most significant problem.
Assuming all usernames start with #, I'd replace
// Delete all usernames mentioned (may run multiple times)
text = text.replaceAll("#AshStewart09", "");
by a loop copying everything until it founds a #, then checking if the following chars match any of the listed usernames and possibly skipping them. For this lookup you could use a trie. A simpler method would be a replaceAll-like loop for the regex #\w+ together with a HashMap lookup.
// Delete all RT (retweets flags)
text = text.replaceAll("RT", "");
Here,
private static final Pattern RT_PATTERN = Pattern.compile("RT");
is a sure win. All the following parts could be handled similarly. Instead of
// Delete all special characters except spaces
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();
you could use Guava's CharMatcher. The method removeFrom does exactly what you did, but collapseFrom or trimAndCollapseFrom might be better.
According to the now closed question, it all boils down to
tweet = tweet.replaceAll("#\\w+|#\\w+|\\bRT\\b", "")
.replaceAll("\n", " ")
.replaceAll("[^\\p{L}\\p{N} ]+", " ")
.replaceAll(" +", " ")
.trim();
The second line seems to be redundant as the third one does remove \n too. Changing the first line's replacement to " " doesn't change the outcome an allows to aggregate the replacements.
tweet = tweet.replaceAll("#\\w*|#\\w*|\\bRT\\b|[^##\\p{L}\\p{N} ]+", " ")
.replaceAll(" +", " ")
.trim();
I've changed the usernames and hashtags part to eating also lone # or #, so that it doesn't need to be consumed by the special chars part. This is necessary for corrent processing of strings like !#AshStewart09.
For maximum performance, you surely need a precompiled pattern. I'd also re-suggest to use Guava's CharMatcher for the second part. Guava is huge (2 MB I guess), but you surely find more useful things there. So in the end you can get
private static final Pattern PATTERN =
Pattern.compile("#\\w*|#\\w*|\\bRT\\b|[^##\\p{L}\\p{N} ]+");
private static final CharMatcher CHAR_MATCHER = CharMacher.is(" ");
tweet = PATTERN.matcher(tweet).replaceAll(" ");
tweet = CHAR_MATCHER.trimAndCollapseFrom(tweet, " ");
You can inline all of the things that are being replaced with nothing into one call to replace all and everything that is replaced with a space into one call like so (also using a regex to find the hashtags and usernames as this seems easier):
text = text.replaceAll("#\w+|#\w+|RT", "");
text = text.replaceAll("\n| +", " ");
text = text.replaceAll("[^a-zA-Z0-9 ]+", "").trim();
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Query about the trim() method in Java
I am parsing a site's usernames and other information, and each one has a bunch of spaces after it (but spaces in between the words).
For example: "Bob the Builder " or "Sam the welder ". The numbers of spaces vary from name to name. I figured I'd just use .trim(), since I've used this before.
However, it's giving me trouble. My code looks like this:
for (int i = 0; i < splitSource3.size(); i++) {
splitSource3.set(i, splitSource3.get(i).trim());
}
The result is just the same; no spaces are removed at the end.
Thank you in advance for your excellent answers!
UPDATE:
The full code is a bit more complicated, since there are HTML tags that are parsed out first. It goes exactly like this:
for (String s : splitSource2) {
if (s.length() > "<td class=\"dddefault\">".length() && s.substring(0, "<td class=\"dddefault\">".length()).equals("<td class=\"dddefault\">")) {
splitSource3.add(s.substring("<td class=\"dddefault\">".length()));
}
}
System.out.println("\n");
for (int i = 0; i < splitSource3.size(); i++) {
splitSource3.set(i, splitSource3.get(i).substring(0, splitSource3.get(i).length() - 5));
splitSource3.set(i, splitSource3.get(i).trim());
System.out.println(i + ": " + splitSource3.get(i));
}
}
UPDATE:
Calm down. I never said the fault lay with Java, and I never said it was a bug or broken or anything. I simply said I was having trouble with it and posted my code for you to collaborate on and help solve my issue. Note the phrase "my issue" and not "java's issue". I have actually had the code printing out
System.out.println(i + ": " + splitSource3.get(i) + "*");
in a for each loop afterward.
This is how I knew I had a problem.
By the way, the problem has still not been fixed.
UPDATE:
Sample output (minus single quotes):
'0: Olin D. Kirkland '
'1: Sophomore '
'2: Someplace, Virginia 12345<br />VA SomeCity<br />'
'3: Undergraduate '
EDIT the OP rephrased his question at Query about the trim() method in Java, where the issue was found to be Unicode whitespace characters which are not matched by String.trim().
It just occurred to me that I used to have this sort of issue when I worked on a screen-scraping project. The key is that sometimes the downloaded HTML sources contain non-printable characters which are non-whitespace characters too. These are very difficult to copy-paste to a browser. I assume that this could happened to you.
If my assumption is correct then you've got two choices:
Use a binary reader and figure out what those characters are - and delete them with String.replace(); E.g.:
private static void cutCharacters(String fromHtml) {
String result = fromHtml;
char[] problematicCharacters = {'\000', '\001', '\003'}; //this could be a private static final constant too
for (char ch : problematicCharacters) {
result = result.replace(ch, ""); //I know, it's dirty to modify an input parameter. But it will do as an example
}
return result;
}
If you find some sort of reoccurring pattern in the HTML to be parsed then you can use regexes and substrings to cut the unwanted parts. E.g.:
private String getImportantParts(String fromHtml) {
Pattern p = Pattern.compile("(\\w*\\s*)"); //this could be a private static final constant as well.
Matcher m = p.matcher(fromHtml);
StringBuilder buff = new StringBuilder();
while (m.find()) {
buff.append(m.group(1));
}
return buff.toString().trim();
}
Works without a problem for me.
Here your code a bit refactored and (maybe) better readable:
final String openingTag = "<td class=\"dddefault\">";
final String closingTag = "</td>";
List<String> splitSource2 = new ArrayList<String>();
splitSource2.add(openingTag + "Bob the Builder " + closingTag);
splitSource2.add(openingTag + "Sam the welder " + closingTag);
for (String string : splitSource2) {
System.out.println("|" + string + "|");
}
List<String> splitSource3 = new ArrayList<String>();
for (String s : splitSource2) {
if (s.length() > openingTag.length() && s.startsWith(openingTag)) {
String nameWithoutOpeningTag = s.substring(openingTag.length());
splitSource3.add(nameWithoutOpeningTag);
}
}
System.out.println("\n");
for (int i = 0; i < splitSource3.size(); i++) {
String name = splitSource3.get(i);
int closingTagBegin = splitSource3.get(i).length() - closingTag.length();
String nameWithoutClosingTag = name.substring(0, closingTagBegin);
String nameTrimmed = nameWithoutClosingTag.trim();
splitSource3.set(i, nameTrimmed);
System.out.println("|" + splitSource3.get(i) + "|");
}
I know that's not a real answer, but i cannot post comments and this code as a comment wouldn't fit, so I made it an answer, so that Olin Kirkland can check his code.
I am trying to strip and replace a text string that looks as follows in the most elegant way possible:
element {"item"} {text {
} {$i/child::itemno}
To look like:
<item> {$i/child::itemno}
Hence removing the element text substituting its braces and removing text and its accompanying braces.
I believe the appropriate regex to do this is:
/element\s*\{"([^"]+)"\}\s*{text\s*{\s*}\s*({[^}]*})/
but I am unsure as to the number of backslashes to use in java and also how to complete the final substitution which makes use of my group(1) and replaces it with < at its start and > at its end:
So far I have this (although perhaps I might be better off with a full rewrite ?)
Pattern p = Pattern.compile("/element\\s*\\{\"([^\"]+)\"\\}\\s*{text\\s*{\\s*}\\s*({[^}]*})/ ");
// Split input with the pattern
Matcher m = p.matcher("element {\"item\"} {text {\n" +
" } {$i/child::itemno} text { \n" +
" } {$i/child::description} text {\n" +
" } element {\"high_bid\"} {{max($b/child::bid)}} text {\n" +
" }} ");
// Next for each instance of group 1, replace it with < > at the start
I think I've stumbled across a problem. What I am trying to do is somewhat harder than I previously stated. With the solution I have below:
element {"item"} {text { } {$i/child::itemno} text { } {$i/child::description} text { } element {"high_bid"} {{max($b/child::bid)}} text { }}
GIVES:
<item> {$i/child::itemno} text { } {$i/child::description} text { } element {"high_bid"} {{max($b/child::bid)}} text { }}
When I expected:
<item>{$i/child::itemno}{$i/child::description}<high_bid>{fn:max($b/child::bid)}</high_bid></item>
Java regex-es are written without delimiters. So lose the forward slashes;
every single backslash needs one extra, so \s becomes \\s;
all { need to be escaped: \\{, and } need no escape (although it doesn't hurt if you do escape them).
Try:
String text = "element {\"item\"} {text { } {$i/child::itemno}";
System.out.println(text.replaceAll("element\\s*\\{\"([^\"]+)\"}\\s*\\{text\\s*\\{\\s*}\\s*(\\{[^}]*})", "<$1> $2"));
Output:
<item> {$i/child::itemno}