Remove comments between quotes in String - java

I need to remove comments from a String. Comments are specified using quotes. For example:
removeComments("1+4 'sum'").equals("1+4 ");
removeComments("this 'is not a comment").equals("this 'is not a comment");
removeComments("1+4 'sum' 'unclosed comment").equals("1+4 'unclosed comment");
I could iterate through the characters of the String, keeping track of the indexes of the quotes, but I would like to know if there is a simpler solution (maybe a regex?)

You can use replaceAll:
str = str.replaceAll("\\'.*?\\'", "");
This will replace the first and the second ' and everything between them with "" (Thus, will remove them).
Edit: As stated on the comments, there is no need to backslash the single quote.

If you don't need to be able to have quotes inside the comment, this will do it:
input.replaceAll("'[^']+'", "");
It matches a quote, at least one of anything that isn't a quote, then a quote.
Working example

Related

remove bracket, double quote, and possibly add space using string.replaceall in java

I saw multiple thread answering about how to replace/remove single quote ('), double quote ("), bracket ([,{), commas, etc. While I was able to successfully remove them, but I would like to understand more. For example, string.replaceAll("\p{P}",""); can remove the punctuations. I am confused about this syntax; how does "\p{P}","" is equal to punctuations?
I have a string that I would like to remove bracket, double quote, and possibly add space. As shown below, I would like to use replaceAll to change my string from category to updatedCategory.
String category = "["restaurant","bar","burger joint"]";
String updatedCategory = "restaurant, bar, burger joint";
You need to learn about regex, for your problem you can use replaceAll with this regex ["\[ \]] like this:
category.replaceAll("[\"\\[ \\]]", "")
The output will be:
restaurant,bar,burgerjoint
So to get the same updatedCategory just use:
category.replaceAll("[\"\\[ \\]]", "")
.replace(",", ", ")

Regex replace a character or only replace one if repeating

I'm trying to write regex that will remove Backslash () character
Replace "\" with "" , but using replace it will replace all the Backslash
However I do not want to replace all the Backslash ()
For example,
\" TO "
\\\" TO \"
\\n TO \n
Here's sample data
{\"data\":\"text\\\"textInsideQuote\\\"\"}
What I expect
{"data":"text\"textInsideQuote\"\"}
The one that doesn't have any repeat should be replaced first, and then the one with repeat should be reduced to one.
Any idea on how I should achieve this?
Thanks
The one that doesn't have any repeat should be replaced first, and then the one with repeat should be reduced to one.
I's not necessary to use two passes. It can be done with a single regex like so:
input.replaceAll("(\\\\)*\\\\", "$1")
Any solitary backslash will be replaced by empty string
Groups of repeating backslashes will be reduced to one single backslash
I hope I am interpreting your words correctly.
Actually the problem is with my code where I double escape the json data.
For those who're interested in similar problem Patrick Parker's answer should work.
Thanks

How to split a string in java using (,) with certain conditions [duplicate]

I would like to find a regex that will pick out all commas that fall outside quote sets.
For example:
'foo' => 'bar',
'foofoo' => 'bar,bar'
This would pick out the single comma on line 1, after 'bar',
I don't really care about single vs double quotes.
Has anyone got any thoughts? I feel like this should be possible with readaheads, but my regex fu is too weak.
This will match any string up to and including the first non-quoted ",". Is that what you are wanting?
/^([^"]|"[^"]*")*?(,)/
If you want all of them (and as a counter-example to the guy who said it wasn't possible) you could write:
/(,)(?=(?:[^"]|"[^"]*")*$)/
which will match all of them. Thus
'test, a "comma,", bob, ",sam,",here'.gsub(/(,)(?=(?:[^"]|"[^"]*")*$)/,';')
replaces all the commas not inside quotes with semicolons, and produces:
'test; a "comma,"; bob; ",sam,";here'
If you need it to work across line breaks just add the m (multiline) flag.
The below regexes would match all the comma's which are present outside the double quotes,
,(?=(?:[^"]*"[^"]*")*[^"]*$)
DEMO
OR(PCRE only)
"[^"]*"(*SKIP)(*F)|,
"[^"]*" matches all the double quoted block. That is, in this buz,"bar,foo" input, this regex would match "bar,foo" only. Now the following (*SKIP)(*F) makes the match to fail. Then it moves on to the pattern which was next to | symbol and tries to match characters from the remaining string. That is, in our output , next to pattern | will match only the comma which was just after to buz . Note that this won't match the comma which was present inside double quotes, because we already make the double quoted part to skip.
DEMO
The below regex would match all the comma's which are present inside the double quotes,
,(?!(?:[^"]*"[^"]*")*[^"]*$)
DEMO
While it's possible to hack it with a regex (and I enjoy abusing regexes as much as the next guy), you'll get in trouble sooner or later trying to handle substrings without a more advanced parser. Possible ways to get in trouble include mixed quotes, and escaped quotes.
This function will split a string on commas, but not those commas that are within a single- or double-quoted string. It can be easily extended with additional characters to use as quotes (though character pairs like « » would need a few more lines of code) and will even tell you if you forgot to close a quote in your data:
function splitNotStrings(str){
var parse=[], inString=false, escape=0, end=0
for(var i=0, c; c=str[i]; i++){ // looping over the characters in str
if(c==='\\'){ escape^=1; continue} // 1 when odd number of consecutive \
if(c===','){
if(!inString){
parse.push(str.slice(end, i))
end=i+1
}
}
else if(splitNotStrings.quotes.indexOf(c)>-1 && !escape){
if(c===inString) inString=false
else if(!inString) inString=c
}
escape=0
}
// now we finished parsing, strings should be closed
if(inString) throw SyntaxError('expected matching '+inString)
if(end<i) parse.push(str.slice(end, i))
return parse
}
splitNotStrings.quotes="'\"" // add other (symmetrical) quotes here
Try this regular expression:
(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*=>\s*(?:"(?:[^\\"]+|\\(?:\\\\)*[\\"])*"|'(?:[^\\']+|\\(?:\\\\)*[\\'])*')\s*,
This does also allow strings like “'foo\'bar' => 'bar\\',”.
MarkusQ's answer worked great for me for about a year, until it didn't. I just got a stack overflow error on a line with about 120 commas and 3682 characters total. In Java, like this:
String[] cells = line.split("[\t,](?=(?:[^\"]|\"[^\"]*\")*$)", -1);
Here's my extremely inelegant replacement that doesn't stack overflow:
private String[] extractCellsFromLine(String line) {
List<String> cellList = new ArrayList<String>();
while (true) {
String[] firstCellAndRest;
if (line.startsWith("\"")) {
firstCellAndRest = line.split("([\t,])(?=(?:[^\"]|\"[^\"]*\")*$)", 2);
}
else {
firstCellAndRest = line.split("[\t,]", 2);
}
cellList.add(firstCellAndRest[0]);
if (firstCellAndRest.length == 1) {
break;
}
line = firstCellAndRest[1];
}
return cellList.toArray(new String[cellList.size()]);
}
#SocialCensus, The example you gave in the comment to MarkusQ, where you throw in ' alongside the ", doesn't work with the example MarkusQ gave right above that if we change sam to sam's: (test, a "comma,", bob, ",sam's,",here) has no match against (,)(?=(?:[^"']|["|'][^"']")$). In fact, the problem itself, "I don't really care about single vs double quotes", is ambiguous. You have to be clear what you mean by quoting either with " or with '. For example, is nesting allowed or not? If so, to how many levels? If only 1 nested level, what happens to a comma outside the inner nested quotation but inside the outer nesting quotation? You should also consider that single quotes happen by themselves as apostrophes (ie, like the counter-example I gave earlier with sam's). Finally, the regex you made doesn't really treat single quotes on par with double quotes since it assumes the last type of quotation mark is necessarily a double quote -- and replacing that last double quote with ['|"] also has a problem if the text doesn't come with correct quoting (or if apostrophes are used), though, I suppose we probably could assume all quotes are correctly delineated.
MarkusQ's regexp answers the question: find all commas that have an even number of double quotes after it (ie, are outside double quotes) and disregard all commas that have an odd number of double quotes after it (ie, are inside double quotes). This is generally the same solution as what you probably want, but let's look at a few anomalies. First, if someone leaves off a quotation mark at the end, then this regexp finds all the wrong commas rather than finding the desired ones or failing to match any. Of course, if a double quote is missing, all bets are off since it might not be clear if the missing one belongs at the end or instead belongs at the beginning; however, there is a case that is legitimate and where the regex could conceivably fail (this is the second "anomaly"). If you adjust the regexp to go across text lines, then you should be aware that quoting multiple consecutive paragraphs requires that you place a single double quote at the beginning of each paragraph and leave out the quote at the end of each paragraph except for at the end of the very last paragraph. This means that over the space of those paragraphs, the regex will fail in some places and succeed in others.
Examples and brief discussions of paragraph quoting and of nested quoting can be found here http://en.wikipedia.org/wiki/Quotation_mark .

nextLink1.replace("""), Making " act differently

nextLink1.replace(""",()), so basically I want to replace " with a blank. Any help would be greatly appreciated.
Thanks
You need to escape the " sign. Like this:
nextLink1.replace("\"","");
The compiler will recognize the first two quote marks, but the third one will produce a syntax error.
Using an escape sequence will place a double quote as such:
nextLink1.replace("\"","");
You can find more escape sequences here http://docs.oracle.com/javase/tutorial/java/data/characters.html
" is Java's metacharacter used to start or end Strings literals. If you want to use it inside String literal you need to escape it first with \ like \" (which is another Java's metacharacter used for example to create new lines mark "\n").
Also blank String is not () but "". So try this way
nextLink1.replace("\"","");
BTW Strings are immutable which means this method will not affect original String, but create new one with replaced character. If you want nextLink1 to contain String with replaced characters you will need to use
nextLink1 = nextLink1.replace("\"","");

String replace a Backslash

How can I do a string replace of a back slash.
Input Source String:
sSource = "http://www.example.com\/value";
In the above String I want to replace "\/" with a "/";
Expected ouput after replace:
sSource = "http://www.example.com/value";
I get the Source String from a third party, therefore I have control over the format of the String.
This is what I have tried
Trial 1:
sSource.replaceAll("\\", "/");
Exception
Unexpected internal error near index 1
\
Trial 2:
sSource.replaceAll("\\/", "/");
No Exception, but does not do the required replace. Does not do anything.
Trial 3:
sVideoURL.replace("\\", "/");
No Exception, but does not do the required replace. Does not do anything.
sSource = sSource.replace("\\/", "/");
String is immutable - each method you invoke on it does not change its state. It returns a new instance holding the new state instead. So you have to assign the new value to a variable (it can be the same variable)
replaceAll(..) uses regex. You don't need that.
Try replaceAll("\\\\", "") or replaceAll("\\\\/", "/").
The problem here is that a backslash is (1) an escape chararacter in Java string literals, and (2) an escape character in regular expressions – each of this uses need doubling the character, in effect needing 4 \ in row.
Of course, as Bozho said, you need to do something with the result (assign it to some variable) and not throw it away. And in this case the non-regex variant is better.
Try
sSource = sSource.replaceAll("\\\\", "");
Edit : Ok even in stackoverflow there is backslash escape... You need to have four backslashes in your replaceAll first String argument...
The reason of this is because backslash is considered as an escape character for special characters (like \n for instance).
Moreover replaceAll first arg is a regular expression that also use backslash as escape sequence.
So for the regular expression you need to pass 2 backslash. To pass those two backslashes by a java String to the replaceAll, you also need to escape both backslashes.
That drives you to have four backslashes for your expression! That's the beauty of regex in java ;)
s.replaceAll ("\\\\", "");
You need to mask a backslash in your source, and for regex, you need to mask it again, so for every backslash you need two, which ends in 4.
But
s = "http://www.example.com\\/value";
needs two backslashes in source as well.
This will replace backslashes with forward slashes in the string:
source = source.replace('\\','/');
you have to do
sSource.replaceAll("\\\\/", "/");
because the backshlash should be escaped twice one for string in source one in regular expression
To Replace backslash at particular location:
if ((stringValue.contains("\\"))&&(stringValue.indexOf("\\", location-1)==(location-1))) {
stringValue=stringValue.substring(0,location-1);
}
sSource = StringUtils.replace(sSource, "\\/", "/")

Categories

Resources