private static String filterString(String code) {
String partialFiltered = code.replaceAll("/\\*.*\\*/", "");
String fullFiltered = partialFiltered.replaceAll("//.*(?=\\n)", "");
return fullFiltered;
}
I tried above code to remove all comments in a string but it isn't working - please help.
Works with both // single and multi-line /* comments */.
String sourceCode =
"/*\n"
+ " * Multi-line comment\n"
+ " * Creates a new Object.\n"
+ " */\n"
+ "public Object someFunction() {\n"
+ " // single line comment\n"
+ " Object obj = new Object();\n"
+ " return obj; /* single-line comment */\n"
+ "}";
System.out.println(sourceCode.replaceAll(
"//.*|/\\*((.|\\n)(?!=*/))+\\*/", ""));
Input :
/*
* Multi-line comment
* Creates a new Object.
*/
public Object someFunction() {
// single line comment
Object obj = new Object();
return obj; /* single-line comment */
}
Output :
public Object someFunction() {
Object obj = new Object();
return obj;
}
How about....
private static String filterString(String code) {
return code.Replace("//", "").Replace("/*", "").Replace("*/", "");
}
Replace below code
partialFiltered.replaceAll("//.*(?=\\n)", "");
With,
partialFiltered.replaceAll("//.*?\n","\n");
You need to use (?s) at the start of your partialFiltered regex to allow for comments spanning multiple lines (e.g. see Pattern.DOTALL with String.replaceAll).
But then the .* in the middle of /\\*.*\\*/ uses a greedy match so I'd expect it to replace the whole lot between two separate comment blocks. E.g., given the following:
/* Comment #1 */
for (i = 0; i < 10; i++)
{
i++
}
/* Comment #2 */
Haven't tested this so am risking egg on my face but would expect it to remove the whole lot including the code in the middle rather than just the two comments. One way to prevent would be to use .*? to make the inner matching non-greedy, i.e. to match as little as possible:
String partialFiltered = code.replaceAll("(?s)/\\*.*?\\*/", "");
Since the fullFiltered regex doesn't begin with (?s), it should work without the (?=\\n) (since the replaceAll regex doesn't span multiple lines by default) - so you should be able to change it to:
String fullFiltered = partialFiltered.replaceAll("//.*", "");
There are also possible issues with looking for the characters denoting a comment, e.g. if they appear within a string or regular expression pattern but I'm assuming these aren't important for your application - if they are it's probably the end of the road for using simple regular expressions and you may need a parser instead...
Maybe this can help someone:
return code.replaceAll(
"((['\"])(?:(?!\\2|\\\\).|\\\\.)*\\2)|\\/\\/[^\\n]*|\\/\\*(?:[^*]|\\*(?!\\/))*\\*\\/", "$1");
Use this regexp to test ((['"])(?:(?!\2|\\).|\\.)*\2)|\/\/[^\n]*|\/\*(?:[^*]|\*(?!\/))*\*\/ here
Related
I need to remove all types of comments from my string without affecting the URL defined in that string. When i tried removing comments from string using regular expression some part of the URL also removed from the string.
I tried the following regex but the same issue happening.
String sourceCode= "/*\n"
+ " * Multi-line comment\n"
+ " * Creates a new Object.\n"
+ " */\n"
+ "public Object someFunction() {\n"
+ " // single line comment\n"
+ " Object obj = new Object();\n"
+ " return obj; /* single-line comment */\n"
+ "}"
+ "\n"
+ "https://stackoverflow.com/questions/18040431/remove-comments-in-a-string";
sourceCode=sourceCode.replaceAll("//.*|/\\*((.|\\n)(?!=*/))+\\*/", "");
System.out.println(sourceCode);
but anyway the comments are removed but the out put is showing like this
public Object someFunction() {
Object obj = new Object();
return obj;
}
https:
please help me to find out a solution for this.
[^:]//.*|/\\*((.|\\n)(?!=*/))+\\*/
Changes are in first few characters - [^:]. This means that symbol before // must be not :.
I usually use regex101.com to work with regular expressions. Select python language for your case (since languages use a little bit different escaping).
This is quite complex regexp to be read by human, so another solultion may be in using several simple expressions and process incoming text in multiple passes. Like
Remove one-line comments
Remove multiline comments
Process some special cases
Note: Processing regexp costs pretty much time. So if performance is required, you should check for another solution - your own processor or third-party libraries.
EDITED
As suggested #Wiktor expression [^:]//.*|/\\*((?!=*/)(?s:.))+\\*/ is faster solution. At least 2-3 times faster.
You can split your String by "\n" and check each line. Here is the tested code:
String sourceCode= "/*\n"
+ " * Multi-line comment\n"
+ " * Creates a new Object.\n"
+ " */\n"
+ "public Object someFunction() {\n"
+ " // single line comment\n"
+ " Object obj = new Object();\n"
+ " return obj; /* single-line comment */\n"
+ "}"
+ "\n"
+ "https://stackoverflow.com/questions/18040431/remove-comments-in-a-string";
String [] parts = sourceCode.split("\n");
System.out.println(getUrlFromText(parts));
Here is the fetching method:
private static String getUrlFromText(String []parts) {
for (String part : parts) {
if(part.startsWith("http")) {
return part;
}
}
return null;
}
For more specific this EXP should be use
.*[^:]//.*|/\\*((.|\\n)(?!=*/))*\\*/
Your provided pattern was not able to remove /**/ portion of code if it is there.(If it is special requirement then its fine)
So Your EXP is like :
And it should be:
For more understanding visit and use your EXP .*[^:]\/\/.*|\/\*((.|\n)(?!=*\/))*\*\/ it will show you graph for that.
I have an ArrayMap, of which the keys are something like tag - randomWord. I want to check if the tag part of the key matches a certain variable.
I have tried messing around with Patterns, but to no success. The only way I can get this working at this moment, is iterating through all the keys in a for loop, then splitting the key on ' - ', and getting the first value from that, to compare to my variable.
for (String s : testArray) {
if ((s.split("(\\s)(-)(\\s)(.*)")[0]).equals(variableA)) {
// Do stuff
}
}
This seems very devious to me, especially since I only need to know if the keySet contains the variable, that's all I'm interested in. I was thinking about using the contains() method, and put in (variableA + "(\\s)(-)(\\s)(.*)"), but that doesn't seem to work.
Is there a way to use the .contains() method for this case, or do I have to loop the keys manually?
You should split these tasks into two steps - first extract the tag, then compare it. Your code should look something like this:
for (String s : testArray) {
if (arrayMap. keySet().contains(extractTag(s)) {
// Do stuff
}
}
Notice that we've separated our concerns into two steps, making it easier to verify each step behaves correctly individually. So now the question is "How do we implement extractTag()?"
The ( ) symbols in a regular expression create a group match, which you can retrieve via Matcher.group() - if you only care about tag you could use a Pattern like so:
"(\\S+)\\s-\\s.*"
In which case your extractTag() method would look like:
private static final Pattern TAG_PATTERN = Pattern.compile("(\\S+)\\s-\\s.*");
private static String extractTag(String s) {
Matcher m = TAG_PATTERN.matcher(s);
if (m.matches()) {
return m.group(1);
}
throw new IllegalArgumentException(
"'" + s + "' didn't match " TAG_PATTERN.pattern());
}
If you'd rather use String.split() you just need to define a regular expression that matches the delimiter, in this case -; you could use the following regular expression in a split() call:
"\\s-\\s"
It's often a good idea to use + after \\s to support one or more spaces, but it depends on what inputs you need to process. If you know it's always exactly one-space-followed-by-one-dash-followed-by-one-space, you could just split on:
" - "
In which case your extractTag() method would look like:
private static String extractTag(String s) {
String[] parts = s.split(" - ");
if (parts.length > 1) {
return s[0];
}
throw new IllegalArgumentException("Could not extract tag from '" + s + "'");
}
A little fun with Java this time. I want to write a program that reads a code from standard input (line by line, for example), like:
// some comment
class Main {
/* blah */
// /* foo
foo();
// foo */
foo2();
/* // foo2 */
}
finds all comments in it and removes them. I'm trying to use regular expressions, and for now I've done something like this:
private static String ParseCode(String pCode)
{
String MyCommentsRegex = "(?://.*)|(/\\*(?:.|[\\n\\r])*?\\*/)";
return pCode.replaceAll(MyCommentsRegex, " ");
}
but it seems not to work for all the cases, e.g.:
System.out.print("We can use /* comments */ inside a string of course, but it shouldn't start a comment");
Any advice or ideas different from regex?
Thanks in advance.
You may have already given up on this by now but I was intrigued by the problem.
I believe this is a partial solution...
Native regex:
//.*|("(?:\\[^"]|\\"|.)*?")|(?s)/\*.*?\*/
In Java:
String clean = original.replaceAll( "//.*|(\"(?:\\\\[^\"]|\\\\\"|.)*?\")|(?s)/\\*.*?\\*/", "$1 " );
This appears to properly handle comments embedded in strings as well as properly escaped quotes inside strings. I threw a few things at it to check but not exhaustively.
There is one compromise in that all "" blocks in the code will end up with space after them. Keeping this simple and solving that problem would be very difficult given the need to cleanly handle:
int/* some comment */foo = 5;
A simple Matcher.find/appendReplacement loop could conditionally check for group(1) before replacing with a space and would only be a handful of lines of code. Still simpler than a full up parser maybe. (I could add the matcher loop too if anyone is interested.)
The last example is no problem I think:
/* we comment out some code
System.out.print("We can use */ inside a string of course");
we end the comment */
... because the comment actually ends with "We can use */. This code does not compile.
But I have another problematic case:
int/*comment*/foo=3;
Your pattern will transform this into:
intfoo=3;
...what is invalid code. So better replace your comments with " " instead of "".
I think a 100% correct solution using regular expressions is either inhuman or impossible (taking into account escapes, etc.).
I believe the best option would be using ANTLR- I believe they even provide a Java grammar you can use.
I ended up with this solution.
public class CommentsFun {
static List<Match> commentMatches = new ArrayList<Match>();
public static void main(String[] args) {
Pattern commentsPattern = Pattern.compile("(//.*?$)|(/\\*.*?\\*/)", Pattern.MULTILINE | Pattern.DOTALL);
Pattern stringsPattern = Pattern.compile("(\".*?(?<!\\\\)\")");
String text = getTextFromFile("src/my/test/CommentsFun.java");
Matcher commentsMatcher = commentsPattern.matcher(text);
while (commentsMatcher.find()) {
Match match = new Match();
match.start = commentsMatcher.start();
match.text = commentsMatcher.group();
commentMatches.add(match);
}
List<Match> commentsToRemove = new ArrayList<Match>();
Matcher stringsMatcher = stringsPattern.matcher(text);
while (stringsMatcher.find()) {
for (Match comment : commentMatches) {
if (comment.start > stringsMatcher.start() && comment.start < stringsMatcher.end())
commentsToRemove.add(comment);
}
}
for (Match comment : commentsToRemove)
commentMatches.remove(comment);
for (Match comment : commentMatches)
text = text.replace(comment.text, " ");
System.out.println(text);
}
//Single-line
// "String? Nope"
/*
* "This is not String either"
*/
//Complex */
///*More complex*/
/*Single line, but */
String moreFun = " /* comment? doubt that */";
String evenMoreFun = " // comment? doubt that ";
static class Match {
int start;
String text;
}
}
Another alternative is to use some library supporting AST parsing, for e.g. org.eclipse.jdt.core has all the APIs you need to do this and more. But then that's just one alternative:)
I try to replace ". " with "\n\n" within a string but it doesnt work, I use the following code:
text=text.replace(". ","\n\n");
The result is every word without the last letter of the word in a each line. I read something like the point means any character in this case, but how can I actually refer to the point?
Input Example: "Hello world"
Example of the output:
Hell
world
Thank you
There is something fishy here; either text is not a String, or you don't use .replace() but something else (.replaceAll()?), or Android's .replace() is buggy.
And I frankly doubt that Android devs would have had such an overlook.
The Javadoc for String#replace() says:
Replaces each substring of this string that matches the literal target sequence with the specified literal replacement sequence. [emphasis mine]
Unlike its sibling methods (.replaceFirst() and .replaceAll()) which do use regexes, .replace() doesn't (and the fact that internally it does use Pattern, at least in Oracle's JDK [*], is not the problem).
Therefore, if you actually use .replace() and gain the result you say, this is a bug in Android. If this is the case, try an alternative, like so (UNTESTED):
public static String realStringReplace(final String victim, final String target,
final String replacement)
{
final int skip = target.length();
final StringBuilder sb = new StringBuilder(victim.length());
String tmp = victim;
int index;
while (!tmp.isEmpty()) {
index = tmp.indexOf(target);
if (index == -1)
break;
sb.append(tmp.subString(0, index)).append(replacement);
tmp = tmp.subString(index + skip);
}
return sb.append(tmp).toString();
}
the point means any character if you use
text=text.replaceAll(". ", "\n\n");
perhaps you have posted the wrong code, in this case try this one:
text=text.replaceAll("\\. ", "\n\n");
the strange thing is that this line is equivalent to the line that you have posted..
A little fun with Java this time. I want to write a program that reads a code from standard input (line by line, for example), like:
// some comment
class Main {
/* blah */
// /* foo
foo();
// foo */
foo2();
/* // foo2 */
}
finds all comments in it and removes them. I'm trying to use regular expressions, and for now I've done something like this:
private static String ParseCode(String pCode)
{
String MyCommentsRegex = "(?://.*)|(/\\*(?:.|[\\n\\r])*?\\*/)";
return pCode.replaceAll(MyCommentsRegex, " ");
}
but it seems not to work for all the cases, e.g.:
System.out.print("We can use /* comments */ inside a string of course, but it shouldn't start a comment");
Any advice or ideas different from regex?
Thanks in advance.
You may have already given up on this by now but I was intrigued by the problem.
I believe this is a partial solution...
Native regex:
//.*|("(?:\\[^"]|\\"|.)*?")|(?s)/\*.*?\*/
In Java:
String clean = original.replaceAll( "//.*|(\"(?:\\\\[^\"]|\\\\\"|.)*?\")|(?s)/\\*.*?\\*/", "$1 " );
This appears to properly handle comments embedded in strings as well as properly escaped quotes inside strings. I threw a few things at it to check but not exhaustively.
There is one compromise in that all "" blocks in the code will end up with space after them. Keeping this simple and solving that problem would be very difficult given the need to cleanly handle:
int/* some comment */foo = 5;
A simple Matcher.find/appendReplacement loop could conditionally check for group(1) before replacing with a space and would only be a handful of lines of code. Still simpler than a full up parser maybe. (I could add the matcher loop too if anyone is interested.)
The last example is no problem I think:
/* we comment out some code
System.out.print("We can use */ inside a string of course");
we end the comment */
... because the comment actually ends with "We can use */. This code does not compile.
But I have another problematic case:
int/*comment*/foo=3;
Your pattern will transform this into:
intfoo=3;
...what is invalid code. So better replace your comments with " " instead of "".
I think a 100% correct solution using regular expressions is either inhuman or impossible (taking into account escapes, etc.).
I believe the best option would be using ANTLR- I believe they even provide a Java grammar you can use.
I ended up with this solution.
public class CommentsFun {
static List<Match> commentMatches = new ArrayList<Match>();
public static void main(String[] args) {
Pattern commentsPattern = Pattern.compile("(//.*?$)|(/\\*.*?\\*/)", Pattern.MULTILINE | Pattern.DOTALL);
Pattern stringsPattern = Pattern.compile("(\".*?(?<!\\\\)\")");
String text = getTextFromFile("src/my/test/CommentsFun.java");
Matcher commentsMatcher = commentsPattern.matcher(text);
while (commentsMatcher.find()) {
Match match = new Match();
match.start = commentsMatcher.start();
match.text = commentsMatcher.group();
commentMatches.add(match);
}
List<Match> commentsToRemove = new ArrayList<Match>();
Matcher stringsMatcher = stringsPattern.matcher(text);
while (stringsMatcher.find()) {
for (Match comment : commentMatches) {
if (comment.start > stringsMatcher.start() && comment.start < stringsMatcher.end())
commentsToRemove.add(comment);
}
}
for (Match comment : commentsToRemove)
commentMatches.remove(comment);
for (Match comment : commentMatches)
text = text.replace(comment.text, " ");
System.out.println(text);
}
//Single-line
// "String? Nope"
/*
* "This is not String either"
*/
//Complex */
///*More complex*/
/*Single line, but */
String moreFun = " /* comment? doubt that */";
String evenMoreFun = " // comment? doubt that ";
static class Match {
int start;
String text;
}
}
Another alternative is to use some library supporting AST parsing, for e.g. org.eclipse.jdt.core has all the APIs you need to do this and more. But then that's just one alternative:)