how to suppress single and multi-line comments using a regexp?

how to suppress single and multi-line comments using a regexp? - java

I need do find all multiline comments in a string and replace them with a space (if the comment is in one line) or with a \n (if the comment is on more than one line).
for example:
int/* one line comment */a;
should be changed to:
int a;
and this:
int/*
more
than one
line comment*/a;
should be changed to:
int
a;
I have one String with all the text and I used this command:
file = file.replaceAll("(/\\*([^*]|(\\*+[^*/]))*\\*+/)"," ");
where file is the string.
The problem is it finds all multiline comment and I want to separate it to 2 cases.
How can I do it?

This can be solved using Matcher.appendReplacement and Matcher.appendTail.
String file = "hello /* line 1 \n line 2 \n line 3 */"
+ "there /* line 4 */ world";
StringBuffer sb = new StringBuffer();
Matcher m = Pattern.compile("(?m)/\\*([^*]|(\\*+[^*/]))*\\*+/").matcher(file);
while (m.find()) {
// Find a comment
String toReplace = m.group();
// Figure out what to replace it with
String replacement = toReplace.contains("\n") ? "\n" : "";
// Perform the replacement.
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
System.out.println(sb);
Output:
hello
there world
Note: If you want to preserve correct line number / columns for all text that is not inside comments (good if you want to refer back to the source code in error messages etc) I would recommend doing
String replacement = toReplace.replaceAll("\\S", " ");
which replaces all non-whitespace with white space. This way \n is preserved, and
"/* abc */"
is replaced by
" "

Related

How to splitting records based white spaces when different lines have spaces at different positions

I have a file with records as below and I am trying to split the records in it based on white spaces and convert them into comma.
file:
a 3w 12 98 header P6124
e 4t 2 100 header I803
c 12L 11 437 M12
BufferedReader reader = new BufferedReader(new FileReader("/myfile.txt"));
String line = reader.readLine();
while (line != null) {
System.out.println(line);
line = reader.readLine();
String[] splitLine = line.split("\\s+")
If the data is separated by multiple white spaces, I usually go for regex replace -> split('\\s+') or split(" +").
But in the above case, I have a record c which doesn't have the data header. Hence the regex "\s+" or " +" will just skip that record and I will get an empty space as c,12L,11,437,M12 instead of c,12L,11,437,,M12
How do I properly split the lines based on any delimiter in this case so that I get data in the below format:
a,3w,12,98,header,P6124
e,4t,2,100,header,I803
c,12L,11,437,,M12
Could anyone let me know how I can achieve this ?

May be you can try using a more complicated approach, using a complex regex in order to match exatcly six fields for each line and handling explicitly the case of a missing value for the fifth one.
I rewrote your example adding some console log in order to clarify my suggestion:
public class RegexTest {
private static final String Input = "a 3w 12 98 header P6124\n" +
"e 4t 2 100 header I803\n" +
"c 12L 11 437 M12";
public static void main(String[] args) throws Exception {
BufferedReader reader = new BufferedReader(new StringReader(Input));
String line = null;
Pattern pattern = Pattern.compile("^([^ ]+) +([^ ]+) +([^ ]+) +([^ ]+) +([^ ]+)? +([^ ]+)$");
do {
line = reader.readLine();
System.out.println(line);
if(line != null) {
String[] splitLine = line.split("\\s+");
System.out.println(splitLine.length);
System.out.println("Line: " + line);
Matcher matcher = pattern.matcher(line);
System.out.println("matches: " + matcher.matches());
System.out.println("groups: " + matcher.groupCount());
for(int i = 1; i <= matcher.groupCount(); i++) {
System.out.printf(" Group %d has value '%s'\n", i, matcher.group(i));
}
}
} while (line != null);
}
}
The key is that the pattern used to match each line requires a sequence of six fields:
for each field, the value is described as [^ ]+
separators between fields are described as +
the value of the fifth (nullable) field is described as [^ ]+?
each value is captured as a group using parentheses: ( ... )
start (^) and end ($) of each line are marked explicitly
Then, each line is matched against the given pattern, obtaining six groups: you can access each group using matcher.group(index), where index is 1-based because group(0) returns the full match.
This is a more complex approach but I think it can help you to solve your problem.

Put a limit on the number of whitespace chars that may be used to split the input.
In the case of your example data, a maximum of 5 works:
String[] splitLine = line.split("\\s{1,5}");
See live demo (of this code working as desired).

Are you just trying to switch your delimiters from spaces to commas?
In that case:
cat myFile.txt | sed 's/ */ /g' | sed 's/ /,/g'
*edit: added a stage to strip out lists of more than two spaces, replacing them with just the two spaces needed to retain the double comma.

Separate string by whitespace, but keep newlines in split array

I'm trying to split a string in Java, but keep the newline characters as elements in the array.
For example, with input: "Hello \n\n\nworld!"
I want the output to be: ["Hello", "\n", "\n", "\n", "world", "!"]
The regex I have in place right now is this:
String[] parsed = input.split(" +|(?=\\p{Punct})|(?<=\\p{Punct})");
This gets me the punctuation separation I want, but its output looks like this:["Hello", "\n\n\nworld", "!"]
Is there a way to unclump the newlines in Java?

You could first replace all \n with \n (newline and a space) and then do a simple split on the space character.
String input = "Hello \n\n\nworld!";
String replacement = input.replace("\n", "\n ");
String[] result = replacement.split(" ");
input: "Hello \n\n\nworld!"
replacement: "Hello \n \n \n world!"
result: ["Hello", "\n", "\n", "\n", "world!"]
Note: my example does not handle the final exclamation mark - but it seems you already know how to handle that.

The trick is to add whitespace after each "\n" and then apply your regex.
String line = "Hello \n\n\nworld!";
line = line.replaceAll("\n", "\n "); // here we replace all "\n" to "\n "
String[] items = line.split(" +|(?=\\p{Punct})|(?<=\\p{Punct})");
or shorter version:
String line = "Hello \n\n\nworld!";
String[] items = line.replaceAll("\n", "\n ").split(" +|(?=\\p{Punct})|(?<=\\p{Punct})");
So, in this context the result is: ["Hello", "\n", "\n", "\n", "world", "!"]

Using the find method makes things easier:
String str = "Hello \n\n\nworld!";
List<String> myList = new ArrayList<String>();
Pattern pat = Pattern.compile("\\w+|\\H");
Matcher m = pat.matcher(str);
while (m.find()) {
myList.add(m.group(0));
}
If you use Java 7, change \\H to [\\S\\n].
Note that using this approach, you obtain a pattern easier to write and to edit since you don't need to use lookarounds.

JAVA - Ignore part of strings containing "#"

I'm having some difficulties in excluding part of strings after the "#" symbol.
I explain myself better:
This is a sample input text a user could insert in a textbox:
Some Text
Some Text again #A comment
#A comment line
Another Text
Another Text again#Comment
I need to read this text and ignore all text after "#" symbol.
This should be the expected output:
Some Text;Some Text again;Another Text;Another Text again
As for now here's the code:
This replaces all newlines with ";"
readText = userInputTextArea.getText();
readTextAllInALine = readText.replaceAll("\\n", ";");
so the output after this is:
Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment
This code is to ignore all characters after the first "#" but works fine just for the first line if we read it all sequentially.
int startIndex = inputCommandText.indexOf("#");
int endIndex = inputCommandText.indexOf(";");
String toBeReplaced = inputCommandText.substring(startIndex, endIndex);
readTextAllInALine.replace(toBeReplaced, "");
I'm stuck in finding a way for having the expected output. I was thinking of using a StringTokenizer, processing every line, removing text after "#" or ignoring the whole line if it starts with "#", and then printing all tokens (i.e. all lines) separating them with ";" but I cannot make it work.
Any help will be appreciated.
Thank you very much in advance.
Regards.

Just call this replace command on your pure string, retrieved from the text input. The regex #[^;]* grabs everything, starting at the hash until it reads a semicolon. Afterwards it replaces it with an empty string.
public static void main(String[] args) {
String text = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment";
System.out.println(text);
text = text.replaceAll("#[^;]*", "");
System.out.println(text);
}

A regex is useful here but it's tricky because your pattern is moderately complex. The comments are end line so they can appear in more than one arrangement.
I came up with the following which is a two-pass:
replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";");
The two-pass circumvents the fact that sometimes you get a duplicate line break. The first expression replaces comments but not new line characters and the second expression replaces multiple new line characters with a single semicolon.
The individual parts of the expression in the first pass are the following:
" *"
This includes zero or more leading spaces in the comment match. IE in "...again #A...", we want to remove that space between n and #.
"(#.* )"
The start of the comment match: matches a # followed by zero or more characters. (Typically the . matches any character except a new line.)
"(?= )"
This is a positive lookahead and where the regex starts to get tricky. It looks for whatever is inside this expression but doesn't include it in the text that's matched. It asserts that the #.* is followed by a certain string but doesn't replace that certain string.
"\\n|$"
The lookahead finds a new line or the end anchor. This will find a comment ended with a new line character or a comment that is at the end of the String. But again, since it's inside the lookahead, the new line doesn't get replaced.
So given the input:
String text = (
"Some Text" + '\n' +
"Some Text again #A comment" + '\n' +
"#A comment line" + '\n' +
"Another Text" + '\n' +
"Another Text again#Comment"
);
System.out.println(
text.replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";")
);
The output is:
Some Text;Some Text again;Another Text;Another Text again

readText = userInputTextArea.getText();
readText = readText.replaceAll("\\s*#[^\n]*", "");
readText = readText.replaceAll("\n+", ";");

Just to make it clear, Coxer's reply is the way to go. Far more precise and clean. But in any case, if you fancy experimenting here is a recursive solution that will work:
public class IgnoreHash {
#Test
public void test() {
String readTextAllInALine = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment;";
String actualResult = removeHashComments(readTextAllInALine);
Assert.assertEquals(actualResult, "Some Text;Some Text again ;Another Text;Another Text again");
}
private String removeHashComments(String input) {
StringBuffer result = new StringBuffer();
int hashIndex = input.indexOf("#");
int endIndex = input.indexOf(";");
if(hashIndex != -1){
result.append(input.substring(0, hashIndex));
//first line
if(hashIndex < endIndex ) {
result.append(removeHashComments(input.substring(endIndex)));
} // the case of ;#
else if (endIndex == hashIndex-1) {
int endIndex2 = input.indexOf(";", hashIndex+1);
result.append(removeHashComments(input.substring(endIndex2+1)));
}
else {
result.append(removeHashComments(input.substring(hashIndex)));
}
}
return result.toString();
}
}

How to remove spaces in between the String

I have below String
string = "Book Your Domain And Get\n \n\n \n \n \n Online Today."
string = str.replace("\\s","").trim();
which returning
str = "Book Your Domain And Get Online Today."
But what is want is
str = "Book Your Domain And Get Online Today."
I have tried Many Regular Expression and also googled but got no luck. and did't find related question, Please Help, Many Thanks in Advance

Use \\s+ instead of \\s as there are two or more consecutive whitespaces in your input.
string = str.replaceAll("\\s+"," ")

You can use replaceAll which takes a regex as parameter. And it seems like you want to replace multiple spaces with a single space. You can do it like this:
string = str.replaceAll("\\s{2,}"," ");
It will replace 2 or more consecutive whitespaces with a single whitespace.

First get rid of multiple spaces:
String after = before.trim().replaceAll(" +", " ");

If you want to just remove the white space between 2 words or characters and not at the end of string
then here is the
regex that i have used,
String s = " N OR 15 2 ";
Pattern pattern = Pattern.compile("[a-zA-Z0-9]\\s+[a-zA-Z0-9]", Pattern.CASE_INSENSITIVE);
Matcher m = pattern.matcher(s);
while(m.find()){
String replacestr = "";
int i = m.start();
while(i<m.end()){
replacestr = replacestr + s.charAt(i);
i++;
}
m = pattern.matcher(s);
}
System.out.println(s);
it will only remove the space between characters or words not spaces at the ends
and the output is
NOR152

Eg. to remove space between words in a string:
String example = "Interactive Resource";
System.out.println("Without space string: "+ example.replaceAll("\\s",""));
Output:
Without space string: InteractiveResource

If you want to print a String without space, just add the argument sep='' to the print function, since this argument's default value is " ".

//user this for removing all the whitespaces from a given string for example a =" 1 2 3 4"
//output: 1234
a.replaceAll("\\s", "")

String s2=" 1 2 3 4 5 ";
String after=s2.replace(" ", "");
this work for me

String string_a = "AAAA BBB";
String actualTooltip_3 = string_a.replaceAll("\\s{2,}"," ");
System.out.println(String actualTooltip_3);
OUTPUT will be:AAA BBB

Java - New Line Character Issue

I have a code like,
String str = " " ;
while( cond ) {
str = str + "\n" ;
}
Now, I don't know why at the time of printing, the output string is not printing the newline character. However, when I add any other character like ( str = str + "c"), it is printing properly. Can anybody help me, how to solve this problem and why this happening ?

The newline character is considered a control character, which doesn't print a special character to the screen by default.
As an example, try this:
String str = "Hi";
while (cond) {
str += "\n"; // Syntactically equivalent to your code
}
str += "Bye";
System.out.println(str);

Looks like you are trying to run the above code on Windows. Well the line separator or new line is different on Windows ( '\r\n' ) and Unix flavors ('\n').
So, instead of hard coding and using '\n' as new line. Try getting new line from the system like:
String newLine = System.getProperty("line.separator");
String str = " " ;
while( cond ) {
str = str + newLine ;
}

If you really want \n, to get printed, do it like this.
String first = "C:/Mine/Java" + "\\n";
System.out.println(first);
OUTPUT is as follows :
For a good reference as to why is this happening, visit JAVA Tutorials
As referred in that TUTORIAL : A character preceded by a backslash is an escape sequence, and has a special meaning to the compiler. When an escape sequence is encountered in a print statement, the compiler interprets it accordingly
Hope this might help.
Regards

Based on your sample, the only reason it would not show a new line character is that cond is never true and thus the while loop never runs...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

how to suppress single and multi-line comments using a regexp? - java

Related

How to splitting records based white spaces when different lines have spaces at different positions

Separate string by whitespace, but keep newlines in split array

JAVA - Ignore part of strings containing "#"

How to remove spaces in between the String

Java - New Line Character Issue

Categories

Resources