I am trying to delete lines start with . in my parsed text document. With the current code no line starts with point is being deleted. How can I fix that? I have tried it without \\ but the result is the same.
Simple:
... ...
Code:
if (line.startsWith("\\.")) {
outputLine = line.replace(".", " ");
}
startsWith doesn't evaluate a regex, it just takes the string as is. As such, there's no need to escape the .:
if(line.startsWith(".")){
outputLine = line.replace(".", " ");
}
You can also try with indexOf function, like this:
if(line.indexOf(".") == 0)
{
outputLine = line.replace("."," ");
}
You could use String.charAt(0) to get the first character, and you could use String.substring(1) to get the String minus its' first character. Something like
String line = ".Hello";
if (line.charAt(0) == '.') {
line = line.substring(1);
}
System.out.println(line);
Output is
Hello
You might also omit the test and use a regular expression with String.replaceAll(String, String) like
String line = ".Hello";
line = line.replaceAll("^\\.", ""); // <-- starts with .
System.out.println(line);
for the sample output.
Related
I have a string with space and I want that space replace by "\_"
. For example here is my code
String example = "Bill Gates";
example = example.replaceAll(" ","\\_");
And the result of example is: "Bill_Gates" not "Bill\_Gates". When I try to do like this
String example = "Bill Gates";
example = example.replaceAll(" ","\\\\_");
The result of example is: "Bill\\_Gates" not "Bill\_Gates"
You need to use replaceAll(" ","\\\\_") instead of replaceAll(" ","\\_"). Because '\\' is a literal. It will be compiled as '\' single slash. When you pass this to replaceall method. It will take first slash as escaping character for "_". If you look inside replaceall method
while (cursor < replacement.length()) {
char nextChar = replacement.charAt(cursor);
if (nextChar == '\\') {
cursor++;
if (cursor == replacement.length())
throw new IllegalArgumentException(
"character to be escaped is missing");
nextChar = replacement.charAt(cursor);
result.append(nextChar);
cursor++;
When it finds a single slash it will replace next character of that slash. So you have to input "\\\\_" to replace method. Then it will be processed as "\\_". Method will look first slash and replace second slash. Then it will replace underscore.
Try:
String example = "Bill Gates";
example = example.replaceAll(" ","\\\\_");
System.out.println(example);
public static void main(String[] args) {
String example = "Bill Gates";
example = example.replaceAll(" ", "\\\\_");
System.out.println(example);
}
output
Bill\_Gates
I am facing a little difficulty with a Syntax highlighter that I've made and is 90% complete. What it does is that it reads in the text from the source of a .java file, detects keywords, comments, etc and writes a (colorful) output in an HTML file. Sample output from it is:
(I couldn't upload a whole html page, so this is a screenshot.) As (I hope) you can see, my program seems to work correctly with keywords, literals and comments (see below) and hence can normally document almost all programs. But it seems to break apart when I store the escape sequence for " i.e. \" inside a String. An error case is shown below:
The string literal highlighting doesn't stop at the end of the literal, but continues until it finds another cue, like a keyword or another literal.
So, the question is how do I disguise/hide/remove this \" from within a String?
The stringFilter method of my program is:
public String stringFilter(String line) {
if (line == null || line.equals("")) {
return "";
}
StringBuffer buf = new StringBuffer();
if (line.indexOf("\"") <= -1) {
return keywordFilter(line);
}
int start = 0;
int startStringIndex = -1;
int endStringIndex = -1;
int tempIndex;
//Keep moving through String characters until we want to stop...
while ((tempIndex = line.indexOf("\"")) > -1 && !isInsideString(line, tempIndex)) {
//We found the beginning of a string
if (startStringIndex == -1) {
startStringIndex = 0;
buf.append( stringFilter(line.substring(start,tempIndex)) );
buf.append("</font>");
buf.append(literal).append("\"");
line = line.substring(tempIndex+1);
}
//Must be at the end
else {
startStringIndex = -1;
endStringIndex = tempIndex;
buf.append(line.substring(0,endStringIndex+1));
buf.append("</font>");
buf.append(normal);
line = line.substring(endStringIndex+1);
}
}
buf.append( keywordFilter(line) );
return buf.toString();
}
EDIT
in response to the first few comments and answers, here's what I tried:
A snippet from htmlFilter(String), but it doesn't work :(
//replace '&' i.e. ampersands with HTML escape sequence for ampersand.
line = line.replaceAll("&", "&");
//line = line.replaceAll(" ", " ");
line = line.replaceAll("" + (char)35, "#");
// replace less-than signs which might be confused
// by HTML as tag angle-brackets;
line = line.replaceAll("<", "<");
// replace greater-than signs which might be confused
// by HTML as tag angle-brackets;
line = line.replaceAll(">", ">");
line = multiLineCommentFilter(line);
//replace the '\\' i.e. escape for backslash with HTML escape sequences.
//fixes a problem when backslashes preceed quotes.
//line = line.replaceAll("\\\"", "\"");
//line = line.replaceAll("" + (char)92 + (char)92, "\\");
return line;
My idea is that when a backslash is met, ignore the next character.
String str = "blah\"blah\\blah\n";
int index = 0;
while (true) {
// find the beginning
while (index < str.length() && str.charAt(index) != '\"')
index++;
int beginIndex = index;
if (index == str.length()) // no string found
break;
index++;
// find the ending
while (index < str.length()) {
if (str.charAt(index) == '\\') {
// escape, ignore the next character
index += 2;
} else if (str.charAt(index) == '\"') {
// end of string found
System.out.println(beginIndex + " " + index);
break;
} else {
// plain content
index++;
}
}
if (index >= str.length())
throw new IllegalArgumentException(
"String literal is not properly closed by a double-quote");
index++;
}
Check for char found at tempIndex-1 it it is \ then don't consider as beginning or ending of string.
String originalLine=line;
if ((tempIndex = originalLine.indexOf("\"", tempIndex + 1)) > -1) {
if (tempIndex==0 || originalLine.charAt(tempIndex - 1) != '\\') {
...
Steps to follow:
First replace all \" with some temp string such as
String tempStr="forward_slash_followed_by_double_quote";
line = line.replaceAll("\\\\\"", tempStr);
//line = line.replaceAll("\\\"", tempStr);
do what ever you are doing
Finally replace that temp string with \"
line = line.replaceAll(tempStr, "\\\\\"");
//line = line.replaceAll(tempStr, "\\\"");
The trouble with finding a quote and then trying to work out whether it's escaped is that it's not enough to simply look at the previous character to see if it's a backslash - consider
String basedir = "C:\\Users\\";
where the \" isn't an escaped quote, but is actually an escaped backslash followed by an unescaped quote. In general a quote preceded by an odd number of backslashes is escaped, one preceded by an even number of backslashes isn't.
A more sensible approach would be to parse through the string one character at a time from left to right rather than trying to jump ahead to quote characters. If you don't want to have to learn a proper parser generator like JavaCC or antlr then you can tackle this case with regular expressions using the \G anchor (to force each subsequent match to start at the end of the previous one with no gaps) - if we assume that str is a substring of your input starting with the character following the opening quote of a string literal then
Pattern p = Pattern.compile("\\G(?:\\\\u[0-9A-Fa-f]{4}|\\\\.|[^\"\\\\])");
StringBuilder buf = new StringBuilder();
Matcher m = p.matcher(str);
while(m.find()) buf.append(m.group());
will leave buf containing the content of the string literal up to but not including the closing quote, and will handle escapes like \", \\ and unicode escapes \uNNNN.
Use double slash "\\"" instead of "\""... Maybe it works...
I'm having some difficulties in excluding part of strings after the "#" symbol.
I explain myself better:
This is a sample input text a user could insert in a textbox:
Some Text
Some Text again #A comment
#A comment line
Another Text
Another Text again#Comment
I need to read this text and ignore all text after "#" symbol.
This should be the expected output:
Some Text;Some Text again;Another Text;Another Text again
As for now here's the code:
This replaces all newlines with ";"
readText = userInputTextArea.getText();
readTextAllInALine = readText.replaceAll("\\n", ";");
so the output after this is:
Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment
This code is to ignore all characters after the first "#" but works fine just for the first line if we read it all sequentially.
int startIndex = inputCommandText.indexOf("#");
int endIndex = inputCommandText.indexOf(";");
String toBeReplaced = inputCommandText.substring(startIndex, endIndex);
readTextAllInALine.replace(toBeReplaced, "");
I'm stuck in finding a way for having the expected output. I was thinking of using a StringTokenizer, processing every line, removing text after "#" or ignoring the whole line if it starts with "#", and then printing all tokens (i.e. all lines) separating them with ";" but I cannot make it work.
Any help will be appreciated.
Thank you very much in advance.
Regards.
Just call this replace command on your pure string, retrieved from the text input. The regex #[^;]* grabs everything, starting at the hash until it reads a semicolon. Afterwards it replaces it with an empty string.
public static void main(String[] args) {
String text = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment";
System.out.println(text);
text = text.replaceAll("#[^;]*", "");
System.out.println(text);
}
A regex is useful here but it's tricky because your pattern is moderately complex. The comments are end line so they can appear in more than one arrangement.
I came up with the following which is a two-pass:
replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";");
The two-pass circumvents the fact that sometimes you get a duplicate line break. The first expression replaces comments but not new line characters and the second expression replaces multiple new line characters with a single semicolon.
The individual parts of the expression in the first pass are the following:
" *"
This includes zero or more leading spaces in the comment match. IE in "...again #A...", we want to remove that space between n and #.
"(#.* )"
The start of the comment match: matches a # followed by zero or more characters. (Typically the . matches any character except a new line.)
"(?= )"
This is a positive lookahead and where the regex starts to get tricky. It looks for whatever is inside this expression but doesn't include it in the text that's matched. It asserts that the #.* is followed by a certain string but doesn't replace that certain string.
"\\n|$"
The lookahead finds a new line or the end anchor. This will find a comment ended with a new line character or a comment that is at the end of the String. But again, since it's inside the lookahead, the new line doesn't get replaced.
So given the input:
String text = (
"Some Text" + '\n' +
"Some Text again #A comment" + '\n' +
"#A comment line" + '\n' +
"Another Text" + '\n' +
"Another Text again#Comment"
);
System.out.println(
text.replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";")
);
The output is:
Some Text;Some Text again;Another Text;Another Text again
readText = userInputTextArea.getText();
readText = readText.replaceAll("\\s*#[^\n]*", "");
readText = readText.replaceAll("\n+", ";");
Just to make it clear, Coxer's reply is the way to go. Far more precise and clean. But in any case, if you fancy experimenting here is a recursive solution that will work:
public class IgnoreHash {
#Test
public void test() {
String readTextAllInALine = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment;";
String actualResult = removeHashComments(readTextAllInALine);
Assert.assertEquals(actualResult, "Some Text;Some Text again ;Another Text;Another Text again");
}
private String removeHashComments(String input) {
StringBuffer result = new StringBuffer();
int hashIndex = input.indexOf("#");
int endIndex = input.indexOf(";");
if(hashIndex != -1){
result.append(input.substring(0, hashIndex));
//first line
if(hashIndex < endIndex ) {
result.append(removeHashComments(input.substring(endIndex)));
} // the case of ;#
else if (endIndex == hashIndex-1) {
int endIndex2 = input.indexOf(";", hashIndex+1);
result.append(removeHashComments(input.substring(endIndex2+1)));
}
else {
result.append(removeHashComments(input.substring(hashIndex)));
}
}
return result.toString();
}
}
I need do find all multiline comments in a string and replace them with a space (if the comment is in one line) or with a \n (if the comment is on more than one line).
for example:
int/* one line comment */a;
should be changed to:
int a;
and this:
int/*
more
than one
line comment*/a;
should be changed to:
int
a;
I have one String with all the text and I used this command:
file = file.replaceAll("(/\\*([^*]|(\\*+[^*/]))*\\*+/)"," ");
where file is the string.
The problem is it finds all multiline comment and I want to separate it to 2 cases.
How can I do it?
This can be solved using Matcher.appendReplacement and Matcher.appendTail.
String file = "hello /* line 1 \n line 2 \n line 3 */"
+ "there /* line 4 */ world";
StringBuffer sb = new StringBuffer();
Matcher m = Pattern.compile("(?m)/\\*([^*]|(\\*+[^*/]))*\\*+/").matcher(file);
while (m.find()) {
// Find a comment
String toReplace = m.group();
// Figure out what to replace it with
String replacement = toReplace.contains("\n") ? "\n" : "";
// Perform the replacement.
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
System.out.println(sb);
Output:
hello
there world
Note: If you want to preserve correct line number / columns for all text that is not inside comments (good if you want to refer back to the source code in error messages etc) I would recommend doing
String replacement = toReplace.replaceAll("\\S", " ");
which replaces all non-whitespace with white space. This way \n is preserved, and
"/* abc */"
is replaced by
" "
I have a code like,
String str = " " ;
while( cond ) {
str = str + "\n" ;
}
Now, I don't know why at the time of printing, the output string is not printing the newline character. However, when I add any other character like ( str = str + "c"), it is printing properly. Can anybody help me, how to solve this problem and why this happening ?
The newline character is considered a control character, which doesn't print a special character to the screen by default.
As an example, try this:
String str = "Hi";
while (cond) {
str += "\n"; // Syntactically equivalent to your code
}
str += "Bye";
System.out.println(str);
Looks like you are trying to run the above code on Windows. Well the line separator or new line is different on Windows ( '\r\n' ) and Unix flavors ('\n').
So, instead of hard coding and using '\n' as new line. Try getting new line from the system like:
String newLine = System.getProperty("line.separator");
String str = " " ;
while( cond ) {
str = str + newLine ;
}
If you really want \n, to get printed, do it like this.
String first = "C:/Mine/Java" + "\\n";
System.out.println(first);
OUTPUT is as follows :
For a good reference as to why is this happening, visit JAVA Tutorials
As referred in that TUTORIAL : A character preceded by a backslash is an escape sequence, and has a special meaning to the compiler. When an escape sequence is encountered in a print statement, the compiler interprets it accordingly
Hope this might help.
Regards
Based on your sample, the only reason it would not show a new line character is that cond is never true and thus the while loop never runs...