Removing new Line from sql record via Java - java

I am reading and manipulating a MS SQL table using JPA. I have text in the cells that I want to cleanup by removing line breaks. Below is one sample of the text (pasting this in notepad++ shows **CR LF** on each line) :
(
(NVSM in (1,2)) and
(NISFVSM in (1,2)) and
(TRMBVSM = 0)
)
I have tried the following code but I can not get rid of the newlines
flatTextString.trim()
.replace(System.getProperty("line.separator"), " ")
.replaceAll("\t", "")
.replaceAll("(\\r|\\n)", "")
.replaceAll("\\s{2,}", " ")
;
How can I fix this?

I suggest just replacing all newline and carriage returns with just a space, and then remove all whitespace before closing ) and after opening (:
String flatTextString = "(\r\n(NVSM in (1,2)) and\r\n(NISFVSM in (1,2)) and\r\n(TRMBVSM = 0)\r\n)";
System.out.println(flatTextString.replaceAll("[\r\n]+", " ").replaceAll("\\s+\\)", ")").replaceAll("\\(\\s+", "(")); // My way
// => ((NVSM in (1,2)) and (NISFVSM in (1,2)) and (TRMBVSM = 0))
See IDEONE demo
As Java regex cannot use conditional replacement patterns, you can only chain replaceAll methods as I have shown in the code snippet above.

Related

What's the correct usage of withEscapeChar in open CSV [duplicate]

Am using opencsv 2.3 and it does not appear to be dealing with escape characters as I expect. I need to be able to handle an escaped separator in a CSV file that does not use quoting characters.
Sample test code:
CSVReader reader = new CSVReader(new FileReader("D:/Temp/test.csv"), ',', '"', '\\');
String[] nextLine;
while ((nextLine = reader.readNext()) != null) {
for (String string : nextLine) {
System.out.println("Field [" + string + "].");
}
}
and the csv file:
first field,second\,field
and the output:
Field [first field].
Field [second].
Field [field].
Note that if I change the csv to
first field,"second\,field"
then I get the output I am after:
Field [first field].
Field [second,field].
However, in my case I do not have the option of modifying the source CSV.
Unfortunately it looks like opencsv does not support escaping of separator characters unless they're in quotes. The following method (taken from opencsv's source) is called when an escape character is encountered.
protected boolean isNextCharacterEscapable(String nextLine, boolean inQuotes, int i) {
return inQuotes // we are in quotes, therefore there can be escaped quotes in here.
&& nextLine.length() > (i + 1) // there is indeed another character to check.
&& (nextLine.charAt(i + 1) == quotechar || nextLine.charAt(i + 1) == this.escape);
}
As you can see, this method only returns true if the character following the escape character is a quote character or another escape character. You could patch the library to this, but in its current form, it won't let you do what you're trying to do.

JAVA - Ignore part of strings containing "#"

I'm having some difficulties in excluding part of strings after the "#" symbol.
I explain myself better:
This is a sample input text a user could insert in a textbox:
Some Text
Some Text again #A comment
#A comment line
Another Text
Another Text again#Comment
I need to read this text and ignore all text after "#" symbol.
This should be the expected output:
Some Text;Some Text again;Another Text;Another Text again
As for now here's the code:
This replaces all newlines with ";"
readText = userInputTextArea.getText();
readTextAllInALine = readText.replaceAll("\\n", ";");
so the output after this is:
Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment
This code is to ignore all characters after the first "#" but works fine just for the first line if we read it all sequentially.
int startIndex = inputCommandText.indexOf("#");
int endIndex = inputCommandText.indexOf(";");
String toBeReplaced = inputCommandText.substring(startIndex, endIndex);
readTextAllInALine.replace(toBeReplaced, "");
I'm stuck in finding a way for having the expected output. I was thinking of using a StringTokenizer, processing every line, removing text after "#" or ignoring the whole line if it starts with "#", and then printing all tokens (i.e. all lines) separating them with ";" but I cannot make it work.
Any help will be appreciated.
Thank you very much in advance.
Regards.
Just call this replace command on your pure string, retrieved from the text input. The regex #[^;]* grabs everything, starting at the hash until it reads a semicolon. Afterwards it replaces it with an empty string.
public static void main(String[] args) {
String text = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment";
System.out.println(text);
text = text.replaceAll("#[^;]*", "");
System.out.println(text);
}
A regex is useful here but it's tricky because your pattern is moderately complex. The comments are end line so they can appear in more than one arrangement.
I came up with the following which is a two-pass:
replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";");
The two-pass circumvents the fact that sometimes you get a duplicate line break. The first expression replaces comments but not new line characters and the second expression replaces multiple new line characters with a single semicolon.
The individual parts of the expression in the first pass are the following:
" *"
This includes zero or more leading spaces in the comment match. IE in "...again #A...", we want to remove that space between n and #.
"(#.* )"
The start of the comment match: matches a # followed by zero or more characters. (Typically the . matches any character except a new line.)
"(?= )"
This is a positive lookahead and where the regex starts to get tricky. It looks for whatever is inside this expression but doesn't include it in the text that's matched. It asserts that the #.* is followed by a certain string but doesn't replace that certain string.
"\\n|$"
The lookahead finds a new line or the end anchor. This will find a comment ended with a new line character or a comment that is at the end of the String. But again, since it's inside the lookahead, the new line doesn't get replaced.
So given the input:
String text = (
"Some Text" + '\n' +
"Some Text again #A comment" + '\n' +
"#A comment line" + '\n' +
"Another Text" + '\n' +
"Another Text again#Comment"
);
System.out.println(
text.replaceAll(" *(#.*(?=\\n|$))", "").replaceAll("\\n+", ";")
);
The output is:
Some Text;Some Text again;Another Text;Another Text again
readText = userInputTextArea.getText();
readText = readText.replaceAll("\\s*#[^\n]*", "");
readText = readText.replaceAll("\n+", ";");
Just to make it clear, Coxer's reply is the way to go. Far more precise and clean. But in any case, if you fancy experimenting here is a recursive solution that will work:
public class IgnoreHash {
#Test
public void test() {
String readTextAllInALine = "Some Text;Some Text again #A comment;#A comment line;Another Text;Another Text again#Comment;";
String actualResult = removeHashComments(readTextAllInALine);
Assert.assertEquals(actualResult, "Some Text;Some Text again ;Another Text;Another Text again");
}
private String removeHashComments(String input) {
StringBuffer result = new StringBuffer();
int hashIndex = input.indexOf("#");
int endIndex = input.indexOf(";");
if(hashIndex != -1){
result.append(input.substring(0, hashIndex));
//first line
if(hashIndex < endIndex ) {
result.append(removeHashComments(input.substring(endIndex)));
} // the case of ;#
else if (endIndex == hashIndex-1) {
int endIndex2 = input.indexOf(";", hashIndex+1);
result.append(removeHashComments(input.substring(endIndex2+1)));
}
else {
result.append(removeHashComments(input.substring(hashIndex)));
}
}
return result.toString();
}
}

Replace single quote with double quote with Regex

I have an app that received a malformed JSON string like this:
{'username' : 'xirby'}
I need to replaced the single quotes ' with double quoates "
With these rule (I think):
A single quote comes after a { with one or more spaces
Comes before one or more spaces and :
Comes after a : with one more spaces
Comes before one or more spaces and }
So this String {'username' : 'xirby'} or
{ 'username' : 'xirby' }
Would be transformed to:
{"username" : "xirby"}
Update:
Also a possible malformed JSON String:
{ 'message' : 'there's not much to say' }
In this example the single quote inside the message value should not be replaced.
Try this regex:
\s*\'\s*
and a call to Replace with " will do the job. Look at here.
Instead of doing this, you're better off using a JSON parser which can read such malformed JSON and "normalize" it for you. Jackson can do that:
final ObjectReader reader = new ObjectMapper()
.configure(Feature.ALLOW_SINGLE_QUOTES, true)
.reader();
final JsonNode node = reader.readTree(yourMalformedJson);
// node.toString() does the right thing
This regex will capture all appropriate single quotes and associated white spaces while ignoring single quotes inside a message. One can replace the captured characters with double quotes, while preserving the JSON format. It also generalizes to JSON strings with multiple messages (delimited by commas ,).
((?<={)\s*\'|(?<=,)\s*\'|\'\s*(?=:)|(?<=:)\s*\'|\'\s*(?=,)|\'\s*(?=}))
I know you tagged your question for java, but I'm more familiar with python. Here's an example of how you can replace the single quotes with double quotes in python:
import re
regex = re.compile('((?<={)\s*\'|(?<=,)\s*\'|\'\s*(?=:)|(?<=:)\s*\'|\'\s*(?=,)|\'\s*(?=}))')
s = "{ 'first_name' : 'Shaquille' , 'lastname' : 'O'Neal' }"
regex.sub('"', s)
> '{"first_name":"Shaquille","lastname":"O\'Neal"}'
This method looks for single quotes next to the symbols {},: using look-ahead and look-behind operations.
String test = "{'username' : 'xirby'}";
String replaced = test.replaceAll("'", "\"");
Concerning your question's tag is JAVA, I answered in JAVA.
At first import the libraries:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
Then:
Pattern p = Pattern.compile("((?<=(\\{|\\[|\\,|:))\\s*')|('\\s*(?=(\\}|(\\])|(\\,|:))))");
String s = "{ 'firstName' : 'Malus' , 'lastName' : ' Ms'Malus' , marks:[ ' A+ ', 'B+']}";
String replace = "\"";
String o;
Matcher m = p.matcher(s);
o = m.replaceAll(replace);
System.out.println(o);
Output:
{"firstName":"Malus","lastName":" Ms'Malus", marks:[" A+ ","B+"]}
If you're looking to exactly satisfy all of those conditions, try this:
'{(\s)?\'(.*)\'(\s)?:(\s)?\'(.*)\'(\s)?}'
as you regex. It uses (\s)? to match one or zero whitespace characters.
I recommend you to use a JSON parser instead of REGEX.
String strJson = "{ 'username' : 'xirby' }";
strJson = new JSONObject(strJson).toString();
System.out.println(strJson);

how to suppress single and multi-line comments using a regexp?

I need do find all multiline comments in a string and replace them with a space (if the comment is in one line) or with a \n (if the comment is on more than one line).
for example:
int/* one line comment */a;
should be changed to:
int a;
and this:
int/*
more
than one
line comment*/a;
should be changed to:
int
a;
I have one String with all the text and I used this command:
file = file.replaceAll("(/\\*([^*]|(\\*+[^*/]))*\\*+/)"," ");
where file is the string.
The problem is it finds all multiline comment and I want to separate it to 2 cases.
How can I do it?
This can be solved using Matcher.appendReplacement and Matcher.appendTail.
String file = "hello /* line 1 \n line 2 \n line 3 */"
+ "there /* line 4 */ world";
StringBuffer sb = new StringBuffer();
Matcher m = Pattern.compile("(?m)/\\*([^*]|(\\*+[^*/]))*\\*+/").matcher(file);
while (m.find()) {
// Find a comment
String toReplace = m.group();
// Figure out what to replace it with
String replacement = toReplace.contains("\n") ? "\n" : "";
// Perform the replacement.
m.appendReplacement(sb, replacement);
}
m.appendTail(sb);
System.out.println(sb);
Output:
hello
there world
Note: If you want to preserve correct line number / columns for all text that is not inside comments (good if you want to refer back to the source code in error messages etc) I would recommend doing
String replacement = toReplace.replaceAll("\\S", " ");
which replaces all non-whitespace with white space. This way \n is preserved, and
"/* abc */"
is replaced by
" "

Java - New Line Character Issue

I have a code like,
String str = " " ;
while( cond ) {
str = str + "\n" ;
}
Now, I don't know why at the time of printing, the output string is not printing the newline character. However, when I add any other character like ( str = str + "c"), it is printing properly. Can anybody help me, how to solve this problem and why this happening ?
The newline character is considered a control character, which doesn't print a special character to the screen by default.
As an example, try this:
String str = "Hi";
while (cond) {
str += "\n"; // Syntactically equivalent to your code
}
str += "Bye";
System.out.println(str);
Looks like you are trying to run the above code on Windows. Well the line separator or new line is different on Windows ( '\r\n' ) and Unix flavors ('\n').
So, instead of hard coding and using '\n' as new line. Try getting new line from the system like:
String newLine = System.getProperty("line.separator");
String str = " " ;
while( cond ) {
str = str + newLine ;
}
If you really want \n, to get printed, do it like this.
String first = "C:/Mine/Java" + "\\n";
System.out.println(first);
OUTPUT is as follows :
For a good reference as to why is this happening, visit JAVA Tutorials
As referred in that TUTORIAL : A character preceded by a backslash is an escape sequence, and has a special meaning to the compiler. When an escape sequence is encountered in a print statement, the compiler interprets it accordingly
Hope this might help.
Regards
Based on your sample, the only reason it would not show a new line character is that cond is never true and thus the while loop never runs...

Categories

Resources