java - Line breaks not working - java

I'm writing a simple program that writes data to the selected file .
everything is going great except the line breaks \n the string is written in the file but without line breaks
I've tried \n and \n\r but nothing changed
the program :
public void prepare(){
String content = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n\r<data>\n\r<user><username>root</username><password>root</password></user>\n\r</data>";
FileOutputStream fos = null;
try {
fos = new FileOutputStream(file);
} catch (FileNotFoundException ex) {
System.out.println("File Not Found .. prepare()");
}
byte b[] = content.getBytes();
try {
fos.write(b);
fos.close();
} catch (IOException ex) {
System.out.println("IOException .. prepare()");
}
}
public static void main(String args[]){
File f = new File("D:\\test.xml");
Database data = new Database(f);
data.prepare();
}

Line endings for Windows follow the form \r\n, not \n\r. However, you may want to use platform-dependent line endings. To determine the standard line endings for the current platform, you can use:
System.lineSeparator()
...if you are running Java 7 or later. On earlier versions, use:
System.getProperty("line.separator")

My guess is that you're using Windows. Write \r\n instead of \n\r - as \r\n is the linebreak on Windows.
I'm sure you'll find that the characters you're writing into the file are there - but you need to understand that different platforms use different default line breaks... and different clients will handle things differently. (Notepad on Windows only understands \r\n, other text editors may be smarter.)

The correct linebreak sequence on windows is \r\n not \n\r.
Also your viewer may interpret them differently. For example, notepad will only display CRLF linebreaks, but Write or Word have no problem displaying CR or LF alone.
You should use System.lineSeparator() if you want to find the linebreak sequence for the current platform. You are correct in writing them explicitly if you are attempting to force a linebreak format regardless of the current platform.

Related

Java BufferedWriter Creating Null Characters

I've been using Java's BufferedWriter to write to a file to parse out some input. When I open the file after, however, there seems to be added null characters. I tried specifying the encoding as "US-ASCII" and "UTF8" but I get the same result. Here's my code snippet:
Scanner fileScanner = new Scanner(original);
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), "US-ASCII"));
while(fileScanner.hasNextLine())
{
String next = fileScanner.nextLine();
next = next.replaceAll(".*\\x0C", ""); //remove up to ^L
out.write(next);
out.newLine();
}
out.flush();
out.close();
Maybe the issue isn't even with the BufferedWriter?
I've narrowed it down to this code block because if I comment it out, there are no null-characters in the output file. If I do a regex replace in VIM the file is null-character free (:%s/.*^L//g).
Let me know if you need more information.
Thanks!
EDIT:
hexdump of a normal line looks like:
0000000 5349 2a41 3030 202a
But when this code is run the hexdump looks like:
0000000 5330 2a49 4130 202a
I'm not sure why things are getting mixed up.
EDIT:
Also, even if the file doesn't match the regex and runs through that block of code, it comes out with null characters.
EDIT:
Here's a hexdump of the first few lines of a diff:
http://pastie.org/pastes/8964701/text
command was: diff -y testfile.hexdump expectedoutput.hexdump
The rest of the lines are different like the last two.
EDIT: Looking at the hexdump diff you gave, the only difference is that one has LF line endings (0A) and the other has CRLF line endings (0D 0A). All the other data in your diff is shifted ahead to accomodate the extra byte.
The CRLF is the default line ending on the OS you're using. If you want a specific line ending in your output, write the string "\n" or "\r\n".
Previously I noted that the Scanner doesn't specify a charset. It should specify the appropriate one that the input is known to be encoded in. However, this isn't the source of the unexpected output.
Scanner.nextLine() is eating the existing line endings.
The javadoc for nextLine states:
This method returns the rest of the current line, excluding any line separator at the end.
The javadoc for BufferedWriter.newLine explains:
Writes a line separator. The line separator string is defined by the system property line.separator, and is not necessarily a single newline ('\n') character.
In your case your system's default newline seperator is "\n". The EDI file you are parsing uses "\r\n".
Using the system defined newLine seperator isn't the appropriate thing to do in this case. The newline separator to use is dictated by the file format and should be put in a format specific static constant somewhere.
Change "out.newLine();" to "out.write("\r\n");"
I think what is going on is the following
All lines that contain ^L (ff) get modified to remove everything before the ^L but in addition you have the side effect in 1 that all \r (cr) also get removed. However, if cr appears before ^L nextLine() is treating that as a line too. Note how, in the output file below, the number of cr + nl is 6 in the input file and the number of cr + nl is also 6 but they're all nl, so the line with c gets preserved because it's being treated on a different line than ^L. Probably not what you want. See below.
Some observations
The source file is being generated on a system that uses \r\n to define a new line, and your program is being run on a system that does not. Because of this all occurrences of 0xd are going to be removed. This will make the two files different sizes even if there are no ^L.
But you probably overlooked #1 because vim will operate in DOS mode (recognize \r\n as a newline separator) or non-DOS mode (only \n) depending on what it reads when it opens the file and hides the fact from the user if it can. In fact to test I had to brute force in \r using ^v^m because I was editing on Linux using vim more here.
Your means to test is probably using od -x (for hex right)? But that outputs ints which is not what you want. Consider the following input file and output file. After your program runs. As viewed in vi
Input file
a
b^M
c^M^M ^L
d^L
Output file
a
b
c
Well maybe that's right, lets see what od has to say
od -x of input File
0a61 0d62 630a 0d0d 0c20 640a 0a0c
od -x of output File
0a61 0a62 0a63 0a0a 000a
Huh, what where did that null come from? But wait from the man page of od
-t type Specify the output format. type is a string containing one or more of the following kinds of type specifiers:
q a Named characters (ASCII). Control characters are displayed using the following names:
-h, -x Output hexadecimal shorts. Equivalent to -t x2.
-a Output named characters. Equivalent to -t a.
Oh, ok so instead use the -a option
od -a of input
a nl b cr nl c cr cr sp ff nl d ff nl
od -a of output
a nl b nl c nl nl nl nl
Forcing java to ignore \r
And finally, all that being said, you really have to overcome the implicit understanding of java that \r delimits a line, even contrary to the documentation. Even when explicitly setting the scanner to use a \r ignoring pattern, it still operates contrary to the documentation and you must override that again by setting the delimiter (see below). I've found the following will probably do what you want by insisting on Unix line semantics. I also added in some logic to not output a blank line.
public static void repl(File original,File file) throws IOException
{
Scanner fileScanner = new Scanner(original);
Pattern pattern1 = Pattern.compile("(?d).*");
fileScanner.useDelimiter("(?d)\\n");
BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file), "UTF8"));
while(fileScanner.hasNext(pattern1))
{
String next = fileScanner.next(pattern1);
next = next.replaceAll("(?d)(.*\\x0C)|(\\x0D)","");
if(next.length() != 0)
{
out.write(next);
out.newLine();
}
}
out.flush();
out.close();
}
With this change, the output above changes to.
od -a of input
a nl b cr nl c cr cr sp ff nl d ff nl
od -a of output
a nl b nl
Stuart Caie provided the answer. if you are looking for an code to avoid these characters.
Basic issue is , Org file using different line separator and the new file using different line separator character.
One easy way, find the Org file Separator character and use the same in new file.
try(BufferedWriter out = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(file)));
Scanner fileScanner = new Scanner(original);) {
String lineSep = null;
boolean lineSepFound = false;
while(fileScanner.hasNextLine())
{
if (!lineSepFound){
MatchResult matchResult = fileScanner.match();
if (matchResult != null){
lineSep = matchResult.group(1);
if (lineSep != null){
lineSepFound = true;
}
}
}else{
out.write(lineSep);
}
String next = fileScanner.nextLine();
next = next.replaceAll(".*\\x0C", ""); //remove up to ^L
out.write(next);
}
} catch ( IOException e) {
e.printStackTrace();
}
Note ** MatchResult matchResult = fileScanner.match(); would provide the matchResult for the last Match performed. And in our case we have used hasNextLine() - Scanner used linePattern to find the next line .. Scanner.hasNextLine Source code finding the line Separator ,
but unfortunately no way to get the line separator back. So i have used thier code to get the lineSep only once. and used that lineSep for creating new file.
Also per your code , you would be having extra line separator at the end of file. Corrected here.
Let me know if that works.

Special characters when run in netbeans are showing correctly, but when running "jar" file strange characters appear

Seems like a simple problem, but even after searching forum and web I could not find an answer.
When I run my program in netbeans all the special characters like ä, ö, ü are showing correctly. But when I run "jar" file of the same project (I did clean and rebuild) some strange characters as #A &$ and so on are appearing instead of correct character.
Any help would be appreciated.
//edited 22. 08. 2012 00:46
I thought the solution would be easier so I didn't post any code or details. Ok then:
//input file is in UTF-8
try {
BufferedReader in = new BufferedReader(new FileReader("fin.dir"));
String line;
while ((line = in.readLine()) != null) {
processLine(line, 0);
}
in.close();
} catch (FileNotFoundException ex) {
System.out.println(ex.getMessage());
} catch (IOException ex) {
System.out.println(ex.getMessage());
}
I am displaying characters in this way:
JOptionPane.showMessageDialog(rootPane, "Correct!\n\n"
+ testingFin.getWord(), "Congrats", 1);`
From the description of FileReader:
Convenience class for reading character files. The constructors of
this class assume that the default character encoding and the default
byte-buffer size are appropriate. To specify these values yourself,
construct an InputStreamReader on a FileInputStream.
If you're on Windows, the default encoding is ISO-8859-1, so as Jon commented, the encoding problem is occurring on input. Try this:
in = new BufferedReader(
new InputStreamReader(new FileInputStream("fin.dir"),"UTF-8"));
Add your netbeans setting under YOURNETBEANS/etc/netbeans.conf likes this;
-J-Dfile.encoding=UTF-8

How to preserve correct offset of string which is read from a file

I have a text.txt file which contains following txt.
Kontagent Announces Partnership with Global Latino Social Network Quepasa
Released By Kontagent
I read this text file into a string documentText.
documentText.subString(0,9) gives Kontagent, which is good.
But, documentText.subString(87,96) gives y Kontage in windows (IntelliJ Idea) and gives Kontagent in Unix environment. I am guessing it is happening because of blank line in the file (after which the offset got screwed). But, I cannot understand, why I get two different results. I need to get one result in the both the environments.
To read file as string I used all the functions talked about here
How do I create a Java string from the contents of a file? . But, I still get same results after using any of the functions.
Currently I am using this function to read the file into documentText String:
public static String readFileAsString(String fileName)
{
File file = new File(fileName);
StringBuilder fileContents = new StringBuilder((int)file.length());
Scanner scanner = null;
try {
scanner = new Scanner(file);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
String lineSeparator = System.getProperty("line.separator");
try {
while(scanner.hasNextLine()) {
fileContents.append(scanner.nextLine() + lineSeparator);
}
return fileContents.toString();
} finally {
scanner.close();
}
}
EDIT: Is there a way to write a general function which will work for both windows and UNIX environments. Even if file is copied in text mode.
Because, unfortunately, I cannot guarantee that everyone who is working on this project will always copy files in binary mode.
The Unix file probably uses the native Unix EOL char: \n, whereas the Windows file uses the native Windows EOL sequence: \r\n. Since you have two EOLs in your file, there is a difference of 2 chars. Make sure to use a binary file transfer, and all the bytes will be preserved, and everything will run the same way on both OSes.
EDIT: in fact, you are the one which appends an OS-specific EOL (System.getProperty("line.separator")) at the end of each line. Just read the file as a char array using a Reader, and everything will be fine. Or use Guava's method which does it for you:
String s = CharStreams.toString(new FileReader(fileName));
On Windows, a newline character \n is prepended by \r or a carriage return character. This is non-existent in Linux. Transferring the file from one operating system to the other will not strip/append such characters but occasionally, text editors will auto-format them for you.
Because your file does not include \r characters (presumably transferred straight from Linux), System.getProperty("line.separator") will return \r\n and account for non-existent \r characters. This is why your output is 2 characters behind.
Good luck!
Based on input you guys provided, I wrote something like this
documentText = CharStreams.toString(new FileReader("text.txt"));
documentText = this.documentText.replaceAll("\\r","");
to strip off extra \r if a file has \r.
Now,I am getting expect result in windows environment as well as unix. Problem solved!!!
It works fine irrespective of what mode file has been copied.
:) I wish I could chose both of your answer, but stackoverflow doesn't allow.

Java: how to write formatted output to plain text file

I am developing a small java application. At some point i am writing some data in a plain text file. Using the following code:
Writer Candidateoutput = null;
File Candidatefile = new File("Candidates.txt"),
Candidateoutput = new BufferedWriter(new FileWriter(Candidatefile));
Candidateoutput.write("\n Write this text on next line");
Candidateoutput.write("\t This is indented text");
Candidateoutput.close();
Now every thing goes fine, the file is created with the expected text. The only problem is that the text was not formatted all the text was on single line. But if I copy and paste the text in MS Word then the text is formatted automatically.
Is there any way to preserver text formatting in Plain text file as well?
Note: By text formatting I am referring to \n and \t only
Use System.getProperty("line.separator") for new lines - this is the platform-independent way of getting the new-line separator. (on windows it is \r\n, on linux it's \n)
Also, if this is going to be run on non-windows machines, avoid using \t - use X (four) spaces instead.
You can use line.separator system property to solve your issue.
E.g.
String separator = System.getProperty("line.separator");
Writer Candidateoutput = null;
File Candidatefile = new File("Candidates.txt"),
Candidateoutput = new BufferedWriter(new FileWriter(Candidatefile));
Candidateoutput.write(separator + " Write this text on next line");
Candidateoutput.write("\t This is indented text");
Candidateoutput.close();
line.separator system property is a platform independent way of getting a newline from your environment.
A PrintWriter does this platform independent - use the println() methods.
You would have to use the Java utility Formatter which can be found here: java.util.Formatter
Then all you would have to do is create an object of Formatter type such as this:
private Formatter output;
In this case, output will be the output file you are writing to.
Then you have to pass the file name to the output object like this:
output = new Formatter("name.of.your.file.txt")
Once that's done, you can either hard-code the file contents to your output file using the output.format command which is similar to the System.out.println or printf commands.
Or use the Scanner utility to input the data into memory then use output.format to output this data to the output object or file.
This is an example on how to write a record to output:
output.format( "%d %s %s %2f\n" , field1.decimal, field2.string, field3.string, field4.double)
There is a little bit more to it than this, but this sure beats parsing data, or using a bunch of complicated third party plugins.
To read this file you would redirect the Scanner utility to read a file instead of the console:
input = new Scanner(new File( "name.of.your.file.txt")
Window's Notepad needs \r\n to display a new-line correctly. Only \n is ignored by Notepad.
Well Windows expects a newline and a carriage return char to indicate a new line. So you'd want to do \r\n to make it work.

How to print the text file without boxes in java?

I have a text file that i'm trying to print but it prints boxes in between two characters. My code works fine for all the text files except this particular one. I cannot copy-paste this box character. So that i can check if the given character is that box and not print it using if condition. Please help. Thanks
Without a sample of the text you are trying to print, I think that it might be an issue with encoding. Here is a list of encodings supported by the java language. You might then want to do something like this:
Charset charset = Charset.forName("US-ASCII");
String s = ...;
try (BufferedWriter writer = Files.newBufferedWriter(file, charset)) {
writer.write(s, 0, s.length());
} catch (IOException x) {
System.err.format("IOException: %s%n", x);
}
(Example taken from here.)
My guess is that your document is UTF-16 encoded. Try to reencode it to UTF-8 or ASCII.

Categories

Resources