XML FILE LINE BREAKS

XML FILE LINE BREAKS - java

I am trying to use out writer in Java to create a XML file, but the generated XML File is in one line, which is very difficult to read and debug. I tried using backward-slash-n but the reader which I am using does not recognizes backward-slash-n so I need some alternative to break or backward-slash-n to add a line break to make it more readable and debug easily.
What should i do?
And also in notepad is there a way to add linebreaks using replace keyword for e.g., Search for > and replace with > + {Enter Key} i.e., line-break after every tag

In notepad++, you can search for
>
and replace with
>\r\n
just click the "Extended" search mode

Depending on how you do it, the text might be escaped twice. Try replacing \n with \\n.
As for the notepad part of the question I am very sure that it is not possible. I recommend Notepad++ (http://notepad-plus-plus.org). It supports this and many other functionalities, such as macros, converting between encodings, search/replace by regexp etc.

Related

SAX Parser, how to dynamically ignore & in input xml file for SAXParser.parse

I have a scenario where I get an XML from another service and I parse this file and render it to another file.
But, sometimes, we get & in the input file inside any tag and when we try to parse this file we get SAXException.
Is there a way we can dynamically replace &, or we can ignore the & sign while parsing?

After doing a bit of research I have come up with following points:
SAX Parser needs a clean XML file without any error, else it will fail and we cannot change characters dynamically in the input. So, we need to check the Input XML file before hand.
To change characters in the input file with ease use "StringEscapeUtils.escapeXml" provided by Apache in "org.apache.commons.lang.StringEscapeUtils" package. But, this to has its downside, as it will all the occurences of the character. For reference you can check this blog: "http://javarevisited.blogspot.com/2012/09/how-to-replace-escape-xml-special-characters-java-string.html"
But, my use case scenario was different, I need only particular character to be deleted from the input file. So, for that I had to code from scratch; I had to read file and check for the desired character to delete and delete it and write back to the file again.

Passing Unicode line return characters set in Class to client side (DWR/HTML/UTF8) for InDesign Team

I've built a content management tool that allows a product team to create and manage product that gets exported to a website and for a different team of designers to create print ads for newspapers displaying the same product data.
My problem is with the InDesign graphic designers and the macros that they use within InDesign. The macros have the ability to take copy/pasted text/data and auto format the text inside InDesign based on the presence of certain characters. In particular the design team uses tab, "soft line break" (shift return), and regular line breaks (hard returns) in their macros.
Right now I generate a block of text with the records and the desired formatting characters in a java Class and then that's sent via DWR to the client side. When there is a requirement for a tab character I send \t, return is \r and I was hoping that a soft line break would be \n however InDesign seems to regard both \r and \n as a regular line break.
I had given up on being able to pass a soft-return until yesterday when I cam across Unicode \u2028 (soft line break) and \u2029 (regular line break). I've tried outputting both of these characters instead of \r and \n in the hopes that InDesign may regard these characters differently. In the box that the designers copy the output from it looks like there is no character there. There's no line break at all in the places where I've specific \u2028 to appear. When I copy/paste the output into a text editor it shows me that there is an unrecognized character there (it displays as a box with a question mark around it).
Platform is Java/MySQL running on Tomcat.
To date, I haven't had to deal too much with character encoding in this application. Header has <meta charset="utf-8" /> set but that's about it so far. I've tried setting this to utf-16 but it doesn't change the output. All of the tables in the MySQL database are set to utf8/utf8_general_ci.
Thoughts? How can I force InDesign to take copy/pasted text and recognize all of its macro capable characters? Actually, it's just the soft line breaks that it's not recognizing. HELP! :)
Thank you. Sorry this is so long!
Ryan V

I've been playing around with ID CS6 (OS X) for a while and I can't for the life of me get it to recognize a pasted LF as a forced line break. LF and CR and CRLF all go to paragraph breaks. U+2028 and U+2029 are display as empty glyphs, not breaks.
I'm a little wary of posting this as an answer, but I'll give it a go:
You might consider providing the text as a downloaded .txt file. CS5 introduced "Tagged Text" (a sort of XML-ish text document with full support for InDesign characters, attributes, etc.,) so this means your designers will be able to place the text file and InDesign will treat everything as intended.
To turn your existing text into CS5+'s Tagged Text (see the reference here), plop a <ASCII-MAC> or <ASCII-WIN> (as appropriate) as the first line and escape any '<' or '>'s with a backslash, then you're free to use <0x000A> as a forced line break. (literally those 8 characters)
That's probably mega-overkill, but it's certainly the most stupidly reliable way I can think of. I'll edit if I get anything else working.
NB. "forced line break" is the term InDesign itself uses for the character produced by Shift+Enter, your "soft line break;" contrast with "paragraph break" for a standard carriage return. InDesign apparently represents forced breaks with LF (U+000A) and paragraph breaks with CR (U+000D).

I'm not sure how you were trying to transfer and print out your characters (if you post your DWR and javascript code I might be able to help more), but one thing I would try is to ensure that your java output is actual UTF-8 using something like:
String yourRecordString = "Some line 1. \u2028Some line 2.";
ByteBuffer bb = Charset.forName("UTF-8").encode(yourRecordString);
Then, you can write out the bytes in bb into an output stream/file and check them. (Make sure to write them as bytes and not as a String nor as chars.) For example, the UTF-8 encoding of \u2028 is E2 80 A8, so you should see that sequence at the appropriate place in your output. (I use hexmode in vim for things like this.)
Then, make sure that these bytes get received back on the javascript side. (While I'm not an expert with DWR, I might prefer to make your java function return something other than a String.)
This should at least help you diagnose where the problem lies. If you do see that sequence and if InDesign still isn't recognizing the soft line breaks, then you at least know the problem is with InDesign and that you will have to find some other solution (such as modifying the designer's macros to recognize other characters).
(Also, note that you can see the default encoding for your JVM using Charset.defaultCharset(). My guess is that your default is not UTF-8 and that InDesign may have also had a problem with the UTF-16 you tried due to endianess or something like that.)

How to check which pattern use in file by java?

I am reading different files from different operating systems.
In file there are lots of lines. but i got info that after every line there may be use "\n" or "\r" or something like that.
When i read file by following instructions from Best way to read a text file [closed]
then what should i add to print in console by if-else conditions which explain in Class Pattern.
I need help please.
I am trying by reading this. But still need experts help.
Thanks.

If you're just printing to the console, you can simply use \n. To give you a little background, the times where those two separators really matter is within the files themselves (and when you must manually detect or write newlines in a file). In a unix-based system, newlines are represented in files by \n. In Windows, newlines are represented by \r\n. But when you are printing to the console, \n will perform correctly.
Also in the pattern matching example you gave. You're simply manually adding a newline so you can tell where a newline is. So especially in that case, you can mark the newline however you want as long as it's consistent.

how to write special characters(interpunct) in a xml file in java?

I have a problem in writing a xml file with UTF-8 in JAVA.
Problem: I have a file with filename having an interpunct(middot)(·) in it. When im trying to write the filename inside a xml tag, using java code i get some junk number like  in filename instead of ·
OutputStreamWriter osw =new OutputStreamWriter(file_output_stream,"UTF8");
Above is the java code i used to write the xmlfile. Can anybody tell me why to understand and sort the problem ? thanks in advance

Java sources are UTF-16 by default.
If your character is not in it, then use an escape:
String a = "\u00b7";
Or tell your compiler to use UTF-8 and simply write it to the code as-is.

That character is ASCII 183 (decimal), so you need to escape the character to ·. Here is a demonstration: If I type "·" into this answer, I get "·"
The browser is printing your character because this web page is XML.
There are utility methods that can do this for you, such as apache commons-lang library's StringEscapeUtils.escapeXml() method, which will correctly and safely escape the entire input.

In general it is a good idea to use UTF-8 everywhere.
The editor has to know that the source is in UTF-8. You could use the free programmers editor JEdit which can deal with many encodings.
The javac compiler has to know that the java source is in UTF-8. In Java you can use the solution of #OndraŽižka.
This makes for two settings in your IDE.

Don't try to create XML by hand. Use a library for the purpose. You are just scratching the surface of the heap of special cases that will break a hand-made solution.
One way, using core Java classes, is to create a DOM, then serialize that using an no-op XSL transform that writes to a StreamResult. (if your document is large, you can do something similar by driving a SAX event handler.)
There are many third party libraries that will help you do the same thing very easily.

Writing to text files in Java

I have a method that writes a string to a text file using a DataOutputStream and the .writeBytes(String) method. If I write a string with a newline character, for example "I need \n help!", the new line is not displayed in notepad or other basic text editors. However, it does show up in WordPad, MS Word, etc. Why is this and can I fix it?

Mostly by using real text editors, which Notepad isn't.
You need to write system-specific newlines if you're not going to use a text editor that understands different flavors, or filter the file through something that does the conversion for you.
System.getProperty("line.separator");
This will give you an OS-specific line separator. It's less useful than you think.
System.out.printf("%n");
This does the same (and is available in String.format as well); also less useful than you think. It's more an editor thing, since any file could exist on any system, edited with any editor.

You should use System.getProperty("line.separator"); instead of directly using \n.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.