For a certain application (serializing and deserializing an object for transport via XMPP PubSub item payload), I need to create XML fragments - this is I have to omit the document declaration.
I'm using the org.xmlpull.v1.XmlSerializer class; unfortunately there doesn't seem to be much documentation available on the correct usage of it. At least all documentation I've found on its startDocument() method leaves it unclear whether I can or cannot skip calling this method. At least all examples I've found call this method (but all of them explained just how to create complete XML documents, no fragments).
To give a code example:
XmlSerializer xmlSerializer = Xml.newSerializer();
StringWriter xmlStringWriter = new StringWriter();
try {
xmlSerializer.setFeature("http://xmlpull.org/v1/doc/features.html#indent-output", true);
xmlSerializer.setOutput(xmlStringWriter);
// xmlSerializer.startDocument("UTF-8", true);
xmlSerializer.startTag(null, "tag-name");
// ...
xmlSerializer.endTag(null, "tag-name";
// xmlSerializer.endDocument();
xmlSerializer.flush();
} catch (IOException e) {
// Hanle exception
}
String xmlOutputString = xmlStringWriter.toString();
Is this allowed? And if not, is there any other way to generate fragments with XMLSerializer without parsing the output string in order to manually remove the document declaration (e.g. calling startDocument only with null parameters)?
Here comes the answer in short terms: No, calling startDocument() is not required and will skip generating the document declaration.
Related
I am trying to parse an XML file using Java that lives on a network drive...I have reviewed lots of XML parsing info here but cannot find the answer I need... the problem is that the getDocument() routine constantly returns a null value even though the parser gets a accurate location and file name.
Here is the code...
String ThisXMLFile = XMLFileData.getPath();
DOMParser myXMLParser = new DOMParser();
myXMLParser.parse(ThisXMLFile);
Document doc = myXMLParser.getDocument();
Some notes:
I had to use getPath() as the getName() function did not return the fully qualified file name and path - the XML file lives on a network directory and that directory is mapped on my PC to the 'V' drive
I have imported all the required class header files for DOM objects
The variable names given above are real and accurate so if I have inadvertently used a reserved keyword in a variable declaration then please offer correction.
I have extensive programming experience in a few languages but this is my first real Java app.
all the lines of code and the variables above work, until I reach the last line and then getDocument() just sets the doc variable to null... which makes the rest of the program break.
I Believe that your are calling the wrong method... according to your code, you're executing: DOMParser.parse(systemId) when you need to call: DOMParser.parse(InputSource) ...
to create an InputSource you can can do this:
InputSource source = new InputSource(new FileInputStream(ThisXMLFile));
myXMLParser.parse(source);
Document doc = myXMLParser.getDocument();
NOTE: remember to close the opened FileInputStream!!!
XMLInputFactory XMLFactory = XMLInputFactory.newInstance();
XMLStreamReader XMLReader = XMLFactory.createXMLStreamReader(myXMLStream);
while(XMLReader.hasNext())
{
if (XMLReader.getEventType() == XMLStreamReader.START_ELEMENT)
{
String XMLTag = XMLReader.getLocalName();
if(XMLTag.equals("value"))
{
String idValue = XMLReader.getAttributeValue(null, "id");
if (idValue.equals(ElementName))
{
System.out.println(idValue);
XMLReader.nextTag();
System.out.println(XMLReader.getElementText());
}
}
}
XMLReader.next();
}
so this is the code I finally got to...it works and solves the issue of retrieving specific XML data fro a XML file. I wanted at first to use nodelists, elements, Documents, etc but those functions never did work for me... this one did - thanks to all for the answers given as they helped me think this one through...
I´m using the Jackson XmlMapper to map and xml into a POJO but I have the following problem:
My XML looks like this (not the original one, only an example):
<?xml version="1.0" encoding="UTF-8"?>
<result>
<pojo>
<name>test</name>
</pojo>
</result>
The problem is, I don´t want to parse the "result" object. I wan´t to parse the pojo as an own object. Can I do this with XmlMapper?
thank you!
Artur
You can do it but you must write some boiler plate code.
You must create an instance of XMLStreamReader to be able to do customized reading of your xml input. The next() method allows to go to the next parsing event of the reader. It's rather a tricky method() related to the internal rules of the reader. So read the documentation to understands particularities :
From the Javadoc:
int javax.xml.stream.XMLStreamReader.next() throws XMLStreamException
Get next parsing event - a processor may return all contiguous
character data in a single chunk, or it may split it into several
chunks. If the property javax.xml.stream.isCoalescing is set to true
element content must be coalesced and only one CHARACTERS event must
be returned for contiguous element content or CDATA Sections. By
default entity references must be expanded and reported transparently
to the application. An exception will be thrown if an entity reference
cannot be expanded. If element content is empty (i.e. content is "")
then no CHARACTERS event will be reported.
Given the following XML: content
textHello</greeting>]]>other content The
behavior of calling next() when being on foo will be: 1- the comment
(COMMENT) 2- then the characters section (CHARACTERS) 3- then the
CDATA section (another CHARACTERS) 4- then the next characters section
(another CHARACTERS) 5- then the END_ELEMENT
NOTE: empty element (such as ) will be reported with two
separate events: START_ELEMENT, END_ELEMENT - This preserves parsing
equivalency of empty element to . This method will throw an
IllegalStateException if it is called after hasNext() returns false.
Returns: the integer code corresponding to the current parse event
Let me illustrate the way to proceed with an unit test :
#Test
public void mapXmlToPojo() throws Exception {
XMLInputFactory factory = XMLInputFactory2.newFactory();
InputStream inputFile = MapXmlToPojo.class.getResourceAsStream("pojo.xml");
XMLStreamReader xmlStreamReader = factory.createXMLStreamReader(inputFile);
XmlMapper xmlMapper = new XmlMapper();
xmlStreamReader.next();
xmlStreamReader.next();
Pojo pojo = xmlMapper.readValue(xmlStreamReader, Pojo.class);
Assert.assertEquals("test", pojo.getName());
}
Just to add more on this (In order yo make this generic), I had a scenario where I had to extract a specific element and map that to java object, in this case we can put a conditional check whenever that tag encountered get that out and map the same.
I have added DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES to ignore fields which is not needed, in case our pojo has less fields to map than what we are getting from end source.
Below is the tested code -
while (xmlStreamReader.hasNext()) {
xmlStreamReader.next();
if (xmlStreamReader.nextTag() == XMLEvent.START_ELEMENT) {
QName name = xmlStreamReader.getName();
if (("spcific_name").equalsIgnoreCase(name.getLocalPart())) {
objectMapper.configure(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES, false);
result = objectMapper.readValue(xmlStreamReader, Pojo.class);
break;
}
}
}
I'm creating a program which checks the legitimacy of a given URL. I've already created my own algorithm for this, but now I want to add PhishTank's services into my program.
They provide services where you can directly query a URL from their website, but they have set a certain quota on the number of queries you can make per day. The other option, which I'm going with, is to simply download their database and work with it locally, without restrictions.
The file you get is in XML, and found some code to test with, but it seems like their XML contains illegal characters (such as unicode 0x07 -- the [BEL] character) inside CDATA, and so the parsing throws me an exception.
<url><![CDATA[http://shaghaf-edu.com/sign-in/??msg=InvalidOnlineIdException&id[BEL]da9ca9b23227a572d1fb5ff4ff91e3&lpOlbResetErrorCounter=0l=&request_locale=en-us]]></url>
I've done a bit of searching and all I've found is solutions that seem fine to rather small XML-files. The one I'm working with is close to 2.7 million lines -- I'm not sure how efficiently a regex would work in this case or a char-to-char comparison.
I should note that their database is updated hourly, and has to be redownloaded. So cleaning the file once manually isn't an option.
So I'm wondering if there is any fast and efficient way of solving this problem?
I don't have the exact code with me, but I use is a very slight variation of this which I found here on StackOverflow:
private void start() throws Exception
{
URL url = new URL("http://localhost:8080/AutoLogin/resource/web.xml");
URLConnection connection = url.openConnection();
Document doc = parseXML(connection.getInputStream());
NodeList descNodes = doc.getElementsByTagName("description");
for(int i=0; i<descNodes.getLength();i++)
{
System.out.println(descNodes.item(i).getTextContent());
}
}
private Document parseXML(InputStream stream)
throws Exception
{
DocumentBuilderFactory objDocumentBuilderFactory = null;
DocumentBuilder objDocumentBuilder = null;
Document doc = null;
try
{
objDocumentBuilderFactory = DocumentBuilderFactory.newInstance();
objDocumentBuilder = objDocumentBuilderFactory.newDocumentBuilder();
doc = objDocumentBuilder.parse(stream);
}
catch(Exception ex)
{
throw ex;
}
return doc;
}
Answering by asking a question ...
Why not write a simple pre-processing utility?
It could read the XML file as is (line by line); and do whatever is required to turn that content into "correct" XML.
In other words: you should explicitly distinguish between the task of "preparing your input", and "actually working that xml input". This will also make it much easier to do fine tuning. If you find that regular expressions are too expensive; then just change the the "pre-processor" to not use them. And afterwards, easily measure the effects on runtime ...
We have large project which uses immense amount of tagx of our creation, and we are about to re factor the UIs underlying code. This means that many tagx will be merged, thrown away and views (jspx-s) rewritten. To be able to delegate the re factoring into reasonable pieces without conflicting with each other we would like to "map" the tagx calls.
Is there an easy way, or a tool maybe, that goes through the jspx/tagx files and lists which tagx they have called (not just the library, but the specific tagx)?
So for example:
create.jspx calls in its body:
c:if
form:create
form:dependency
myowntaglib1:myowntag1
myowntaglibN:myowntagN
etc
and app lists this out.
Simplest way to do that would be to write simple java program that recursively goes through directories searching for jspx files, and using XML Parser, ie. SAX parser listen to
XMLStreamConstants.START_ELEMENT
and then displaying
xmlReader.getName().getLocalPart();
Sample code:
XMLInputFactory xmlFactory = XMLInputFactory.newInstance();
List<TercCode> tercCodeList = new ArrayList<TercCode>();
try {
XMLStreamReader xmlReader = xmlFactory.createXMLStreamReader(fname, stream);
while (xmlReader.hasNext()) {
// returns the event type
int eventType = xmlReader.next();
// returns event type for reference
if (xmlReader.getEventType() == XMLStreamConstants.START_ELEMENT){
System.out.println(xmlReader.getName().getLocalPart());
}
} catch (XMLStreamException e) {
e.printStackTrace();
}
stream should be FileInputStream for fname file.
Instead of displaying tag names, you can put them to HashMap and display them after all file is parsed, you'll not get duplicates then.
This block of code essentially takes a JAXB Object and turns it into a JSONObject
StringWriter stringWriter = new StringWriter();
marshaller.marshal(jaxbObj, stringWriter);
try {
JSONObject jsonObject = XML.toJSONObject(stringWriter.toString());
resp.getOutputStream().write(jsonObject.toString(2).getBytes());
} catch (JSONException e) {
throw new ServletException("Could not parse JSON",e);
}
Unfortunately, this transformation doesn't turn, say, a String like "true" into a boolean, leaving the poor front end guy to do it.
I would think that I want to somehow map over the values in the JSONObject, calling stringToValue on each. I have a feeling there is a better way. Any ideas?
Well, JAXB itself won't even produce JSON to begin with, so you are using something else in addition (maybe use it via Jersey). So maybe package in question has something.
But why try to do this with org.json objects? Just use regular Java bean with expected type, create that from whatever input bean you have. No need for magic, just explicit code.