i want to parse the following type of text. Example1
<root>my name is <j> <b> mike</b> </j> </root>
example 2
<root> my name is <mytag1 attribute="val" >mike</mytag1> and yours is <mytag2> john</mytag2> </root>
can i parse it using a DOM parser?I will not have the same format evry time .I can have different formats in which the tags are nested.I dont know the format in advance.
Both these examples are valid XML documents so there's no reason you can;t do this.
If your XML is very simple, especially if it combines text and tags together, you may want to run it via an XSL transformation first, to have a format easier to parse or to convert it to other format, such as HTML.
You can use a DOM parser for the examples you've given - they're valid XML. However, you wouldn't be able to use it for non-XML as per your subject line.
When you say you can have "different formats in which the tags are nested" what exactly do you mean? If it's always simple nesting, e.g.
<root>
<tag1>
<tag2>
<tag3>
Stuff
</tag3>
</tag2>
</tag1>
</root>
Then that will be fine. However, an XML parser won't like markup where an "outer" tag is closed before an "inner" one:
<root>
<tag1>
<tag2>
Stuff
</tag1> <!-- Invalid -->
</tag2>
</root>
Related
I am using java to process Xpath for my project. I have and XMl similar to this
'''
<Tag1>
<Tag2>
<Name>ABC\Test</Name>
<value>10</value>
</Tag2>
<Tag2>
<Name>ABC\Test\test1</Name>
<value>112</value>
</Tag2>
<Tag1>
'''
My requirement is to write an Xpath that works for all similar xml to get the "Value" where name is Test. In the above example ABC in ABC\test can be anything(i.e. in one set of xml it will be abc in other xyz) so cannot use strict compare like:
//Tag2[Name[text()="text"]/value/text()
This will fail as the exact match us not there. I also tried the contains function like:
//Tag2[contains(Name,"test")]/value/text()
This case works but it return both the value node.
Is there any other function with which i can achieve this. Any help will be highly appreciated.
(1) The input XML is not well-formed. So I fixed it.
(2) XML is case sensitive, so looking for the "test" string will not find "Test".
XML
<Tag1>
<Tag2>
<Name>ABC\Test</Name>
<value>10</value>
</Tag2>
<Tag2>
<Name>ABC\Test\test1</Name>
<value>112</value>
</Tag2>
</Tag1>
XPath
/Tag1/Tag2[contains(Name,"test")]/value/text()
I'm unmarshaling a couple of large XML files.
they have the common part and I decided to write the common parts in separate XML file and then include it using xi:include tag.
it looks like this:
<tag1>
<tag2>
</tag2>
<tag3>
</tag3>
<xi:include href = "long/common/part/of/partial/xml/file1"/>
<xi:include href = "long/common/part/of/partial/xml/file2"/>
</tag1>
at this moment I would like to parametrize the long/common/part.
I tried to define a variable using xsl:variable like this
<xsl:variable name="test">
"long/common/part/of/partial/xml/"
</xsl:variable>
but the assigning value to href was a problem, neither the
<xi:include href = "{$test}"/>
or
<xi:include href = <xsl:value-of select="test"/>
wasn't working.
Is there a way to assign value to XML attribute?
You're mixing XInclude, XSLT, and ad-hoc {$var} syntax (not part of XML) here. What you can do to parametrize a href value in XInclude elements is to use an entity reference (XML's and SGML's mechanism for text substitution variables among other things):
<xi:include href="&href-value;"/>
where href-value must be bound to the string long/common/part/of/partial/xml/file1 either programmatically, or (preferably) by declaring it in the prolog eg:
<!DOCTYPE tag1 [
<!ENTITY href-value "long/common/part/of/partial/xml/file1">
]>
<tag1>
<xi:include href = "&href-value;"/>
</tag1>
However, since now you're using entity references anyway, you can achieve just the same with just entities, and without XInclude at all:
<!DOCTYPE tag1 [
<!ENTITY common-part SYSTEM "long/common/part/of/partial/xml/file1">
]>
<tag1>
&common-part;
</tag1>
This pulls the content of long/common/part/of/partial/xml/file1 into the common-part entity, then references that value in content, with the XML parser treating the document as if the replacement value for common-part (eg. whatever is stored in long/common/part/of/partial/xml/file1) had been specified directly in the document.
Hope this isn't too confusing; there's a general explanation how entities in XML and SGML work in this answer
It is easy to parse XML in which tags name are fixed. In XStream, we can simply use #XStreamAlias("tagname") annotation. But how to parse XML in which tag name is not fixed. Suppose I have following XML :
<result>
<result1>
<fixed1> ... </fixed1>
<fixed2> ... </fixed2>
</result1>
<result2>
<item>
<America>
<name> America </name>
<language> English </language>
</America>
</item>
<item>
<Spain>
<name> Spain </name>
<language> Spanish </language>
</Spain>
</item>
</result2>
</result>
Tag names America and Spain are not fixed and sometimes I may get other tag names like Germany, India, etc.
How to define pojo for tag result2 in such case? Is there a way to tell XStream to accept anything as alias name if tag name is not known before-hand?
if it is ok for you to get the tag from inside the tag itself (field 'name'), using Xpath, you can do:
//result2/*/name/text()
another option could be to use the whole element, like:
//result2/*
or also:
//result2/*/name()
Some technologies (specifically, data binding approaches) are optimized for handling XML whose structure is known at compile time. Others (like DOM and other DOM-like tree models - JDOM, XOM etc) are designed for handling XML whose structure is not known in advance. Use the tool for the job.
XSLT and XQuery try to blend both. In their schema-aware form, they can take advantage of static structure information when it is available. But more usually they are run in "untyped" mode, where there is no a-priori knowledge of element names or structure, and everything is handled as it comes. The XSLT rule-based processing paradigm is particularly well suited to "semi-structured" XML whose content is unpredictable or variable.
I have One XML Like
<root>
<name id="1">Abc</name>
<salary>25000</salary>
</root>
I want something like this
<root>
<name id="1,2">Abc</name>
<salary>25000</salary>
</root>
I am able to create the attribute by using DOM parser as:
Document doc = _docBuilder.newDocument();`
Attr attr = doc.createAttribute("id");
attr.setValue("1");
name.setAttributeNode(attr);
How can I get multiple attribute values for the same attribute.
XML does not support attributes with multiple values.
You could certainly do: attr.setValue("1,2");
However that really isn't very XML friendly. Also, you probably shouldn't have more than one value for an id. You may wish to consider something like this:
<thing>
<name>Abc</name>
<reference_ids>
<id>1</id>
<id>2</id>
</reference_ids>
</thing>
I have an XML, for example
<root>
<config x="xxx" y="yyyy" z="zzz" />
<properties>blah blah blah </properties>
<example>
<name>...</name>
<decr>...</descr>
</example>
<example>
<name>...</name>
<decr>...</descr>
</example>
</root>
and I need to get nodes config, and properties and all values in it.
Thank you
Xpath can fetch you the data in the config tag. You need to create an expression first like this
expression="//root/config/#x", to get value of x,y,z.
For properties, follow this thread :
Parsing XML with XPath in Java
Hope this helps
DOM,DOM4J,SAX..
if the size of XML file is small,you can use DOM or DOM4J,but the size is big , you use the SAX
If you directly want to query or fetch data XPath can help, but if you want the data as Java Objects so that you can use it further then use JAXB
You can use SAX parser to read the xml manipulate its event based parsing and consumes more memory.
If your xml is big and requires lot of manipulations then go-for DOM/DOM4j either is good. DOM4L is very latest. DOM is widely used in industry.
Based on your requirement go for good parser.
Thanks,
Pavan