I'm kind of stuck with XPath and/or Java. I have XML document structures like the following:
<document>
<text>
<headline>This is the text's headline</headline>
This is the text.
</text>
</document>
What I need is this:
<document>
<text>
<headline>This is the text's headline</headline>
<w>This</w>
<w>is</w>
<w>the</w>
<w>text</w>
<w>.</w>
</text>
</document>
How in the world can I change the text content of the text-node while leaving the headline-node untouched?! (I'm using org.jdom2.xpath.)
Bob
Since you need to clear the text content of <text> and setting (empty) text removes all other content, you could clone the <headline> element first and add it back after using setText(). Then continue into your loop and create the <w> elements.
As for XPath, you can select the <heading> element with
/document/text/heading
and the text content of the <text> element with
/document/text/text()
Related
I have xml and I want to get, using xpath expression, text from Text node only if Text_2 contains elements. Is there any possibility? I couldn't find out any.
<List>
<Response>
<Node>
<SomeNode>
<Text>text</Text>
<Text_1>text_1</Text_1>
<Text_2 value_1="some value 1" value_2="some value 2" />
</SomeNode>
</Node>
</Response>
</List>
I tried to get Text_2 elements using //*[#value_1] but I stuck and do not have any other idea
Your text says "want to get, using xpath expression, text from Text node only if Text_2 contains elements", with your given sample that would be //SomeNode[Text_2/*]/Text. For some reasons I don't understand, however, in your sample the Text_2 element doesn't have any child elements.
I am using Java to extract values using XPath. I was able to extract elements under the element fields but the elements under records are not returned.
XML is as follows:
<?xml version="1.0" ?>
<qdbapi>
<action>****</action>
<errcode>0</errcode>
<errtext>No error</errtext>
<qid>****</qid>
<qname>****</qname>
<table>
<fields>
<field id="19" field_type="text" base_type="text">
</field>
</fields>
<records>
<record>
<f id="6">1</f>
</record>
</records>
</table>
</qdbapi>
Code below:
XMLDOMDocObj.selectNodes("//*[local-name()='fields']")//21 fields returned
XMLDOMDocObj.selectNodes("//*[local-name()='records']")//no records are returned
XML must have a single root element; yours has two: fields and records.
Wrap them in a single common root to get the results you expect.
Also, if your XML has no namespaces, there's no reason to defeat them. Instead of
//*[local-name()='records']
use
//records
See also
How does XPath deal with XML namespaces?
Why must XML documents have a single root element?
What is the difference between root node, root element and document element in XML?
I'm unmarshaling a couple of large XML files.
they have the common part and I decided to write the common parts in separate XML file and then include it using xi:include tag.
it looks like this:
<tag1>
<tag2>
</tag2>
<tag3>
</tag3>
<xi:include href = "long/common/part/of/partial/xml/file1"/>
<xi:include href = "long/common/part/of/partial/xml/file2"/>
</tag1>
at this moment I would like to parametrize the long/common/part.
I tried to define a variable using xsl:variable like this
<xsl:variable name="test">
"long/common/part/of/partial/xml/"
</xsl:variable>
but the assigning value to href was a problem, neither the
<xi:include href = "{$test}"/>
or
<xi:include href = <xsl:value-of select="test"/>
wasn't working.
Is there a way to assign value to XML attribute?
You're mixing XInclude, XSLT, and ad-hoc {$var} syntax (not part of XML) here. What you can do to parametrize a href value in XInclude elements is to use an entity reference (XML's and SGML's mechanism for text substitution variables among other things):
<xi:include href="&href-value;"/>
where href-value must be bound to the string long/common/part/of/partial/xml/file1 either programmatically, or (preferably) by declaring it in the prolog eg:
<!DOCTYPE tag1 [
<!ENTITY href-value "long/common/part/of/partial/xml/file1">
]>
<tag1>
<xi:include href = "&href-value;"/>
</tag1>
However, since now you're using entity references anyway, you can achieve just the same with just entities, and without XInclude at all:
<!DOCTYPE tag1 [
<!ENTITY common-part SYSTEM "long/common/part/of/partial/xml/file1">
]>
<tag1>
&common-part;
</tag1>
This pulls the content of long/common/part/of/partial/xml/file1 into the common-part entity, then references that value in content, with the XML parser treating the document as if the replacement value for common-part (eg. whatever is stored in long/common/part/of/partial/xml/file1) had been specified directly in the document.
Hope this isn't too confusing; there's a general explanation how entities in XML and SGML work in this answer
i need to parse a XML file that looks like this
1.<?xml version="1.0" encoding="UTF-8"?>
2.<Root>
3.<Record>
4.<in><![CDATA[<?xml version="1.0" encoding="UTF-8"?><XML><Attribute AttrID="A">Test</Attribute>-<Attribute AttrID="B"> <![CDATA[Aap Noot Mies]]> </Attribute>]]></XML></in>
5.<out><![CDATA[]]></out>
6.</Record>
7.</Root>
I am getting a erro while parsing line number 4 Is there any way to escape a CDATA end token ( ]]> ) within a CDATA section in an xml document.
Your input is not well formed there are several errors I think you need to fix whatever generated that to generate something more like
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<Record>
<in><![CDATA[<?xml version="1.0" encoding="UTF-8"?><!-- - --><XML><Attribute AttrID="A">Test</Attribute>-<Attribute AttrID="B"> <![CDATA[Aap Noot Mies]]<![CDATA[> </Attribute></XML>]]></in>
<out><![CDATA[]]></out>
</Record>
</Root>
Note that the outer CDATA needs <![CDATA[ not <!CDATA[ the first use of ]]> needs to be quoted (for example by stopping and starting the outer CDATA section as here). The outer ]]> needs to be moved after the </XML> so that the end as well as the start of the element is quoted.
That makes the file technically well formed, although elements with name XML (or in general starting with xml in upper or lower case are reserved by the W3C for use in XML related specifications and should not be used in user XML files unless it is a specific element or attribute (such as xmlns defined by the W3C)
In addition I added a (quoted) comment around the dash after the XML declaration as if that CDATA section were extracted and made into an XML document it would make the resulting document non-well formed as only white space or comments and PIs are allowed before the first element.
i want to parse the following type of text. Example1
<root>my name is <j> <b> mike</b> </j> </root>
example 2
<root> my name is <mytag1 attribute="val" >mike</mytag1> and yours is <mytag2> john</mytag2> </root>
can i parse it using a DOM parser?I will not have the same format evry time .I can have different formats in which the tags are nested.I dont know the format in advance.
Both these examples are valid XML documents so there's no reason you can;t do this.
If your XML is very simple, especially if it combines text and tags together, you may want to run it via an XSL transformation first, to have a format easier to parse or to convert it to other format, such as HTML.
You can use a DOM parser for the examples you've given - they're valid XML. However, you wouldn't be able to use it for non-XML as per your subject line.
When you say you can have "different formats in which the tags are nested" what exactly do you mean? If it's always simple nesting, e.g.
<root>
<tag1>
<tag2>
<tag3>
Stuff
</tag3>
</tag2>
</tag1>
</root>
Then that will be fine. However, an XML parser won't like markup where an "outer" tag is closed before an "inner" one:
<root>
<tag1>
<tag2>
Stuff
</tag1> <!-- Invalid -->
</tag2>
</root>