I'm unmarshaling a couple of large XML files.
they have the common part and I decided to write the common parts in separate XML file and then include it using xi:include tag.
it looks like this:
<tag1>
<tag2>
</tag2>
<tag3>
</tag3>
<xi:include href = "long/common/part/of/partial/xml/file1"/>
<xi:include href = "long/common/part/of/partial/xml/file2"/>
</tag1>
at this moment I would like to parametrize the long/common/part.
I tried to define a variable using xsl:variable like this
<xsl:variable name="test">
"long/common/part/of/partial/xml/"
</xsl:variable>
but the assigning value to href was a problem, neither the
<xi:include href = "{$test}"/>
or
<xi:include href = <xsl:value-of select="test"/>
wasn't working.
Is there a way to assign value to XML attribute?
You're mixing XInclude, XSLT, and ad-hoc {$var} syntax (not part of XML) here. What you can do to parametrize a href value in XInclude elements is to use an entity reference (XML's and SGML's mechanism for text substitution variables among other things):
<xi:include href="&href-value;"/>
where href-value must be bound to the string long/common/part/of/partial/xml/file1 either programmatically, or (preferably) by declaring it in the prolog eg:
<!DOCTYPE tag1 [
<!ENTITY href-value "long/common/part/of/partial/xml/file1">
]>
<tag1>
<xi:include href = "&href-value;"/>
</tag1>
However, since now you're using entity references anyway, you can achieve just the same with just entities, and without XInclude at all:
<!DOCTYPE tag1 [
<!ENTITY common-part SYSTEM "long/common/part/of/partial/xml/file1">
]>
<tag1>
&common-part;
</tag1>
This pulls the content of long/common/part/of/partial/xml/file1 into the common-part entity, then references that value in content, with the XML parser treating the document as if the replacement value for common-part (eg. whatever is stored in long/common/part/of/partial/xml/file1) had been specified directly in the document.
Hope this isn't too confusing; there's a general explanation how entities in XML and SGML work in this answer
Related
It is easy to parse XML in which tags name are fixed. In XStream, we can simply use #XStreamAlias("tagname") annotation. But how to parse XML in which tag name is not fixed. Suppose I have following XML :
<result>
<result1>
<fixed1> ... </fixed1>
<fixed2> ... </fixed2>
</result1>
<result2>
<item>
<America>
<name> America </name>
<language> English </language>
</America>
</item>
<item>
<Spain>
<name> Spain </name>
<language> Spanish </language>
</Spain>
</item>
</result2>
</result>
Tag names America and Spain are not fixed and sometimes I may get other tag names like Germany, India, etc.
How to define pojo for tag result2 in such case? Is there a way to tell XStream to accept anything as alias name if tag name is not known before-hand?
if it is ok for you to get the tag from inside the tag itself (field 'name'), using Xpath, you can do:
//result2/*/name/text()
another option could be to use the whole element, like:
//result2/*
or also:
//result2/*/name()
Some technologies (specifically, data binding approaches) are optimized for handling XML whose structure is known at compile time. Others (like DOM and other DOM-like tree models - JDOM, XOM etc) are designed for handling XML whose structure is not known in advance. Use the tool for the job.
XSLT and XQuery try to blend both. In their schema-aware form, they can take advantage of static structure information when it is available. But more usually they are run in "untyped" mode, where there is no a-priori knowledge of element names or structure, and everything is handled as it comes. The XSLT rule-based processing paradigm is particularly well suited to "semi-structured" XML whose content is unpredictable or variable.
We have a Java-based system that reads data from a database, merges individual data fields with preset XSL-FO tags and converts the result to PDF with Apache FOP.
In XSL-FO format it looks like this:
<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE Html [
<!ENTITY nbsp " ">
<!-- all other entities -->
]>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format">
<xsl:output method="xml" indent="yes" />
<xsl:template match="/">
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format" xmlns:svg="http://www.w3.org/2000/svg" font-family="..." font-size="...">
<fo:layout-master-set>
<fo:simple-page-master master-name="Letter Page" page-width="8.500in" page-height="11.000in">
<!-- appropriate settings -->
</fo:simple-page-master>
</fo:layout-master-set>
<fo:page-sequence master-reference="Letter Page">
<!-- some static content -->
<fo:flow flow-name="xsl-region-body">
<fo:block>
<fo:table ...>
<fo:table-column ... />
<fo:table-body>
<fo:table-row>
<fo:table-cell ...>
<fo:block text-align="...">
<fo:inline font-size="..." font-weight="...">
<!-- Header / Title -->
</fo:inline>
</fo:block>
</fo:table-cell>
</fo:table-row>
</fo:table-body>
</fo:table>
</fo:block>
<fo:block>
<fo:table ...>
<fo:table-column ... />
<fo:table-body>
<fo:table-row>
<fo:table-cell>
<fo:block ...>
<!-- Field A -->
</fo:block>
</fo:table-cell>
</fo:table-row>
</fo:table-body>
</fo:table>
<!-- Other fields in a very similar fashion as the above "Field A" -->
</fo:block>
</fo:flow>
</fo:page-sequence>
</fo:root>
</xsl:template>
</xsl:stylesheet>
Now I am looking for a way to allow some of the fields to contain static HTML-formatted content. This content will be generated by our HTML-enabled editor (something along the lines of CLEditor, CKEditor, etc.) or pasted from outside.
My plan is to follow the recipe from this JavaWorld article:
use JTidy to convert HTML-formatted string to proper XHTML
further modify xhtml2fo.xsl from Antenna House to remove all document-wide and page-wide transformations
apply this modified XSLT to my XHTML string (javax.xml.transform)
extract all the nodes under the root with XPath (javax.xml.xpath)
feed the result directly into existing XSL-FO document
I have a bare-bone version of such code and got the following error:
(Location of error unknown)org.apache.fop1.fo.ValidationException:
"{http://www.w3.org/1999/XSL/Format}table-body" is not a valid child
of "fo:block"! (No context info available)
My questions:
What would be the way to troubleshoot this issue?
Can <fo:block> serve as a generic container with other objects (including tables) nested inside?
Is this an overall reasonable approach to solving the task?
If someone already "been there done that", please share your experience.
If you use an XSLT debugger such as in oXygen or XML Spy, then you can step through the transformation. With oXygen -- not sure about XML Spy or other editors -- if you click on the markup in the debugger output, oXygen highlights the markup from both the source and the stylesheet that produced that node.
Once you have the FO, the focheck framework (https://github.com/AntennaHouse/focheck) has the most complete validation of FO currently available.
fo:block can contain tables, etc. In the XSL 1.1 spec, the definition of every FO includes a 'Contents' subsection that lists its allowed content. See, e.g., http://www.w3.org/TR/xsl11/#fo_block. The definitions of the 'parameter entities' in the content models are at http://www.w3.org/TR/xsl11/#d0e6532, but some FOs have additional restrictions in the text of their definitions.
The article that you cite doesn't seem to have the 'extract all the nodes under the root with XPath' step, and I'm not sure why you need it. Other than that, it looks like a reasonable approach for doing the job using Java.
Instead of inserting the FO transformed from your JTidy-ed HTML into the static FO, you could replace your <!-- Field A --> with non-FO markup that provides enough information to make a reference to the field to insert. You can then make an XSLT stylesheet that transforms the template+references document into straight FO by doing an identity transform on the FO parts -- as in the answer from #kevin-brown -- and using the information in the reference markup to construct the URI to use with the document() function (http://www.w3.org/TR/xslt#document) to find the markup to insert.
If the FO for the field content is sitting on the disk, then using document() is straightforward. If it's not, then you'd have to do something like overriding the URIResolver used by the XSLT processor so that, rather than looking on the disk, it does the right thing to retrieve the content. You may even be able to have the JTidying happen as part of the URIResolver retrieving the HTML. You could also do the transformation to FO 'inside' the URIResolver or, also as #kevin-brown suggested, do it as a separate mode. If the transformation is done before or during the URIResolver retrieving the FO, then the 'main' transformation of template+references to FO just needs to extract the right part of the FO sub-document, e.g. document('constructed-URI')/fo:root/fo:page-sequence/*. However, if you're modifying the stylesheet from Antenna House, then you should be able to modify it to not produce an outer fo:root, etc., anyway.
I did something similar years ago with overriding the URI resolver for the libxslt XSLT processor for an XSLT-based server: the context for successive runs of the inner XSLT processor was saved as documents at special URIs and weren't necessarily written to the file system at all.
You could, instead, possibly write an extension function that does the lookup of the references to the fields. The Print and Page Layout Community Group # W3C, for example, has produced extension functions for multiple XSLT processors that runs an FO processor in the middle of the XSLT transformation to get back the XML for an area tree for the formatted result. See http://www.w3.org/community/ppl/wiki/XSLTExtensions
The best way to troubleshoot is to use a validating viewer/editor to examine the XSL FO. Many (such as oXygen) will show you errors in XSL FO structure as you open them and they will describe the issue (just as the error reported).
In your case, you obviously have an fo:table-body as a child of fo:block. It cannot be. An fo:table-body have but one valid parent, fo:table. You are either missing the fo:table tag or you have erroneously inserted an fo:block in this position.
In my opinion, I might do things slightly different. I would put the XHTML content inline into the XSL FO right where you want it. Then I would create an identity transform that copies over all the content that is fo-based, but converts the XHTML parts using XSL. This way, you can actually step that transform in an XSL editor like oXygen and see where errors occur and exactly why. Like any other degugger.
Note: You may wish to look at other XSLs also, especially if your HTML may have any style="" CSS attributes. If this is the case it is not simple HTML, then you will need a better method for processing the HTML with CSS to FO.
http://www.cloudformatter.com/css2pdf is based on this complete transform. That general stylesheet is available here: http://xep.cloudformatter.com/doc/XSL/xeponline-fo-translate-2.xsl
I am the author of that stylesheet. It does much more than you ask, but has a fairly complex parsing recursion for converting CSS styling into XSL FO attributes.
I have this XML:
<Body xmlns:wsu="http://mynamespace">
<Ticket xmlns="http://othernamespace">
<Customer xlmns="">Robert</Customer>
<Products xmlns="">
<Product>a product</>
</Products>
</Ticket>
<Delivered xmlns="" />
<Payment xlmns="">cash</Payment>
</Body>
I am using Java to read it as a DOM document. I want remove the empty namespace attributes (i.e., xmlns=""). Is there any way to do that?
You need to understand that xmlns is a very special attribute. Basically, the xmlns="" is so that your Customer element is in the "unnamed" namespace, rather than the http://othernamespace namespace (and likewise for other elements which would otherwise inherit a default namespace from their ancestors).
If you want to get rid of the xmlns="", you basically need to put the elements into the appropriate namespace - so it's changing the element name. I don't think the W3C API lets you change the name of an element - you may well need to create a new element with the appropriate namespaced-name, and copy the content. Or if you're responsible for creating the document to start with, just use the right namespace.
I have One XML Like
<root>
<name id="1">Abc</name>
<salary>25000</salary>
</root>
I want something like this
<root>
<name id="1,2">Abc</name>
<salary>25000</salary>
</root>
I am able to create the attribute by using DOM parser as:
Document doc = _docBuilder.newDocument();`
Attr attr = doc.createAttribute("id");
attr.setValue("1");
name.setAttributeNode(attr);
How can I get multiple attribute values for the same attribute.
XML does not support attributes with multiple values.
You could certainly do: attr.setValue("1,2");
However that really isn't very XML friendly. Also, you probably shouldn't have more than one value for an id. You may wish to consider something like this:
<thing>
<name>Abc</name>
<reference_ids>
<id>1</id>
<id>2</id>
</reference_ids>
</thing>
i want to parse the following type of text. Example1
<root>my name is <j> <b> mike</b> </j> </root>
example 2
<root> my name is <mytag1 attribute="val" >mike</mytag1> and yours is <mytag2> john</mytag2> </root>
can i parse it using a DOM parser?I will not have the same format evry time .I can have different formats in which the tags are nested.I dont know the format in advance.
Both these examples are valid XML documents so there's no reason you can;t do this.
If your XML is very simple, especially if it combines text and tags together, you may want to run it via an XSL transformation first, to have a format easier to parse or to convert it to other format, such as HTML.
You can use a DOM parser for the examples you've given - they're valid XML. However, you wouldn't be able to use it for non-XML as per your subject line.
When you say you can have "different formats in which the tags are nested" what exactly do you mean? If it's always simple nesting, e.g.
<root>
<tag1>
<tag2>
<tag3>
Stuff
</tag3>
</tag2>
</tag1>
</root>
Then that will be fine. However, an XML parser won't like markup where an "outer" tag is closed before an "inner" one:
<root>
<tag1>
<tag2>
Stuff
</tag1> <!-- Invalid -->
</tag2>
</root>