How to split xml to header and items using smooks? - java

I have a xml file roughly like this:
<batch>
<header>
<headerStuff />
</header>
<contents>
<timestamp />
<invoices>
<invoice>
<invoiceStuff />
</invoice>
<!-- Insert 1000 invoice elements here -->
</invoices>
</contents>
</batch>
I would like to split that file to 1000 files with the same headerStuff and only one invoice. Smooks documentation is very proud of the possibilities of transformations, but unfortunately I don't want to do those.
The only way I've figured how to do this is to repeat the whole structure in freemarker. But that feels like repeating the structure unnecessarily. The header has like 30 different tags so there would be lots of work involved also.
What I currently have is this:
<?xml version="1.0" encoding="UTF-8"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:calc="http://www.milyn.org/xsd/smooks/calc-1.1.xsd"
xmlns:frag="http://www.milyn.org/xsd/smooks/fragment-routing-1.2.xsd"
xmlns:file="http://www.milyn.org/xsd/smooks/file-routing-1.1.xsd">
<params>
<param name="stream.filter.type">SAX</param>
</params>
<frag:serialize fragment="INVOICE" bindTo="invoiceBean" />
<calc:counter countOnElement="INVOICE" beanId="split_calc" start="1" />
<file:outputStream openOnElement="INVOICE" resourceName="invoiceSplitStream">
<file:fileNamePattern>invoice-${split_calc}.xml</file:fileNamePattern>
<file:destinationDirectoryPattern>target/invoices</file:destinationDirectoryPattern>
<file:highWaterMark mark="10"/>
</file:outputStream>
<resource-config selector="INVOICE">
<resource>org.milyn.routing.io.OutputStreamRouter</resource>
<param name="beanId">invoiceBean</param>
<param name="resourceName">invoiceSplitStream</param>
<param name="visitAfter">true</param>
</resource-config>
</smooks-resource-list>
That creates files for each invoice tag, but I don't know how to continue from there to get the header also in the file.
EDIT:
The solution has to use Smooks. We use it in an application as a generic splitter and just create different smooks configuration files for different types of input files.

I just started with Smooks myself. However... your problem sounds identical to this: http://www.smooks.org/mediawiki/index.php?title=V1.5:Smooks_v1.5_User_Guide#Routing_to_File
You will have to provide the output FTL format in whole, that's the downside of using a general purpose tool I guess. Data mapping often includes a lot of what feels like redundancy, one way around this is to leverage convention but that has to be built into the framework.

I don't know smooks, but the simplest solution (with poor performance) would be (to create the Nth file):
copy the whole xml structure
delete all the invoice tags but the Nth one
I don't know how to do that in smooks, that only an idea. In this case you don't need to duplicate the structure of the xml in a freemarker template.

Related

how can i get a matching string from a xml file using <getxmlproperties> tag using ant and replace the matched string in other xml file

My requirement is to match following tag in data.xml file and replace the content in display.xml file using in ant
data.xml
--------
<data>123456789</data>
display.xml
-----------
<data>abcdefg</data
I need to match the content in data.xml file and replace the it in display.xml file.
my final output should be like:
data.xml
--------
<data>123456789</data>
display.xml
-----------
<data>123456789</data>
How can i solve this Issue? Thanks in advance
I did not find any Ant task named GetXmlProperties, but I think you may have thought about this one, XmlProperty, which composes a sequence of properties from parsing an XML document.
Two ways, to achieve what you want, come to my mind (there may be more):
The most primitive would be to just use the XmlProperty task to retreive the value in question and use a crude Replace task on the destination file, doing a simple string replace, by handling the destination as a plain text file, instead of caring for the XML logic in it. However, doing string match and replace with XML data is no fun and error prone.
Thus I propose a second approach, which is to use the XmlTask, as per the following example. Adding a prefix xml to our newly parsed properties makes it easier to distinguish them from the rest. For demo purposes we also let the build process log all the new properties under the xml prefix to the console by using the EchoProperties task.
<project name="SO63816092" default="default">
<taskdef name="xmltask" classname="com.oopsconsultancy.xmltask.ant.XmlTask" />
<xmlproperty file="data.xml" prefix="xml" />
<echoproperties prefix="xml"/>
<target name="default">
<xmltask source="display.xml" dest="display.xml" failWithoutMatch="true">
<replace path="//data/text()" withText="${xml.data}" />
</xmltask>
</target>
</project>
Using the second approach, while using the following input file data.xml:
<?xml version="1.0" encoding="UTF-8"?>
<data>123456789</data>
and the following destination file display.xml:
<?xml version="1.0" encoding="UTF-8"?>
<data>abcdefg</data>
I can successfully accomplish what you ask for and get display.xml to become:
<?xml version="1.0" encoding="UTF-8"?>
<data>123456789</data>
Just remember to place the XmlTask Java jar in the current classpath for your Ant process and note, that you may need to change the XPath expression, we use, //data/text(), to something more refined, should your document structure demand it, because the way, it is now, it would replace the value for all data elements it finds, throughout the whole display.xml document.

Parse XML in which tag name is not fixed

It is easy to parse XML in which tags name are fixed. In XStream, we can simply use #XStreamAlias("tagname") annotation. But how to parse XML in which tag name is not fixed. Suppose I have following XML :
<result>
<result1>
<fixed1> ... </fixed1>
<fixed2> ... </fixed2>
</result1>
<result2>
<item>
<America>
<name> America </name>
<language> English </language>
</America>
</item>
<item>
<Spain>
<name> Spain </name>
<language> Spanish </language>
</Spain>
</item>
</result2>
</result>
Tag names America and Spain are not fixed and sometimes I may get other tag names like Germany, India, etc.
How to define pojo for tag result2 in such case? Is there a way to tell XStream to accept anything as alias name if tag name is not known before-hand?
if it is ok for you to get the tag from inside the tag itself (field 'name'), using Xpath, you can do:
//result2/*/name/text()
another option could be to use the whole element, like:
//result2/*
or also:
//result2/*/name()
Some technologies (specifically, data binding approaches) are optimized for handling XML whose structure is known at compile time. Others (like DOM and other DOM-like tree models - JDOM, XOM etc) are designed for handling XML whose structure is not known in advance. Use the tool for the job.
XSLT and XQuery try to blend both. In their schema-aware form, they can take advantage of static structure information when it is available. But more usually they are run in "untyped" mode, where there is no a-priori knowledge of element names or structure, and everything is handled as it comes. The XSLT rule-based processing paradigm is particularly well suited to "semi-structured" XML whose content is unpredictable or variable.

find and replace an int in xml file

I have a issue where i need to find and replace an int within a xml file.
Here is an example file:
<?xml version="1.0" encoding="UTF-8"?>
<Data>
<Element time="0.00000" num="10723465" />
<Element time="7.98000" num="10028736" />
<Element time="8.40000" num="94123576" />
</data>
I want to find and replace the "num" attribute. I have been able to do this with the DOM factory however it doesn't keep the order of the attributes. There must be a simpler way to find and replace the num. any help would be great :)
Advice 1: you should use XML library to parse, nothing else, or it would be a pain.
Information 2: order doesnt matter in XML for attributes. Then you should forget this problem:
See this with much more details
Order of XML attributes after DOM processing
Alternative: use Regex (it can work for very simple XML). example in the link before

Tool or schema that describes Adobe Premiere Pro

I have some data outside that i would like to use to create certain edit effects in Adobe Premiere Pro. Rather than editing by hand adding keyframes over time with my data i would like to automate this and write or use a tool to create an XML fragment and update the project file.
I have looked at the XML and some properties are evident. However most data is hidden away as comma separated values, which of course means theres no self documenting tag name. I am therefore after a schema or documentation that describes the format of some or all effects.
<VideoComponentParam ObjectID="48" ClassID="fe47129e-6c94-4fc0-95d5-c056a517aaf3" Version="8">
<Node Version="1">
<Properties Version="1">
<ECP.Angle.Expanded>false</ECP.Angle.Expanded>
<ECW.Parameter.VelocityHeight>54</ECW.Parameter.VelocityHeight>
</Properties>
</Node>
<RangeLocked>false</RangeLocked>
<ParameterID>5</ParameterID>
<CurrentValue>0.</CurrentValue>
<UnitsString></UnitsString>
<UpperBound>32767.</UpperBound>
<LowerBound>-32768.</LowerBound>
<Keyframes>913287043468800,270.,0,0,0,0.166667,-32.4615,0.166667;914685944772533,91.230003356934,0,0,-32.4615,0.166667,14.5418,0.166667;916236575654400,180.,0,0,14.5418,0.166667,-11.4292,0.166667;920237090572800,0.,0,0,-11.4292,0.166667,0,0.166667;</Keyframes>
<StartKeyframe>-91445760000000000,0.,0,0,0,0,0,0</StartKeyframe>
<ParameterControlType>3</ParameterControlType>
<DiscontinuousInterpolate>false</DiscontinuousInterpolate>
<IsLocked>false</IsLocked>
<IsTimeVarying>true</IsTimeVarying>
<Name>Rotation</Name>
</VideoComponentParam>
The interesting tag is of course the Keyframes, which appears to include the keyframe, rotation degrees and some other numbers. I havent yet decyphered the first value which is obviously the timestamp.
Any help in undetrstnding the XML is appreciated.
ADOBE FORUMS
http://forums.adobe.com/thread/962485
Todd_Kopriva, 14-Feb-2012 00:18 in reply to br4ime Report No, there is not any public documentation about the structure of the
Premiere Pro project file format. Was this helpful? Yes No
FINAL CUT PRO XML
I have exported a simple project to Final Cut Pro XML and it appears to be functional but in the above case about rotation over several keyframes, the FCP file has far fewer values.
<parameter authoringApp="PremierePro">
<parameterid>rotation</parameterid>
<name>Rotation</name>
<valuemin>-8640</valuemin>
<valuemax>8640</valuemax>
<value>0</value>
<keyframe>
<when>107634</when>
<value>123</value>
</keyframe>
<keyframe>
<when>107784</when>
<value>124</value>
</keyframe>
<keyframe>
<when>107934</when>
<value>126</value>
</keyframe>
</parameter>
Here is full description of Final Cut XML format. it is same as Premiere XML.
Go to developer.apple.com and find the document that describes FinalCutPro XML format, it's exact the same as Premiere pro XML. The structure is simple, for example this is sequence block format:
<?xml version="1.0" encoding="UTF-8"?>
<xmeml version="3">
<sequence>
<name>Sequence 1</name>
<duration></duration>
<rate>. . .</rate>
<timecode>. . .</timecode>
<media>
<video>
<format></format>
<track></track>
</video>
<audio>
<format></format>
<outputs></outputs>
<track></track>
<track></track>
</audio>
</media>
</sequence>
</xmeml>
direct link is:
https://developer.apple.com/appleapplications/download/FinalCutPro_XML.pdf
THe best solution is to make changes and study the file differences with your favourite diff'ing tool. Its not terribly difficult to understand small fragments and hand edit the XML. Naturally its a pain to make a change and reload the project file and observe the changes, buts its doable.

XML Parsing - DOM or SAX - Complex xml with attributes as conditions to access hierarchy in java

<playingTestCodeDetails classCode="ENT" determinerCode="INSTANCE" >
<realmCode code="QD" />
<id assigningAuthorityName="PRMORDCODE" extension="16494" />
<id assigningAuthorityName="TESTNUMINBOOK" extension="16494" />
<code code="16494" codeSystemName="QTIM" displayName="SureSwab Candidiasis" />
<name use=""></name>
<asSeeAlsoCode classCode="ROL" > <!-- Have repeated Seealsocode section for multiple see also codes and stripped names -->
<realmCode code="QD" />
<code code="7600" displayName="Sample See Also Name" ></code>
</asSeeAlsoCode>
<asSeeAlsoCode classCode="ROL" >
<realmCode code="QD" />
<code code="6496" displayName="Sample See Also Name" ></code>
</asSeeAlsoCode>
</playingTestCodeDetails>
<subjectOf typeCode="SBJ">
<realmCode code="QD" />
<order classCode="OBS" moodCode="EVN" >
<realmCode code="QD" />
<performer nullFlavor="" typeCode="PRF"><!-- Have added this to accomodate the UnitCode-->
<performingLocatedEntity classCode="LOCE" nullFlavor="">
<locatedPerformingSite classCode="ORG" determinerCode="INSTANCE">
<id assigningAuthorityName="ASORDERED" extension="16494" />
</locatedPerformingSite>
</performingLocatedEntity>
</performer>
<origin nullFlavor="" typeCode="ORG"> <!-- Have added this to accomodate the Ordering Lab Code-->
<orderingLocatedEntity classCode="LOCE" >
<locatedOrderingSite classCode="ORG" determinerCode="INSTANCE">
<id assigningAuthorityName="PRMORDCODE" extension="16494"/>
<code code="SJC" codeSystemName="QTIM" codeSystem="ORDERINGLABCODE"/>
</locatedOrderingSite>
</orderingLocatedEntity>
</origin>
<pertinentInformation1 typeCode="PERT">
<realmCode code="QD" />
<clinicalInfo classCode="ACT" moodCode="EVN">
<realmCode code="QD" />
<title>Specialitysample1</title>
<text>Conditionsample1</text>
</clinicalInfo>
</pertinentInformation1>
<subjectOf typeCode="SUBJ">
<realmCode code="QD" />
<annotation classCode="ACT" moodCode="EVN" >
<realmCode code="QD" />
<code code="DOSCATNAME"></code>
<text><![CDATA[SureSwab<sup>®</sup>, <em>Candidiasis</em>, PCR]]></text>
</annotation>
</subjectOf>
</subjectOf>
I have a xml looking like above. I want to parse it; what is the best way to parse it?? DOM, SAX ( i have heard of JAXB, XSLT,.... not sure of this two); Can we have a combination of DOM & SAX to parse a XML??
A simple scenario to attain a tag value using attribute access as "code"
like when code=DOSCATNAME in tag then we need to take up data for corresponding tag.
Other scenario is to access tag and get the hierarchy and access extension attribute of when assigningAuthorityName attribute has value PRMORDCODE.
Can the above two scenarios can be achievable using a Parser??
I am a newbie, please understand what i need to parse & suggest me a thought... thanks in advance...
Use JAXB. Create class model and annotate your classes appropriately. The environment will do the rest.
For example you should create class PlayingTestCodeDetails with properties classCode, determinerCode etc.
I will tell you more: you can kindly ask JAXB to generate the classes for you. Start learning from this article: http://www.roseindia.net/jaxb/r/jaxb.shtml
It will take a couple of hours to start but than you will be done in 15 minutes. If you are using DOM you can start in 15 minutes of learning and the coding a couple of days to parse your XML.
It depends on your need which to use.
Both SAX and DOM are used to parse the XML document. Both has advantages and disadvantages and can be used in our programming depending on the situation.
SAX
• Parses node by node
• Doesn’t store the XML in memory
• We cant insert or delete a node
• SAX is an event based parser
• SAX is a Simple API for XML
• doesn’t preserve comments
• SAX generally runs a little faster than DOM
DOM
• Stores the entire XML document into memory before processing
• Occupies more memory
• We can insert or delete nodes
• Traverse in any direction.
• DOM is a tree model parser
• Document Object Model (DOM) API
• Preserves comments
• SAX generally runs a little faster than DOM
If we need to find a node and doesn’t need to insert or delete we can go with SAX itself otherwise DOM provided we have more memory.
These are few parsers:-
woodstox
dom4j
In addition to SAX and DOM there is STaX parsing available using XMLStreamReader which is an xml pull parser.

Categories

Resources