XLIFF, versioning or translation updates process (Diff Leverage step) - java

I am considering using XLIFF to standardize localization efforts within the enterprise.
I am very new to xliff and having done some research I figured the following general process to use it:
Extract strings from development project resources (.resx for .Net,
.properties for Java) - the good way to do it as I found is to use
Rainbow from Okapi Localization Toolbox - and use "Utilities =>
Translation Kit Creation" command
Then translate the extracted file, like it is described at
http://www.opentag.com/okapi/wiki/index.php?title=How_to_Translate_XLIFF_Documents
for example using Virtaal application
And finally convert the translated xliff back into original format (resx/properties) - which is possible to do with Rainbow as well "Utilities => Translation Kit Post-Processing"
So far everything is clear, however I would like to know what are the best practices when adding or modifying the string resources? I would prefer not to have all resources to be re-translated every time there is a new string added to the string resources in original format (resx/properties)
That will be also great if there is a versioning support for the translations - so that multiple languages translations will be consolidated (provide the same set of strings) if they are marked with the same version. And version is updated when new string are added or existing strings are modified.
Is there a ready to use solution for this? Or is it something we will have to build on our own?
EDIT:
I found the Diff Leverage step in Okapi Rainbow's Pipeline library, but I have a difficulty to get it working. Here are two xliff files. First one was the first version of the resources that was translated in French, the second one is an file generated from new version of resources with the following changes:
1 string updated (AdminTitleResource is now Administration Resource)
1 string removed (HomeLinkResource is gone)
2 new strings added (Project and Company)
But running Diff Leverage pipeline doesn't produce a smart merge of the translations. Any ideas why?
The translated xliff for previous version of resources:
<?xml version="1.0" encoding="windows-1252"?><xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-framework:xliff-extensions" version="1.2">
<file original="/Messages.resx" source-language="en-us" target-language="fr-fr" datatype="xml">
<body>
<trans-unit id="1" resname="AccessDenied" xml:space="preserve" approved="yes">
<source xml:lang="en-us">Access denied</source>
<target xml:lang="fr-fr" state="translated">Accès refusé </target>
<note>Error message</note>
</trans-unit>
<trans-unit id="2" resname="AdminTitleResource" xml:space="preserve" approved="yes">
<source xml:lang="en-us">Administration</source>
<target xml:lang="fr-fr" state="translated">Administration</target>
<note></note>
</trans-unit>
<trans-unit id="3" resname="HomeLinkResource" xml:space="preserve" approved="yes">
<source xml:lang="en-us">Main page</source>
<target xml:lang="fr-fr" state="translated">Page web principale</target>
<note></note>
</trans-unit>
<trans-unit id="4" resname="SelectCategoriesResource" xml:space="preserve" approved="yes">
<source xml:lang="en-us">Categories</source>
<target xml:lang="fr-fr" state="translated">Catégories</target>
<note></note>
</trans-unit>
<trans-unit id="5" resname="SelectConfigResource" xml:space="preserve">
<source xml:lang="en-us">Configuration</source>
<target xml:lang="fr-fr" state="needs-review-translation">Paramètres</target>
<note></note>
</trans-unit>
<trans-unit id="6" resname="SelectGroupsResource" xml:space="preserve">
<source xml:lang="en-us">User groups</source>
<target xml:lang="fr-fr" state="needs-review-translation">Utiliser le groupe</target>
<note></note>
</trans-unit>
</body>
</file>
</xliff>
How do I get XLIFF file with only strings that need to be translated?
The new file with changes listed above:
<?xml version="1.0" encoding="windows-1252"?><xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:okp="okapi-framework:xliff-extensions" version="1.2">
<file original="/Messages_v2.resx" source-language="en-us" target-language="fr-fr" datatype="xml">
<body>
<trans-unit id="1" resname="AccessDenied" xml:space="preserve">
<source xml:lang="en-us">Access denied</source>
<target xml:lang="fr-fr">Access denied</target>
<note>Error message</note>
</trans-unit>
<trans-unit id="2" resname="AdminTitleResource" xml:space="preserve">
<source xml:lang="en-us">Administration Resource</source>
<target xml:lang="fr-fr">Administration Resource</target>
<note></note>
</trans-unit>
<trans-unit id="3" resname="SelectCategoriesResource" xml:space="preserve">
<source xml:lang="en-us">Categories</source>
<target xml:lang="fr-fr">Categories</target>
<note></note>
</trans-unit>
<trans-unit id="4" resname="SelectConfigResource" xml:space="preserve">
<source xml:lang="en-us">Configuration</source>
<target xml:lang="fr-fr">Configuration</target>
<note></note>
</trans-unit>
<trans-unit id="5" resname="SelectGroupsResource" xml:space="preserve">
<source xml:lang="en-us">User groups</source>
<target xml:lang="fr-fr">User groups</target>
<note></note>
</trans-unit>
<trans-unit id="6" resname="Project" xml:space="preserve">
<source xml:lang="en-us">Project</source>
<target xml:lang="fr-fr">Project</target>
<note></note>
</trans-unit>
<trans-unit id="7" resname="Company" xml:space="preserve">
<source xml:lang="en-us">Company</source>
<target xml:lang="fr-fr">Company</target>
<note></note>
</trans-unit>
</body>
</file>
</xliff>

There is an answer for this here:
http://tech.groups.yahoo.com/group/okapitools/message/2494
EDIT: content of linked message
Hi Paul,
I am trying to figure out how to use the diff leverage
to improve the translation experience and get an
update xliff file when merging/leveraging existing
translation with new version of the document with
added/modified/removed strings.
As Jim noted, with XLIFF files you may be able to take advantage of
ID-based steps.
But the Diff leverage step would work too. Here is how to do it:
I've assumed you have the XLIFF files and just want to update them.
You could create pipelines that do additional things like create a
translation kit etc. but this will keep things simple.
First you need to put the new source file in the Input List 1 and the
translated file in the Input List 2.
Then you can create the following pipeline:
Raw document to Filter Events
Diff Leveraging
Filter Events to Raw Document
In the parameters for the Diff Leverage step: make sure the option
"Copy to/over the target" is set.
Then execute the pipeline.
I've attached a comparison (compare_out.html) between the original new
file and the output file. As you can see all the text that could be
leveraged is now in the output. Your entry 'AdminTitleResource' is not
translated because it's the source in the translated file is
different, and your two new entries are also not translated.
You'll also note the new attributes approved='yes' that are there to
indicate the translation was done. That extra flag can be used to
differentiate entries that need translation from the one that have
been leveraged.
For some reason two of the leveraged entries do not have it: I'll have
to look at that and report back. It may be a bug or some condition I
don't recall (maybe Jim does).
The Id-Based Copy step could almost be better. It would copy the
translated text by matching in the resname of the entries. I say
almost because currently it does not look at the source texts, so you
get the translation even if the new source is different (it's not a
'leveraging' step). But we could add an option to make that extra
check and that would make the step work like a leveraging step. I'll
try to find the time do this.
Hope this helps,
-yves

Related

Include tag in jaxb

I have an xml file of following structure:
<root>
<paramsToInclude>
<params id="id1">
<param11>val1</param11>
<param12>val2</param12>
<param13>val3</param13>
<param14>val4</param14>
</params>
<params id="id3">
<param31>val1</param31>
<param32>val2</param32>
</params>
</paramsToInclude>
<process>
<subprocess1>
<include params="id1"/>
<query>
SELECT *
FROM
table;
</query>
</subprocess2>
<subprocess1>
<rule>rule1</rule>
<rule>rule2</rule>
</subprocess2>
<subprocess3>
<processParam>val1</processParam>
<include params="id2"/>
<include params="id3"/>
</subprocess3>
</process>
I'm using jaxb to parse this xml into the java classes. Is there a way to substitute includes in the process by it's values from the begin of file? I mean, I wan't file to be parsed as if it look's like
<root>
<paramsToInclude>
<params id="id1">
<param11>val1</param11>
<param12>val2</param12>
<param13>val3</param13>
<param14>val4</param14>
</params>
<params id="id3">
<param31>val1</param31>
<param32>val2</param32>
</params>
</paramsToInclude>
<process>
<subprocess1>
<param11>val1</param11>
<param12>val2</param12>
<param13>val3</param13>
<param14>val4</param14>
<query>
SELECT *
FROM
table;
</query>
</subprocess2>
<subprocess1>
<rule>rule1</rule>
<rule>rule2</rule>
</subprocess2>
<subprocess3>
<processParam>val1</processParam>
<param11>val1</param11>
<param12>val2</param12>
<param13>val3</param13>
<param14>val4</param14>
<param31>val1</param31>
<param32>val2</param32>
</subprocess3>
</process>
is it possible t do this? I've found link http://thetechietutorials.blogspot.com/2011/08/jaxb-tutorial-part-2-jaxb-with-xinclude.html how to this include from another file, but comment says that it's impossible to do this for the same file (I understand that I can put this includes in another xml, but I don't think it's a best way). Also I don't want to use hashMap because in this way included params will be stored in hashMap and processParam (from subprocess3) will be class variable.
Is there a way to do this somehow?

Tool or schema that describes Adobe Premiere Pro

I have some data outside that i would like to use to create certain edit effects in Adobe Premiere Pro. Rather than editing by hand adding keyframes over time with my data i would like to automate this and write or use a tool to create an XML fragment and update the project file.
I have looked at the XML and some properties are evident. However most data is hidden away as comma separated values, which of course means theres no self documenting tag name. I am therefore after a schema or documentation that describes the format of some or all effects.
<VideoComponentParam ObjectID="48" ClassID="fe47129e-6c94-4fc0-95d5-c056a517aaf3" Version="8">
<Node Version="1">
<Properties Version="1">
<ECP.Angle.Expanded>false</ECP.Angle.Expanded>
<ECW.Parameter.VelocityHeight>54</ECW.Parameter.VelocityHeight>
</Properties>
</Node>
<RangeLocked>false</RangeLocked>
<ParameterID>5</ParameterID>
<CurrentValue>0.</CurrentValue>
<UnitsString></UnitsString>
<UpperBound>32767.</UpperBound>
<LowerBound>-32768.</LowerBound>
<Keyframes>913287043468800,270.,0,0,0,0.166667,-32.4615,0.166667;914685944772533,91.230003356934,0,0,-32.4615,0.166667,14.5418,0.166667;916236575654400,180.,0,0,14.5418,0.166667,-11.4292,0.166667;920237090572800,0.,0,0,-11.4292,0.166667,0,0.166667;</Keyframes>
<StartKeyframe>-91445760000000000,0.,0,0,0,0,0,0</StartKeyframe>
<ParameterControlType>3</ParameterControlType>
<DiscontinuousInterpolate>false</DiscontinuousInterpolate>
<IsLocked>false</IsLocked>
<IsTimeVarying>true</IsTimeVarying>
<Name>Rotation</Name>
</VideoComponentParam>
The interesting tag is of course the Keyframes, which appears to include the keyframe, rotation degrees and some other numbers. I havent yet decyphered the first value which is obviously the timestamp.
Any help in undetrstnding the XML is appreciated.
ADOBE FORUMS
http://forums.adobe.com/thread/962485
Todd_Kopriva, 14-Feb-2012 00:18 in reply to br4ime Report No, there is not any public documentation about the structure of the
Premiere Pro project file format. Was this helpful? Yes No
FINAL CUT PRO XML
I have exported a simple project to Final Cut Pro XML and it appears to be functional but in the above case about rotation over several keyframes, the FCP file has far fewer values.
<parameter authoringApp="PremierePro">
<parameterid>rotation</parameterid>
<name>Rotation</name>
<valuemin>-8640</valuemin>
<valuemax>8640</valuemax>
<value>0</value>
<keyframe>
<when>107634</when>
<value>123</value>
</keyframe>
<keyframe>
<when>107784</when>
<value>124</value>
</keyframe>
<keyframe>
<when>107934</when>
<value>126</value>
</keyframe>
</parameter>
Here is full description of Final Cut XML format. it is same as Premiere XML.
Go to developer.apple.com and find the document that describes FinalCutPro XML format, it's exact the same as Premiere pro XML. The structure is simple, for example this is sequence block format:
<?xml version="1.0" encoding="UTF-8"?>
<xmeml version="3">
<sequence>
<name>Sequence 1</name>
<duration></duration>
<rate>. . .</rate>
<timecode>. . .</timecode>
<media>
<video>
<format></format>
<track></track>
</video>
<audio>
<format></format>
<outputs></outputs>
<track></track>
<track></track>
</audio>
</media>
</sequence>
</xmeml>
direct link is:
https://developer.apple.com/appleapplications/download/FinalCutPro_XML.pdf
THe best solution is to make changes and study the file differences with your favourite diff'ing tool. Its not terribly difficult to understand small fragments and hand edit the XML. Naturally its a pain to make a change and reload the project file and observe the changes, buts its doable.

How to split xml to header and items using smooks?

I have a xml file roughly like this:
<batch>
<header>
<headerStuff />
</header>
<contents>
<timestamp />
<invoices>
<invoice>
<invoiceStuff />
</invoice>
<!-- Insert 1000 invoice elements here -->
</invoices>
</contents>
</batch>
I would like to split that file to 1000 files with the same headerStuff and only one invoice. Smooks documentation is very proud of the possibilities of transformations, but unfortunately I don't want to do those.
The only way I've figured how to do this is to repeat the whole structure in freemarker. But that feels like repeating the structure unnecessarily. The header has like 30 different tags so there would be lots of work involved also.
What I currently have is this:
<?xml version="1.0" encoding="UTF-8"?>
<smooks-resource-list xmlns="http://www.milyn.org/xsd/smooks-1.1.xsd"
xmlns:calc="http://www.milyn.org/xsd/smooks/calc-1.1.xsd"
xmlns:frag="http://www.milyn.org/xsd/smooks/fragment-routing-1.2.xsd"
xmlns:file="http://www.milyn.org/xsd/smooks/file-routing-1.1.xsd">
<params>
<param name="stream.filter.type">SAX</param>
</params>
<frag:serialize fragment="INVOICE" bindTo="invoiceBean" />
<calc:counter countOnElement="INVOICE" beanId="split_calc" start="1" />
<file:outputStream openOnElement="INVOICE" resourceName="invoiceSplitStream">
<file:fileNamePattern>invoice-${split_calc}.xml</file:fileNamePattern>
<file:destinationDirectoryPattern>target/invoices</file:destinationDirectoryPattern>
<file:highWaterMark mark="10"/>
</file:outputStream>
<resource-config selector="INVOICE">
<resource>org.milyn.routing.io.OutputStreamRouter</resource>
<param name="beanId">invoiceBean</param>
<param name="resourceName">invoiceSplitStream</param>
<param name="visitAfter">true</param>
</resource-config>
</smooks-resource-list>
That creates files for each invoice tag, but I don't know how to continue from there to get the header also in the file.
EDIT:
The solution has to use Smooks. We use it in an application as a generic splitter and just create different smooks configuration files for different types of input files.
I just started with Smooks myself. However... your problem sounds identical to this: http://www.smooks.org/mediawiki/index.php?title=V1.5:Smooks_v1.5_User_Guide#Routing_to_File
You will have to provide the output FTL format in whole, that's the downside of using a general purpose tool I guess. Data mapping often includes a lot of what feels like redundancy, one way around this is to leverage convention but that has to be built into the framework.
I don't know smooks, but the simplest solution (with poor performance) would be (to create the Nth file):
copy the whole xml structure
delete all the invoice tags but the Nth one
I don't know how to do that in smooks, that only an idea. In this case you don't need to duplicate the structure of the xml in a freemarker template.

Ant and XML configuration file parsing

I have an XML file of the following form -
<map MAP_XML_VERSION="1.0">
<entry key="database.user" value="user1"/>
...
</map>
Does ant have a native ability to read this and let me perform an xquery to pull back values for keys? Going through the API I did not see such capabilities.
The optional Ant task XMLTask is designed to do this. Give it an XPath expression and you can select the above into (say) a property. Here's an article on how to use it, with examples. It'll do tons of other XML-related manipulations/querying/creation as well.
e.g.
<xmltask source="map.xml">
<!-- copies to a property 'user' -->
<copy path="/map/entry[#key='database.user']/#value" attrValue="true" property="user"/>
</xmltask>
Disclaimer: I'm the author.
You can use the scriptdef tag to create a JavaScript wrapper for your class. Inside JS, you pretty much have the full power of Java and can do any kind of complicated XML parsing you want.
For example:
<project default="build">
<target name="build">
<xpath-query query="//entry[#key='database.user']/#value"
xmlFile="test.xml" addproperty="value"/>
<echo message="Value is ${value}"/>
</target>
<scriptdef name="xpath-query" language="javascript">
<attribute name="query"/>
<attribute name="xmlfile"/>
<attribute name="addproperty"/>
<![CDATA[
importClass(java.io.FileInputStream);
importClass(javax.xml.xpath.XPath);
importClass(javax.xml.xpath.XPathConstants);
importClass(javax.xml.xpath.XPathFactory);
importClass(org.xml.sax.InputSource);
var exp = attributes.get("query");
var filename = attributes.get("xmlfile");
var input = new InputSource(new FileInputStream(filename));
var xpath = XPathFactory.newInstance().newXPath();
var value = xpath.evaluate(exp, input, XPathConstants.STRING);
self.project.setProperty( attributes.get("addproperty"), value );
]]>
</scriptdef>
</project>
Sounds like you want something like ant-xpath-task. I'm not aware of a built-in way to do this with Ant.

How to use size of file inside Ant target

I'm currently in the process of replacing my homebrewn build script by an Ant build script.
Now I need to replace various tokens by the size of a specific file. I know how to get the size in bytes via the <length> task and store in in a property, but I need the size in kilobytes and megabytes too.
How can I access the file size in other representations (KB, MB) or compute these values from within the Ant target and store them in a property?
Edit: After I discovered the <script> task, it was fairly easy to calculate the other values using some JavaScript and add a new property to the project using project.setNewProperty("foo", "bar");.
I found a solution that does not require any third-party library or custom tasks using the <script> task that allows for using JavaScript (or any other Apache BSF or JSR 223 supported language) from within an Ant target.
<target name="insert-filesize">
<length file="${afile}" property="fs.length.bytes" />
<script language="javascript">
<![CDATA[
var length_bytes = project.getProperty("fs.length.bytes");
var length_kbytes = Math.round((length_bytes / 1024) * Math.pow(10,2))
/ Math.pow(10,2);
var length_mbytes = Math.round((length_kbytes / 1024) * Math.pow(10,2))
/ Math.pow(10,2);
project.setNewProperty("fs.length.kb", length_kbytes);
project.setNewProperty("fs.length.mb", length_mbytes);
]]>
</script>
<copy todir="${target.dir}">
<fileset dir="${source.dir}">
<include name="**/*" />
<exclude name="**/*.zip" />
</fileset>
<filterset begintoken="$$$$" endtoken="$$$$">
<filter token="SIZEBYTES" value="${fs.length.bytes}"/>
<filter token="SIZEKILOBYTES" value="${fs.length.kb}"/>
<filter token="SIZEMEGABYTES" value="${fs.length.mb}"/>
</filterset>
</copy>
</target>
There is a math task at http://ant-contrib.sourceforge.net/ that may be useful

Categories

Resources