There are many XML Parsing technique are there which I am not aware yet. I want to parse the XML (Form Data) and get the form output data in Key, Value pair. Which XML parsing technique makes it easy to get the values in key value pair for the following XML format,
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<control for="9bd2f8fd2421eb0b0a410feaa1f482c50551486a" name="first-name" type="input" datatype="string">
<resources lang="en">
<label>First Name</label>
<help />
<hint>Your first or given name
</hint>
<alert />
</resources>
<resources lang="fr">
<label>Prénom</label>
<help />
<hint>
Votre prénom
</hint>
<alert />
</resources>
<value>Rahul</value>
</control>
<control for="8532f26e19a5b33200f56bb839c5f3aa2fa3a25f" name="last-name" type="input" datatype="string">
<resources lang="en">
<label>Last Name</label>
<help />
<hint>Your last name</hint>
<alert />
</resources>
<resources lang="fr">
<label>Nom de famille</label>
<help />
<hint>Votre nom de famille</hint>
<alert />
</resources>
<value>Sharma
</value>
</control>
</metadata>
Note I need to capture only values with English Language. For the above XML I need the output as follows,
First Name - Rahul
Last Name - Sharma
This might push to the right direction:
Which is the best library for XML parsing in java
And to capture the values in English, you would have to employ natural language processing to recognize which language the text you've captured using the xml parser. Luckily, you can use libraries for identifying english sentences. Here is a post outlining java libraries to identify the language of text:
How to detect language of user entered text?
Then after removing the text that is not english, you can go through in retrieving the dictionary.
Related
I saw some xml parsing in java but I really don't know how I can apply it to my code.
Here is my xml file:
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:georss="http://www.georss.org/georss" xmlns:gml="http://www.opengis.net/gml" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xml:base="https://adomain.com">
<id>https://sharepoint.mydomain/aFile)</id>
<category term="SP.File" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<title />
<updated>2015-05-18T07:13:18Z</updated>
<author>
Bla Bla<name />
</author>
<content type="application/xml">
<m:properties>
<d:CheckInComment />
<d:CheckOutType m:type="Edm.Int32">2</d:CheckOutType>
<d:ContentTag>{63FD2CFA-D223-405B-86B3-D59B34ECEBBE},3,1</d:ContentTag>
<d:CustomizedPageStatus m:type="Edm.Int32">0</d:CustomizedPageStatus>
<d:ETag>"{63FD2CFA-D223-405B-86B3-D59B34ECEBBE},3"</d:ETag>
<d:Exists m:type="Edm.Boolean">true</d:Exists>
<d:Length m:type="Edm.Int64">638367</d:Length>
<d:Level m:type="Edm.Byte">2</d:Level>
<d:MajorVersion m:type="Edm.Int32">0</d:MajorVersion>
<d:MinorVersion m:type="Edm.Int32">1</d:MinorVersion>
<d:Name>aName.pdf</d:Name>
<d:ServerRelativeUrl>/mydomain.com/afile</d:ServerRelativeUrl>
<d:TimeCreated m:type="Edm.DateTime">2014-09-03T15:30:22Z</d:TimeCreated>
<d:TimeLastModified m:type="Edm.DateTime">2014-09-03T15:30:25Z</d:TimeLastModified>
<d:Title />
<d:UIVersion m:type="Edm.Int32">1</d:UIVersion>
<d:UIVersionLabel>0.1</d:UIVersionLabel>
</m:properties>
</content>
</entry>
I am trying to get the metadata of a file from SharePoint which is displayed in xml format.
How can I get the data which is inside the content and also the title and the author like this:
Author BlaBla
Title Bla
Type application/xml
TimeLastModified xx/xx/xxxx
The easiest way to parse XML files is the DOM parser. You can find the documentation here and a few tutorials here and here.
Also, a related question in stackoverflow here.
You can use Jaxb, now a days it is used for parsing purpose very effectively, Converting XML to JAVA called UnMarshalling http://www.javatpoint.com/jaxb-tutorial
I'm interested to parse an XML that contains variables (which are defined by me inside the XML).
Here's an example of the XML files:
<parameters>
<parameter name="parent-id" value="1" />
<parameter name="child-id" value="1" />
</parameters>
<Parents>
<Parent id="$(parent-id)">
<Children>
<Child id="$(child-id)">
</Child>
</Children>
</Parent>
</Parents>
Is there a utility or some standard way to do so in Java? (using JAXB possibly)
Or should I implement this "mini" parsing mechanism by myself?
(A mechanism that identifies the variables and plants them inside the XML, and only later calls JAXB flows)
Thanks a lot in advance!
Use an XSLT transformation to convert your XML into an XSLT stylesheet and then execute the XSLT stylesheet. It's simple enough to convert
<parameters>
<parameter name="parent-id" value="1" />
<parameter name="child-id" value="1" />
</parameters>
into
<xsl:param name="parent-id" select="1" />
<xsl:param name="child-id" select="1" />
and
<Parent id="$(parent-id)">
into
<Parent id="{$parent-id}">
and to add a wrapper xsl:stylesheet and xsl:template element, and then you're done.
As I understood it before taking this job, XML Uses a series of key-value pairs that up till today seemed fairly straight forward with how I was using it. Basically I need to parse an XML document like this in Android:
<MailingAddress Caption="Mailing Address" PropertyType="CRM.Address" FieldType="4" DisplayType="0" ValueType="0" IsRequired="False">
<CRM.Address>
<Street Caption="Street" PropertyType="System.String" FieldType="1" DisplayType="1" ValueType="1" MaxDataLength="400" IsRequired="False" />
<City Caption="City" PropertyType="System.String" FieldType="1" DisplayType="1" ValueType="1" MaxDataLength="400" IsRequired="False" />
<State Caption="State" PropertyType="System.String" FieldType="1" DisplayType="1" ValueType="1" MaxDataLength="200" IsRequired="False" />
<PostalCode Caption="Postal Code" PropertyType="System.String" FieldType="1" DisplayType="1" ValueType="1" MaxDataLength="100" IsRequired="False" />
<Country Caption="Country" PropertyType="System.String" FieldType="1" DisplayType="1" ValueType="1" MaxDataLength="200" IsRequired="False" />
</CRM.Address>
Does anyone know how I might go about parsing this or know of a parser that would be useful to me? Am I going to have to write my own parser?
This looks like well formed xml , so have a schema, generate POJO using xsd to java command , and de-serilize then we should have values in xml as POJO thus we can do whatever we want from java pojo
I am new to java Web Start and trying to set up a jnlp download on my server.
The Web start seems to initiate ok, butit gives the following error:
WARNING: <meta> tag is not closed correctly Exception parsing xml at line 3
Line 3 just contains the meta descriptions. Originally I used some characters the xml parse might object to, these were "-" and "(" ")". Suspecting that these characters might not be valid in a meta description, I removed them from the meta on my web page and in the jnlp script.
However, when I try to run Web start, it still has line 3 as:
<head><title>xxxxxx</title><meta name="author" content="xxxxx"><meta name="keywords" content="xxxxxx xxxxxxxx xxxxxxxxx"><meta name="description" content="xxxx (xxx) xxxxx, xxxx, 2-12 players."></head>
In other words it is not showing my updated meta info. Where is it getting this old version from, and how can I update it?
And most importantly, are the characters "- ( ) ," causing my problem anyway?
Here is a link to my site. Be aware it's not quite ready to go live yet!
fantasyhexwars.com/getting_started.html
Quite possibly something is configured wrong!
It seems so.
http://fantasyhexwars.com/include/launch.jnlp
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Frameset//EN" "http://www.w3.org/TR/html4/frameset.dtd">
<html lang="en">
<head><title>Fantasy Hex Wars</title><meta name="author" content="Mark Keen"><meta name="keywords" content="strategy fantasy hex wars multiplayer war game turn based"><meta name="description" content="An online, turn-based, strategy game for up to 12 players."><link rel="shortcut icon" href="http://91.223.16.102/httpdocs/favicon.ico"></head>
<frameset rows="100%,*">
<frame title="http://91.223.16.102/include/launch.jnlp" src="http://91.223.16.102/include/launch.jnlp" name="mainframe" frameborder="0" noresize="noresize" scrolling="auto">
<frame title="empty frame" frameborder="0" scrolling="no" noresize="noresize">
<noframes>Sorry, you don"t appear to have frame support.
Go here instead - Fantasy Hex Wars</noframes>
</frameset>
</html>
http://91.223.16.102/include/launch.jnlp
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<jnlp codebase="http://fantasyhexwars.com/httpdocs/include" href="launch.jnlp" spec="1.0+">
<information>
<title>Fantasy Hex Wars</title>
<vendor>Fysh Games</vendor>
<homepage href="fantasyhexwars.com"/>
<description>A turn-based, online, multiplayer strategy game.</description>
<description kind="short">Fantasy Hex Wars</description>
</information>
<update check="always"/>
<security>
<all-permissions/>
</security>
<resources>
<j2se version="1.5+"/>
<jar href="FantasyHexWar.jar" main="true"/>
<jar href="lib/appframework-1.0.3.jar"/>
<jar href="lib/swing-worker-1.1.jar"/>
<jar href="lib/beansbinding-1.2.1.jar"/>
<jar href="lib/mail.jar"/>
</resources>
<application-desc main-class="fantasyhexwar.FantasyHexWarApp">
</application-desc>
</jnlp>
I have some code to write data into a CSV file, but it writes data into a CSV
without formatting it properly. I want to bold some specific text. Is that possible?
CSV is just a plain text format, so you can't do any formatting.
If you want formatting, consider using an Excel library such as Apache POI or Jasper Reports. (Of course, then you end up with an excel file rather than a CSV, so depending on your situation that may or may not be appropriate)
As a side note, there are some strange nuances to writing CSV (such as making sure quotes, commas etc are properly escaped). There's a nice lightweight library I've used called Open CSV that might make your life easier if you choose to just stick with plain old CSV.
CSV is a plain text file format, you can not use any text effect.
Not that I am aware of, CSV is a plain text format.
If you are creating a csv, so that you can open it up in Excel, then I would suggest taking a look at the MS Excel XML format.
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats#Excel_XML_Spreadsheet_example
An example would be as follows (taken from the wikipedia link, and this makes some text BOLD)
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook
xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>Darl McBride</Author>
<LastAuthor>Bill Gates</LastAuthor>
<Created>2007-03-15T23:04:04Z</Created>
<Company>SCO Group, Inc.</Company>
<Version>11.8036</Version>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>6795</WindowHeight>
<WindowWidth>8460</WindowWidth>
<WindowTopX>120</WindowTopX>
<WindowTopY>15</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom" />
<Borders />
<Font />
<Interior />
<NumberFormat />
<Protection />
</Style>
<Style ss:ID="s21">
<Font x:Family="Swiss" ss:Bold="1" />
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="5"
x:FullColumns="1" x:FullRows="1">
<Row>
<Cell>
<Data ss:Type="String">Text in cell A1</Data>
</Cell>
</Row>
<Row>
<Cell ss:StyleID="s21">
<Data ss:Type="String">Bold text in A2</Data>
</Cell>
</Row>
<Row ss:Index="4">
<Cell ss:Index="2">
<Data ss:Type="Number">43</Data>
</Cell>
</Row>
<Row>
<Cell ss:Index="2" ss:Formula="=R[-1]C/2">
<Data ss:Type="Number">21.5</Data>
</Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Print>
<ValidPrinterInfo />
<HorizontalResolution>600</HorizontalResolution>
<VerticalResolution>600</VerticalResolution>
</Print>
<Selected />
<Panes>
<Pane>
<Number>3</Number>
<ActiveRow>5</ActiveRow>
<ActiveCol>1</ActiveCol>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook
I plain old regular CSV no. But there is no reason why you could not encode your data before writing it out... for example the HTML tags for bold is <b></b>. This would signal to you exactly which portions of the text are bold and being a from a well know standard is still human readable too. The main drawback is you have to parse your data after you read it :(
Something else to consider, since you are writing the data out why not write it out as comma separated values in RTF or some other format that does support bold etc? Normally CSV is plain text but there is no reason why you couldn't write it out another way. Just remember to read it back in the same format...