Parse an XML into Java - MetaData Format - java

I saw some xml parsing in java but I really don't know how I can apply it to my code.
Here is my xml file:
<?xml version="1.0" encoding="UTF-8"?>
<entry xmlns="http://www.w3.org/2005/Atom" xmlns:d="http://schemas.microsoft.com/ado/2007/08/dataservices" xmlns:georss="http://www.georss.org/georss" xmlns:gml="http://www.opengis.net/gml" xmlns:m="http://schemas.microsoft.com/ado/2007/08/dataservices/metadata" xml:base="https://adomain.com">
<id>https://sharepoint.mydomain/aFile)</id>
<category term="SP.File" scheme="http://schemas.microsoft.com/ado/2007/08/dataservices/scheme" />
<title />
<updated>2015-05-18T07:13:18Z</updated>
<author>
Bla Bla<name />
</author>
<content type="application/xml">
<m:properties>
<d:CheckInComment />
<d:CheckOutType m:type="Edm.Int32">2</d:CheckOutType>
<d:ContentTag>{63FD2CFA-D223-405B-86B3-D59B34ECEBBE},3,1</d:ContentTag>
<d:CustomizedPageStatus m:type="Edm.Int32">0</d:CustomizedPageStatus>
<d:ETag>"{63FD2CFA-D223-405B-86B3-D59B34ECEBBE},3"</d:ETag>
<d:Exists m:type="Edm.Boolean">true</d:Exists>
<d:Length m:type="Edm.Int64">638367</d:Length>
<d:Level m:type="Edm.Byte">2</d:Level>
<d:MajorVersion m:type="Edm.Int32">0</d:MajorVersion>
<d:MinorVersion m:type="Edm.Int32">1</d:MinorVersion>
<d:Name>aName.pdf</d:Name>
<d:ServerRelativeUrl>/mydomain.com/afile</d:ServerRelativeUrl>
<d:TimeCreated m:type="Edm.DateTime">2014-09-03T15:30:22Z</d:TimeCreated>
<d:TimeLastModified m:type="Edm.DateTime">2014-09-03T15:30:25Z</d:TimeLastModified>
<d:Title />
<d:UIVersion m:type="Edm.Int32">1</d:UIVersion>
<d:UIVersionLabel>0.1</d:UIVersionLabel>
</m:properties>
</content>
</entry>
I am trying to get the metadata of a file from SharePoint which is displayed in xml format.
How can I get the data which is inside the content and also the title and the author like this:
Author BlaBla
Title Bla
Type application/xml
TimeLastModified xx/xx/xxxx

The easiest way to parse XML files is the DOM parser. You can find the documentation here and a few tutorials here and here.
Also, a related question in stackoverflow here.

You can use Jaxb, now a days it is used for parsing purpose very effectively, Converting XML to JAVA called UnMarshalling http://www.javatpoint.com/jaxb-tutorial

Related

XML Parsing in java to get it in Key, Value Pair?

There are many XML Parsing technique are there which I am not aware yet. I want to parse the XML (Form Data) and get the form output data in Key, Value pair. Which XML parsing technique makes it easy to get the values in key value pair for the following XML format,
<?xml version="1.0" encoding="UTF-8"?>
<metadata>
<control for="9bd2f8fd2421eb0b0a410feaa1f482c50551486a" name="first-name" type="input" datatype="string">
<resources lang="en">
<label>First Name</label>
<help />
<hint>Your first or given name
</hint>
<alert />
</resources>
<resources lang="fr">
<label>Prénom</label>
<help />
<hint>
Votre prénom
</hint>
<alert />
</resources>
<value>Rahul</value>
</control>
<control for="8532f26e19a5b33200f56bb839c5f3aa2fa3a25f" name="last-name" type="input" datatype="string">
<resources lang="en">
<label>Last Name</label>
<help />
<hint>Your last name</hint>
<alert />
</resources>
<resources lang="fr">
<label>Nom de famille</label>
<help />
<hint>Votre nom de famille</hint>
<alert />
</resources>
<value>Sharma
</value>
</control>
</metadata>
Note I need to capture only values with English Language. For the above XML I need the output as follows,
First Name - Rahul
Last Name - Sharma
This might push to the right direction:
Which is the best library for XML parsing in java
And to capture the values in English, you would have to employ natural language processing to recognize which language the text you've captured using the xml parser. Luckily, you can use libraries for identifying english sentences. Here is a post outlining java libraries to identify the language of text:
How to detect language of user entered text?
Then after removing the text that is not english, you can go through in retrieving the dictionary.

JMS Message Parse Exception

My java code is taking XML messages from my local ActiveMQ queue. Now it can successfully consume messages from the queue, but it seems fails to parse it? My xml data looks like this:
#---------- #1 : ----------#
<MSG_INFO>
<message type="TextMessage" messageSelector="" originationTimestamp="" receiveTime="" jmsServerTimestamp="" jmsMsgExpiration="">
<header JMSDestination="Asurion.SYD02.Q.Business.NonPersistent.Policy.PublishTelstraAMAEnrollments" JMSDestinationType="Queue" JMSDeliveryMode="1" />
<properties>
<property name="Client" type="String">Telstra</property>
</properties>
</message>
</MSG_INFO>
BodyLength=850
<?xml version="1.0" encoding="UTF-8"?>
<ns0:PublishEnrollmentRequest xmlns:ns0="http://services.asurion.com/schemas/PolicyAdministration/PublishEnrollmentRequest/1.0">
<ns0:Parameters>
<ns0:Enrollments>
<ns0:MDN>9890667692</ns0:MDN>
<ns0:FeatureCode>MBBPHPMPS</ns0:FeatureCode>
<ns0:ProductName>MTS-SA</ns0:ProductName>
<ns0:Status>Active</ns0:Status>
<ns0:Active>Y</ns0:Active>
<ns0:EffectiveDate>2013-07-02T19:36:51-04:00</ns0:EffectiveDate>
<ns0:EnrollmentType>Customer</ns0:EnrollmentType>
<ns0:Make>UnKnown</ns0:Make>
<ns0:Model>UnKnown</ns0:Model>
<ns0:ActivationDate>2013-07-02T19:36:51-04:00</ns0:ActivationDate>
<ns0:ESN />
<ns0:IMEI />
<ns0:SubID>281474977839805</ns0:SubID>
<ns0:Operation>Enrollment Added</ns0:Operation>
</ns0:Enrollments>
</ns0:Parameters>
The exception I am getting now is:
Caused by: org.xml.sax.SAXParseException: Unexpected element: CDATA
I understand it might be the BodyLength tne that may cause this problem, but if I got rid of them, my code will not be able to extrat client information from it.
Is this something configurable in the code? Thanks.
Your data is not well-formed XML and cannot be parsed with an XML parser as-is. You'll have to find a way to separate the XML data before and after the BodyLength=850 line and parse them separately.
try to change your xml to the following if you can:
<?xml version="1.0" encoding="UTF-8"?>
<ns0:PublishEnrollmentRequest xmlns:ns0="http://services.asurion.com/schemas/PolicyAdministration/PublishEnrollmentRequest/1.0">
<ns0:Parameters>
<ns0:Enrollments>
<ns0:MDN>9890667692</ns0:MDN>
<ns0:FeatureCode>MBBPHPMPS</ns0:FeatureCode>
<ns0:ProductName>MTS-SA</ns0:ProductName>
<ns0:Status>Active</ns0:Status>
<ns0:Active>Y</ns0:Active>
<ns0:EffectiveDate>2013-07-02T19:36:51-04:00</ns0:EffectiveDate>
<ns0:EnrollmentType>Customer</ns0:EnrollmentType>
<ns0:Make>UnKnown</ns0:Make>
<ns0:Model>UnKnown</ns0:Model>
<ns0:ActivationDate>2013-07-02T19:36:51-04:00</ns0:ActivationDate>
<ns0:ESN />
<ns0:IMEI />
<ns0:SubID>281474977839805</ns0:SubID>
<ns0:Operation>Enrollment Added</ns0:Operation>
</ns0:Enrollments>
</ns0:Parameters>
<MSG_INFO>
<message type="TextMessage" messageSelector="" originationTimestamp="" receiveTime="" jmsServerTimestamp="" jmsMsgExpiration="">
<header JMSDestination="Asurion.SYD02.Q.Business.NonPersistent.Policy.PublishTelstraAMAEnrollments" JMSDestinationType="Queue" JMSDeliveryMode="1" />
<properties>
<property name="Client" type="String">Telstra</property>
</properties>
</message>
</MSG_INFO>
<body BodyLength="850" />
</ns0:PublishEnrollmentRequest>
If you don't want to change your xml try separating you xml above and beneath BodyLength=850. and use <?xml version="1.0" encoding=utf-8"?> in the beginning of the file

Need to parse XML with a strange looking Schema

As I understood it before taking this job, XML Uses a series of key-value pairs that up till today seemed fairly straight forward with how I was using it. Basically I need to parse an XML document like this in Android:
<MailingAddress Caption="Mailing Address" PropertyType="CRM.Address" FieldType="4" DisplayType="0" ValueType="0" IsRequired="False">
<CRM.Address>
<Street Caption="Street" PropertyType="System.String" FieldType="1" DisplayType="1" ValueType="1" MaxDataLength="400" IsRequired="False" />
<City Caption="City" PropertyType="System.String" FieldType="1" DisplayType="1" ValueType="1" MaxDataLength="400" IsRequired="False" />
<State Caption="State" PropertyType="System.String" FieldType="1" DisplayType="1" ValueType="1" MaxDataLength="200" IsRequired="False" />
<PostalCode Caption="Postal Code" PropertyType="System.String" FieldType="1" DisplayType="1" ValueType="1" MaxDataLength="100" IsRequired="False" />
<Country Caption="Country" PropertyType="System.String" FieldType="1" DisplayType="1" ValueType="1" MaxDataLength="200" IsRequired="False" />
</CRM.Address>
Does anyone know how I might go about parsing this or know of a parser that would be useful to me? Am I going to have to write my own parser?
This looks like well formed xml , so have a schema, generate POJO using xsd to java command , and de-serilize then we should have values in xml as POJO thus we can do whatever we want from java pojo

Can I format data that is to be written in CSV file using java

I have some code to write data into a CSV file, but it writes data into a CSV
without formatting it properly. I want to bold some specific text. Is that possible?
CSV is just a plain text format, so you can't do any formatting.
If you want formatting, consider using an Excel library such as Apache POI or Jasper Reports. (Of course, then you end up with an excel file rather than a CSV, so depending on your situation that may or may not be appropriate)
As a side note, there are some strange nuances to writing CSV (such as making sure quotes, commas etc are properly escaped). There's a nice lightweight library I've used called Open CSV that might make your life easier if you choose to just stick with plain old CSV.
CSV is a plain text file format, you can not use any text effect.
Not that I am aware of, CSV is a plain text format.
If you are creating a csv, so that you can open it up in Excel, then I would suggest taking a look at the MS Excel XML format.
http://en.wikipedia.org/wiki/Microsoft_Office_XML_formats#Excel_XML_Spreadsheet_example
An example would be as follows (taken from the wikipedia link, and this makes some text BOLD)
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook
xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<DocumentProperties xmlns="urn:schemas-microsoft-com:office:office">
<Author>Darl McBride</Author>
<LastAuthor>Bill Gates</LastAuthor>
<Created>2007-03-15T23:04:04Z</Created>
<Company>SCO Group, Inc.</Company>
<Version>11.8036</Version>
</DocumentProperties>
<ExcelWorkbook xmlns="urn:schemas-microsoft-com:office:excel">
<WindowHeight>6795</WindowHeight>
<WindowWidth>8460</WindowWidth>
<WindowTopX>120</WindowTopX>
<WindowTopY>15</WindowTopY>
<ProtectStructure>False</ProtectStructure>
<ProtectWindows>False</ProtectWindows>
</ExcelWorkbook>
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom" />
<Borders />
<Font />
<Interior />
<NumberFormat />
<Protection />
</Style>
<Style ss:ID="s21">
<Font x:Family="Swiss" ss:Bold="1" />
</Style>
</Styles>
<Worksheet ss:Name="Sheet1">
<Table ss:ExpandedColumnCount="2" ss:ExpandedRowCount="5"
x:FullColumns="1" x:FullRows="1">
<Row>
<Cell>
<Data ss:Type="String">Text in cell A1</Data>
</Cell>
</Row>
<Row>
<Cell ss:StyleID="s21">
<Data ss:Type="String">Bold text in A2</Data>
</Cell>
</Row>
<Row ss:Index="4">
<Cell ss:Index="2">
<Data ss:Type="Number">43</Data>
</Cell>
</Row>
<Row>
<Cell ss:Index="2" ss:Formula="=R[-1]C/2">
<Data ss:Type="Number">21.5</Data>
</Cell>
</Row>
</Table>
<WorksheetOptions xmlns="urn:schemas-microsoft-com:office:excel">
<Print>
<ValidPrinterInfo />
<HorizontalResolution>600</HorizontalResolution>
<VerticalResolution>600</VerticalResolution>
</Print>
<Selected />
<Panes>
<Pane>
<Number>3</Number>
<ActiveRow>5</ActiveRow>
<ActiveCol>1</ActiveCol>
</Pane>
</Panes>
<ProtectObjects>False</ProtectObjects>
<ProtectScenarios>False</ProtectScenarios>
</WorksheetOptions>
</Worksheet>
</Workbook
I plain old regular CSV no. But there is no reason why you could not encode your data before writing it out... for example the HTML tags for bold is <b></b>. This would signal to you exactly which portions of the text are bold and being a from a well know standard is still human readable too. The main drawback is you have to parse your data after you read it :(
Something else to consider, since you are writing the data out why not write it out as comma separated values in RTF or some other format that does support bold etc? Normally CSV is plain text but there is no reason why you couldn't write it out another way. Just remember to read it back in the same format...

XPath Trouble with getting attributes

I'm having a bit of trouble with some XML in Java. The following is the result of an API call to EVE Online. How can I get the "name" and "characterID" for each row?
Frankly I just have no idea where to start with this one, so please don't ask for extra information. I just gotta know how to get those attributes.
<?xml version='1.0' encoding='UTF-8'?>
<eveapi version="1">
<currentTime>2007-12-12 11:48:50</currentTime>
<result>
<rowset name="characters" key="characterID" columns="name,characterID,corporationName,corporationID">
<row name="Mary" characterID="150267069"
corporationName="Starbase Anchoring Corp" corporationID="150279367" />
<row name="Marcus" characterID="150302299"
corporationName="Marcus Corp" corporationID="150333466" />
<row name="Dieinafire" characterID="150340823"
corporationName="Center for Advanced Studies" corporationID="1000169" />
</rowset>
</result>
<cachedUntil>2007-12-12 12:48:50</cachedUntil>
</eveapi>
Try
/eveapi/result/rowset/row/#name
and
/eveapi/result/rowset/row/#key

Categories

Resources