How to count the elements in XML through DOM parser - java

I am trying to parse an XML using DOM parser. I would like to get the count of the records in the XML basing on the type "Senior Software Developer". Below is my XML file. Could someone suggest me how do I get the count.
<class>
<employee type="Senior Software Developer">
<empid>A433568</empid>
<empname>John Mathews</empname>
<address>6th Avenue Street</address>
</employee>
<employee type="Junior Software Developer">
<empid>A433678</empid>
<empname>Sunny Mathews</empname>
<address>5th Avenue Street</address>
</employee>
<employee type="Trainee">
<empid>A434567</empid>
<empname>Brad Hodge</empname>
<address>4th Avenue Street</address>
</employee>
<employee type="Senior Software Developer">
<empid>A433599</empid>
<empname>Glenn Powell</empname>
<address>6th Avenue Street</address>
</employee>
<employee type="Senior Software Developer">
<empid>A433588</empid>
<empname>Recordo Mathews</empname>
<address>6th Avenue Street</address>
</employee>
</class>

Please refer to a similar question
How to get specific XML elements with specific attribute value?
The following python script can also be used to get the details.
import xml.dom.minidom
import collections
xml_data = xml.dom.minidom.parse("test.xml")
emps = xml_data.getElementsByTagName("employee")
ssd = [emp.getAttribute('type') for emp in emps]
print collections.Counter(ssd)

Related

How can an xml string containing an embedded CDATA xml string be formatted as 'pretty' xml

I have an xml string containing an embedded CDATA xml string. I need to format 'pretty' xml
Example string:
<?xml version="1.0" encoding-\"UTF-8\" standlaone=\"no\"?>
<catalog>
<book id=\"b1\">
<title>XML Developer's Guide</title>
<description>An in-depth look at creating applications with XML.</description>
<data>
<![CDATA[<?xml version=\"1.0\" encoding-\"UTF-8\" standlaone=\"no\"?><details><author>Gambardella, Matthew</author><genre>Computer</genre><price>44.95</price><publish_date>2000-10-01</publish_date></details>]]>
</data>
</book>
</catalog>
What is the easiest way in java or reactjs to create a pretty string:
<?xml version="1.0" encoding-\"UTF-8\" standlaone=\"no\"?>
<catalog>
<book id=\"b1\">
<title>XML Developer's Guide</title>
<description>An in-depth look at creating applications with XML.</description>
<data>
<?xml version=\"1.0\" encoding-\"UTF-8\" standlaone=\"no\"?>
<details>
<author>Gambardella, Matthew</author>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
</details>
</data>
</book>
</catalog>
I feel this cannot be done in reactjs and I would need to use java to extract the inner xml and create 2 xml:
<?xml version="1.0" encoding-\"UTF-8\" standlaone=\"no\"?>
<catalog>
<book id=\"b1\">
<title>XML Developer's Guide</title>
<description>An in-depth look at creating applications with XML.</description>
</book>
</catalog>
<?xml version="1.0" encoding-\"UTF-8\" standlaone=\"yes\"?>
<data>
<details>
<author>Gambardella, Matthew</author>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
</details>
</data>
I would appreciate any ideas or alternatives using reactjs or java?
Obviously the XML parser doesn't know that the CDATA section contains XML (CDATA means "character data", that is it's explicitly telling the parser to treat the content as plain text, not as markup). So the only way to handle this is to put it through two stages of parsing: the second stage reads the text node representing the CDATA section, and parses it again, grafting the resulting node tree into the original tree.
Note that the output you want isn't well-formed XML: you can't have an XML declaration in the middle of the document. With that qualification, you can achieve the desired effect in XSLT 3.0 as:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="3.0">
<xsl:mode on-no-match="shallow-copy"/>
<xsl:output method="xml" indent="yes"/>
<xsl:template match="data">
<xsl:copy-of select="parse-xml(.)"/>
</xsl:template>
</xsl:transform>
which you can run from Java with the help of the Saxon library. (Disclaimer: my company's product.) The free open-source version will do the job.

How can I synchronize data between XML Files and SQLite?

I have a XML File which I read data from XML File into Java Object and then insert these Java Objects into SQLite Database.
Then I modify my XML File and insert more data into it, how can I keep data between XML File and SQLite synchronized.
My XML File
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<Employees xmlns="https://www.journaldev.com/employee">
<Employee id="1">
<name>Pankaj</name>
<age>29</age>
<role>Java Developer</role>
<gender>Male</gender>
</Employee>
<Employee id="2">
<name>Lisa</name>
<age>35</age>
<role>Manager</role>
<gender>Female</gender>
</Employee>
</Employees>
Code to read XML File Employee in Java
https://anotepad.com/notes/qnn7nb2k
What you are looking for is a file watcher and since jdk1.7, you have the possibility to do this via WatchService.
A working example can be found in the docs here

Loading large xml files into Snowflake and flattening by tag

I have some extremely large XML files that I need to process. I used to process them using Spark, but I am moving away from SQLDW and onto Snowflake, so I can no longer use Spark. In Spark, there was a concept of flattening XML files by providing a "rowTag" to a spark function. Let us say we have this persons.xml file:
<persons>
<person id="1">
<firstname>James</firstname>
<lastname>Smith</lastname>
<middlename></middlename>
<dob_year>1980</dob_year>
<dob_month>1</dob_month>
<gender>M</gender>
<salary currency="Euro">10000</salary>
<addresses>
<address>
<street>123 ABC street</street>
<city>NewJersy</city>
<state>NJ</state>
</address>
<address>
<street>456 apple street</street>
<city>newark</city>
<state>DE</state>
</address>
</addresses>
</person>
<person id="2">
<firstname>Michael</firstname>
<lastname></lastname>
<middlename>Rose</middlename>
<dob_year>1990</dob_year>
<dob_month>6</dob_month>
<gender>M</gender>
<salary currency="Dollor">10000</salary>
<addresses>
<address>
<street>4512 main st</street>
<city>new york</city>
<state>NY</state>
</address>
<address>
<street>4367 orange st</street>
<city>sandiago</city>
<state>CA</state>
</address>
</addresses>
</person>
</persons>
If I want to flatten this XML file to look like a CSV with headers firstname, lastname, middlename, dob_year, dob_month... etc, I would run a function that looks like this:
val df = spark.read
.format("com.databricks.spark.xml")
.option("rowTag", "person")
.load("persons.xml");
display(df);
By providing spark the rowTag person in the .option() function, we get a dataframe that looks like this:
_id addresses dob_month dob_year firstname gender lastname middlename salary
1 {"address":[{"city":"NewJersy","state":"NJ","street":"123 ABC street"},{"city":"newark","state":"DE","street":"456 apple street"}]} 1 1980 James M Smith {"_VALUE":10000,"_currency":"Euro"}
2 {"address":[{"city":"new york","state":"NY","street":"4512 main st"},{"city":"sandiago","state":"CA","street":"4367 orange st"}]} 6 1990 Michael M Rose {"_VALUE":10000,"_currency":"Dollor"}
It's a little difficult to read, so here is an image to help...
Anyways, I was wondering how I could do this with Snowflake, if it is possible? I would like to avoid pre-processing my xml file if possible.
Remember, these files are large. 1Gb+. There is also no guarantee that the files will have the rowTag in the beginning or near the beginning - it could be several hundred lines down the file.
Some ideas for you:
On load, use STRIP_OUTER_ELEMENT = TRUE to eliminate the PERSONS tag, and have each PERSON object land in it's own row. This simplifies the data and allows you to load larger files.
Flatten the XML to find all the paths. For example, select *
from my_table a, lateral flatten(input=>a.data, recursive=>true) b;
Translate the paths from the flatten notation into the field notation and build your query:
For example (assuming PERSONS outer tag removed):
select
data:"#id"::number id,
data:"$"[0]."$"::text first_name,
data:"$"[1]."$"::text last_name
from my_table;
Where data is your XML column.
Hope that helps.
UPDATE -- Sample XML to use with query above:
create or replace table my_table as
select parse_xml($1) as data
from values ('
<person id="1">
<firstname>James</firstname>
<lastname>Smith</lastname>
<middlename></middlename>
<dob_year>1980</dob_year>
<dob_month>1</dob_month>
<gender>M</gender>
<salary currency="Euro">10000</salary>
<addresses>
<address>
<street>123 ABC street</street>
<city>NewJersy</city>
<state>NJ</state>
</address>
<address>
<street>456 apple street</street>
<city>newark</city>
<state>DE</state>
</address>
</addresses>
</person>'),('
<person id="2">
<firstname>Michael</firstname>
<lastname></lastname>
<middlename>Rose</middlename>
<dob_year>1990</dob_year>
<dob_month>6</dob_month>
<gender>M</gender>
<salary currency="Dollor">10000</salary>
<addresses>
<address>
<street>4512 main st</street>
<city>new york</city>
<state>NY</state>
</address>
<address>
<street>4367 orange st</street>
<city>sandiago</city>
<state>CA</state>
</address>
</addresses>
</person>
');

JAXB unmarshall XML to Hashmap containing another Hashmap

I have the following XML which I want to unmarshall using JAXB:
<?xml version="1.0" encoding="UTF-8"?>
<questions>
<question id="1">
<text>What is the capital of Germany?</text>
<correctAnswer>2</correctAnswer>
<answers>
<answer id="1">
<text>Munich</text>
</answer>
<answer id="2">
<text>Berlin</text>
</answer>
</answers>
</question>
<question id="2">
<text>What is the capital of France?</text>
<correctAnswer>1</correctAnswer>
<answers>
<answer id="1">
<text>Paris</text>
</answer>
<answer id="2">
<text>Marseille</text>
</answer>
</answers>
</question>
</questions>
I want to create a HashMap:
HashMap<String, Question>
from which I can get the questions by their ID.
Each question object should also have a Hashmap:
HashMap<String, Answer>
from which I can get the answers by their ID.
I started off using the code from this thread: JAXB unmarshal XML elements to HashMap
Now I have two problems / questions:
The accepted answer in that thread suggests using "#XmlPath(".")", but for that I'm supposed to import an eclipse plugin:
import org.eclipse.persistence.oxm.annotations.XmlPath;
But I'm using IntelliJ, and I don't know which alternative import to use to get this working.
Even if I get the proposed solution "#XmlPath(".")" working, I still don't know how to implement the HashMap property within the question object (containing the answers).

Best way to create huge xml file by duplicating the element(including children) with dom or sax using Java

I have a xml file(ABC.xml) and i need to duplicate only the
<Transaction>...</Transaction>
multiple times(more than 100000 times) keeping the Header and Trailer intact creating NEW.xml whose final size may go upto 1GB. Also i have to increment the Uniqueid for every transaction in sequence.
As i m new to xml, i have been searching to this the best possible way, and im confused.
Can anyone please help me with the best way to do it(using DOM or SAX) and some piece of code.
Also can you please give me some links about it.
ABC.xml
========
<?xml version="1.0" encoding="UTF-8"?>
<Header><Datetime><date>20130209</date><Time>01:12</Time></Datetime></Header>
<Transaction>
<Uniqueid>1230001</Uniqueid>
<Affiliate>
<Name>abc</Name>
<Address>
<line1>aaaa</line1>
<line2>bbbb</line2>
<line3>cccc</line3>
</Address>
<Amount>123.00</Amount>
<Currency>USD</Currency>
<Purpose>
<line1>aaaa</line1>
<line2>bbbb</line2>
<line3>cccc</line3>
</Purpose>
</Affiliate>
</Transaction>
<Trailer><TotalTransactions>1</TotalTransactions><TotalAmount>123<TotalAmount> </Trailer>
NEW.xml
=======
<?xml version="1.0" encoding="UTF-8"?>
<Header><Datetime><date>20130209</date><Time>01:12</Time></Datetime></Header>
<Transaction>
<Uniqueid>1230001</Uniqueid>
<Affiliate>
<Name>abc</Name>
<Address>
<line1>aaaa</line1>
<line2>bbbb</line2>
<line3>cccc</line3>
</Address>
<Amount>123.00</Amount>
<Currency>USD</Currency>
<Purpose>
<line1>aaaa</line1>
<line2>bbbb</line2>
<line3>cccc</line3>
</Purpose>
</Affiliate>
</Transaction>
<Transaction>
<Uniqueid>1230002</Uniqueid>
<Affiliate>
<Name>abc</Name>
<Address>
<line1>aaaa</line1>
<line2>bbbb</line2>
<line3>cccc</line3>
</Address>
<Amount>123.00</Amount>
<Currency>USD</Currency>
<Purpose>
<line1>aaaa</line1>
<line2>bbbb</line2>
<line3>cccc</line3>
</Purpose>
</Affiliate>
</Transaction>
<Trailer><TotalTransactions>2</TotalTransactions><TotalAmount>246<TotalAmount></Trailer>
It would help if your source XML were well-formed - it needs an outer wrapper element.
There are a number of XQuery processors available in Java, for example Saxon. Just execute the query
<doc>{doc/Header, for $i in 1 to 100000 return doc/Transaction, doc/Footer}</doc>
on the supplied input document, assuming <doc> as the outer wrapper element.

Categories

Resources