In my case i am able to read the xml file and parse it to get content as of meta data only provides the type of file which is "application/xml"
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.parser.xml.XMLParser;
import org.apache.tika.sax.BodyContentHandler;
import org.xml.sax.SAXException;
public class XmlParserExample {
public static void main(String[] args) throws IOException, SAXException, TikaException {
BodyContentHandler handler = new BodyContentHandler();
XMLParser parser = new XMLParser();
Metadata metadata = new Metadata();
ParseContext pcontext = new ParseContext();
FileInputStream inputstream = new FileInputStream(new File("example.xml"));
parser.parse(inputstream, handler, metadata, pcontext);
System.out.println("Contents of the document:" + handler.toString());
System.out.println("Metadata of the document:");
String[] metadataNames = metadata.names();
for(String name : metadataNames) {
System.out.println(name + ": " + metadata.get(name));
}
}
}
Above snippet of code prints the whole xml content and Content Type (as metadata).But i also want to fetch the xml tags as well so that i can create a HashMap which is requirement in my case.
Below is my Dummy example.xml:-
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE PubmedArticleSet SYSTEM "http://dtd.nlm.nih.gov/ncbi/pubmed/out/pubmed_190101.dtd">
<PubmedArticleSet>
<PubmedArticle>
<MedlineCitation Status="MEDLINE" Owner="NLM">
<PMID Version="1">27483086</PMID>
<DateCompleted>
<Year>2018</Year>
<Month>05</Month>
<Day>02</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>05</Month>
<Day>02</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">1532-849X</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>26</Volume>
<Issue>4</Issue>
<PubDate>
<Year>2017</Year>
<Month>Jun</Month>
</PubDate>
</JournalIssue>
<Title>Journal of prosthodontics : official journal of the American College of Prosthodontists</Title>
<ISOAbbreviation>J Prosthodont</ISOAbbreviation>
</Journal>
<ArticleTitle>The Use of CADCAM Technology for Fabricating Cast Gold Survey Crowns under Existing Partial Removable Dental Prosthesis. A Clinical Report.</ArticleTitle>
<Pagination>
<MedlinePgn>321-326</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10.1111jopr.12525</ELocationID>
<Abstract>
<AbstractText>The fabrication of a survey crown under an existing partial removable dental prosthesis (PRDP) has always been a challenge to many dental practitioners. This clinical report presents a technique for fabricating accurate cast gold survey crowns to fit existing PRDPs using CAD/CAM technology. The report describes a technique that would digitally scan the coronal anatomy of a cast gold survey crown and an abutment tooth under existing PRDPs planned for restoration, prior to any preparation. The information is stored in the digital software where all the coronal anatomical details are preserved without any modifications. The scanned designs are then applied to the scanned teeth preparations, sent to the milling machine and milled into full-contour clear acrylic resin burn-out patterns. The acrylic resin patterns are tried in the patient's mouth the same day to verify the full seating of the PRDP components. The patterns are then invested and cast into gold crowns and cemented in the conventional manner.</AbstractText>
<CopyrightInformation>© 2016 by the American College of Prosthodontists.</CopyrightInformation>
</Abstract>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>El Kerdani</LastName>
<ForeName>Tarek</ForeName>
<Initials>T</Initials>
<AffiliationInfo>
<Affiliation>Department of Restorative Dental Sciences, Division of Prosthodontics, University of Florida College of Dentistry, Gainesville, FL.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Roushdy</LastName>
<ForeName>Sally</ForeName>
<Initials>S</Initials>
<AffiliationInfo>
<Affiliation>Department of Restorative Dental Sciences, Division of Prosthodontics, University of Florida College of Dentistry, Gainesville, FL.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<PublicationTypeList>
<PublicationType UI="D002363">Case Reports</PublicationType>
<PublicationType UI="D016428">Journal Article</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2016</Year>
<Month>08</Month>
<Day>02</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>J Prosthodont</MedlineTA>
<NlmUniqueID>9301275</NlmUniqueID>
<ISSNLinking>1059-941X</ISSNLinking>
</MedlineJournalInfo>
<ChemicalList>
<Chemical>
<RegistryNumber>7440-57-5</RegistryNumber>
<NameOfSubstance UI="D006046">Gold</NameOfSubstance>
</Chemical>
</ChemicalList>
<CitationSubset>D</CitationSubset>
<MeshHeadingList>
<MeshHeading>
<DescriptorName UI="D000368" MajorTopicYN="N">Aged</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017076" MajorTopicYN="Y">Computer-Aided Design</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D003442" MajorTopicYN="Y">Crowns</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D000044" MajorTopicYN="N">Dental Abutments</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D017267" MajorTopicYN="Y">Dental Prosthesis Design</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D003832" MajorTopicYN="Y">Denture, Partial, Removable</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006046" MajorTopicYN="N">Gold</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D006801" MajorTopicYN="N">Humans</DescriptorName>
</MeshHeading>
<MeshHeading>
<DescriptorName UI="D008297" MajorTopicYN="N">Male</DescriptorName>
</MeshHeading>
</MeshHeadingList>
<KeywordList Owner="NOTNLM">
<Keyword MajorTopicYN="N">CADM</Keyword>
<Keyword MajorTopicYN="N">cast gold</Keyword>
<Keyword MajorTopicYN="N">milled acrylic resin patterns</Keyword>
</KeywordList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="accepted">
<Year>2016</Year>
<Month>06</Month>
<Day>13</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2016</Year>
<Month>8</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2018</Year>
<Month>5</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2016</Year>
<Month>8</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">27483086</ArticleId>
<ArticleId IdType="doi">10.111pr.12525</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
<PubmedArticle>
<MedlineCitation Status="PubMed-not-MEDLINE" Owner="NLM">
<PMID Version="1">27483087</PMID>
<DateCompleted>
<Year>2018</Year>
<Month>08</Month>
<Day>07</Day>
</DateCompleted>
<DateRevised>
<Year>2018</Year>
<Month>08</Month>
<Day>07</Day>
</DateRevised>
<Article PubModel="Print-Electronic">
<Journal>
<ISSN IssnType="Electronic">2326-5205</ISSN>
<JournalIssue CitedMedium="Internet">
<Volume>68</Volume>
<Issue>11</Issue>
<PubDate>
<Year>2016</Year>
<Month>11</Month>
</PubDate>
</JournalIssue>
<Title>Arthritis & rheumatology (Hoboken, N.J.)</Title>
</Journal>
<ArticleTitle>Reply.</ArticleTitle>
<Pagination>
<MedlinePgn>2826-2827</MedlinePgn>
</Pagination>
<ELocationID EIdType="doi" ValidYN="Y">10t.39831</ELocationID>
<AuthorList CompleteYN="Y">
<Author ValidYN="Y">
<LastName>Hitchon</LastName>
<ForeName>Carol Ann</ForeName>
<Initials>CA</Initials>
<AffiliationInfo>
<Affiliation>University of Manitoba, Winnipeg, Manitoba, Canada.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Koppejan</LastName>
<ForeName>Hester</ForeName>
<Initials>H</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Trouw</LastName>
<ForeName>Leendert A</ForeName>
<Initials>LA</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Huizinga</LastName>
<ForeName>Tom J W</ForeName>
<Initials>TJ</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>Toes</LastName>
<ForeName>René E M</ForeName>
<Initials>RE</Initials>
<AffiliationInfo>
<Affiliation>Leiden University Medical Center, Leiden, The Netherlands.</Affiliation>
</AffiliationInfo>
</Author>
<Author ValidYN="Y">
<LastName>El-Gabalawy</LastName>
<ForeName>Hani S</ForeName>
<Initials>HS</Initials>
<AffiliationInfo>
<Affiliation>University of Manitoba, Winnipeg, Manitoba, Canada.</Affiliation>
</AffiliationInfo>
</Author>
</AuthorList>
<Language>eng</Language>
<GrantList CompleteYN="Y">
<Grant>
<GrantID>MOP‐77700</GrantID>
<Agency>CIHR</Agency>
<Country>Canada</Country>
</Grant>
</GrantList>
<PublicationTypeList>
<PublicationType UI="D016422">Letter</PublicationType>
<PublicationType UI="D013485">Research Sup</PublicationType>
<PublicationType UI="D016420">Comment</PublicationType>
</PublicationTypeList>
<ArticleDate DateType="Electronic">
<Year>2016</Year>
<Month>10</Month>
<Day>09</Day>
</ArticleDate>
</Article>
<MedlineJournalInfo>
<Country>United States</Country>
<MedlineTA>Arthritis Rheumatol</MedlineTA>
<NlmUniqueID>101623795</NlmUniqueID>
<ISSNLinking>2326-5191</ISSNLinking>
</MedlineJournalInfo>
<CommentsCorrectionsList>
<CommentsCorrections RefType="CommentOn">
<RefSource>dff</RefSource>
<PMID Version="1">27483211</PMID>
</CommentsCorrections>
<CommentsCorrections RefType="CommentOn">
<RefSource>Arthritis Rheumato</RefSource>
<PMID Version="1">26946484</PMID>
</CommentsCorrections>
</CommentsCorrectionsList>
</MedlineCitation>
<PubmedData>
<History>
<PubMedPubDate PubStatus="received">
<Year>2016</Year>
<Month>07</Month>
<Day>26</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="accepted">
<Year>2016</Year>
<Month>07</Month>
<Day>28</Day>
</PubMedPubDate>
<PubMedPubDate PubStatus="pubmed">
<Year>2016</Year>
<Month>10</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="medline">
<Year>2016</Year>
<Month>10</Month>
<Day>28</Day>
<Hour>6</Hour>
<Minute>1</Minute>
</PubMedPubDate>
<PubMedPubDate PubStatus="entrez">
<Year>2016</Year>
<Month>8</Month>
<Day>3</Day>
<Hour>6</Hour>
<Minute>0</Minute>
</PubMedPubDate>
</History>
<PublicationStatus>ppublish</PublicationStatus>
<ArticleIdList>
<ArticleId IdType="pubmed">27483087</ArticleId>
<ArticleId IdType="doi">efre</ArticleId>
</ArticleIdList>
</PubmedData>
</PubmedArticle>
</PubmedArticleSet>
Kindly help me out on this.
Thanks
My suggestion: If you want to read an XML file, and then parse its contents, you are probably better off using a purpose-built XML parser, rather than Tika.
There are various options - each with its own pros and cons (for example speed, memory consumption).
Here is one approach - it reads the entire file into memory, but you already do that with your Tika approach, so I assume file size is not a problem.
The code assumes there is a file called pubmed.xml which contains the XML presented in the question.
It reads the XML from file, and handles each element as a DOM node.
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.DocumentBuilder;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
import org.w3c.dom.Element;
import java.io.File;
...
public void parseUsingDom() {
try {
File xmlFile = new File("C:/tmp/pubmed.xml");
DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
Document doc = dBuilder.parse(xmlFile);
doc.getDocumentElement().normalize();
NodeList articles = doc.getElementsByTagName("Article");
for (int i = 0; i < articles.getLength(); i++) {
Node article = articles.item(i);
if (article.getNodeType() == Node.ELEMENT_NODE) {
Element articleElement = (Element) article;
String title = articleElement
.getElementsByTagName("ArticleTitle")
.item(0).getTextContent();
System.out.println("");
System.out.println("Title : " + title);
NodeList authors = articleElement.getElementsByTagName("Author");
for (int j = 0; j < authors.getLength(); j++) {
Node author = authors.item(j);
if (author.getNodeType() == Node.ELEMENT_NODE) {
Element authorElement = (Element) author;
String foreName = authorElement
.getElementsByTagName("ForeName")
.item(0).getTextContent();
String lastName = authorElement
.getElementsByTagName("LastName")
.item(0).getTextContent();
System.out.println("Author : " + lastName + ", " + foreName);
}
}
}
}
} catch (Exception e) {
System.err.print(e);
}
}
The program prints the following output, just as a demo of what is possible:
Title : The Use of CADCAM Technology for Fabricating Cast Gold Survey Crowns under Existing Partial Removable Dental Prosthesis. A Clinical Report.
Author : El Kerdani, Tarek
Author : Roushdy, Sally
Title : Reply.
Author : Hitchon, Carol Ann
Author : Koppejan, Hester
Author : Trouw, Leendert A
Author : Huizinga, Tom J W
Author : Toes, René E M
Author : El-Gabalawy, Hani S
In your case, you would capture the relevant values in a hash map, of course.
i have been trying to validate xml through xslt for couple of hours. i have following xsl form for xml validation. every time i try to validate xml, i get following warnings below and empty currencyid attributes in xml are ignored and xml validate although its not. Does anyone has an idea why its ignored and validated ?
<xsl:variable name="CurrencyCodeList"
select="',AED,AFN,ALL,AMD,ANG,AOA,ARS,AUD,AWG,AZN,BAM,BBD,BDT,BGN,BHD,BIF,BMD,BND,BOB,BOV,BRL,BSD,BTN,BWP,BYR,BZD,CAD,CDF,CHE,CHF,CHW,CLF,CLP,CNY,COP,COU,CRC,CUC,CUP,CVE,CZK,DJF,DKK,DOP,DZD,EEK,EGP,ERN,ETB,EUR,FJD,FKP,GBP,GEL,GHS,GIP,GMD,GNF,GTQ,GWP,GYD,HKD,HNL,HRK,HTG,HUF,IDR,ILS,INR,IQD,IRR,ISK,JMD,JOD,JPY,KES,KGS,KHR,KMF,KPW,KRW,KWD,KYD,KZT,LAK,LBP,LKR,LRD,LSL,LTL,LVL,LYD,MAD,MAD,MDL,MGA,MKD,MMK,MNT,MOP,MRO,MUR,MVR,MWK,MXN,MXV,MYR,MZN,NAD,NGN,NIO,NOK,NPR,NZD,OMR,PAB,PEN,PGK,PHP,PKR,PLN,PYG,QAR,RON,RSD,RUB,RWF,SAR,SBD,SCR,SDG,SEK,SGD,SHP,SLL,SOS,SRD,STD,SVC,SYP,SZL,THB,TJS,TMT,TND,TOP,TRY,TTD,TWD,TZS,UAH,UGX,USD,USN,USS,UYI,UYU,UZS,VEF,VND,VUV,WST,XAF,XAG,XAU,XBA,XBB,XBC,XBD,XCD,XDR,XFU,XOF,XPD,XPF,XPF,XPF,XPT,XTS,XXX,YER,ZAR,ZMK,ZWL,'"/>
<xsl:template match="//#currencyID" priority="1008" mode="M0">
<svrl:fired-rule xmlns:svrl="http://purl.oclc.org/dsdl/svrl" context="//#currencyID"/>
<!--ASSERT -->
<xsl:choose>
<xsl:when test="contains($CurrencyCodeList, concat(',',.,','))"/>
<xsl:otherwise>
<svrl:failed-assert xmlns:svrl="http://purl.oclc.org/dsdl/svrl"
test="contains($CurrencyCodeList, concat(',',.,','))">
<xsl:attribute name="location">
<xsl:apply-templates select="." mode="schematron-select-full-path"/>
</xsl:attribute>
<svrl:text>Geçersiz currencyID niteliği : '<xsl:text/>
<xsl:value-of select="."/>
<xsl:text/>'. Geçerli değerler için kod listesine bakınız.</svrl:text>
</svrl:failed-assert>
</xsl:otherwise>
</xsl:choose>
<xsl:apply-templates select="*|comment()|processing-instruction()" mode="M0"/>
Warning: on line 286
The preceding-sibling axis starting at a namespace node will never select anything
Warning: on line 311
The preceding-sibling axis starting at a namespace node will never select anything
Warning: on line 407
The child axis starting at an attribute node will never select anything
Warning: on line 407
The child axis starting at an attribute node will never select anything
Warning: on line 407
The child axis starting at an attribute node will never select anything
Warning: on line 436
The child axis starting at an attribute node will never select anything
Warning: on line 436
The child axis starting at an attribute node will never select anything
Warning: on line 436
EDIT:
actually i transformed schematron to xslt in order to test it also in xslt. given was schematron for validating. so actually i have to validate through given schematron files, sample xml and java code for validating. main schematron and other files have more rules and patterns. but i removed most of them for sample code and easily testing. everything is validated successfully except attributes in elements.( e.g currencyId attribute). im using UgliSch (Ugli Schematron Validator) for schematron validation.
MainSchematron.xml:
<?xml version="1.0" encoding="UTF-8"?>
<sch:schema xmlns="http://purl.oclc.org/dsdl/schematron"
xmlns:sch="http://purl.oclc.org/dsdl/schematron"
xmlns:sh="http://www.unece.org/cefact/namespaces/StandardBusinessDocumentHeader"
xmlns:ef="http://www.efatura.gov.tr/envelope-namespace">
<sch:include href="UBL-TR_Codelist.xml#codes"/>
<sch:include href="UBL-TR_Common_Schematron.xml#abstracts"/>
<sch:ns prefix="cbc" uri="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" />
<sch:rule context="//cbc:CurrencyCode">
<sch:extends rule="GeneralCurrencyCodeCheck"/>
</sch:rule>
<sch:rule context="//#currencyID">
<sch:extends rule="GeneralCurrencyIDCheck"/>
</sch:rule>
</sch:schema>
UBL-TR_Common_Schematron.xml:
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
xmlns="http://purl.oclc.org/dsdl/schematron">
<sch:pattern name="AbstractRules" id="abstracts">
<sch:p>Pattern for storing abstract rules</sch:p>
<!-- Rule to validate currencyID Genel -->
<sch:rule abstract="true" id="GeneralCurrencyIDCheck">
<sch:assert test="contains($CurrencyCodeList, concat(',',.,','))">Geçersiz currencyID niteliği : '<sch:value-of select="."/>'. Geçerli değerler için kod listesine bakınız.</sch:assert>
</sch:rule>
</sch:pattern>
</sch:schema>
UBL-TR_Codelist.xml:
<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
xmlns="http://purl.oclc.org/dsdl/schematron">
<sch:pattern name="CodeList" id="codes">
<sch:let name="CurrencyCodeList" value="',AED,AFN,ALL,AMD,ANG,AOA,ARS,AUD,AWG,AZN,BAM,BBD,BDT,BGN,BHD,BIF,BMD,BND,BOB,BOV,BRL,BSD,BTN,BWP,BYR,BZD,CAD,CDF,CHE,CHF,CHW,CLF,CLP,CNY,COP,COU,CRC,CUC,CUP,CVE,CZK,DJF,DKK,DOP,DZD,EEK,EGP,ERN,ETB,EUR,FJD,FKP,GBP,GEL,GHS,GIP,GMD,GNF,GTQ,GWP,GYD,HKD,HNL,HRK,HTG,HUF,IDR,ILS,INR,IQD,IRR,ISK,JMD,JOD,JPY,KES,KGS,KHR,KMF,KPW,KRW,KWD,KYD,KZT,LAK,LBP,LKR,LRD,LSL,LTL,LVL,LYD,MAD,MAD,MDL,MGA,MKD,MMK,MNT,MOP,MRO,MUR,MVR,MWK,MXN,MXV,MYR,MZN,NAD,NGN,NIO,NOK,NPR,NZD,OMR,PAB,PEN,PGK,PHP,PKR,PLN,PYG,QAR,RON,RSD,RUB,RWF,SAR,SBD,SCR,SDG,SEK,SGD,SHP,SLL,SOS,SRD,STD,SVC,SYP,SZL,THB,TJS,TMT,TND,TOP,TRY,TTD,TWD,TZS,UAH,UGX,USD,USN,USS,UYI,UYU,UZS,VEF,VND,VUV,WST,XAF,XAG,XAU,XBA,XBB,XBC,XBD,XCD,XDR,XFU,XOF,XPD,XPF,XPF,XPF,XPT,XTS,XXX,YER,ZAR,ZMK,ZWL,'"/>
</sch:pattern>
</sch:schema>
sample.xml:
<sh:StandardBusinessDocument xsi:schemaLocation="http://www.unece.org/cefact/namespaces/StandardBusinessDocumentHeader PackageProxy_1_2.xsd" xmlns:ext="urn:oasis:names:specification:ubl:schema:xsd:CommonExtensionComponents-2" xmlns:ef="http://www.efatura.gov.tr/package-namespace" xmlns:oa="http://www.openapplications.org/oagis/9" xmlns:cac="urn:oasis:names:specification:ubl:schema:xsd:CommonAggregateComponents-2" xmlns:sh="http://www.unece.org/cefact/namespaces/StandardBusinessDocumentHeader" xmlns:cbc="urn:oasis:names:specification:ubl:schema:xsd:CommonBasicComponents-2" xmlns:ns9="urn:oasis:names:specification:ubl:schema:xsd:Invoice-2" xmlns:ns11="urn:oasis:names:specification:ubl:schema:xsd:ApplicationResponse-2" xmlns:ns3="http://www.hr-xml.org/3" xmlns:ds="http://www.w3.org/2000/09/xmldsig#" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<sh:StandardBusinessDocumentHeader>
<sh:HeaderVersion>1.0</sh:HeaderVersion>
<sh:Sender>
<sh:Identifier>urn:mail:defaultgb#xxx.com.tr</sh:Identifier>
<sh:ContactInformation>
<sh:Contact>xxx Kurumsal Bilgi Sistemleri A.Ş</sh:Contact>
<sh:ContactTypeIdentifier>UNVAN</sh:ContactTypeIdentifier>
</sh:ContactInformation>
<sh:ContactInformation>
<sh:Contact>8110120507</sh:Contact>
<sh:ContactTypeIdentifier>VKN_TCKN</sh:ContactTypeIdentifier>
</sh:ContactInformation>
</sh:Sender>
<sh:Receiver>
<sh:Identifier>urn:mail:defaultpk#xxx.com.tr</sh:Identifier>
<sh:ContactInformation>
<sh:Contact>KAKAR KURUMSAL BİLGİSİSTEMLERİ LTD.ŞTİ. Test Kullanıcısı</sh:Contact>
<sh:ContactTypeIdentifier>UNVAN</sh:ContactTypeIdentifier>
</sh:ContactInformation>
<sh:ContactInformation>
<sh:Contact>4545552073</sh:Contact>
<sh:ContactTypeIdentifier>VKN_TCKN</sh:ContactTypeIdentifier>
</sh:ContactInformation>
</sh:Receiver>
<sh:DocumentIdentification>
<sh:Standard>UBL-TR</sh:Standard>
<sh:TypeVersion>1.2</sh:TypeVersion>
<sh:InstanceIdentifier>bb583542-a81a-4b45-87d6-e90596101a41</sh:InstanceIdentifier>
<sh:Type>SENDERENVELOPE</sh:Type>
<sh:MultipleType>false</sh:MultipleType>
<sh:CreationDateAndTime>2016-01-06T16:27:25.759+02:00</sh:CreationDateAndTime>
</sh:DocumentIdentification>
</sh:StandardBusinessDocumentHeader>
<ef:Package>
<Elements>
<ElementType>INVOICE</ElementType>
<ElementCount>1</ElementCount>
<ElementList>
<ns9:Invoice>
<ext:UBLExtensions>
<ext:UBLExtension>
<ext:ExtensionContent>
...
</ext:ExtensionContent>
</ext:UBLExtension>
</ext:UBLExtensions>
<cbc:UBLVersionID>2.1</cbc:UBLVersionID>
<cbc:CustomizationID>TR1.2</cbc:CustomizationID>
<cbc:ProfileID>TICARIFATURA</cbc:ProfileID>
<cbc:ID>PAZ2015000000012</cbc:ID>
<cbc:CopyIndicator>false</cbc:CopyIndicator>
<cbc:UUID>54b0dad2-e3a7-44ee-848a-cf7977000020</cbc:UUID>
<cbc:IssueDate>2016-01-06</cbc:IssueDate>
<cbc:InvoiceTypeCode>SATIS</cbc:InvoiceTypeCode>
<cbc:DocumentCurrencyCode>TRY</cbc:DocumentCurrencyCode>
<cbc:LineCountNumeric>0</cbc:LineCountNumeric>
<cac:Signature>
<cbc:ID schemeID="VKN_TCKN">8110120507</cbc:ID>
<cac:SignatoryParty>
<cbc:WebsiteURI>http://www.xxx.com.tr/</cbc:WebsiteURI>
<cac:PartyIdentification>
<cbc:ID schemeID="VKN">8110120507</cbc:ID>
</cac:PartyIdentification>
<cac:PartyName>
<cbc:Name>xxx Kurumsal Bilgi Sistemleri A.Ş</cbc:Name>
</cac:PartyName>
<cac:PostalAddress>
<cbc:StreetName>Besiktas Teknik Universitesi</cbc:StreetName>
<cbc:BuildingNumber>150/1G</cbc:BuildingNumber>
<cbc:CitySubdivisionName>Besıktas</cbc:CitySubdivisionName>
<cbc:CityName>Istanbul</cbc:CityName>
<cbc:PostalZone>06100</cbc:PostalZone>
<cac:Country>
<cbc:Name>Turkiye</cbc:Name>
</cac:Country>
</cac:PostalAddress>
</cac:SignatoryParty>
<cac:DigitalSignatureAttachment>
<cac:ExternalReference>
<cbc:URI>#Signature</cbc:URI>
</cac:ExternalReference>
</cac:DigitalSignatureAttachment>
</cac:Signature>
<cac:AccountingSupplierParty>
<cac:Party>
<cac:PartyIdentification>
<cbc:ID schemeID="VKN">7221130507</cbc:ID>
</cac:PartyIdentification>
<cac:PartyName>
<cbc:Name>KAKAR KURUMSAL LTD.ŞTİ.</cbc:Name>
</cac:PartyName>
<cac:PostalAddress>
<cbc:Room/>
<cbc:BuildingName/>
<cbc:BuildingNumber/>
<cbc:CitySubdivisionName>besiktas</cbc:CitySubdivisionName>
<cbc:CityName>istanbul</cbc:CityName>
<cbc:PostalZone/>
<cac:Country>
<cbc:Name>ALMANYA</cbc:Name>
</cac:Country>
</cac:PostalAddress>
<cac:Contact>
<cbc:Telephone/>
<cbc:Telefax/>
<cbc:ElectronicMail/>
</cac:Contact>
</cac:Party>
</cac:AccountingSupplierParty>
<cac:AccountingCustomerParty>
<cac:Party>
<cbc:WebsiteURI/>
<cac:PartyIdentification>
<cbc:ID schemeID="VKN">2535552073</cbc:ID>
</cac:PartyIdentification>
<cac:PartyName>
<cbc:Name>KAKAR LTD.ŞTİ. Test Kullanıcısı</cbc:Name>
</cac:PartyName>
<cac:PostalAddress>
<cbc:ID/>
<cbc:Postbox/>
<cbc:Room/>
<cbc:StreetName/>
<cbc:BlockName/>
<cbc:BuildingName/>
<cbc:BuildingNumber/>
<cbc:CitySubdivisionName>besiktas</cbc:CitySubdivisionName>
<cbc:CityName>istanbul</cbc:CityName>
<cbc:PostalZone/>
<cbc:Region/>
<cbc:District/>
<cac:Country>
<cbc:Name>TÜRKİYE</cbc:Name>
</cac:Country>
</cac:PostalAddress>
<cac:Contact>
<cbc:Telephone/>
<cbc:Telefax/>
<cbc:ElectronicMail/>
</cac:Contact>
<cac:Person>
<cbc:FirstName/>
<cbc:FamilyName/>
</cac:Person>
</cac:Party>
</cac:AccountingCustomerParty>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="TRY">2.16</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxableAmount currencyID="asdasdasdasd">0</cbc:TaxableAmount>
<cbc:TaxAmount currencyID="TRY">2.16</cbc:TaxAmount>
<cbc:CalculationSequenceNumeric>0</cbc:CalculationSequenceNumeric>
<cbc:Percent>18</cbc:Percent>
<cac:TaxCategory>
<cac:TaxScheme>
<cbc:Name>KDV</cbc:Name>
<cbc:TaxTypeCode>0015</cbc:TaxTypeCode>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
<cac:LegalMonetaryTotal>
<cbc:LineExtensionAmount currencyID="TRY">12</cbc:LineExtensionAmount>
<cbc:TaxExclusiveAmount currencyID="TRY">12</cbc:TaxExclusiveAmount>
<cbc:TaxInclusiveAmount currencyID="TRY">14.16</cbc:TaxInclusiveAmount>
<cbc:AllowanceTotalAmount currencyID="TRY">0</cbc:AllowanceTotalAmount>
<cbc:PayableAmount currencyID="TRY">14.16</cbc:PayableAmount>
</cac:LegalMonetaryTotal>
<cac:InvoiceLine>
<cbc:ID>1</cbc:ID>
<cbc:InvoicedQuantity unitCode="NIU">1</cbc:InvoicedQuantity>
<cbc:LineExtensionAmount currencyID="">12</cbc:LineExtensionAmount>
<cac:AllowanceCharge>
<cbc:ChargeIndicator>false</cbc:ChargeIndicator>
<cbc:MultiplierFactorNumeric>0</cbc:MultiplierFactorNumeric>
<cbc:Amount currencyID="">0</cbc:Amount>
<cbc:BaseAmount currencyID="">0</cbc:BaseAmount>
</cac:AllowanceCharge>
<cac:TaxTotal>
<cbc:TaxAmount currencyID="">2.16</cbc:TaxAmount>
<cac:TaxSubtotal>
<cbc:TaxableAmount currencyID="">0</cbc:TaxableAmount>
<cbc:TaxAmount currencyID="">2.16</cbc:TaxAmount>
<cbc:Percent>18</cbc:Percent>
<cac:TaxCategory>
<cac:TaxScheme>
<cbc:Name>KDV</cbc:Name>
<cbc:TaxTypeCode>0015</cbc:TaxTypeCode>
</cac:TaxScheme>
</cac:TaxCategory>
</cac:TaxSubtotal>
</cac:TaxTotal>
<cac:Item>
<cbc:Name>asdasd</cbc:Name>
</cac:Item>
<cac:Price>
<cbc:PriceAmount currencyID="TRY">12</cbc:PriceAmount>
</cac:Price>
</cac:InvoiceLine>
</ns9:Invoice>
</ElementList>
</Elements>
</ef:Package>
</sh:StandardBusinessDocument>
java :
try (InputStream ubl = getClass().getResourceAsStream("/schematrons/UBL-TR_Main_Schematron.xml");) {
SchemaFactory schemaFactory = SchemaFactory.newInstance(XmlSchemaNsUris.SCHEMATRON_NS_URI);
Schema schema = schemaFactory.newSchema(new StreamSource(ubl));
Validator validator = schema.newValidator();
validator.setErrorHandler(validationErrorHandler);
validator.validate(new StringSource(new String(binary,"UTF-8")));
} catch (Exception e) {
e.printStackTrace();
}