Java stax: The reference to entity "R" must end with the ';' delimiter - java

I am trying to parse a xml using stax but the error I get is:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[414,47]
Message: The reference to entity "R" must end with the ';' delimiter.
Which get stuck on the line 414 which has P&Rinside the xml file. The code I have to parse it is:
public List<Vild> getVildData(File file){
XMLInputFactory factory = XMLInputFactory.newFactory();
try {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(Files.readAllBytes(file.toPath()));
XMLStreamReader reader = factory.createXMLStreamReader(byteArrayInputStream, "iso8859-1");
List<Vild> vild = saveVild(reader);
reader.close();
return vild;
} catch (IOException e) {
e.printStackTrace();
} catch (XMLStreamException e) {
e.printStackTrace();
}
return Collections.emptyList();
}
private List<Vild> saveVild(XMLStreamReader streamReader) {
List<Vild> vildList = new ArrayList<>();
try{
Vild vild = new Vild();
while (streamReader.hasNext()) {
streamReader.next();
//Creating list with data
}
}catch(XMLStreamException | IllegalStateException ex) {
ex.printStackTrace();
}
return Collections.emptyList();
}
I read online that the & is invalid xml code but I don't know how to change it before it throws this error inside the saveVild method. Does someone know how to do this efficiently?

Change the question: you're not trying to parse an XML file, you're trying to parse a non-XML file. For that, you need a non-XML parser, and to write such a parser you need to start with a specification of the language you are trying to parse, and you'll need to agree the specification of this language with the other partners to the data interchange.
How much work you could all save by conforming to standards!
Treat broken XML arriving in your shop the way you would treat any other broken goods coming from a supplier: return it to sender marked "unfit for purpose".

The problem here, as you mention is that the parser finds the & and it expects also the ;
This gets fixed escaping the character, so that the parser finds & instead.
Take a look here for further reference

Related

Using java exception for condition checking

I'm creating a single xml file uploader in my grails application. There is two types of files, Ap and ApWithVendor. I would like to auto detect the file type and convert the xml to the correct object using SAXParser.
What I've been doing is throwing an exception when the sax parser is unable to find a qName match within the the first Ap object using the endElement method. I then catch the exception and try the the ApWithVendor object.
My question is there a better way to do this without doing my condition checking with exceptions?
Code example
try {
System.out.println("ApBatch");
Batch<ApBatchEntry> batch = new ApBatchConverter().convertFromXML(new String(xmlDocument, StandardCharsets.UTF_8));
byte[] xml = new ApBatchConverter().convertToXML(batch, true);
String xmlString = new String(xml, StandardCharsets.UTF_8);
System.out.println(xmlString);
errors = client.validateApBatch(batch);
if (!errors.isEmpty()) {
throw new BatchValidationException(errors);
}
return;
} catch (BatchConverterException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
try {
System.out.println("ApVendorBatch");
Batch<ApWithVendorBatchEntry> batch = new ApWithVendorBatchConverter().convertFromXML(new String(xmlDocument, StandardCharsets.UTF_8));
byte[] xml = new ApWithVendorBatchConverter().convertToXML(batch, true);
String xmlString = new String(xml, StandardCharsets.UTF_8);
System.out.println(xmlString);
errors = client.validateApWithVendorBatch(batch);
if (!errors.isEmpty()) {
throw new BatchValidationException(errors);
}
return;
} catch (BatchConverterException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
You can always iterate over the nodes in the XML and base decision on the fact that specific Node is missing (or is present - or has specific value) (see DocumentBuilder and Document class)
Using exceptions for decision-making or flow-control in 99% situations is considered bad practice.
Try converting the XML string to an XML tree object first and use XPath to decide if it's an ApWithVendor structure. I.e. check if there is an element like "/application/foo/vendor" path in the structure.
Once you have decided, convert the XML tree object to an object.

How to use OpenNLP parser models in an Android app?

I go through this link for java nlp https://www.tutorialspoint.com/opennlp/index.htm
I tried below code in android:
try {
File file = copyAssets();
// InputStream inputStream = new FileInputStream(file);
ParserModel model = new ParserModel(file);
// Creating a parser
Parser parser = ParserFactory.create(model);
// Parsing the sentence
String sentence = "Tutorialspoint is the largest tutorial library.";
Parse topParses[] = ParserTool.parseLine(sentence, parser,1);
for (Parse p : topParses) {
p.show();
}
} catch (Exception e) {
}
i download file **en-parser-chunking.bin** from internet and placed in assets of android project but code stop on third line i.e ParserModel model = new ParserModel(file); without giving any exception. Need to know how can this work in android? if its not working is there any other support for nlp in android without consuming any services?
The reason the code stalls/breaks at runtime is that you need to use an InputStream instead of a File to load the binary file resource. Most likely, the File instance is null when you "load" it the way as indicated in line 2. In theory, this constructor of ParserModelshould detect this and an IOException should be thrown. Yet, sadly, the JavaDoc of OpenNLP is not precise about this kind of situation and you are not handling this exception properly in the catch block.
Moreover, the code snippet you presented should be improved, so that you know what actually went wrong.
Therefore, loading a POSModel from within an Activity should be done differently. Here is a variant that takes care for both aspects:
AssetManager assetManager = getAssets();
InputStream in = null;
try {
in = assetManager.open("en-parser-chunking.bin");
POSModel posModel;
if(in != null) {
posModel = new POSModel(in);
if(posModel!=null) {
// From here, <posModel> is initialized and you can start playing with it...
// Creating a parser
Parser parser = ParserFactory.create(model);
// Parsing the sentence
String sentence = "Tutorialspoint is the largest tutorial library.";
Parse topParses[] = ParserTool.parseLine(sentence, parser,1);
for (Parse p : topParses) {
p.show();
}
}
else {
// resource file not found - whatever you want to do in this case
Log.w("NLP", "ParserModel could not initialized.");
}
}
else {
// resource file not found - whatever you want to do in this case
Log.w("NLP", "OpenNLP binary model file could not found in assets.");
}
}
catch (Exception ex) {
Log.e("NLP", "message: " + ex.getMessage(), ex);
// proper exception handling here...
}
finally {
if(in!=null) {
in.close();
}
}
This way, you're using an InputStream approach and at the same time you take care for proper exception and resource handling. Moreover, you can now use a Debugger in case something remains unclear with the resource path references of your model files. For reference, see the official JavaDoc of AssetManager#open(String resourceName).
Note well:
Loading OpenNLP's binary resources can consume quite a lot of memory. For this reason, it might be the case that your Android App's request to allocate the needed memory for this operation can or will not be granted by the actual runtime (i.e., smartphone) environment.
Therefore, carefully monitor the amount of requested/required RAM while posModel = new POSModel(in); is invoked.
Hope it helps.

Parse some elements from a xml

i want to know if is possible to me to parse some atributes from a xml file, to be a object in java
I donĀ“t wanna to create all fields that are in xml.
So, how can i do this?
For exemple below there is a xml file, and i want only the data inside the tag .
<emit>
<CNPJ>1109</CNPJ>
<xNome>OESTE</xNome>
<xFant>ABATEDOURO</xFant>
<enderEmit>
<xLgr>RODOVIA</xLgr>
<nro>S/N</nro>
<xCpl>402</xCpl>
<xBairro>GOMES</xBairro>
<cMun>314</cMun>
<xMun>MINAS</xMun>
<UF>MG</UF>
<CEP>35661470</CEP>
<cPais>58</cPais>
<xPais>Brasil</xPais>
<fone>03</fone>
</enderEmit>
<IE>20659</IE>
<CRT>3</CRT>
For Java XML parsing where you don't have the XSD and don't want to create a complete object graph to represent the XML, JDOM is a great tool. It allows you to easily walk the XML tree and pick the elements you are interested in.
Here's some sample code that uses JDOM to pick arbitrary values from the XML doc:
// reading can be done using any of the two 'DOM' or 'SAX' parser
// we have used saxBuilder object here
// please note that this saxBuilder is not internal sax from jdk
SAXBuilder saxBuilder = new SAXBuilder();
// obtain file object
File file = new File("/tmp/emit.xml");
try {
// converted file to document object
Document document = saxBuilder.build(file);
//You don't need this or the ns parameters in getChild()
//if your XML document has no namespace
Namespace ns = Namespace.getNamespace("http://www.example.com/namespace");
// get root node from xml. emit in your sample doc?
Element rootNode = document.getRootElement();
//getChild() assumes one and only one, enderEmit element. Use a lib and error
//checking as needed for your document
Element enderEmitElement = rootNode.getChild("enderEmit", ns);
//now we get two of the child from
Element xCplElement = enderEmitElement.getChild("xCpl", ns);
//should be 402 in your example
String xCplValue = xCplElement.getText();
System.out.println("xCpl: " + xCplValue);
Element cMunElement = enderEmitElement.getChild("cMun", ns);
//should be 314 in your example
String cMunValue = cMunElement.getText();
System.out.println("cMun: " + cMunValue);
} catch (JDOMException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
You can use JAXB to unmarshal the xml into Java object, with which you can read selective elements easily. With JAXB, the given XML can be represented in Java as follows :
enderEmit element :
#XmlRootElement
public class EnderEmit{
private String xLgr;
//Other elements.Here you can define properties for only those elements that you want to load
}
emit element (This represents your XML file):
#XmlRootElement
public class Emit{
private String cnpj;
private String xnom;
private EnderEmit enderEmit;
..
//Add elements that you want to load
}
Now by using the below lines of code, you can read your xml to an object :
String filePath="filePath";
File file = new File(filePath);
JAXBContext jaxbContext = JAXBContext.newInstance(Emit.class);
jaxbUnmarshaller = jaxbContext.createUnmarshaller();
Emit emit = (Emit) jaxbUnmarshaller.unmarshal(file);
The line will give you an emit object for the given xml.
Try to use StringUtils.subStringBetween
try
{
String input = "";
br = new BufferedReader(new FileReader(FILEPATH));
String result = null;
while ((input = br.readLine()) != null) // here we read the file line by line
{
result = StringUtils.substringBetween(input, ">", "<"); // using StringUtils.subStringBetween to get the data what you want
if(result != null) // if the result should not be null because some of the line not having the tags
{
System.out.println(""+result);
}
}
}
catch (IOException e)
{
e.printStackTrace();
}
finally
{
try
{
if (br != null)
{
br.close();
}
}
catch (IOException ex)
{
ex.printStackTrace();
}
}

Is encoding Cp1252 invalid in an XML file?

Some XML file I ran across is failing a well-formed XML check, even though it looks well-formed to me (I might be wrong.)
I have reduced it to a trivial example:
<?xml version="1.0" encoding="Cp1252"?>
<jnlp/>
The method being used to do the check works like this:
public static boolean isWellFormedXml(InputStream inputStream) {
try {
XMLInputFactory inputFactory = XMLInputFactory.newInstance();
inputFactory.setProperty(XMLInputFactory.IS_COALESCING, false);
inputFactory.setProperty(XMLInputFactory.SUPPORT_DTD, false);
XMLStreamReader reader = inputFactory.createXMLStreamReader(stream);
try {
// Scan through all the reader tokens to ensure everything is well formed
while (reader.hasNext()) {
reader.next();
}
} finally {
reader.close();
}
} catch (XMLStreamException e) {
// Ignore the exception
return false;
}
return true;
}
The error I'm seeing is:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1,40]
Message: Invalid encoding name "Cp1252".
Only problem is - I can breakpoint at the catch and confirm that this encoding name does resolve. So what's the deal here? Does XML also restrict which encodings you're allowed to use in the prologue?
check:
http://www.iana.org/assignments/character-sets/character-sets.xml
i guess the encoding you're looking for COULD be windows-1252. Cp1252 might be a valid charset in java, but in XML, you're not supposed to use it (by that name).

SAXException: Content is not allowed in trailing section

This is driving me crazy. I have used this bit of code for lots of different projects but this is the first time it's given me this type of error. This is the whole XML file:
<layers>
<layer name="Layer 1" h="400" w="272" z="0" y="98" x="268"/>
<layer name="Layer 0" h="355" w="600" z="0" y="287" x="631"/>
</layers>
Here is the operative bit of code in my homebrew Xml class which uses the DocumentBuilderFactory to parse the Xml fed into it:
public static Xml parse(String xmlString)
{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
Document doc = null;
//System.out.print(xmlString);
try
{
doc = dbf.newDocumentBuilder().parse(
new InputSource(new StringReader(xmlString)));
// Get root element...
Node rootNode = (Element) doc.getDocumentElement();
return getXmlFromNode(rootNode);
} catch (ParserConfigurationException e)
{
System.out.println("ParserConfigurationException in Xml.parse");
e.printStackTrace();
} catch (SAXException e)
{
System.out.println("SAXException in Xml.parse ");
e.printStackTrace();
} catch (IOException e)
{
System.out.println("IOException in Xml.parse");
e.printStackTrace();
}
return null;
}
The context that I am using it is: school project to produce a Photoshop type image manipulation application. The file is being saved with the layers as .png and this xml file for the position, etc. of the layers in a .zip file. I don't know if the zipping is adding some mysterious extra characters or not.
I appreciate your feedback.
If you look at that file in an editor, you'll see content (perhaps whitespace) following the end element e.g.
</layers> <-- after here
It's worth dumping this out using a tool that will highlight whitespace chars e.g.
$ cat -v -e my.xml
will dump 'unprintable' characters.
Hopefully this can be helpful to someone at some point. The fix that worked was just to use lastIndexOf() with substring. Here's the code in situ:
public void loadFile(File m_imageFile)
{
try
{
ZipFile zipFile = new ZipFile(m_imageFile);
ZipEntry xmlZipFile = zipFile.getEntry("xml");
byte[] buffer = new byte[10000];
zipFile.getInputStream(xmlZipFile).read(buffer);
String xmlString = new String(buffer);
Xml xmlRoot = Xml.parse(xmlString.substring(0, xmlString.lastIndexOf('>')+1));
for(List<Xml> iter = xmlRoot.getNestedXml(); iter != null; iter = iter.next())
{
String layerName = iter.element().getAttributes().getValueByName("name");
m_view.getCanvasPanel().getLayers().add(
new Layer(ImageIO.read(zipFile.getInputStream(zipFile.getEntry(layerName))),
Integer.valueOf(iter.element().getAttributes().getValueByName("x")),
Integer.valueOf(iter.element().getAttributes().getValueByName("y")),
Integer.valueOf(iter.element().getAttributes().getValueByName("w")),
Integer.valueOf(iter.element().getAttributes().getValueByName("h")),
Integer.valueOf(iter.element().getAttributes().getValueByName("z")),
iter.element().getAttributes().getValueByName("name"))
);
}
zipFile.close();
} catch (FileNotFoundException e)
{
System.out.println("FileNotFoundException in MainController.loadFile()");
e.printStackTrace();
} catch (IOException e)
{
System.out.println("IOException in MainController.loadFile()");
e.printStackTrace();
}
}
Thanks for all the people that contributed. I suspect the error was either introduced by the zip process or by using the byte[] buffer. Any further feedback is appreciated.
I had some extra char at the end of XML, check the XML properly or do an online format of XML , which will throw error if XML is not proper. I used
Online XML Formatter
I just had this error in Sterling Integrator - when I looked at the file in a hex editor
it had about 5 extra lines of char(0), not space. No idea where they from, but this was precisely the issue, especially as I was basically doing a unity transform, so the xsl engine - in this case xalan - was obviously passing it through into the result. Removed extra rows after last close angle bracket, problem solved.

Categories

Resources