SAXException: Content is not allowed in trailing section - java

This is driving me crazy. I have used this bit of code for lots of different projects but this is the first time it's given me this type of error. This is the whole XML file:
<layers>
<layer name="Layer 1" h="400" w="272" z="0" y="98" x="268"/>
<layer name="Layer 0" h="355" w="600" z="0" y="287" x="631"/>
</layers>
Here is the operative bit of code in my homebrew Xml class which uses the DocumentBuilderFactory to parse the Xml fed into it:
public static Xml parse(String xmlString)
{
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
Document doc = null;
//System.out.print(xmlString);
try
{
doc = dbf.newDocumentBuilder().parse(
new InputSource(new StringReader(xmlString)));
// Get root element...
Node rootNode = (Element) doc.getDocumentElement();
return getXmlFromNode(rootNode);
} catch (ParserConfigurationException e)
{
System.out.println("ParserConfigurationException in Xml.parse");
e.printStackTrace();
} catch (SAXException e)
{
System.out.println("SAXException in Xml.parse ");
e.printStackTrace();
} catch (IOException e)
{
System.out.println("IOException in Xml.parse");
e.printStackTrace();
}
return null;
}
The context that I am using it is: school project to produce a Photoshop type image manipulation application. The file is being saved with the layers as .png and this xml file for the position, etc. of the layers in a .zip file. I don't know if the zipping is adding some mysterious extra characters or not.
I appreciate your feedback.

If you look at that file in an editor, you'll see content (perhaps whitespace) following the end element e.g.
</layers> <-- after here
It's worth dumping this out using a tool that will highlight whitespace chars e.g.
$ cat -v -e my.xml
will dump 'unprintable' characters.

Hopefully this can be helpful to someone at some point. The fix that worked was just to use lastIndexOf() with substring. Here's the code in situ:
public void loadFile(File m_imageFile)
{
try
{
ZipFile zipFile = new ZipFile(m_imageFile);
ZipEntry xmlZipFile = zipFile.getEntry("xml");
byte[] buffer = new byte[10000];
zipFile.getInputStream(xmlZipFile).read(buffer);
String xmlString = new String(buffer);
Xml xmlRoot = Xml.parse(xmlString.substring(0, xmlString.lastIndexOf('>')+1));
for(List<Xml> iter = xmlRoot.getNestedXml(); iter != null; iter = iter.next())
{
String layerName = iter.element().getAttributes().getValueByName("name");
m_view.getCanvasPanel().getLayers().add(
new Layer(ImageIO.read(zipFile.getInputStream(zipFile.getEntry(layerName))),
Integer.valueOf(iter.element().getAttributes().getValueByName("x")),
Integer.valueOf(iter.element().getAttributes().getValueByName("y")),
Integer.valueOf(iter.element().getAttributes().getValueByName("w")),
Integer.valueOf(iter.element().getAttributes().getValueByName("h")),
Integer.valueOf(iter.element().getAttributes().getValueByName("z")),
iter.element().getAttributes().getValueByName("name"))
);
}
zipFile.close();
} catch (FileNotFoundException e)
{
System.out.println("FileNotFoundException in MainController.loadFile()");
e.printStackTrace();
} catch (IOException e)
{
System.out.println("IOException in MainController.loadFile()");
e.printStackTrace();
}
}
Thanks for all the people that contributed. I suspect the error was either introduced by the zip process or by using the byte[] buffer. Any further feedback is appreciated.

I had some extra char at the end of XML, check the XML properly or do an online format of XML , which will throw error if XML is not proper. I used
Online XML Formatter

I just had this error in Sterling Integrator - when I looked at the file in a hex editor
it had about 5 extra lines of char(0), not space. No idea where they from, but this was precisely the issue, especially as I was basically doing a unity transform, so the xsl engine - in this case xalan - was obviously passing it through into the result. Removed extra rows after last close angle bracket, problem solved.

Related

Java stax: The reference to entity "R" must end with the ';' delimiter

I am trying to parse a xml using stax but the error I get is:
javax.xml.stream.XMLStreamException: ParseError at [row,col]:[414,47]
Message: The reference to entity "R" must end with the ';' delimiter.
Which get stuck on the line 414 which has P&Rinside the xml file. The code I have to parse it is:
public List<Vild> getVildData(File file){
XMLInputFactory factory = XMLInputFactory.newFactory();
try {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(Files.readAllBytes(file.toPath()));
XMLStreamReader reader = factory.createXMLStreamReader(byteArrayInputStream, "iso8859-1");
List<Vild> vild = saveVild(reader);
reader.close();
return vild;
} catch (IOException e) {
e.printStackTrace();
} catch (XMLStreamException e) {
e.printStackTrace();
}
return Collections.emptyList();
}
private List<Vild> saveVild(XMLStreamReader streamReader) {
List<Vild> vildList = new ArrayList<>();
try{
Vild vild = new Vild();
while (streamReader.hasNext()) {
streamReader.next();
//Creating list with data
}
}catch(XMLStreamException | IllegalStateException ex) {
ex.printStackTrace();
}
return Collections.emptyList();
}
I read online that the & is invalid xml code but I don't know how to change it before it throws this error inside the saveVild method. Does someone know how to do this efficiently?
Change the question: you're not trying to parse an XML file, you're trying to parse a non-XML file. For that, you need a non-XML parser, and to write such a parser you need to start with a specification of the language you are trying to parse, and you'll need to agree the specification of this language with the other partners to the data interchange.
How much work you could all save by conforming to standards!
Treat broken XML arriving in your shop the way you would treat any other broken goods coming from a supplier: return it to sender marked "unfit for purpose".
The problem here, as you mention is that the parser finds the & and it expects also the ;
This gets fixed escaping the character, so that the parser finds & instead.
Take a look here for further reference

Wrong characters in Java XML?

I have the following problem:
For a project I created my own logger, which produces an xml file with custom tags.
The problem is that both using DOM and JAXB to create the XML probably have problems in encoding. Since the "content" field always produces incorrect characters.
I have already tried to change the encoding with UTF-8 / windows-1252.
I found that in reality the project that I then run the logger on uses ISO-8859-1 I tried to replace that too, but nothing. As output of the content field I always get these incomprehensible characters.
Can anyone help me?
My Code:
if (OS.contains("Window")) {
try {
fh = new FileHandler(userDir+s+logF+s+jade+s+nameAgent+"-receive(Logger Java).xml" );
logger.addHandler(fh);
XMLFormatter formatter = new XMLFormatter();
fh.setFormatter(formatter);
logger.info(" ");
}
catch (SecurityException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
XmlCreator xmlcreator = new XmlCreator();
xmlcreator.setOntology(onto);
xmlcreator.setPerformative(perf);
xmlcreator.settimeStamp(ts);
xmlcreator.setProtocol(pro);
xmlcreator.setReceiver(rec);
xmlcreator.setContent(con);
try {
File file = new File("C:\\Users\\Francesco\\Desktop\\writereceiver.xml");
JAXBContext jaxbContext = JAXBContext.newInstance(XmlCreator.class);
Marshaller jaxbMarshaller = jaxbContext.createMarshaller();
// output pretty printed
jaxbMarshaller.setProperty(Marshaller.JAXB_ENCODING, "UTF-8");
jaxbMarshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
jaxbMarshaller.marshal(xmlcreator, file);
jaxbMarshaller.marshal(xmlcreator, System.out);
} catch (JAXBException e) {
e.printStackTrace();
}
Output XML (problem in content tag) :
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<xmlCreator>
<content>’ sr pojo.SongRequestInfoÃÃcÀWCë</content>
<performative>ACCEPT-PROPOSAL</performative>
<receiver>jade.util.leap.ArrayList$1#445c4a59</receiver>
<timeStamp>1583849551513</timeStamp>
</xmlCreator>
I agree with #VCR. In all likelihood the output XML is a correctly encoded UTF-8 XML document, and it only looks odd because you are looking at it using some piece of software that doesn't know how to display UTF-8.
The prevalence of character pairs starting  is symptomatic of what happens when you display UTF-8 data using software that thinks it is displaying iso-8859-1.

OpenNLP-Document Categorizer- how to classify documents based on status; language of docs not English, also default features?

I want to classify my documents using OpenNLP's Document Categorizer, based on their status: pre-opened, opened, locked, closed etc.
I have 5 classes and I'm using the Naive Bayes algorithm, 60 documents in my training set, and trained my set on 1000 iterations with 1 cut off param.
But no success, when I test them I don't get good results. I was thinking maybe it is because of the language of the documents (is not in English) or maybe I should somehow add the statuses as features. I have set the default features in the categorizer, and also I'm not very familiar with them.
The result should be locked, but its categorized as opened.
InputStreamFactory in=null;
try {
in= new MarkableFileInputStreamFactory(new
File("D:\\JavaNlp\\doccategorizer\\doccategorizer.txt"));
}
catch (FileNotFoundException e2) {
System.out.println("Creating new input stream");
e2.printStackTrace();
}
ObjectStream lineStream=null;
ObjectStream sampleStream=null;
try {
lineStream = new PlainTextByLineStream(in, "UTF-8");
sampleStream = new DocumentSampleStream(lineStream);
}
catch (IOException e1) {
System.out.println("Document Sample Stream");
e1.printStackTrace();
}
TrainingParameters params = new TrainingParameters();
params.put(TrainingParameters.ITERATIONS_PARAM, 1000+"");
params.put(TrainingParameters.CUTOFF_PARAM, 1+"");
params.put(AbstractTrainer.ALGORITHM_PARAM,
NaiveBayesTrainer.NAIVE_BAYES_VALUE);
DoccatModel model=null;
try {
model = DocumentCategorizerME.train("en", sampleStream, params, new
DoccatFactory());
}
catch (IOException e)
{
System.out.println("Training...");
e.printStackTrace();
}
System.out.println("\nModel is successfully trained.");
BufferedOutputStream modelOut=null;
try {
modelOut = new BufferedOutputStream(new
FileOutputStream("D:\\JavaNlp\\doccategorizer\\classifier-maxent.bin"));
}
catch (FileNotFoundException e) {
System.out.println("Creating output stream");
e.printStackTrace();
}
try {
model.serialize(modelOut);
}
catch (IOException e) {
System.out.println("Serialize...");
e.printStackTrace();
}
System.out.println("\nTrained model is kept in:
"+"model"+File.separator+"en-cases-classifier-maxent.bin");
DocumentCategorizer doccat = new DocumentCategorizerME(model);
String[] docWords = "Some text here...".replaceAll("[^A-Za-z]", " ").split(" ");
double[] aProbs = doccat.categorize(docWords);
System.out.println("\n---------------------------------\nCategory :
Probability\n---------------------------------");
for(int i=0;i<doccat.getNumberOfCategories();i++){
System.out.println(doccat.getCategory(i)+" : "+aProbs[i]);
}
System.out.println("---------------------------------");
System.out.println("\n"+doccat.getBestCategory(aProbs)+" : is the category
for the given sentence");
Can someone make a suggestion for me how to categorize my documents well, like should I add a language detector first, or add new features?
Thanks in advance
By default, the document classifier takes the document text and forms a bag of words. Each word in the bag becomes a feature. As long as the language can be tokenized by an English tokenizer (again by default a white space tokenizer), I would guess that the language is not your problem. I would check the format of the data you are using for the training data. It should be formatted like this:
category<tab>document text
The text should fit be one line. The opennlp documentation for the document classifier can be found at http://opennlp.apache.org/docs/1.9.0/manual/opennlp.html#tools.doccat.training.tool
It would be helpful if you could provide a line or two of training data to help examine the format.
Edit: Another potential issue. 60 documents may not be enough documents to train a good classifier, particularly if you have a large vocabulary. Also, even though this is not English, please tell me it is not multiple languages. Finally, is the document text the best way to classify the document? Would metadata from the document itself produce better features.
Hope it helps.

Using java exception for condition checking

I'm creating a single xml file uploader in my grails application. There is two types of files, Ap and ApWithVendor. I would like to auto detect the file type and convert the xml to the correct object using SAXParser.
What I've been doing is throwing an exception when the sax parser is unable to find a qName match within the the first Ap object using the endElement method. I then catch the exception and try the the ApWithVendor object.
My question is there a better way to do this without doing my condition checking with exceptions?
Code example
try {
System.out.println("ApBatch");
Batch<ApBatchEntry> batch = new ApBatchConverter().convertFromXML(new String(xmlDocument, StandardCharsets.UTF_8));
byte[] xml = new ApBatchConverter().convertToXML(batch, true);
String xmlString = new String(xml, StandardCharsets.UTF_8);
System.out.println(xmlString);
errors = client.validateApBatch(batch);
if (!errors.isEmpty()) {
throw new BatchValidationException(errors);
}
return;
} catch (BatchConverterException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
try {
System.out.println("ApVendorBatch");
Batch<ApWithVendorBatchEntry> batch = new ApWithVendorBatchConverter().convertFromXML(new String(xmlDocument, StandardCharsets.UTF_8));
byte[] xml = new ApWithVendorBatchConverter().convertToXML(batch, true);
String xmlString = new String(xml, StandardCharsets.UTF_8);
System.out.println(xmlString);
errors = client.validateApWithVendorBatch(batch);
if (!errors.isEmpty()) {
throw new BatchValidationException(errors);
}
return;
} catch (BatchConverterException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
You can always iterate over the nodes in the XML and base decision on the fact that specific Node is missing (or is present - or has specific value) (see DocumentBuilder and Document class)
Using exceptions for decision-making or flow-control in 99% situations is considered bad practice.
Try converting the XML string to an XML tree object first and use XPath to decide if it's an ApWithVendor structure. I.e. check if there is an element like "/application/foo/vendor" path in the structure.
Once you have decided, convert the XML tree object to an object.

Specific characters not rendering properly in Java

I have an issue when displaying strings received from a server in a JTable. Some specific characters appear as little white squares instead of "é" or "à" etc. I tried a lot of things but none of them fixed my problem. I'm working with Eclipse under Windows. The server was developped using Visual Studio 2010.
The server sends an XML file using tinyXML2, the client uses JDom to read it. The font used is "Dialog". The server takes the strings from an Oracle database.
I assume this is an encoding problem, but I haven't been able to fix it yet.
Does anyone have an idea ?
Thx
Arnaud
EDIT : As requested, this is how I use JDom
public static Player fromXML(Element e)
{
Player result = new Player();
String e_text = null;
try
{
e_text = e.getChildText(XMLTags.XML_Player_playerId);
if (e_text != null) result.setID(Integer.parseInt(e_text));
e_text = e.getChildText(XMLTags.XML_Player_lastName);
if (e_text != null) result.setName(e_text);
e_text = e.getChildText(XMLTags.XML_Player_point_scored);
if (e_text != null) result.addSpecial(STAT_SCORED, Double.parseDouble(e_text));
e_text = e.getChildText(XMLTags.XML_Player_point_scored_last);
if (e_text != null) result.addSpecial(STAT_SCORED_LAST, Double.parseDouble(e_text));
}
catch (Exception ex) {
ex.printStackTrace();
}
return result;
}
public static Document load(String filename) {
File XMLFile = new File(CLIENT_TO_SERVER, filename);
SAXBuilder sxb = new SAXBuilder();
Document document = new Document();
try
{
document = sxb.build(new File(XMLFile.getPath()));
} catch(Exception e){e.printStackTrace();}
return document;
}
read the file using correct encoding, something like:
document = sxb.build(new BufferedReader(new InputStreamReader(new FileInputStream(XMLFile.getPath()), "UTF8")));
Note: 1. 1st determine which char encoding used in that file. specify that charset instead of UTF8 above.
Incase encoding is not known or it's being generated from various systems with different encoding, you may use 'encoding detector library of Mozilla'. #see https://code.google.com/p/juniversalchardet/
need to handle UnsupportedEncodingException

Categories

Resources