Parse error while converting edi to java?

Parse error while converting edi to java? - java

I am trying to convert EDI data format to java.
The EDI data is as follows
HDR*1*0*59.97*64.92*4.95*Wed Nov 15 13:45:28 EST 2006
CUS*user1*Harry^Fletcher*SD
ORD*1*1*364*The 40-Year-Old Virgin*29.98
ORD*2*1*299*Pulp Fiction*29.99
I have referred to the folllowing link while implementing this.
While executing the project, getting the below error:
Caused by: org.smooks.api.SmooksException: Parse Error: Failed to populate order-item[2]. Cause: Parse Error: Terminator '%NL;' not found
I tired executing the mentioned project, wanted data formatted to be java object.
But ended up with the below error
Caused by: org.smooks.api.SmooksException: Parse Error: Failed to populate order-item[2]. Cause: Parse Error: Terminator '%NL;' not found

You are missing a newline at the end of the EDI document hence the error. For some reason, the newline isn't rendered when viewing the example file from GitHub but it's present when you view it locally.

Related

Issue loading .XML files with doc4J in Java

I am facing an issue where I cannot load even the sample word2003xml.xml which is provided by doc4J for tests in docx4j-samples-docx4j-8.3.1.zip found here https://www.docx4java.org/downloads.html
I tried loading the file using 2 different constructors but the result is the same.
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new FileInputStream(new File("C:\\Mine\\project4tests\\word2003xml.xml")));
WordprocessingMLPackage wordMLPackage2 = WordprocessingMLPackage.load(new java.io.File("C:\\Mine\\project4tests\\word2003xml.xml"));
Here is the exception that I am getting:
Exception in thread "main" org.docx4j.openpackaging.exceptions.Docx4JException: Couldn't load xml from stream
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:641)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:418)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:376)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:341)
at org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(WordprocessingMLPackage.java:182)
at Main.main(Main.java:13)
Caused by: javax.xml.bind.UnmarshalException
with linked exception:
[com.sun.istack.internal.SAXParseException2; lineNumber: 3; columnNumber: 827; unexpected element (uri:"http://schemas.microsoft.com/office/word/2003/wordml", local:"wordDocument"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}part>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>]
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.handleStreamException(UnmarshallerImpl.java:468)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:402)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal(UnmarshallerImpl.java:371)
at org.docx4j.convert.in.FlatOpcXmlImporter.<init>(FlatOpcXmlImporter.java:132)
at org.docx4j.openpackaging.packages.OpcPackage.load(OpcPackage.java:638)
... 5 more
Caused by: com.sun.istack.internal.SAXParseException2; lineNumber: 3; columnNumber: 827; unexpected element (uri:"http://schemas.microsoft.com/office/word/2003/wordml", local:"wordDocument"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}part>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.handleEvent(UnmarshallingContext.java:726)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:247)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportError(Loader.java:242)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.Loader.reportUnexpectedChildElement(Loader.java:109)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext$DefaultRootLoader.childElement(UnmarshallingContext.java:1131)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext._startElement(UnmarshallingContext.java:556)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallingContext.startElement(UnmarshallingContext.java:538)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.InterningXmlVisitor.startElement(InterningXmlVisitor.java:60)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.handleStartElement(StAXStreamConnector.java:231)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:165)
at com.sun.xml.internal.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:400)
... 8 more
Caused by: javax.xml.bind.UnmarshalException: unexpected element (uri:"http://schemas.microsoft.com/office/word/2003/wordml", local:"wordDocument"). Expected elements are <{http://schemas.microsoft.com/office/2006/xmlPackage}package>,<{http://schemas.microsoft.com/office/2006/xmlPackage}part>,<{http://schemas.microsoft.com/office/2006/xmlPackage}xmlData>
... 19 more
There is no issue loading a .DOCX file, however what I need to use the docx4J library is to convert an old .DOC (WordprocessingML more like an .XML) file into a .DOCX.
Similar to what is done here https://coderanch.com/t/721499/java/Word-XML-DOCX
Does anybody know why I cannot load the file properly?

See https://github.com/plutext/docx4j/blob/master/docx4j-core/src/main/java/org/docx4j/convert/in/word2003xml/Word2003XmlConverter.java for 2003 XML files.
Note that .doc is the old binary format; its not XML, it is something different again.

Functionality by JasonPlutext here: https://github.com/plutext/docx4j/blob/master/docx4j-core/src/main/java/org/docx4j/convert/in/word2003xml/Word2003XmlConverter.java
This was later fixed with this commit here: https://github.com/plutext/docx4j/commit/2c846e7c633d0264757521d15a3f5f37b037b815

java.io.IOException Not a data file after converting JSON to Avro with Avro Tools

I have a JSON file and an avro schema file, which correctly describes it's structure.
I then convert the JSON file with the Avro tools into an avro file, without getting an error, like this:
java -jar .\avro-tools-1.7.7.jar fromjson --schema-file .\data.avsc .\data.json > .\data.avro
I then convert the generated Avro file back to JSON to verify that I got a valid Avro file like this:
java -jar .\avro-tools-1.7.7.jar tojson .\data.avro > .\data.json
This throws the error:
Exception in thread "main" java.io.IOException: Not a data file.
at org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileReader.<init>(DataFileReader.java:97)
at org.apache.avro.tool.DataFileGetMetaTool.run(DataFileGetMetaTool.java:64)
at org.apache.avro.tool.Main.run(Main.java:84)
at org.apache.avro.tool.Main.main(Main.java:73)
I get the same exception when doing 'getschema' or 'getmeta' and also if I use avro-tools-1.8.2 or avro-tools-1.7.4.
I also tried it with multiple, varying pairs of json and schema data which I checked for validity.
The error is thrown here (in the Avro tools):
if (!Arrays.equals(DataFileConstants.MAGIC, magic)) {
throw new IOException("Not a data file.");
}
It seems, the (binary) Avro file does not match the expected Avro file due to a few characters at the beginning.
I have checked all of the other stackoverflow questions regarding this error, but none of them helped. I used the command line on a Windows 10 PowerShell.
See https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/#json-to-binary-avro
Anyone got an idea what the heck is going on here?
UPDATE:
The conversion works if I do it on a Cloudera VM instead of in Windows. Only a few bites at the beginning are different in the generated Avro files.

Found the cause:
The Windows 10 PowerShell transforms the binary stream into a UTF8 stream. Changing the encoding changes the magic bytes, which (correctly) causes the exception to be thrown.
It works perfectly in another shell like the terminal etc.
Side note: the PowerShell app can be forced not to change the encoding by using a pipe instead of greater-than like so:
java -jar .\avro-tools-1.7.7.jar fromjson --schema-file .\data.avsc .\data.json | .\data.avro

storing special characters in neo4j database from java

I'm trying to store a text that contains special characters like (ç,é,è,à ...) in neo4j from java using the REST api but i'm getting the following error :
{"results":[],
"errors":[{"code":"Neo.ClientError.Request.InvalidFormat","message":"Unable to deserialize request: Invalid UTF-8 start byte 0xfd\n at [Source: HttpInputOverHTTP#b15eccb; line: 1, column: 1237]"}]}
I already setted the charset to UTF-8 in the header of my request
I did also add the following line to the neo4j.conf file , section JVM parameters :
dbms.jvm.additional=-Dfile.encoding=UTF8
but this still getting the same error.
does anyone have a solution for this ?

Tika unable to parse after detecting mime-type

I have earlier succeeded in parsing all kinds of files with Tika by calling tika.parseToString() without setting any custom configuration or metadata. Now I have the need to filter files to parse based on mime-type.
I can find the mime-type with tika.detect(new BufferedInputStream(inputStream), new Metadata());, but when calling tika.parseToString() afterwards tika uses EmptyParser and the content-type detected is "application/octet-stream". This is default, meaning that tika is unable to find what type of file it is. I have tried to set the content type in Metadata before trying to parse the file, but this leads to org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException. From what I've read this means that the file is malformed, but the same files gets parsed successfully without the check for mime-type beforehand.
Does detect() do something with the InputStream, making the parser unable to parse the files?
I'm using the same tika-instance for both checking the mime-type and parsing, version 1.13

My issue was caused by passing InputStream to the parse method directly. detect() marks and resets the stream passed, which InputStream does not support. Wrapping the InputStream into a TikaInputStream(TikaInputStream stream = TikaInputStream.get(new BufferedInputStream(inputStream));) solved the issue.

After using POI with excel the .xlsx file gets corrupted therefore the code complains about InvalidFormat

I've been working with excel document to read/write data to it using POI API with Java. I get an invalid format exception. The file size becomes 0(zero). The format of the file is the same when I look at it manually so I am not sure why after using exactly the same code at certain point of execution the file gets corrupted. I always make a backup of the document manually, if changes are in, otherwise if file is corrupted I have no way of repairing or restoring it.
Here is an exception:
org.apache.poi.POIXMLException: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:41)
at org.apache.poi.xssf.usermodel.XSSFWorkbook.<init>(XSSFWorkbook.java:218)
Caused by: org.apache.poi.openxml4j.exceptions.InvalidFormatException: Package should contain a content type part [M1.13]
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:199)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:665)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:274)
at org.apache.poi.util.PackageHelper.open(PackageHelper.java:39)
... 4 more
Exception in thread "main" java.lang.NullPointerException

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Parse error while converting edi to java? - java

You are missing a newline at the end of the EDI document hence the error. For some reason, the newline isn't rendered when viewing the example file from GitHub but it's present when you view it locally.

Related

Issue loading .XML files with doc4J in Java

java.io.IOException Not a data file after converting JSON to Avro with Avro Tools

storing special characters in neo4j database from java

Tika unable to parse after detecting mime-type

After using POI with excel the .xlsx file gets corrupted therefore the code complains about InvalidFormat

Categories

Resources