Avro - java.io.IOException: Not a data file

Avro - java.io.IOException: Not a data file - java

I am using https://github.com/allegro/json-avro-converter to convert my json message into an avro file. After calling the convertToAvro method I get a byte array: byte[] byteArrayJson. Then I am using the commons library from Apache:
FileUtils.writeByteArrayToFile(myFile.avro, byteArrayJson);
The file is created. When I try to reconvert it to json, using:
java -jar avro-tools-1.8.1.jar tojson myFile.avro > testCheck.json
Exception in thread "main" java.io.IOException: Not a data file.
at
org.apache.avro.file.DataFileStream.initialize(DataFileStream.java:105)
at org.apache.avro.file.DataFileStream.<init>(DataFileStream.java:84)
at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:71)
at org.apache.avro.tool.Main.run(Main.java:87)
at org.apache.avro.tool.Main.main(Main.java:76)
I have created a Junit test and used convertToJson method (from the previous link) and assert the strings and it is everything ok. But with the jar it is not working. Am I doing something wrong? I am using the cmd, not powerShell, because I saw in a SO post that this can change the encoding. I think that the problem is with encoding, but I have no idea where to look.
(I am using windows as OS)

The reason is that the avro file do not contain same data when produced from these 2 different ways and this is expected behavior.
As a test, use this command to generate the avro file
java -jar avro-tools-1.8.2.jar fromjson --schema-file avroschema.json
testCheck.json > myFile2.auro
Now read this and print in Java, and notice that it doesnt contain ONLY AVRO RECORD
It contains the scme as well ( at least ) -see the String converted data below.
This means the data in AVRO files is different when generated using acro tools and when using avro converter
bjavro.schemaœ{"type":"record","name":"Acme","fields":[{"name":"username","type":"string"}]}avro.c
The validation within tools API "fails" when you try to read an avro file generated from converter with tojson command.
Now the correct command to use to read the "json" using avro tools when the file is generated using converter is fragtojson. See that we are really reading only JSON fragment ( an avro record here )
java -jar avro-tools-1.8.2.jar fragtojson --schema-file avroschema.json myFile.avro > myFile21.json
Another thought here is avoid using AVRO tools altogether and create your own executable jar with converter as dependency, and use it read AVRO JSON records.

After I had a look at the avro files, I saw that the one created using avro-tools has the schema in it and the one created by the library from github doesn't. So I don't use the method from the library convertToAvro, but I am using convertToGenericDataRecord and then create my own dataFileWriter and set the record (which was obtained after calling the convertToGenericDataRecord method).

Related

Using OpenCSV I can't read first column

I have a problem reading CSV
Dear everybody
I have a CSV with following format in SFTP
Date;Risk 11/01/2020;C4 12/01/2020;C4
I am using Intellij but when I download with java code this File I don't know why receive this content:
Date;Risk 11/01/2020;C4;0,22;N;O;178 12/01/2020;C4;0,22;N;O;178
I think thas is problem for my Intellij when I download file but I don't have any idea why this happened.
I use OpenCSV with version 5.3
List<FieldsCSV> listFieldsCSV = new CsvToBeanBuilder<FieldsCSV>(inStrReader)
.withType(FieldsCSV.class)
.withSeparator(Constants.SEPARATOR_FIELDS)
.withIgnoreQuotations(true)
.withSkipLines(1).build()
.parse();

Could using Java.IO.File cause data to scramble on an AS400 IFS?

I am building a small java application that will take in XML files and convert them to text files. The end result being a jar that sits on the IFS to convert files when called from a shell script run by a RPGLE program. (Not my idea, just charged with making it happen)
To do so I am using JAXB to unmarshall these XML files into JAXB annotated POJOS before putting them into a text file using the a new IFSFile and IFSFileOutputStream. This works great with the exception that the order output data is scrambled or out of its proper order.
I created a second version of the Jar that replaced the IBM Toolbox classes for standard Java.IO classes. This version of the jar when run from Windows outputs the results in the proper order. This same Jar (Java.IO version) run from the 400 itself returns scrambled data too.
Both Jars however use a standard Java.IO file for the XML inputs. This is because JAXB will not accept an IFSFile as an input. See the below code:
File inputFile = new File(source);
JAXBContext context = JAXBContext.newInstance(PriceRecords.class);
Unmarshaller unmarshaller = context.createUnmarshaller();
PriceRecords priceRecords = (PriceRecords) unmarshaller.unmarshal(inputfile);
List<PriceRecord> data = priceRecords.getPriceRecords();
I read somewhere that the AS400s save/read files differently than a standard Linux or Windows OS would. So I am wondering if someone can shed some light on this. If the scrambled data isn't caused by the above inputFile being the Java.IO.File class used on a AS400, what else could cause this difference in functionality?

How to take notice of filenames in different locale in Java

My application is sifting through a M3U playlist and generating a Windows batch file (copy_files.bat) using the following command:
printWriter = new PrintWriter("copy_files.bat", "UTF-8");
It generates the appropriate batch file that mostly works, but fails to copy some files that have foreign characters in their filenames. The same behavior occurs when using Java's built in function for copying files (few files cannot be found on the system due to filename character encoding). Please advise!

To wrap it up, I cannot import some playlists that are M3U, but M3U8 playlist contains the data that can be used. Here's a small project to test this.
http://m3uexporttool.sourceforge.net/

How to use model class of a classifer prepared using weka in eclipse

I have created a classifier in weka using GUI, I saved the classifier and the result was a .model file, I want to use this classifier in my java project in eclipse.
I read this blog, it says that I will require a model.dat file to do this, but how can I get the model.dat file from model file?

The .dat in model.dat is just an extension that the author used for his files. Here's a reference for how to load your model. Here's another article for loading a model in Java.

run pentaho kettle from cmd. how to send the source files as a param?

I'm using Pentaho Kettle 4.0.1. Now I run a transformation from Java by providing a transformation file (XML type not KTR) and give some other XML files (in a src folder) that are meant to be inserted or updated in DB.
What I want is to do these things from a bat or shell file and not from Java. I'm not completely familiarized with kettle... I've seen some example regarding running a kettle transformation from a .bat file but there only is a file parameter that receives the transformation file (ktr). How do I pass as a param to pan.bat/pan.sh the src dir where my xml data files are (the data that is about to be inserted in DB)?

I think you need to read this:
http://wiki.pentaho.com/display/EAI/Pan+User+Documentation

You can use the following syntax for sh file
sh pan.sh -file:Transformation1.ktr -param:parameter1=myname -param:parameter2=30

You can use transformations named parameters to capture the xml files path
Here you can see an example,
This is other way

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Avro - java.io.IOException: Not a data file - java

Related

Using OpenCSV I can't read first column

Could using Java.IO.File cause data to scramble on an AS400 IFS?

How to take notice of filenames in different locale in Java

How to use model class of a classifer prepared using weka in eclipse

run pentaho kettle from cmd. how to send the source files as a param?

Categories

Resources