Read and process large one line JSON file - java

I have a large json file(200 MB), but all are in one single line.
I need to do some processing with the data in the file and write the data in to a relational database.
What is the best way we can do this using java.
Note: Most of the available methods are using line by line reading. Also We can use thing like MappedByteBuffer to read by characters but it is not an efficient solution
Non java solutions are also welcome

I recommend you the library from Douglas Crackford https://github.com/douglascrockford/JSON-java, use the following command to load a json array.
org.json.JSONArray mediaArray = new org.json.JSONArray(filecontent);
Check the following article for read a file content.
http://www.javapractices.com/topic/TopicAction.do?Id=42

Related

Simplest format to read/write huge files

I need to write huge files ( more than 1 million lines) and send the file to a different machine where I need to read it with a Java BufferedReader, one line at a time.
I was using indetned Json format but it turned out to be not very handy,
it requires too much coding and that consumes extra RAM/CPU.
I'm looking for something that looks like this:
client:id="1" name="jack" adress="House N°1\nCity N°3 \n Country 1" age="20"
client:id="2" name="alice" adress="House N°2\nCity N°5 \n Country 2" age="30"
vihecul:id="1" model="ford" hp="250" fuel="diesel"
vihecul:id="2" model="nisan" hp="190" fuel="diesel"
This way I can read the objects one at a time.
I know about url.encode & base64, but I'm trying to keep shorter readable lines.
So any suggestions please!
With the huge files, any textual data formats, specially with the markup data like JSON, YAML or XML, is not a very nice solution.
I can suggest to use a universal binary format, like Google Protocol Buffers or ASN1.
The Google Protocol Buffers is much easy to get started.
Of course if you just need a Java-To-Java data transferring, you can use java out of the box serialization.
What about reading/writing files in binary format using DataInputStream and DataOutputStream?
Of course, your data must have fixed structure, but as a benefit you'll get smaller file sizes and faster reading/writing.

Java: Telling whether a byte array is a zip file

I have a server written in Java, that in a single request, gets a whole file from the client. The file is passed to the server as a list of bytes, and is finally represented in the java server as a byte array.
Is there some standard way / standard library that could tell whether a file represented by a byte array is a valid zip file?
Files are typically identified using magic numbers in the beginning of the file.
To make an educated guess about a given file Java has built-in method of detecting some file types: Files.probeContentType. Plus, there are various third party libraries: simplemagic or Apache Tika (which supports more than only magic numbers).
But content detection alone won't tell you whether the file is valid. For that, you'd need something that actually knows how to read Zip files, such as Java's ZipFile.
If you want to standard way to implement for this process, you can use serialization API.For that use following articles that found myself while searching about this topic.
Article 1 - javaworld
Article 2 - developer.com
Check Zip4j library. It is really easy to use and the ZipFile class has a isValidZipFile() method
The easiest way is to check the "PK" magic at the beginning of the byte array.
Something like this:
"PK".equals(new String(array, 0,2))

Read .csv file in GB's using Java

I have following two requirements:
To read a CSV file and put rows line by line into the database (RDSMS) without any data manipulation.
To read a CSV file and put this data into the database (RDBMS). In this case, row Z might be dependent on row B. So need to have a staging DB (in-memory or another a staging RDBMS)
I am analyzing multiple ways to accomplish this:
Using Core java, and read file in Producer-consumer way.
Using Apache Camel and BeanIO to read the csv file.
Using SQL to read the file.
Wanted to know, if is there an already industry defined preferred way to do such kind of tasks?
I found few links on stackoverflow, but I am looking for more options:
How to read a large text file line by line using Java?
Read a huge file of numbers in Java in a memory-efficient way?
Read large CSV in java
I am using Java6 for implementation.
you should use NIO package to do such stuff in GBs. NIO is asynchronous, fastest till date and most reliable. you can simple read files in chunks via NIO packaging and then insert into db using bulk commands rather than single insertion. Single insertion take lot of your CPU cycles and may cause OOM errors.
We are using Apache Camel's "File:" protocol to read file and process the data.
You can use RandomAccessFile for reading csv file, it gives you fast enough read speed, it does not requires any extra jar file, here is code,
File f=new File(System.getProperty("user.home")+"/Desktop/CSVDOC1.csv");
RandomAccessFile ra = new RandomAccessFile(f,"rw");
ra.seek(0);//Read from start
long p=ra.getFilePointer();
String d= ra.readLine();
ra.seek(p);
while(d!=null){
//Each line data stored in variable d
d=ra.readLine();
//d="col1","col2","col2","col3"
//Separate line data by separator ","
//insert row values into database
}
//Release file lock
ra.close();

Parse through text files and write into CSV

I have about 100 different text files in the same format. Each file has observations about certain events at certain time periods for a particular person. I am trying to parse out the relevant information for a few individuals for each of the different text files. Ideally, I want to get parse through all this information and create a CSV file (eventually to be imported into Excel) with all the information I need from ALL the text files. Any help/ideas would be greatly appreciated. I would prefer to use java...but any simpler methods are appreciated.
The log files are structured as below: changed data to preserve private information
|||ID||NAME||TIME||FIRSTMEASURE^DATA||SECONDMEASURE^DATA|| etc...
TIME appears like 20110825111501 for 2011, 08/25 11:15:01 AM
Here are the steps in Java:
Open the file using FileReader
You could also wrapped the FileReader with BufferedReader and use readLine() method to read the file line by line
For each line you need to parse it. You know best the data definition of each line, to help you might be able to use various String functions or Java Regex
You could do the same thing for the Date. Check if you could utilize DateFormat
Once you parse the data, you could start building your CSV File using CSVParser mentioned above or write it your own using FileOutputStream
When you are ready to to convert to Excel, you could use Apache POI for Excel
Let me know if you need further clarification
Just to parse through the text file and use CSVParser from apache to write to a csv file. Additionally if you want to write in to excel, you can simply use Apache POI or JXL for that.
You can use SuperCSV to parse the file into a bean and
also to create a csv-file.

Java Parsing Framework for complex CSV files

I need to parse complex (non fixed length) csv files to Java objects in order to compare its values.
I first tried the Flatform Parsing Framework, i liked the approach of describing the values in an extra (xml) document. Maybe it's the right tool for simple csv (and also flat) files. Nevertheless my csv files contains lines that vary in quantity of fields - sometimes they span across multiple lines. There are also dependencies among those fields.
Here's a little sample: (each type has a certain amount of extra parameters)
; <COMMENTS (to be ignored)>
<NAME>,<TYPE_A>,<DESCRIPTION>,<PARAMETER>
<NAME>,<TYPE_B>,<DESCRIPTION>,<PARAMETER>,<PARAMETER>
<NAME>,<TYPE_C>,<DESCRIPTION>,<PARAMETER>,<PARAMETER>,<PARAMETER>,<PARAMETER>
<NAME>,<TYPE_D>,<DESCRIPTION>,<PARAMETER>,<PARAMETER>,<PARAMETER>,<PARAMETER>, -
<PARAMETER>,<PARAMETER>, -
<PARAMETER>,<PARAMETER>
<NAME>,<TYPE_B>,<DESCRIPTION>,<PARAMETER>,<PARAMETER>
<NAME>,<TYPE_A>,<DESCRIPTION>,<PARAMETER>
So i need something to describe and parse the csv file in a more complex manner. I'm new to this, I've heard about parser generator - is that what I need?
Try OpenCSV (see http://opencsv.sourceforge.net/#what-features). It handles embedded carriage returns just fine.
One option is to use the Scanner class or you might want to check out the Spring Batch. Ive never actually used SB but given batch jobs often read from simple text files i believe i read it caters for this including all sorts of object mapping.
You may also try japaki

Categories

Resources