Incremental streaming JSON library for Java

Incremental streaming JSON library for Java - java

Can anyone recommend a JSON library for Java which allows me to give it chunks of data as they come in, in a non-blocking fashion? I have read through A better Java JSON library and similar questions, and haven't found precisely what I'd like.
Essentially, what I'd like is a library which allows me to do something like the following:
String jsonString1 = "{ \"A broken";
String jsonString2 = " json object\" : true }";
JSONParser p = new JSONParser(...);
p.parse(jsonString1);
p.isComplete(); // returns false
p.parse(jsonString2);
p.isComplete(); // returns true
Object o = p.getResult();
Notice the actual key name ("A broken json object") is split between pieces.
The closest I've found is this async-json-library which does almost exactly what I'd like, except it cannot recover objects where actual strings or other data values are split between pieces.

There are a few blocking streaming/incemental JSON parsers (as per Is there a streaming API for JSON?); but for async nothing yet that I am aware of.
The lib you refer to seems badly named; it does not seem to do real asynchronous processing, but merely allow one to parse sequence of JSON documents (which multiple other libs allow doing as well)
If there were people who really wanted this, writing one is not impossible -- for XML there is Aalto, and handling JSON is quite a bit simpler than XML.
For what it is worth, there is actually this feature request to add non-blocking parsing mode for Jackson; but very few users have expressed interest in getting that done (via voting for the feature request).
EDIT: (2016-01) while not async, Jackson ObjectMapper allows for convenient sub-tree by sub-tree binding of parts of the stream as well -- see ObjectReader.readValues() (ObjectReader created from ObjectMapper), or short-cut versions of ObjectMapper.readValues(...). Note the trailing s in there, which implies a stream of Objects, not just a single one.

Google Gson can incrementally parse Json from an InputStream
https://sites.google.com/site/gson/streaming

I wrote such a parser: JsonParser.java. See examples how to use it:JsonParserTest.java.

Related

Should I parse json string to json object or manipulate the string directly

normally I parse a json string to json object instead of manipulating the json string directly. for example, a json string like
{"number": "1234567"}
if I have to add 000 at the end
...
{...,"number" : "1234567000",...}
....
I will use jackson either parse it as Json Object or POJO
I understand readability perspective parsing to Json object or POJO is much better, but I'm curious about the performance. In this case, if I manipulate the json string directly, I have to use regex to extract the number attribute, and add 000 at the end, which is much more expensive than parsing to Json Object if having lots of data? because string object basically creates a new string object?
EDIT:
Based on #Itai Steinherz's link I also make a benchmark in JS, and it shows json parse is better
https://jsbench.me/93jr1w6k5b/1

Since I'm not very familiar with JSON parsing/manipulation in Java, I'll compare the same operations in JavaScript (which I am more experienced in).
Comparing using a basic regex with .replace and using JSON.parse & JSON.stringify, the result are that using JSON.parse is slower by a small percentage (4.37% to be precise).
However, I don't think the perf gain is worth it, and I would always go with more readable and maintainable code (the JSON.parse approach) rather than the more performant (the .replace approach).
See the complete benchmark I used here.

Load a Perl Hash into Java

I have a big .pm File, which only consist of a very big Perl hash with lots of subhashes. I have to load this hash into a Java program, do some work and changes on the data lying below and save it back into a .pm File, which should look similar to the one i started with.
By now, i tried to convert it linewise by regex and string matching, converting it into a XML Document and later Elementwise parse it back into a perl hash.
This somehow works, but seems quite dodgy. Is there any more reliable way to parse the perl hash without having a perl runtime installed?

You're quite right, it's utterly filthy. Regex and string for XML in the first place is a horrible idea, and honestly XML is probably not a good fit for this anyway.
I would suggest that you consider JSON. I would be stunned to find java can't handle JSON and it's inherently a hash-and-array oriented data structure.
So you can quite literally:
use JSON;
print to_json ( $data_structure, { pretty => 1 } );
Note - it won't work for serialising objects, but for perl hash/array/scalar type structures it'll work just fine.
You can then import it back into perl using:
my $new_data = from_json $string;
print Dumper $new_data;
Either Dumper it to a file, but given you requirement is multi-language going forward, just using native JSON as your 'at rest' data is probably a more sensible choice.
But if you're looking at parsing perl code within java, without a perl interpreter? No, that's just insanity.

How to write/read binary files that represent objects?

I'm new to Java programming, and I ran into this problem:
I'm creating a program that reads a .csv file, converts its lines into objects and then manipulate these objects.
Being more specific, the application reads every line giving it an index and also reads certain values from those lines and stores them in TRIE trees.
The application then can read indexes from the values stored in the trees and then retrieve the full information of the corresponding line.
My problem is that, even though I've been researching the last couple of days, I don't know how to write these structures in binary files, nor how to read them.
I want to write the lines (with their indexes) in a binary indexed file and read only the exact index that I retrieved from the TRIEs.
For the tree writing, I was looking for something like this (in C)
fwrite(tree, sizeof(struct TrieTree), 1, file)
For the "binary indexed file", I was thinking on writing objects like the TRIEs, and maybe reading each object until I've read enough to reach the corresponding index, but this probably wouldn't be very efficient.
Recapitulating, I need help in writing and reading objects in binary files and solutions on how to create an indexed file.

I think you are (for starters) best off when trying to do this with serialization.
Here is just one example from stackoverflow: What is object serialization?
(I think copy&paste of the code does not make sense, please follow the link to read)
Admittedly this does not yet solve your index creation problem.

Here is an alternative to Java native serialization, Google Protocol Buffers.
I am going to write direct quotes from documentation mostly in this answer, so be sure to follow the link at the end of answer if you are interested into more details.
What is it:
Protocol buffers are Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data – think XML, but smaller, faster, and simpler.
In other words, you can serialize your structures in Java and deserialize at .net, pyhton etc. This you don't have in java native Serialization.
Performance:
This may vary according to use case but in principle GPB should be faster, as its built with performance and interchangeability in mind.
Here is stack overflow link discussing Java native vs GPB:
High performance serialization: Java vs Google Protocol Buffers vs ...?
How does it work:
You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Here's a very basic example of a .proto file that defines a message containing information about a person:
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes. These provide simple accessors for each field (like name() and set_name()) as well as methods to serialize/parse the whole structure to/from raw bytes.
You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages. You might then write some code like this:
Person john = Person.newBuilder()
.setId(1234)
.setName("John Doe")
.setEmail("jdoe#example.com")
.build();
output = new FileOutputStream(args[0]);
john.writeTo(output);
Read all about it here:
https://developers.google.com/protocol-buffers/
You could look at GPB as an alternative format to XSD describing XML structures, just more compact and with faster serialization.

Java Serialization to transfer data between any language

Question:
Instead of writing my own serialization algorithm; would it be possible to just use the built in Java serialization, like I have done below, while still having it work across multiple languages?
Explanation:
How I imagine it working, would be as follows: I start up a process, that will be be a language-specific program - written in that language. So I'd have a CppExecutor.exe file, for example. I would write data to a stream to this program. The program would then do what it needs to do, then return a result.
To do this, I would need to serialize the data in some way. The first thing that came to mind was the basic Java Serialization with the use of an ObjectInputStream and ObjectOutputStream. Most of what I have read has only stated that the Java serialization is Java-to-Java applications.
None of the data will ever need to be stored in a file. The method of transferring these packets would be through a java.lang.Process, which I have set up already.
The data will be composed of the following:
String - Mostly containing information that is displayed to the user.
Integer - most likely 32-bit. Won't need to deal with times.
Float- just to handle all floating-point values.
Character - to ensure proper types are used.
Array - Composed of any of the elements in this list.
The best way I have worked out how to do this is as follows: I would start with a 4-byte magic number - just to ensure we are working with the correct data. Following, I would have an integer specifying how many elements there are. After that, for each of the elements I would have: a single byte, signifying the data type (of the above), following by any crucial information, e.x: length for the String and Array. Then, the data that follows.
Side-notes:
I would also like to point out that a lot of these calculations will be taking place, where every millisecond could matter. Due to this, a text-based format (such as JSON) may produce far larger operation times. Considering that non of the packets would need to be interpreted by a human, using only bytes wouldn't be an issue.

I'd recommend Google protobuf: it is binary, stable, proven, and has bindings for all languages you've mentioned. Moreover, it also handles structured data nicely.

There is a binary json format called bson.
I would also like to point out that a lot of these calculations will be taking place, so a text-based format (such as JSON) may produce far larger operation times.
Do not optimize before you measured.
Premature optimization is the root of all evil.
Can you have a try and benchmark the throughput? See if it fits your needs?

Thrift,Protobuf,JSON,MessagePack
complexity of installation Thrift >> Protobuf > BSON > MessagePack > JSON
serialization data size JSON > MessagePack > Binary Thrift > Compact Thrift > Protobuf
time cost Compact Thrift > Binary Thrift > Protobuf > JSON > MessagePack

How to parse JSON array with no object name

How would I parse this JSON array in Java? I'm confused because there is no object. Thanks!
EDIT: I'm an idiot! I should have read the documentation... that's probably what it's there for...
[
{
"id":"63565",
"name":"Buca di Beppo",
"user":null,
"phone":"(408)377-7722",
"address":"1875 S Bascom Ave Campbell, California, United States",
"gps_lat":"37.28967000",
"gps_long":"-121.93179700",
"monhh":"",
"tuehh":"",
"wedhh":"",
"thuhh":"",
"frihh":"",
"sathh":"",
"sunhh":"",
"monhrs":"",
"tuehrs":"",
"wedhrs":"",
"thuhrs":"",
"frihrs":"",
"sathrs":"",
"sunhrs":"",
"monspecials":"",
"tuespecials":"",
"wedspecials":"",
"thuspecials":"",
"frispecials":"",
"satspecials":"",
"sunspecials":"",
"description":"",
"source":"ripper",
"worldsbarsname":"BucadiBeppo31",
"url":"www.bucadebeppo.com",
"maybeDupe":"no",
"coupontext":"",
"couponimage":"0",
"distance":"1.00317",
"images":[
0
]
}
]

It is perfectly valid JSON. It is an array containing one object.
In JSON, arrays and objects don't have names. Only attributes of objects have names.
This is all described clearly by the JSON syntax diagrams at http://json.org. (FWIW, the site has translations in a number of languages ...)
How do you parse it? There are many libraries for parsing JSON. Many of them are linked from the site above. I suggest you use one of those rather than writing your own parsing code.
In response to this comment:
OTOH, writing your own parser is a reasonable project, and a good exercise for both learning JSON and learning Java (or whatever language). A reasonable parser can be written in about 500 lines of text.
In my opinion (having written MANY parsers in my time), writing a parser for a language is a very inefficient way to gain a working understanding the syntax of a language. And depending on how you implement the parser (and the nature of the language syntax specification) you can easily get an incorrect understanding.
A better approach is to read the language's syntax specification, which the OP has now done, and which you would have to do in order to implement a parser.
Writing a parser can be a good learning exercising, but it is really a learning exercise in writing parsers. Even then, you need to pick an appropriate implementation approach, and an appropriate language to be parsed.

It's an array containing one element. That element is an object. The object (dictionary) contains about 20 name/value pairs.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.