Generate and parse text files in Java - java

I'm looking for a library/framework to generate/parse TXT files from/into Java objects.
I'm thinking in something like Castor or JAXB, where the mapping between the file and the objects can be defined programmatically or with XML/annotations. The TXT file is not homogeneous and has no separators (fixed positions). The size of the file is not big, therefore DOM-like handling is allowed, no streaming required.
For instance:
TextWriter.write(Collection objects) -> FileOutputStream
TextReader.read(FileInputStream fis) -> Collection

I suggest you use google's protocol buffers
Protocol buffers are a flexible, efficient, automated mechanism for
serializing structured data – think XML, but smaller, faster, and
simpler. You define how you want your data to be structured once, then
you can use special generated source code to easily write and read
your structured data to and from a variety of data streams and using a
variety of languages. You can even update your data structure without
breaking deployed programs that are compiled against the "old" format.
Protobuf messages can be exported/read in binary or text format.
Other solutions would depend on what you call text file : if base64 is texty enough for you, you could simply use java standard serialization with base64 encoding of the binary stream.

You can do this using Jackson serialize to JSON and back
http://jackson.codehaus.org/

Just generate and parse it with XML or JSON formats, there's a whole load of libraries out there that will do all the work for you.

Related

Simplest format to read/write huge files

I need to write huge files ( more than 1 million lines) and send the file to a different machine where I need to read it with a Java BufferedReader, one line at a time.
I was using indetned Json format but it turned out to be not very handy,
it requires too much coding and that consumes extra RAM/CPU.
I'm looking for something that looks like this:
client:id="1" name="jack" adress="House N°1\nCity N°3 \n Country 1" age="20"
client:id="2" name="alice" adress="House N°2\nCity N°5 \n Country 2" age="30"
vihecul:id="1" model="ford" hp="250" fuel="diesel"
vihecul:id="2" model="nisan" hp="190" fuel="diesel"
This way I can read the objects one at a time.
I know about url.encode & base64, but I'm trying to keep shorter readable lines.
So any suggestions please!
With the huge files, any textual data formats, specially with the markup data like JSON, YAML or XML, is not a very nice solution.
I can suggest to use a universal binary format, like Google Protocol Buffers or ASN1.
The Google Protocol Buffers is much easy to get started.
Of course if you just need a Java-To-Java data transferring, you can use java out of the box serialization.
What about reading/writing files in binary format using DataInputStream and DataOutputStream?
Of course, your data must have fixed structure, but as a benefit you'll get smaller file sizes and faster reading/writing.

Thrift - converting from simple JSON

I created the following Thrift Object:
struct Student{
1: string id;
2: string firstName;
3: string lastName
}
Now I would like to read this object from JSON. According to this post this is possible
So I wrote the following code:
String json = "{\"id\":\"aaa\",\"firstName\":\"Danny\",\"lastName\":\"Lesnik\"}";
StudentThriftObject s = new StudentThriftObject();
byte[] jsonAsByte = json.getBytes("UTF-8");
TMemoryBuffer memBuffer = new TMemoryBuffer(jsonAsByte.length);
memBuffer.write(jsonAsByte);
TProtocol proto = new TJSONProtocol(memBuffer);
s.read(proto);
What I'm getting is the following exception:
Exception in thread "main" org.apache.thrift.protocol.TProtocolException: Unexpected character:i
at org.apache.thrift.protocol.TJSONProtocol.readJSONSyntaxChar(TJSONProtocol.java:322)
at org.apache.thrift.protocol.TJSONProtocol.readJSONInteger(TJSONProtocol.java:698)
at org.apache.thrift.protocol.TJSONProtocol.readFieldBegin(TJSONProtocol.java:837)
at com.vanilla.thrift.example.entities.StudentThriftObject$StudentThriftObjectStandardScheme.read(StudentThriftObject.java:486)
at com.vanilla.thrift.example.entities.StudentThriftObject$StudentThriftObjectStandardScheme.read(StudentThriftObject.java:479)
at com.vanilla.thrift.example.entities.StudentThriftObject.read(StudentThriftObject.java:413)
at com.vanilla.thrift.controller.Main.main(Main.java:24)
Am I missing something?
You are missing the fact, that Thrift's JSON is different from yours. The field names are not written, instead the assigned field ID numbers are written (and expected). Here's an example for Thrift's JSON protocol:
[1,"MyService",2,1,{"1":{"rec":{"1":{"str":"Error: Process() failed"}}}}]
In other words, Thrift is not intended to parse any kind of JSON. It supports a very specific JSON format as one of the possible transports.
However, depending on what the origin of your JSON data is, Thrift can possibly still help you out, if you are able to use it on both sides. In that case, write an IDL to describe the data structures, feed it to the Thrift compiler and integrate both the generated code and the neccessary parts of the library with your projects.
If the origin of the JSON lies outside of your reach, or if the JSON format cannot be changed for some reason, you need to find another way.
Format and semantics are different beasts
To some extent, the whole issue can be compared with XML: There is one general XML syntax, which tells us how we have to fomat things so any standard conformant XML processor can read them.
But knowing the rules of XML is only half the answer, if we get a certain XML file from someone. Even if our XML parser can read the file successfully, because it is well-formed XML, we need to know the semantics of the data to really make use of what's within that file: Is it a customer data record? Or is it a SOAP envelope? Maybe a configuration file?
That is where DTDs or XML Schema come into play, they exist to describe the contents of the XML data. Without knowing the logical structure you are lost, because there are myriads of possible ways to express things in XML. And exactly the same is true with JSON, except that JSON schema descriptions are less commonly used.
"So you mean, we need just a way to tell Thrift how the JSON is organized?"
No, because the purpose and idea behind Thrift is to have a framework to de/serialize things and/or implement RPC servers and clients as efficiently as possible. It is not intended to have a general purpose file parser. Instead, Thrift reads and speaks only its own set of formats, which are plugged into the architecture as protocols: Thrift Binary, Thrift JSON, Thrift Compact, and a few more.
What you could do: In addition to what I said at in the first section of my answer, you may consider writing your own custom Thrift protocol implementation to support your particular JSON format of choice. It is not that hard, and worth a try.

Best file format regarding standard string and integer data?

For my project, I need to store info about protocols (the data sent (most likely integers) and in the order it's sent) and info that might be formatted something like this:
'ID' 'STRING' 'ADDITIONAL INTEGER DATA'
This info will be read by a Java program and stored in memory for processing, but I don't know what would be the most sensible format to store this data in?
EDIT: Here's some extra information:
1)I will be using this data in a game server.
2)Since it is a game server, speed is not the primary concern, since this data will primary be read and utilized during startup, which shouldn't occur very often.
3)Memory consumption I would like to keep at a minimum, however.
4)The second data "example" will be used as a "dictionary" to look up names of specific in-game items, their stats and other integer data (and therefore might become very large, unlike the first data containing the protocol information, where each file will only note small protocol bites, like a login protocol for instance).
5)And yes, I would like the data to be "human-editable".
EDIT 2: Here's the choices that I've made:
JSON - For the protocol descriptions
CSV - For the dictionaries
There are many factors that could come to weigh--here are things that might help you figure this out:
1) Speed/memory usage: If the data needs to load very quickly or is very large, you'll probably want to consider rolling your own binary format.
2) Portability/compatibility: Balanced against #1 is the consideration that you might want to use the data elsewhere, with programs that won't read a custom binary format. In this case, your heavy hitters are probably going to be CSV, dBase, XML, and my personal favorite, JSON.
3) Simplicity: Delimited formats like CSV are easy to read, write, and edit by hand. Either use double-quoting with proper escaping or choose a delimiter that will not appear in the data.
If you could post more info about your situation and how important these factors are, we might be able to guide you further.
How about XML, JSON or CSV ?
I've written a similar protocol-specification using XML. (Available here.)
I think it is a good match, since it captures the hierarchal nature of specifying messages / network packages / fields etc. Order of fields are well defined and so on.
I even wrote a code-generator that generated the message sending / receiving classes with methods for each message type in XSLT.
The only drawback as I see it is the verbosity. If you have a really simple structure of the specification, I would suggest you use some simple home-brewed format and write a parser for it using a parser-generator of your choice.
In addition to the formats suggested by others here (CSV, XML, JSON, etc.) you might consider storing the info in a Java properties file. (See the java.util.Properties class.) The code is already there for you, so all you have to figure out is the properties names (or name prefixes) you want to use.
The Properties class also provides for storing/loading properties in a simple XML format.

How to deserialize in PHP an object serialized in Java

Is there any way to deserialize in PHP an object serialized in Java? IE If I have a Java class that implements Serialization and I use an ObjectOutputStream to write the object, and convert the result to a string, is there a way in PHP to take that string and create a similar object representation from it?
What does the Java Serialized data look like?
Response:
���sr�com.site.entity.SessionV3Data���������xpsr�java.util.HashMap���`��F�
loadFactorI� thresholdxp?#�����w������t� sessionIdt�0NmViMzUxYWItZDRmZC00MWY4LWFlMmUtZjg2YmZjZGUxNjg5xx
:)
I would heavily recommend you don't do this. Java serialization is meant for a Java instance to both save and load the data (for either transmission to another Java application or persistence between invocations of the same application). It was not at all meant to be a cross-platform protocol.
I would advise you to make an API adapter layer between the two. Output the contents of your Java object to a format you can work with in PHP, be it XML, YAML, or even a binary format (where you could use DataOutputStream).
What is the easiest way to eat soup with chopsticks when the soup was put in a bowl with a ladle? Put the soup in a cup and discard your chopsticks, because chopsticks are a poor choice for aiding in the consumption of soup. A cup (ubiquitous) eliminates external dependencies except for "mouth" and "opposable thumbs", both of which come with the standard library of humans.
A more elegant solution would be to encode that Java object with a JSON Serializer or XML serializer. Protocol Buffers or any other intentionally cross-language serialization technique would work fine plus Protocol Buffers can efficiently encode binary data.
Some time ago i did something simillar. However i didn't make PHP read "Java serialize" format. I did the oposite, that is, made Java serialize itself to a "PHP serialize" format. This is actually quite easy. Have look at PHPSerializedResponseWriter class that is a part of Solr package:
https://github.com/terrancesnyder/solr-analytics/blob/master/solr/core/src/java/org/apache/solr/response/PHPSerializedResponseWriter.java
...then all you have to do is just read the string and call:
$result = unserialize($string);
From comments in the online PHP manual, there is a Java class that serializes to the PHP serialization format that you can look into. Then you can unserialize the data using the standard PHP functionality.
Is it possible to use one of the more common cross platform data formats like JSON to communicate between your Java app and PHP? PHP has plenty of parsers for those formats. Check out json_decode for an example.
Is there any way to deserialize in PHP
an object serialized in Java?
Yes. The question is, should you? Exporting the Java object as XML or JSON probably makes more sense.
The following SO question might also help.
Dynamically create PHP object based on string

Reading a Java Object in PHP from a file created with ObjectOutputStream

I'm trying to read a file that was created in a Java-based game using ObjectOutputStream in PHP. The data is a serialized object written in a binary format.
I've been using fopen and fread to get the binary data, but I have absolutely no idea what to do with it.
PHP doesn't understand Java. Both do however understand a common format like JSON, XML, CSV, etc. I'd suggest to change the format to either of them and use that as data transfer format instead.
In case of JSON, you can in Java use Google Gson to convert (encode) fullworthy javabeans into JSON flavor and in PHP you can use json_decode() to convert (decode) it into an associative PHP array.
It doesn't seem easy to reimplement http://download.oracle.com/javase/6/docs/platform/serialization/spec/protocol.html
You can't do it so easily (unless an existing framework is available). This because the binary format used by Java serialization is highly specialized to the JVM, think that there's not guaranteed compatibility even between different JVM versions.
You should use a different approach, for example using XML, YAML or JSON..

Categories

Resources