In the profiler I am writing, which is in fact a JVMTI agent for Java programs, I need a format to log the events collected. Further these logs have to be send to a socket and read by a GUI somewhere else. So I need a working serialization between two languages.
I already implemented my own protocol in XML and it worked very well. However I was told to consider another format. As XML building might be very slow and every additional code executed in the profiler influences heavily the profiled program. This is true, but does XML DOM Building take that long?
I used TinyXML so far. I hope no one points to RapidXML, as I hope there are not that different on a not-embedded machine.
What do you think? Currently I am trying to reimplement it with protobuf, which claims to be n times faster then XML.
I have a design I am working on for all log file in my remit. I record data in JSON but the JSON data is nested in a very simple xml format.
eg
<entry ts="2011-02-23T17:18:19.202" level="trc_1" typ="trace">New Message Received</entry>
<entry ts="2011-02-23T17:18:19.202" level="trace" typ="msg"><data>{"Name":"AgtConf","AgtId":1111,...}</data></entry>
That way I can easily separate out data and logging, but keep the logging directory from being complicated. Also saves having to write my own parser for a custom format. However given your situation I recommend using JSON only given that you are basically using to serialise. JSON is very much human-readable when it is formatted correctly, it can be very concise, and there are stable parsers for it.
my first choice is always the traditional txt file.
you can append new entries at the end of file (bottom)
Related
We are using Rest using Jersey. There are few scenarios where server(WAS 8.5) sends large amount of data to client, which is RCP application. In some cases data is so huge(150MB) in xml format that client gets an OutOfMemoryError exception.
I have below questions
How much size is increased when java object is converted in xml?
How we can send large java object to client and still use rest calls?
1) Tough question to answer without seeing the XML schema, I've seen well designed schemas that result in tight, lean XML, and others that are a mess and very bloated. To test it write some test code that serializes your Java objects to a byte[] and compare it's size to the XML payload you currently produce.
2) Might be worth looking into a chunking process, 150MB is pretty large for a single payload. Also are you using GZIP compression for this already? Also may be worth looking at Fast Infoset. Basically it's a binary encoding for XML that generally helps reduce the size of an XML Document.
I'm currently working on a project that requires me to split an XML. For example here is a sample:
<Lakes>
<Lake>
<id>1</id>
<Name>Caspian</Name>
<Type>Natyral</Type>
</Lake>
<Lake>
<id>2</id>
<Name>Moreo</Name>
<Type>Glacial</Type>
</Lake>
<Lake>
<id>3</id>
<Name>Sina</Name>
<Type>Artificial</Type>
</Lake>
</Lakes>
Now in my java code ideally what would happen is it will split the XML into 3 small ones for this example and send each of them out using a messenger service. The code for the messenger service is not important. I have that done already.
So for example the code would run, split the first part into this:
<Lakes>
<Lake>
<id>1</id>
<Name>Caspian</Name>
<Type>Natyral</Type>
</Lake>
</Lakes>
and then the java code would send this out in a message. It would then move on to the next part, send that out etc etc until it reaches the end of the big XML. This can be done through an XSLT or through java it doesn't matter. Any ideas?
To make it clear, I pretty much know how to break up a file using XSLT but I don't know how to break it up and send each part individually one at a time. I also don't want to store anything locally so they would ideally all get transferred into strings and sent out.
If the way you have to chunk your files is fixed and known, the easiest solution is to use SAX or StAX to do it programmatically. I personally prefer StAX for this kind of task as the code is generally cleaner and easier to understand but SAX will do the job equally well.
XSLT is a great tool but its main drawback is that it can only produce one output. And apart from a few exceptions XSLT engines don't support streaming processing, so if the initial file is too big to fit in memory, you can't use them.
Update: In XSLT 2.0 <xsl:result-document> can be used to produce multiple output files, but if you want to get your chunks one by one and not store them in files, it's not ideal.
I would stream the XML (instead of building a DOM tree in memory) and cut the chunks out on the go. Whenever you meet a Lake tag, start copying the content into a buffer which you will send and reset when the final tag </Lake> is met.
EDIT Have a look at this link to know more about XML streaming in Java
For my project, I need to store info about protocols (the data sent (most likely integers) and in the order it's sent) and info that might be formatted something like this:
'ID' 'STRING' 'ADDITIONAL INTEGER DATA'
This info will be read by a Java program and stored in memory for processing, but I don't know what would be the most sensible format to store this data in?
EDIT: Here's some extra information:
1)I will be using this data in a game server.
2)Since it is a game server, speed is not the primary concern, since this data will primary be read and utilized during startup, which shouldn't occur very often.
3)Memory consumption I would like to keep at a minimum, however.
4)The second data "example" will be used as a "dictionary" to look up names of specific in-game items, their stats and other integer data (and therefore might become very large, unlike the first data containing the protocol information, where each file will only note small protocol bites, like a login protocol for instance).
5)And yes, I would like the data to be "human-editable".
EDIT 2: Here's the choices that I've made:
JSON - For the protocol descriptions
CSV - For the dictionaries
There are many factors that could come to weigh--here are things that might help you figure this out:
1) Speed/memory usage: If the data needs to load very quickly or is very large, you'll probably want to consider rolling your own binary format.
2) Portability/compatibility: Balanced against #1 is the consideration that you might want to use the data elsewhere, with programs that won't read a custom binary format. In this case, your heavy hitters are probably going to be CSV, dBase, XML, and my personal favorite, JSON.
3) Simplicity: Delimited formats like CSV are easy to read, write, and edit by hand. Either use double-quoting with proper escaping or choose a delimiter that will not appear in the data.
If you could post more info about your situation and how important these factors are, we might be able to guide you further.
How about XML, JSON or CSV ?
I've written a similar protocol-specification using XML. (Available here.)
I think it is a good match, since it captures the hierarchal nature of specifying messages / network packages / fields etc. Order of fields are well defined and so on.
I even wrote a code-generator that generated the message sending / receiving classes with methods for each message type in XSLT.
The only drawback as I see it is the verbosity. If you have a really simple structure of the specification, I would suggest you use some simple home-brewed format and write a parser for it using a parser-generator of your choice.
In addition to the formats suggested by others here (CSV, XML, JSON, etc.) you might consider storing the info in a Java properties file. (See the java.util.Properties class.) The code is already there for you, so all you have to figure out is the properties names (or name prefixes) you want to use.
The Properties class also provides for storing/loading properties in a simple XML format.
I am writing a server in java that allows clients to play a game similar to 20 questions. The game itself is basically a binary tree with nodes that are questions about an object and leaves that are guesses at the object's identity. When the game guesses wrong it needs to be able to get the right answer from the player and add it to the tree. This data is then saved to a random access file.
The question is: How do you go about representing a tree within a file so that the data can be reaccessed as a tree at a later time.
If you know where I can find information on keeping data structures like trees organized as such when writing/reading to files then please link it. Thanks a lot.
Thanks for the quick answers everyone. This is a school project so it has some odd requirements like using random access files and telnet.
This data is then saved to a random access file.
That's the hard way to solve your problem (the "random access" bit, I mean).
The problem you are really trying to solve is how to persist a "complicated" data structure. In fact, there are a number of ways that this can be done. Here are some of them ...
Use Java persistence. This is simple to implement; make sure that your data structure is serializable, and then its just a few lines of code to serialize and few more lines to deserialize. The downsides are:
Serialized objects can be fragile in the face of code changes.
Serialization is not incremental. You write/read the whole graph each time.
If you have multiple separate serialized graphs, you need some scheme to name and manage them.
Use XML. This is more work to implement than Java persistence, but it has the advantage of being less fragile. And if something does go wrong, there's a chance you can fix it with XSLT or a text editor. (There are XML "binding" libraries that eliminate a lot of the glue coding.)
Use an SQL database. This addresses all of the downsides of Java persistence, but involves more coding ... and using a different computational model to access the persistent data (query versus graph navigation).
Use a database and an Object Relational Mapping technology; e.g. a JPA or JDO implementation. (Hibernate is a popular choice). These bridge between the database and in-memory views of data in a more or less transparent fashion, and avoids a lot of the glue code that you need to write in the SQL database and XML cases.
I think you're looking for serialization. Try this:
http://java.sun.com/developer/technicalArticles/Programming/serialization/
As mentioned, serialization is what you are looking for. It allows you to write an object to a file, and read it back later with minimal effort. The file will automatically be read back in as your object type. This makes things much easier than trying to store the object yourself using XML.
Java serialization has some pitfalls (like when you update your class). I would serialize in a text format. Json is my first choice here but xml and yaml would work as well.
This way you would have a file that doesn't rely on the binary version of your class.
There are several java libraries: http://www.json.org
Some examples:
http://code.google.com/p/json-simple/wiki/DecodingExamples
http://code.google.com/p/json-simple/wiki/EncodingExamples
And to save and read from the file you can use the Commons Io:
import org.apache.commons.io.FileUtis;
import java.io.File;
...
File dataFile = new File("yourfile.json");
String data = FileUtils.readFileToString(dataFile);
FileUtils.writeStringToFile(dataFile, content);
I'm trying to find the best way to save the state of a simple application.
From a DB point-of-view there are 4/5 tables with date fields and relationships off course.
Because the app is simple, and I want the user to have the option of moving the data around (usb pen, dropbox, etc), I wanted to put all data in a single file.
What is the best way/lib to do this?
XML usually is the best format for this (readability & openness), but I haven't found any great lib for this without doing SAX/DOM.
If you want to use XML, take a look at XStream for simple serialization of Java objects into XML. Here is "Two minute tutorial".
If you want something simple, standard Java Properties format can be also a way to store/load some small data.
consider using plain JAXB annotations that come with the JDK:
#XmlRootElement
private class Foo {
#XmlAttribute
private String text = "bar";
}
here's a blog-post of mine that gives more details on this simple usage of JAXB (it also mentiones a more "classy" JAXB-based approach -- in case you need better control over your XML schema, e.g. to guarantee backwards compatibility)
2 other options you might consider -
Hsqldb is a small sql db written in
java. More relevant for your
purposes, it can be configured to
simply write to a csv file as it's
data store, so you could conceivably
use it's text output as a portable
datastore and still use sql, if
that's what you prefer.
A second option might be to write the
datastore directly to a serialized
file either directly or through a
library like prevayler. Very good
performance and simple to implement,
cons are the fragility and opacity of
the format.
But if the data is small enough, xml is probably much less bother.
If you don't need to provide semantic meaning to your data then XML is probably a wrong choice. I would recommend using the fat-free alternative JSON, which is much more naturally built for data structures.