Best way to save data in a Java application?

Best way to save data in a Java application? - java

I'm trying to find the best way to save the state of a simple application.
From a DB point-of-view there are 4/5 tables with date fields and relationships off course.
Because the app is simple, and I want the user to have the option of moving the data around (usb pen, dropbox, etc), I wanted to put all data in a single file.
What is the best way/lib to do this?
XML usually is the best format for this (readability & openness), but I haven't found any great lib for this without doing SAX/DOM.

If you want to use XML, take a look at XStream for simple serialization of Java objects into XML. Here is "Two minute tutorial".
If you want something simple, standard Java Properties format can be also a way to store/load some small data.

consider using plain JAXB annotations that come with the JDK:
#XmlRootElement
private class Foo {
#XmlAttribute
private String text = "bar";
}
here's a blog-post of mine that gives more details on this simple usage of JAXB (it also mentiones a more "classy" JAXB-based approach -- in case you need better control over your XML schema, e.g. to guarantee backwards compatibility)

2 other options you might consider -
Hsqldb is a small sql db written in
java. More relevant for your
purposes, it can be configured to
simply write to a csv file as it's
data store, so you could conceivably
use it's text output as a portable
datastore and still use sql, if
that's what you prefer.
A second option might be to write the
datastore directly to a serialized
file either directly or through a
library like prevayler. Very good
performance and simple to implement,
cons are the fragility and opacity of
the format.
But if the data is small enough, xml is probably much less bother.

If you don't need to provide semantic meaning to your data then XML is probably a wrong choice. I would recommend using the fat-free alternative JSON, which is much more naturally built for data structures.

Related

Is it bad practice to create XML files directly without using a class to store the structure? [duplicate]

In the thread What’s your favorite “programmer ignorance” pet peeve?, the following answer appears, with a large amount of upvotes:
Programmers who build XML using string concatenation.
My question is, why is building XML via string concatenation (such as a StringBuilder in C#) bad?
I've done this several times in the past, as it's sometimes the quickest way for me to get from point A to point B when to comes to the data structures/objects I'm working with. So far, I have come up with a few reasons why this isn't the greatest approach, but is there something I'm overlooking? Why should this be avoided?
Probably the biggest reason I can think of is you need to escape your strings manually, and most new programmers (and even some experienced programmers) will forget this. It will work great for them when they test it, but then "randomly" their apps will fail when someone throws an & symbol in their input somewhere. Ok, I'll buy this, but it's really easy to prevent the problem (SecurityElement.Escape to name one).
When I do this, I usually omit the XML declaration (i.e. <?xml version="1.0"?>). Is this harmful?
Performance penalties? If you stick with proper string concatenation (i.e. StringBuilder), is this anything to be concerned about? Presumably, a class like XmlWriter will also need to do a bit of string manipulation...
There are more elegant ways of generating XML, such as using XmlSerializer to automatically serialize/deserialize your classes. Ok sure, I agree. C# has a ton of useful classes for this, but sometimes I don't want to make a class for something really quick, like writing out a log file or something. Is this just me being lazy? If I am doing something "real" this is my preferred approach for dealing w/ XML.

You can end up with invalid XML, but you will not find out until you parse it again - and then it is too late. I learned this the hard way.

I think readability, flexibility and scalability are important factors. Consider the following piece of Linq-to-Xml:
XDocument doc = new XDocument(new XDeclaration("1.0","UTF-8","yes"),
new XElement("products", from p in collection
select new XElement("product",
new XAttribute("guid", p.ProductId),
new XAttribute("title", p.Title),
new XAttribute("version", p.Version))));
Can you find a way to do it easier than this? I can output it to a browser, save it to a document, add attributes/elements in seconds and so on ... just by adding couple lines of code. I can do practically everything with it without much of effort.

Actually, I find the biggest problem with string concatenation is not getting it right the first time, but rather keeping it right during code maintenance. All too often, a perfectly-written piece of XML using string concat is updated to meet a new requirement, and string concat code is just too brittle.
As long as the alternatives were XML serialization and XmlDocument, I could see the simplicity argument in favor of string concat. However, ever since XDocument et. al., there is just no reason to use string concat to build XML anymore. See Sander's answer for the best way to write XML.
Another benefit of XDocument is that XML is actually a rather complex standard, and most programmers simply do not understand it. I'm currently dealing with a person who sends me "XML", complete with unquoted attribute values, missing end tags, improper case sensitivity, and incorrect escaping. But because IE accepts it (as HTML), it must be right! Sigh... Anyway, the point is that string concatenation lets you write anything, but XDocument will force standards-complying XML.

I wrote a blog entry back in 2006 moaning about XML generated by string concatenation; the simple point is that if an XML document fails to validate (encoding issues, namespace issues and so on) it is not XML and cannot be treated as such.
I have seen multiple problems with XML documents that can be directly attributed to generating XML documents by hand using string concatenation, and nearly always around the correct use of encoding.
Ask yourself this; what character set am I currently encoding my document with ('ascii7', 'ibm850', 'iso-8859-1' etc)? What will happen if I write a UTF-16 string value into an XML document that has been manually declared as 'ibm850'?
Given the richness of the XML support in .NET with XmlDocument and now especially with XDocument, there would have to be a seriously compelling argument for not using these libraries over basic string concatenation IMHO.

I think that the problem is that you aren't watching the xml file as a logical data storage thing, but as a simple textfile where you write strings.
It's obvious that those libraries do string manipulation for you, but reading/writing xml should be something similar to saving datas into a database or something logically similar

If you need trivial XML then it's fine. Its just the maintainability of string concatenation breaks down when the xml becomes larger or more complex. You pay either at development or at maintenance time. The choice is yours always - but history suggests the maintenance is always more costly and thus anything that makes it easier is worthwhile generally.

You need to escape your strings manually. That's right. But is that all? Sure, you can put the XML spec on your desk and double-check every time that you've considered every possible corner-case when you're building an XML string. Or you can use a library that encapsulates this knowledge...

Another point against using string concatenation is that the hierarchical structure of the data is not clear when reading the code. In #Sander's example of Linq-to-XML for example, it's clear to what parent element the "product" element belongs, to what element the "title" attribute applies, etc.

As you said, it's just awkward to build XML correct using string concatenation, especially now you have XML linq that allows for simple construction of an XML graph and will get namespaces, etc correct.
Obviously context and how it is being used matters, such as in the logging example string.Format can be perfectly acceptable.
But too often people ignore these alternatives when working with complex XML graphs and just use a StringBuilder.

The main reason is DRY: Don't Repeat Yourself.
If you use string concat to do XML, you will constantly be repeating the functions that keep your string as a valid XML document. All the validation would be repeated, or not present. Better to rely on a class that is written with XML validation included.

I've always found creating an XML to be more of a chore than reading in one. I've never gotten the hang of serialization - it never seems to work for my classes - and instead of spending a week trying to get it to work, I can create an XML file using strings in a mere fraction of the time and write it out.
And then I load it in using an XMLReader tree. And if the XML file doesn't read as valid, I go back and find the problem within my saving routines and corret it. But until I get a working save/load system, I refuse to perform mission-critical work until I know my tools are solid.
I guess it comes down to programmer preference. Sure, there are different ways of doing things, for sure, but for developing/testing/researching/debugging, this would be fine. However I would also clean up my code and comment it before handing it off to another programmer.
Because regardless of the fact you're using StringBuilder or XMLNodes to save/read your file, if it is all gibberish mess, nobody is going to understand how it works.

Maybe it won't ever happen, but what if your environment switches to XML 2.0 someday? Your string-concatenated XML may or may not be valid in the new environment, but XDocument will almost certainly do the right thing.
Okay, that's a reach, but especially if your not-quite-standards-compliant XML doesn't specify an XML version declaration... just saying.

Best file format regarding standard string and integer data?

For my project, I need to store info about protocols (the data sent (most likely integers) and in the order it's sent) and info that might be formatted something like this:
'ID' 'STRING' 'ADDITIONAL INTEGER DATA'
This info will be read by a Java program and stored in memory for processing, but I don't know what would be the most sensible format to store this data in?
EDIT: Here's some extra information:
1)I will be using this data in a game server.
2)Since it is a game server, speed is not the primary concern, since this data will primary be read and utilized during startup, which shouldn't occur very often.
3)Memory consumption I would like to keep at a minimum, however.
4)The second data "example" will be used as a "dictionary" to look up names of specific in-game items, their stats and other integer data (and therefore might become very large, unlike the first data containing the protocol information, where each file will only note small protocol bites, like a login protocol for instance).
5)And yes, I would like the data to be "human-editable".
EDIT 2: Here's the choices that I've made:
JSON - For the protocol descriptions
CSV - For the dictionaries

There are many factors that could come to weigh--here are things that might help you figure this out:
1) Speed/memory usage: If the data needs to load very quickly or is very large, you'll probably want to consider rolling your own binary format.
2) Portability/compatibility: Balanced against #1 is the consideration that you might want to use the data elsewhere, with programs that won't read a custom binary format. In this case, your heavy hitters are probably going to be CSV, dBase, XML, and my personal favorite, JSON.
3) Simplicity: Delimited formats like CSV are easy to read, write, and edit by hand. Either use double-quoting with proper escaping or choose a delimiter that will not appear in the data.
If you could post more info about your situation and how important these factors are, we might be able to guide you further.

How about XML, JSON or CSV ?

I've written a similar protocol-specification using XML. (Available here.)
I think it is a good match, since it captures the hierarchal nature of specifying messages / network packages / fields etc. Order of fields are well defined and so on.
I even wrote a code-generator that generated the message sending / receiving classes with methods for each message type in XSLT.
The only drawback as I see it is the verbosity. If you have a really simple structure of the specification, I would suggest you use some simple home-brewed format and write a parser for it using a parser-generator of your choice.

In addition to the formats suggested by others here (CSV, XML, JSON, etc.) you might consider storing the info in a Java properties file. (See the java.util.Properties class.) The code is already there for you, so all you have to figure out is the properties names (or name prefixes) you want to use.
The Properties class also provides for storing/loading properties in a simple XML format.

writing data in to files with java

I am writing a server in java that allows clients to play a game similar to 20 questions. The game itself is basically a binary tree with nodes that are questions about an object and leaves that are guesses at the object's identity. When the game guesses wrong it needs to be able to get the right answer from the player and add it to the tree. This data is then saved to a random access file.
The question is: How do you go about representing a tree within a file so that the data can be reaccessed as a tree at a later time.
If you know where I can find information on keeping data structures like trees organized as such when writing/reading to files then please link it. Thanks a lot.
Thanks for the quick answers everyone. This is a school project so it has some odd requirements like using random access files and telnet.

This data is then saved to a random access file.
That's the hard way to solve your problem (the "random access" bit, I mean).
The problem you are really trying to solve is how to persist a "complicated" data structure. In fact, there are a number of ways that this can be done. Here are some of them ...
Use Java persistence. This is simple to implement; make sure that your data structure is serializable, and then its just a few lines of code to serialize and few more lines to deserialize. The downsides are:
Serialized objects can be fragile in the face of code changes.
Serialization is not incremental. You write/read the whole graph each time.
If you have multiple separate serialized graphs, you need some scheme to name and manage them.
Use XML. This is more work to implement than Java persistence, but it has the advantage of being less fragile. And if something does go wrong, there's a chance you can fix it with XSLT or a text editor. (There are XML "binding" libraries that eliminate a lot of the glue coding.)
Use an SQL database. This addresses all of the downsides of Java persistence, but involves more coding ... and using a different computational model to access the persistent data (query versus graph navigation).
Use a database and an Object Relational Mapping technology; e.g. a JPA or JDO implementation. (Hibernate is a popular choice). These bridge between the database and in-memory views of data in a more or less transparent fashion, and avoids a lot of the glue code that you need to write in the SQL database and XML cases.

I think you're looking for serialization. Try this:
http://java.sun.com/developer/technicalArticles/Programming/serialization/

As mentioned, serialization is what you are looking for. It allows you to write an object to a file, and read it back later with minimal effort. The file will automatically be read back in as your object type. This makes things much easier than trying to store the object yourself using XML.

Java serialization has some pitfalls (like when you update your class). I would serialize in a text format. Json is my first choice here but xml and yaml would work as well.
This way you would have a file that doesn't rely on the binary version of your class.
There are several java libraries: http://www.json.org
Some examples:
http://code.google.com/p/json-simple/wiki/DecodingExamples
http://code.google.com/p/json-simple/wiki/EncodingExamples
And to save and read from the file you can use the Commons Io:
import org.apache.commons.io.FileUtis;
import java.io.File;
...
File dataFile = new File("yourfile.json");
String data = FileUtils.readFileToString(dataFile);
FileUtils.writeStringToFile(dataFile, content);

Document Management System - Database Design

I'm writing my own Document Management System (DMS) in Java (the ones available don't satisfy my needs).
The documents shall be described by the Qualified DublinCore Metadata Standard. The easiest way to do this, in my opinion is do pack the key-value pairs in a RDF model with a XML representation.
To store the metadata for all documents i have two ideas (the document files will be stored in the filesystem):
Store all metadata of all documents in a single XML file
Make a XML file for each document and store it either in the filesystem or in a RDBMS (like the H2 database engine for Java), a key-value database won't solve this because the keys for one document are not unique.
Since (many) documents are linked among each other the first approach may would be better for analysing the data, but the second approach may be much faster.
Which solution you would recommend? Or are there any better solutions?
Stefan

I don't know how your analysis work, but if you need the complete graph in memory to do your analysis then use variante 1 (Store all metadata of all documents in a single XML file), because you will get no gain (but only extra work) from variante 2 in this scenario.
added
If this extra work for variant 2 is not to much, then I recomend variant 2, because it can be more calable.
you could update or add document meta data by writing only a small xml file instead of a huge one
it depends on what xml parser you use, but in some cases it is faster to parse some smaller xml files than one huge one (but this strongly depends on the ammout of data).

Have you considered using MongoDB and GridFS? http://www.mongodb.org/display/DOCS/GridFS+Specification
You can store your documents directly in MongoDB as binary and even store the associated metadata for that particular file in any format you want. It would have the ability to store documents even if they have the same name and it will generate it's own unique IDs.

BTW: even if it does not belong to your question: have a look at a JCR (Java Content Repository) implementation like JackRabbit. You could use it to store your documents and maybe your meta data too.

I'd look into a NO SQL document solution like Couch DB to see if it could help you.
I don't like the file system solution; there's no abstraction whatsoever to help you there.

If your are always accessing all documents, none of your approaches would be slower than the other. But I would recommend the second approach. When it comes to analyzing the data, you'll need to read all documents, so there is no difference if they are in different files or in one file...

Simple properties to string conversion in Java

Using Java, I need to encode a Map<String, String> of name value pairs to store into a String, and be able to decode it again. These will be stored in a database column, and will probably usually be short and simple, so the common case should produce a simple nice looking line, but shouldn't corrupt the data, even if it contains unexpected characters, etc.
How would you choose to do it such that:
The encoded form is a single, human readable line
It doesn't require a big library or much context to encode / decode
Any delimeters are properly escaped
Url encoding? JSON? Do it yourself? Please specify any helper libraries or methods you'd use.
(Edited to specify more context and requirements as requested.)

As #Uri says, additional context would be good. I think your primary concerns are less about the particular encoding scheme, as rolling your own for most encodings is pretty easy for a simple Map<String, String>.
An interesting question is: what will this intermediate string encoding be used for?
if it's purely internal, an ad-hoc format is fine eg simple concatenation:
key1|value1|key2|value2
if humans night read it, a format like Ruby's map declaration is nice:
{ first_key => first_value,
second_key => second_value }
if the encoding is to send a serialised map over the wire to another application, the XML suggestion makes a lot of sense as it's standard-ish and reasonably self-documenting, at the cost of XML's verbosity.
<map>
<entry key='foo' value='bar'/>
<entry key='this' value='that'/>
</map>
if the map is going to be flushed to file and read back later by another Java application, #Cletus' suggestion of the Properties class is a good one, and has the additional benefit of being easy to open and inspect by human beings.
Edit: you've added the information that this is to store in a database column - is there a reason to use a single column, rather than three columns like so:
CREATE TABLE StringMaps
(
map_id NUMBER NOT NULL, -- ditch this if you only store one map...
key VARCHAR2 NOT NULL,
value VARCHAR2
);
As well as letting you store more semantically meaningful data, this moves the encoding/decoding into your data access layer more formally, and allows other database readers to easily see the data without having to understand any custom encoding scheme you might use. You can also easily query by key or value if you want to.
Edit again: you've said that it really does need to fit into a single column, in which case I'd either:
use the first pipe-separated encoding (or whatever exotic character you like, maybe some unprintable-in-English unicode character). Simplest thing that works. Or...
if you're using a database like Oracle that recognises XML as a real type (and so can give you XPath evaluations against it and so on) and need to be able to read the data well from the database layer, go with XML. Writing XML parsers for decoding is never fun, but shouldn't be too painful with such a simple schema.
Even if your database doesn't support XML natively, you can just throw it into any old character-like column-type...

Why not just use the Properties class? That does exactly what you want.

I have been contemplating a similar need of choosing a common representation for the conversations (transport content) between my clients and servers via a facade pattern. I want a representation that is standardized, human-readable (brief), robust, fast. I want it to be lightweight to implement and run, easy to test, and easy to "wrap". Note that I have already eliminated XML by my definition, and by explicit intent.
By "wrap", I mean that I want to support other transport content representations such as XML, SOAP, possibly Java properties or Windows INI formats, comma-separated values (CSV) and that ilk, Google protocol buffers, custom binary formats, proprietary binary formats like Microsoft Excel workbooks, and whatever else may come along. I would implement these secondary representations using wrappers/decorators around the primary facade. Each of these secondary representations is desirable, especially to integrate with other systems in certain circumstances, but none of them is desirable as a primary representation due to various shortcomings (failure to meet one or more of my criteria listed above).
Therefore, so far, I am opting for the JSON format as my primary transport content representation. I intend to explore that option in detail in the near future.
Only in cases of extreme performance considerations would I skip translating the underlying conventional format. The advantages of a clean design include good performance (no wasted effort, ease of maintainability) for which a decent hardware selection should be the only necessary complement. When performance needs become extreme (e.g., processing forty thousand incoming data files totaling forty million transactions per day), then EVERYTHING has to be revisited anyway.
As a developer, DBA, architect, and more, I have built systems of practically every size and description. I am confident in my selection of criteria, and eagerly await confirmation of its suitability. Indeed, I hope to publish an implementation as open-source (but don't hold your breath quite yet).
Note that this design discussion ignores the transport medium (HTTP, SMTP, RMI, .Net Remoting, etc.), which is intentional. I find that it is much more effective to treat the transport medium and the transport content as completely separate design considerations, from each other and from the system in question. Indeed, my intent is to make these practically "pluggable".
Therefore, I encourage you to strongly consider JSON. Best wishes.

Some additional context for the question would help.
If you're going to be encoding and decoding at the entire-map granularity, why not just use XML?

As #DanVinton says, if you need this in internal use (I mean "
internal use
as
it's used only by my components, not components written by others
you can concate key and value.
I prefer use different separator between key and key and key and value:
Instead of
key1+SEPARATOR+value1+SEPARATOR+key2 etc
I code
key1+SEPARATOR_KEY_AND_VALUE+value1+SEPARATOR_KEY(n)_AND_KEY(N+1)+key2 etc
if you must debug, this way is clearer (by design too)

Check out the apache commons configuration package. This will allow you to read/save a file as XML or properties format. It also gives you an option of automatically saving the property changes to a file.
Apache Configuration

A realise this is an old "deadish" thread, but I've got a solution not posited previously which I think is worth throwing in the ring.
We store "arbitrary" attributes (i.e. created by the user at runtime) of geographic features in a single CLOB column in the DB in the standard XML attributes format. That is:
name="value" name="value" name="value"
To create an XML element you just "wrap up" the attributes in an xml element. That is:
String xmlString += "<arbitraryAttributes" + arbitraryAttributesString + " />"
"Serialising" a Properties instance to an xml-attributes-string is a no-brainer... it's like ten lines of code. We're lucky in that we can impose on the users the rule that all attribute names must be valid xml-element-names; and we xml-escape (i.e. &quote; etc) each "value" to avoid problems from double-quotes and whatever in the value strings.
It's effective, flexible, fast (enough) and simple.
Now, having said all that... if we had the time again, we'd just totally divorce ourselves from the whole "metadata problem" by storing the complete unadulterated uninterpreted metadata xml-document in a CLOB and use one of the open-source metadata editors to handle the whole mess.
Cheers. Keith.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.