I'm working on a project that save/retrieve JSON-type string to a database.
Everything works ok, i.e., save and update is safe for types String,Number and Boolean, but for List and Map, I want to see what is the safe way to manipulate List and Map when the data comes back and forth the database especially when items on the list become large, i.e thousands of items, say for list of "friends" and "followers"
I am also concerned with potential data corruption when processing the a JSON List or Map in Java.
What is the safe way to update a List and Map using JSON.Simple library while not loading everything (every item) in memory.
For example I just need to insert one (1) item in a JSON list string that is stored in a database.
JSON isn't suitable for ORM (object relational mapping). This is why NoSQL databases store JSON as document (i.e. the whole thing). Therefore, JSON.Simple has no support for lazy loading part of the JSON structures.
Relational database don't map well to JSON (except for primitives) as you noticed because the natural data structure for List is a 1:N mapping where the list type has an index column (i.e. position of the element in the list) while Map needs an N:M mapping.
So a better solution might be to store the whole JSON string in a CLOB (instead of trying to break it apart) or saving the whole JSON string in a CLOB + extracting a few key fields so you can index the data properly.
But when you work with the JSON you will either have to write your own OR mapper which supports lazy loading of JSON arrays and maps or you will have to read the whole structure into RAM every time.
Related
I am working on a project using JPA (EclipseLink 2.5.2) and Jersey 2.27 running on Tomcat 8 under Java 8. Currently I retrieve results using mapped entities and also some simple lists (distinct values, key-value pairs). I have implemented server-side paging, filtering and sorting. Now I'm trying to add the ability to dynamically aggregate data sets.
I've searched on SO and other sites for the best way to serialize a list of Tuples returned from my criteria builder query. So far I haven't found anything that answers my question. I can think of three ways to do it barring some good pointers from here:
Use reflection to create the appropriate object and supply it in my query
Run through the results one at a time and write out my own JSON for each element to a StringBuffer
ArrayList<HashMap<String, Object>> where the HashMap is built using the alias and matching result then added to the list, list is serialized as normal
So if there is no "easy" way to serialize a list of tuples, which method above makes the most sense? Option 3 definitely works, but feels like a kludge to me.
Generally using Reflection in business logic does not sound good so you should eliminate first option (1.). (2.) - serialising object one by one sounds good when you want to use streaming API, in your case it is not an option. Third one (3.) looks pretty normal and it a good solution because it is generic and List<Map<String, Object>> fits extremely well to JSON specification:
JSON array - List
JSON object - Map
property - String
any JSON object - Object
You should try to convert query result on EclipseLink level. See JPA 2.0 native query results as map question where is suggested to convert query result to ResultType.Map.
Performance wise, is it smart to do this?
An example would be with Gson
Gson gson = new Gson();
String json = gson.toJson(myObject);
// store json string in sql with primary key
// retrieve json string in sql with primary key
I want to simplify the way i store and retrieve objects, instead of building and separating them into pieces and individual columns each time i store/retrieve from a database.
But my concern with using a json string is that the length may impact performance when the database fills up? Im not sure, this is why im asking.
There is not an issue with 'space' used or performance of such: MySQL will deal with that just fine.
That is, while the entire JSON chunk must be pulled/pushed for processing and changes MySQL will continue to handle such as best it ever did, even 'as the database fills up'.
However, there are problems with normalization and opaqueness of information when following this design. Databases are about information, but a blob of JSON is just .. a blob of JSON to an SQL database.
Because of this none of the data in the JSON can be used for relationships or queries nor can it participate in indices or constraints. So much for the "relational" part of the database..
Unless the JSON truly is opaque data, like the contents of a binary file, consider working on relevant normalization .. or switch to a (JSON) document-oriented database (eg. Raven, Redis, Couch, Mongo).
There is no space or performance issues with storing JSON strings. MySQL is capable to handle large blobs, if you need so.
The decision whether to store your data serialized into JSON or not should be based on how do you need to process this data. Relational databases, such as MySQL, suppose that you normalize the data and establish relationships between records. That said, in many cases it can be practical to do otherwise (i.e. store as JSON).
If you will store your data as JSON strings, you will not be able to effectively process this data using MySQL features, e.g. you cannot filter out or sort records by values stored as JSON. However, if you need only to store this data, while the processing is going to be done by the application code, it can be reasonable to use JSON.
As document-oriented databases like MongoDB become more popular, some of the traditional relational databases, such as PostgreSQL and MariaDB, recently also implemented native JSON support.
I'm trying to understand how mongodb works and I have some questions.
I understand how to delete, insert, update and select but I have some "best practice questions"
1) Did we have to create an index or we can just use the _id wich is auto generated ?
2) If I have for exemple 2 kind of objects (cars and drivers) with a n-n relation between. I have to get 3 collections(car, driver and a collection witch link the two others) ?
3) To rebuild my objects I have to parse my json with the JSON object ?
Thanks for your help
Three good questions. I'll answer each in turn.
1) Did we have to create an index or we can just use the _id wich is
auto generated ?
You should definitely try and (re)use the _id index. This usually means moving one of the unique fields (or primary key, in RDMS speak) in your domain object to be mapped to the _id field. You do have to be careful that the field does not get to large if you will be sharding but that is a separate question to answer.
2) If I have for exemple 2 kind of objects (cars and drivers) with a
n-n relation between. I have to get 3 collections(car, driver and a
collection witch link the two others) ?
No! You only need two collections. One for the cars and one for the drivers. The "join table" is pulled into each of the collections as DBRefs.
Each car document will contain an array of DBRef or document references. Those references contain the database name (optional), collection name and _id for a driver document. You will have a similar set of DBRefs in the driver document to each of the driven cars.
Most of the drivers have support for creating and de-referencing these document references.
3) To rebuild my objects I have to parse my json with the JSON object?
MongoDB lingua franca is actually BSON. You can think of BSON as a typed, binary, easily parsed version of JSON. Most of the drivers have some capability to convert from JSON to their representation of BSON and back. If you are developing in Java then the 10gen driver's utility class is JSON. For the Asynchronous driver it is called Json.
Having said that, unless your data is already JSON I would not convert your data to JSON to convert it to BSON and back again. Instead either look for a ODM (Object-Document-Mapper) for your language of choice or perform the translation for your domain object to the driver's BSON representation directly.
HTH-
Rob.
I am using Cassandra 1.2.2. I am finding it so easy to use Jackson to map my objects to and fro json and java for storing in database. I am actually tempted to do this to all of my data. My question is, Is this a good idea? What are the disadvantages of doing this to my application. My first guess is probably more processing overheads but is the juice worth the squeeze? and are there any other disadvantages that i need to know about?
One disadvantage is that to modify the data you have to read in the original, deserialize, make your change, serialize and write out the whole object. In Cassandra, writes are much more efficient than reads so it is beneficial to avoid reads before writes if possible.
The alternative is to use separate columns for each field in your JSON. You can use composite columns for multi-dimensional data.
So if you had the data:
{
name: "fred"
address: "some town"
age: 42
}
and you wanted to change the address, if you had these as separate Cassandra columns you'd just insert a column called address. If you had the JSON serialized you'd have to do much more work. This doesn't apply if your data is write-once.
Even if your data is write-once, if you just wanted to read one field from the data you can just read that column if stored separately rather than reading the whole thing and deserializing. This only applies if you want to read parts of your data.
In conclusion, there could be significant performance advantages to using separate columns if you have to update your data or if you only want to read parts at once.
I have Java Map (out of Strings and Ints) objects that I want to save to a database.
Is there a standard way to go about this task?
Is there a way to compress the Map to take less space on the harddrive?
You actually ask two different questions:
How to save a Map object to a database
You need to create a database and an appropriate table. You can of source serialize the Map into a binary object and store that in the database as a BLOB. It will be better however to have a table entry for every object in the map. You need to use the JDBC API to communicate with the database.
How to compress the Mao to take less space in the hard drive?
You need to serialize the Map to a file. The map will be saved in a binary file, which you can try to compress.