I am going to be doing some heavy work saving with Object serialization into string and saving into a DB. Each object has arround 100 characters and all objects are saved in a single row field. The thing is sometimes I want to get substrings or split the data and get a piece at certain location or other String manipulations. Should this be done in MySQL or by the Java program?.
Also is there a way to functionally optimize my queries since the expected size of the fields are going to be about VARCHAR 20k.
If you use Java serialization, the resulting output is opaque, that is, you cannot look inside the serialized data and extract bits and pieces without reloading it back into a Java JVM.
If you want to do anything in SQL with the data, then it has to be stored in the database in its primitive (i.e. unserialized) form.
Related
I am trying to get value like key and value pair but i am doing it from json file and now there is another approach suggested lets do it from db tables. because if in future value change then only update the DB is Needed.
I think using json file is more good as value hardly going to change in future(rarest of rare).. although advantage of db approach is just change the db value and done...
So My point is json will be faster then DB and Using Json will reduce load on DB..as clicking UI it invoke extra call of DB..
What do you Think .. Please let me know..
This very much depends on how you are going to use these data.
Do you need to update it often?
Do you need to update by just one specific field?
Do you need to fetch records based on some specific field?
Do you need to fetch whole json or just some specific fields?
Do some parts of json reference any other tables?
Also, consider the size of those data, e.g. if the json files together may become more in size than the whole other tables, you may break db cache. From the other hand, you can always create separate database for your json files if you still need some relational database features.
So, I would anyway start with answering first 5 questions.
I have a scenerio where the user will insert data as key-value pair and i have to insert that data in Mysql db but in a column of BLOB type, later i have to do few operations with this data. For instance i have to store the below data in BLOB column, I am using JAVA spring, JDBC as back-end.
k1:v1,k2:v2,k3:v3,k4:v4 etc.....
I have to insert this data in Mysql table as a blob file, later i have to traverse it and append changes. How do i achieve it.
For e.g: later i may change the value of k1 to m1 or even append new key value as "x1:v1".
A BLOB type is meant to represent a large object (the L in bLob). Because of its size it usually not meant to be editable (by parts), but to be streamed. You don't normally insert it (or retrieve it) as a simple array of bytes, but by opening input and output streams to non-database sources/destinations that manage them.
You could easily exhaust the whole heap of your application by loading a single BLOB in memory.
In simple words, the editing of such a (big) value is usually handled outside the database, probably in your Java application.
Performance wise, is it smart to do this?
An example would be with Gson
Gson gson = new Gson();
String json = gson.toJson(myObject);
// store json string in sql with primary key
// retrieve json string in sql with primary key
I want to simplify the way i store and retrieve objects, instead of building and separating them into pieces and individual columns each time i store/retrieve from a database.
But my concern with using a json string is that the length may impact performance when the database fills up? Im not sure, this is why im asking.
There is not an issue with 'space' used or performance of such: MySQL will deal with that just fine.
That is, while the entire JSON chunk must be pulled/pushed for processing and changes MySQL will continue to handle such as best it ever did, even 'as the database fills up'.
However, there are problems with normalization and opaqueness of information when following this design. Databases are about information, but a blob of JSON is just .. a blob of JSON to an SQL database.
Because of this none of the data in the JSON can be used for relationships or queries nor can it participate in indices or constraints. So much for the "relational" part of the database..
Unless the JSON truly is opaque data, like the contents of a binary file, consider working on relevant normalization .. or switch to a (JSON) document-oriented database (eg. Raven, Redis, Couch, Mongo).
There is no space or performance issues with storing JSON strings. MySQL is capable to handle large blobs, if you need so.
The decision whether to store your data serialized into JSON or not should be based on how do you need to process this data. Relational databases, such as MySQL, suppose that you normalize the data and establish relationships between records. That said, in many cases it can be practical to do otherwise (i.e. store as JSON).
If you will store your data as JSON strings, you will not be able to effectively process this data using MySQL features, e.g. you cannot filter out or sort records by values stored as JSON. However, if you need only to store this data, while the processing is going to be done by the application code, it can be reasonable to use JSON.
As document-oriented databases like MongoDB become more popular, some of the traditional relational databases, such as PostgreSQL and MariaDB, recently also implemented native JSON support.
We are storing data in serialized form in Oracle tables. Column which holds the serialized data is of Type BLOB. We do have a stored procedure to query the table and return the results as sys_refcursor. The stored procedure accepts list of keys in comma separated form for which the blobs are to be fetched.
Environment - JRE 1.6
Oracle 11g
Problem : While the application requests the data for say 8 or 10 rows, we want the stored procedure to return all the result rows with binary data in a single db round trip. Is this possible? or what is the best way to fetch the results with minimal database round trip.
(We have set the fetchRowsize to 10, so that multiple rows are fetched. We have also Set the setLOBPrefetchSize to sufficient big value (3 Meg) . However these does not seem to make a difference. The resultSet.getBytes() takes a significant time indicating the data is fetched on request)
What am i missing? Do we have any trace to enable/check what is going on underneath?
BLOB trigger intentionally some specific behaviour, since they can be extremely large (depends on block size, but maximum size is at least 8 TB). The drivers do not always know in advance how much data will be transferred, so the driver takes the safe route by default.
I understand that you want it in just one roundtrip to get over, probably because of your network latency? If not, please state.
We've had the same problem with a standard software package we develop and in which the client retrieves on connection setup a lot of data across the globe. We've solved it as follows:
Initially we used a stored procedure that takes a parameter with the hash value of the last retrieved data rows. Stored procedure checks hash of current data rows. If not different, it returns just a header. If different, it puts everything together in one large piece and sends that back. In that way, it avoids transferring lots of information across the globe, because high network latency often comes with low bandwidth. Plus the number of round trips is reduced.
The last two years we've inserted a different product of our company in between (a webservice provider). It does the compression and deduplication by itself and further reduces bandwidth need. Plus it only uses SQL*Net between the webservice and the Oracle RDBMS. The webservice communicates using bson (json in binary format).
With a low labour effort, I would recommend returning just one LOB from the stored procedure with everything in it. Maybe even compressing it or taking use of a client's side cache.
Option 2 described above takes a lot more labour to introduce.
If you happen to use Oracle's thin JDBC driver, you might want to try setting the connection property useFetchSizeWithLongColumn to true, see in http://download.oracle.com/otn_hosted_doc/jdeveloper/905/jdbc-javadoc/index.html?oracle/jdbc/OracleDriver.html. However, as the property name indicates it is meant for LONG columns and it may not have any effect on BLOB columns. (Unfortunaetly, I am not able to try it myself here. However, source code under http://ora-jdbc-source.googlecode.com/svn/trunk/OracleJDBC/src/oracle/jdbc/driver/OracleResultSetImpl.java does not seem to make LONG/LOB distinction with useFetchSizeWithLongColumn, just "re-opens streams".)
For verification/tracing see the example code in: http://steveracanovic.blogspot.de/2009/08/using-usefetchsizewithlongcolumn-and.html
If the connection property does not work for BLOBs, you might try returning LONGs. However, there does not seem to be a nice method of conversion from BLOBs to LONGs. The following discussion (see answers) mentions a way using RAW variables in PL/SQL: http://oracle.mscer.com/q_oracle_41102.html. As you use mentioned using a stored procedure, you might want to try this.
(Note that setting the connection property implies that prefetched LONG data may consume considerable amounts of memory.)
(Again, I'm sorry I cannot try this approach myself and cannot assist with details as I do not have an Oracle DB here. But as nobody came up with an answer during past two hours ...)
I am using Cassandra 1.2.2. I am finding it so easy to use Jackson to map my objects to and fro json and java for storing in database. I am actually tempted to do this to all of my data. My question is, Is this a good idea? What are the disadvantages of doing this to my application. My first guess is probably more processing overheads but is the juice worth the squeeze? and are there any other disadvantages that i need to know about?
One disadvantage is that to modify the data you have to read in the original, deserialize, make your change, serialize and write out the whole object. In Cassandra, writes are much more efficient than reads so it is beneficial to avoid reads before writes if possible.
The alternative is to use separate columns for each field in your JSON. You can use composite columns for multi-dimensional data.
So if you had the data:
{
name: "fred"
address: "some town"
age: 42
}
and you wanted to change the address, if you had these as separate Cassandra columns you'd just insert a column called address. If you had the JSON serialized you'd have to do much more work. This doesn't apply if your data is write-once.
Even if your data is write-once, if you just wanted to read one field from the data you can just read that column if stored separately rather than reading the whole thing and deserializing. This only applies if you want to read parts of your data.
In conclusion, there could be significant performance advantages to using separate columns if you have to update your data or if you only want to read parts at once.