I have Java Map (out of Strings and Ints) objects that I want to save to a database.
Is there a standard way to go about this task?
Is there a way to compress the Map to take less space on the harddrive?
You actually ask two different questions:
How to save a Map object to a database
You need to create a database and an appropriate table. You can of source serialize the Map into a binary object and store that in the database as a BLOB. It will be better however to have a table entry for every object in the map. You need to use the JDBC API to communicate with the database.
How to compress the Mao to take less space in the hard drive?
You need to serialize the Map to a file. The map will be saved in a binary file, which you can try to compress.
Related
I have a scenerio where the user will insert data as key-value pair and i have to insert that data in Mysql db but in a column of BLOB type, later i have to do few operations with this data. For instance i have to store the below data in BLOB column, I am using JAVA spring, JDBC as back-end.
k1:v1,k2:v2,k3:v3,k4:v4 etc.....
I have to insert this data in Mysql table as a blob file, later i have to traverse it and append changes. How do i achieve it.
For e.g: later i may change the value of k1 to m1 or even append new key value as "x1:v1".
A BLOB type is meant to represent a large object (the L in bLob). Because of its size it usually not meant to be editable (by parts), but to be streamed. You don't normally insert it (or retrieve it) as a simple array of bytes, but by opening input and output streams to non-database sources/destinations that manage them.
You could easily exhaust the whole heap of your application by loading a single BLOB in memory.
In simple words, the editing of such a (big) value is usually handled outside the database, probably in your Java application.
I have here a bunch of XML-files which I like to store in a Cassandra database. Is there any possiblity out there to manage that or do I have to parse and reform the XML-files?
You can certainly store them as a blob or text but you will not be able to query the individual fields within the XML files. One other thing you'd want to be cautious of is payload size and partition size. Cassandra in general isn't really designed as an object store but depending on payload size and desired query functionality, you may either have to parse/chunk them out or look for an alternative solution.
I just designed a Pg database and need to choose a way of populating my DB with data, the data consists of txt and csv files but can generally be any type of file containing characters with delimiters, I'm programming in java in order to the data to have the same structure (there's lots of different kinds of files and I need to find what each column of the file represents so I can associate it with a column of my DB) I thought of two ways:
Convert the files into one same type of file (JSON) and then get the DB to regularly check the JSON file and import its content.
Directly connect to the database via JDBC send the strings to the DB (I still need to create a backup file containing what was inserted into the DB so in both cases there is a file created and written into).
Which would you go with time efficiency wise? I'm kinda tempted into using the first one as it would be easier to handle a json file in the DB.
If you have any other suggestion that would also be welcome!
JSON or CSV
If you have the liberty of converting your data either to CSV or JSON format, CSV is the one to choose. This is because you will then be able to use COPY FROM to bulk load large amounts of data at once into postgresql.
CSV is supported by COPY but JSON is not.
Directly inserting values.
This is the approach to take if you only need to insert a few (or maybe even a few thousand) records but not suited for large number of records because it will be slow.
If you choose this approach you can create the back up using COPY TO. However if you feel that you need to create the backup file with your java code. Choosing the format as CSV means you would be able to bulk load as discussed above.
I need to save permanently a big vocabulary and associate to each word some information (and use it to search words efficiently).
Is it better to store it in a DB (in a simply table and let the DBMS make the work of structuring data based on the key) or is it better to create a
trie data structure and then serialize it to a file and deserialize once the program is started, or maybe instead of serialization use a XML file?
Edit: the vocabulary would be in the order of 5 thousend to 10 thousend words in size, and for each word the metadata are structured in array of 10 Integer. The access to the word is very frequent (this is why I thought to trie data structure that have a search time ~O(1) instead of DB that use B-tree or something like that where the search is ~O(logn)).
p.s. using java.
Thanks!
using DB is better.
many companies are merged to DB, like the erp divalto was using serializations and now merged to DB to get performance
you have many choices between DBMS, if you want to see all data in one file the simple way is to use SQLITE. his advantage it not need any server DBMS running.
Here i want to save one object to MongoDB using Java. I found Morphia, Jongo, Springs are providing framework to achieve it.
To store the images to mongoDB i found GridFS
Here my problem is,
1. I have one object it contains both data as well as image. I have to store and have to do lot of mathematical calculation towards the fields in it. As well i want to search a particular image if certain condition satisfies..??
2. If i separate the image with object store image using GridFs and data as BSon data, then how can link this document with image..??
3. While i'm separating the data from object, if that data itself exists 16 MB means how i have to handle this ..?? For this also if i go for GridFs means it is converting into Chunks I want to analyse field by field ..??
4. At particular time can i find the size of the Object in java before write it into mongodb..??
Can any one please suggest me to over come this problem..any link..or any idea which java framework with MongoDB will be very efficient to handle all this real time scenario..??
More informations about the data structure:
I want to store complex business object. For example if i want to store one classroom object it contains many students each student contains many photos. In classroom object has its own data.And each student has its own data with list of photos. I have to efficiently query and analyse the data here. It may be classroom wise or student wise.
You can save the metadata for the image in a normal document which also includes the GridFS filename under which the binary data can be found.
Putting the metadata on GridFS would mean that it would become a binary lumb of data. You then have no longer any way to query it execpt by its filename. So when your image metadata also risks exceeding the 16MB limit, it means that you should reconsider your database schema and separate it into multiple documents.
When you want to do data analysis on both classroom and student level, you should put each student in an own document and then either have the classrooms reference the students or the students reference the classrooms (or both).
I assume that your students will add more and more images during the lifetime of the application. MongoDB does not like documents which grow over time, because growing objects mean that MongoDB needs to constantly reallocate their storage space which is a performance killer on write operations. When that's the case you should also have a separate document per image which references the student they belong to. But when this is not the case (list of images is created once at creation and then rarely changed) you should rather embed the images as an array in the student-object.
Regardless of if you embed or reference the images, the image documents/objects should only contain the meta-data while the binary image data itself should be stored on GridFS and referenced by its filename from the image document.