File uploaded in postgres db - java

I'm new of vaadin and I'm developing my first application with spring and vaadin.
Now I'm trying to save an image inside my database. I followed the description of upload component on vaadin-book (Upload Component)
What do I have to change if I want to store it in the database?
Can you give me an example?

The upload component writes the received data to an java.io.OutputStream so you have plenty of freedom in how you can process the upload content
If you want to store it as a large object, you can write directly as the stream comes in. See large object support.
If you want to store it as bytea in a row, you must accumulate it in memory then pass it to a parameterized query with setObject(parameterIndex, myDataBuffer, Types.BLOB) . This will consume several times the object's size in memory, so bytea is really only well suited to smaller data.

Related

Efficient data import PostgreSQL DB

I just designed a Pg database and need to choose a way of populating my DB with data, the data consists of txt and csv files but can generally be any type of file containing characters with delimiters, I'm programming in java in order to the data to have the same structure (there's lots of different kinds of files and I need to find what each column of the file represents so I can associate it with a column of my DB) I thought of two ways:
Convert the files into one same type of file (JSON) and then get the DB to regularly check the JSON file and import its content.
Directly connect to the database via JDBC send the strings to the DB (I still need to create a backup file containing what was inserted into the DB so in both cases there is a file created and written into).
Which would you go with time efficiency wise? I'm kinda tempted into using the first one as it would be easier to handle a json file in the DB.
If you have any other suggestion that would also be welcome!
JSON or CSV
If you have the liberty of converting your data either to CSV or JSON format, CSV is the one to choose. This is because you will then be able to use COPY FROM to bulk load large amounts of data at once into postgresql.
CSV is supported by COPY but JSON is not.
Directly inserting values.
This is the approach to take if you only need to insert a few (or maybe even a few thousand) records but not suited for large number of records because it will be slow.
If you choose this approach you can create the back up using COPY TO. However if you feel that you need to create the backup file with your java code. Choosing the format as CSV means you would be able to bulk load as discussed above.

Efficient handling of saving Object data with image in MongoDB

Here i want to save one object to MongoDB using Java. I found Morphia, Jongo, Springs are providing framework to achieve it.
To store the images to mongoDB i found GridFS
Here my problem is,
1. I have one object it contains both data as well as image. I have to store and have to do lot of mathematical calculation towards the fields in it. As well i want to search a particular image if certain condition satisfies..??
2. If i separate the image with object store image using GridFs and data as BSon data, then how can link this document with image..??
3. While i'm separating the data from object, if that data itself exists 16 MB means how i have to handle this ..?? For this also if i go for GridFs means it is converting into Chunks I want to analyse field by field ..??
4. At particular time can i find the size of the Object in java before write it into mongodb..??
Can any one please suggest me to over come this problem..any link..or any idea which java framework with MongoDB will be very efficient to handle all this real time scenario..??
More informations about the data structure:
I want to store complex business object. For example if i want to store one classroom object it contains many students each student contains many photos. In classroom object has its own data.And each student has its own data with list of photos. I have to efficiently query and analyse the data here. It may be classroom wise or student wise.
You can save the metadata for the image in a normal document which also includes the GridFS filename under which the binary data can be found.
Putting the metadata on GridFS would mean that it would become a binary lumb of data. You then have no longer any way to query it execpt by its filename. So when your image metadata also risks exceeding the 16MB limit, it means that you should reconsider your database schema and separate it into multiple documents.
When you want to do data analysis on both classroom and student level, you should put each student in an own document and then either have the classrooms reference the students or the students reference the classrooms (or both).
I assume that your students will add more and more images during the lifetime of the application. MongoDB does not like documents which grow over time, because growing objects mean that MongoDB needs to constantly reallocate their storage space which is a performance killer on write operations. When that's the case you should also have a separate document per image which references the student they belong to. But when this is not the case (list of images is created once at creation and then rarely changed) you should rather embed the images as an array in the student-object.
Regardless of if you embed or reference the images, the image documents/objects should only contain the meta-data while the binary image data itself should be stored on GridFS and referenced by its filename from the image document.

lightweight data structure for java google app engine

I have a google app engine based app which stores data in the datastore. I want to implement a cron that will read around 20k rows of data each day and summarize the data into a much smaller data set and store it in a lightweight, easy to access data structure that I will use later to serve google charts to users.
I think it will be much too costly to read all the instance level data every time a user needs the chart, therefore I want to compile the data "ahead of time" once per day.
I'm thinking of the following options and I'm interested in any feedback or approaches that would optimize performance and minimize GAE overhead.
Options:
1) Create a small csv or xml file and keep it locally on the server, then read the data from there
2) Persist another "summary level" object in the data store and read that (still might be costly?)
3) Create the google chart SVG and store it locally then re-serve it to users (not sure if this is possible)
Thanks!
Double check, but I think datastore + memcache may endup being the cheapest one.
In your cronjob you precompute the data you need to return for each graph and store it in both datastore and memcache.
For each graph request you get the data from memcache.
Memcache data can however be deleted at any time, so if not available there you read it from datastore and put it back into memcache.
Why not generate the "expensive" data for the first request, then store those results in memcache? Depending on your particular implementation, even the first, expensive request might be slightly cheaper than reading & parsing local files. Subsequent reads will hit your memcache and be much cheaper all around.

Merging a large table with a large text file using JPA?

We have a large table of approximately 1 million rows, and a data file with millions of rows. We need to regularly merge a subset of the data in the text file into a database table.
The main reason for it being slow is that the data in the file has references to other JPA objects, meaning the other jpa objects need to be read back for each row in the file. ie Imagine we have 100,000 people, and 1,000,000 asset objects
Person object --> Asset list
Our application currently uses pure JPA for all of its data manipulation requirements. Is there an efficient way to do this using JPA/ORM methodologies or am I going to need to revert back to pure SQL and vendor specific commands?
why doesnt use age old technique: divide and conquer? Split the file into small chunks and then have parallel processes work on these small files concurrently.
And use batch inserts/updates that are offered by JPA and Hibernate. more details here
The ideal way in my opinion though is to use batch support provided by plain JDBC and then commit at regular intervals.
You might also wants to look at spring batch as it provided split/parallelization/iterating through files etc out of box. I have used all of these successfully for an application of considerable size.
One possible answer which is painfully slow is to do the following
For each line in the file:
Read data line
fetch reference object
check if data is attached to reference object
if not add data to reference object and persist
So slow it is not worth considering.

Server side caching for Java/Java EE application

Here is my situation: I have Java EE single page application. All client-server communication is AJAX based with JSON is used as format to exchange data. One of my request takes around 1 min to calculate data required by client. Also this data is huge(Could be > 20 MB). So it is not possible to pass entire data to javascript in one go. So for this reason I am only passing few records to client and using grid to display data with paging option.
Now when user clicks on next page button, I need to get more data. My question is how do I cache data on server side ? I need this data only for one user as a time. Would you recommend caching all data one first request using session id as key ?
Any other suggestions ?
I am assuming you are using DB backend for that. I'd use limits to return small chunks of data, most DB vendors have solution for this. That would make your queries faster, and also most of JS fameworks with grid type of components will support paginating results(ExtJS for example).
If you are fetching data from 3rd party and passing it on (with some modifications or not) I'd still stick to the database and use such workflow: pool data from 3rd party, save in db, call from your widget small chunks required by customers.
Hope this helps.
The cheapest (and not so ineffective way of caching data) in a Java EE web application is to use the Session object like you intend to do. It's ineffective since it requires the developer to ensure that the cache does not leak memory; so it is upto to the developer to nullify the reference to the object once the object is no longer needed.
However, even if you wish to implement the poor man's cache, caching 20MB of data is not advisable, as it does not scale well. The scalability question rises when multiple users utilize the same functionality of the application, in which case 20MB is a lot of data.
You're better off returning paginated "datasets" in the form of JSON, based on the ValueList design pattern. Each request for the query of data will result in partial retrieval of data, which is then sent down the wire to the client. That way, you never have to cache the complete results of the query execution, and also you can return partial datasets. It is entirely upto to you, as to whether you want to cache; usually caching is done for large datasets that are utilized time and again.

Categories

Resources