Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm designing a database and a Java application to the following:
1. Allow user to query the database via an API.
2. Allow a user to save a query and identify the query via a 'query-id'. User can then pass-in 'query-id' on next call to API, which will execute the query associated with id but it will only retrieve data from the last time the specific query was requested.
- Along with this, I would also need to save the query-id information for each UserID.
Information regarding the Database
The database of choice is PostgreSQL and the information to be requested by user will be stored in various tables.
My question: Any suggestions/advice/tips on how to go about implementing requirement
No. 2?
Is there an existing design pattern, sql queries, built-in db function on how to save a query and fetch information from multiple tables from the last returned results.
Note:
My initial thoughts so far is to store the last row(each row in all the tables will have a primary key) read from each table into a data structure and then save this data structure for each saved query and use it when retrieving data again.
For storing the user and query-id information, I was thinking of creating a separate table to store the UserName, UserUUID, SavedQuery, LastInfoRetrieved.
Thanks.
This is quite a question. The obvious tool to use here would prepared statements but since these are planned on first run, they can run into problems when run multiple times with multiple parameters. Consider the difference, assuming that id ranges from 1 to 1000000 between:
SELECT * FROM mytable WHERE id > 999900;
and
SELECT * FROM mytable WHERE id > 10;
The first should use an index while the second should do a physical-order scan of the table.
A second possibility would be to have functions which return refcursors. This would mean the query is actually run when the refcursor is returned.
A third possibility would be to have a schema of tables that could be used for this, per session, holding results. Ideally these would be temporary tables in pg_temp, but if you have to preserve across sessions, that may be less desirable. Building such a solution is a lot more work and adds a lot of complexity (read: things that can go wrong) so it is really a last choice.
From what you say, refcursors sound like the way to do this but keep in mind PostgreSQL needs to know what data types to return so you can run into some difficulties in this regard (read the documentation thoroughly before proceeding), and if prepared statements gets where you need to go, that might be simpler.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I'm facing a task that clone a PostgreSQL database, keep all constraints, indexes, etc., and including all records that related to specific column value.
In other word, it's a separate big database to multiple smaller databases.
For example, my original database has numerous schemas, in each schema has numerous tables and in each table has records about multiple person. I want to clone it to new database but clone only records that related to specific person with person id (clone all records in all tables that have person_id = xxx).
Is there a tool for this task or any suggestions? (I'm familiar with Java and Python)
The best way I have found to do this is to first export the complete schema using the pg_dump tool with the -s flag (to dump schema only, and not data), and then export data separately.
To load your schemas starting from a fresh, empty database, use pg_restore. It will read the output from pg_dump and use it to build a database.
When exporting the data, you'll need to classify each table (you can write a helper script to make this easier, or use excel, etc...):
Tables that you want a subset of data from, based on some condition (ie. a certain value in person_id)
Tables you want to copy in their entirety (like dimension tables, such as calendar and company_locations)
Tables that you don't need any data from
For (1), you will need to write an appropriate SELECT query that returns the subset of data that you want to copy. Put those queries in a script and have each one write the result to a separate file, named <schema>.<table>. Lastly, use the psql utility to load the data to the test database. psql has a special \copy command that makes this easy. It can be used from a terminal like this:
psql --c "\copy schema.table FROM ‘~/dump_data/schema.table’ WITH DELIMITER ‘,’ CSV;"
Use pg_dump again to take care of all of those falling under (2), using the -t <table name> flag to dump only the named tables, and -a to dump only data (no schema). This could also be added to the script for (1) by just adding an unqualified SELECT * for each table in (2) and loading the data the same way.
Tables falling under (3) were already handled by the initial export.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm writing a Java Program right now, which reads files and writes the content of these files (after some modifications) into an relational database.
My problem right now is that the program should support a wide range of databases and not only one.
So in my program I create SQL statements and commit them to the DB - no problem. (SAP HANA)
Now I want so add another DB (MySQL) and have to slightly change the SQL syntax of the query before committing.
My solution right now is copying the code block, that creates the statements and make the DB specific changes to it. But that obviously can't be it.(to many databases -> 80% code never used) I probably need some kind of mapper, that converts my sql to a dialect, that the chosen DB understands.
Now, I found out about Hibernate and other mappers, but I don't think they fit my needs. The Problem is that they expect an java object (pojo) and convert them. But since I don't know what kind of data my Program is gonna load, I can not create static objects for each column for example.
Sometimes I need to create 4 columns, sometimes 10. sometimes they are Integer, sometimes Strings / varchar. And all of the time they have different names. So all tutorials I found on hibernate are starting from a point where the program is certain what kind of data is going to be inserted into the db which my program is not.
Moreover I need to insert a large number of lines per table (like a billion+) and I think it might be slow to create a object for each insert.
I hope anyone understands my problem and can give me some hints. maybe a mapper, that just converts sql without the need to create a object before.
thank you very much! : )
edit: to make it more clear: the purpose of the programm is to fill up a relational db with data that is stored / discribes in files like csv and xml ). so the db is not used as a tool to store the data but storing the data there is the main aim. I need a realtional db filled up with data that the user provides. and not only one db, but different kinds of rdbs
I think you are describing a perfect use for a file system. Or if you want to go with a filesystem abstraction:
have a look at the apache jackrabbit project
So basically you want to write a tool that writes a arbitrary text file (some kind of csv I assume) into an arbitrary database system? Creating tables and content on the fly, depending on the structure of the text tile?
Using a high level abstraction layer like hibernate is not gonna take you anywhere soon. What you want to do is low level database interaction. As long as you dont need any specific DBMS dependent features you should go a long way with ANSI sql. If that is not enough, I dont see an easy way out of this. Maybe it is an option to write your own abstraction layer that handles DBMS specific formating of the SQL statments. Doesn't sound nice though.
A different thing to think about is the large number of lines per table (like a billion+). Using single row INSERT statements is not a good idea. You have to make use of efficient mass data interfaces - which are strongly DBMS dependent! Prepared statements is the least measure here.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm new to MongoDB and trying to figure out some solutions for basic requirements.
Here is the scenario:
I have an application which saves info to MongoDB through a scheduler. Each time when the scheduler runs, i need to fetch the last inserted document from db to get the last info id which i should send with my request to get the next set of info. Also when i save an object, i need to validate it whether a specific value for a field is already there in db, if it is there, i just need to update a count in that document or else will save a new one.
My Questions are:
I'm using MongoDB java driver. Is it good to generate the object id myself, or is it good to use what is generated from the driver or MongoDB itself?
What is the best way to retrieve the last inserted document to find the last processed info id?
If i do a find for the specific value in a column before each insert, i'm worried about the performance of the application since it is supposed to have thousands of records. What is the best way to do this validation?
I saw in some places where they are talking about two writes when doing inserts, one to write the document to the collection, other one to write it to another collection as last updated to keep track of the last inserted entry. Is this a good way? We don't normally to this with relational databases.
Can i expect the generated object ids to be unique for every record inserted in the future too? (Have a doubt whether it can repeat or not)
Thanks.
Can i expect the generated object ids to be unique for every record inserted in the future too? (Have a doubt whether it can repeat or not)
I'm using MongoDB java driver. Is it good to generate the object id myself, or is it good to use what is generated from the driver or MongoDB itself?
If you don't provide a object id for the document, mongo will automatically generate it for you. All documents must definitely have _id. Here is the relevant reference.
The relevant part is
ObjectId is a 12-byte BSON type, constructed using:
a 4-byte value representing the seconds since the Unix epoch,
a 3-byte machine identifier,
a 2-byte process id, and
a 3-byte counter, starting with a random value.
http://docs.mongodb.org/manual/reference/object-id/
I guess this is more than enough randomization(although a poor hash choice, since dates/time increase monotonically)
If i do a find for the specific value in a column before each insert, i'm worried about the performance of the application since it is supposed to have thousands of records. What is the best way to do this validation?
Index the field(there are no columns in mongodb, 2 documents can have different fields) first.
db.collection_name.ensureIndex({fieldName : 1})
What is the best way to retrieve the last inserted document to find the last processed >info id?
Fun fact. We don't need info id field if we are using it once and then deleting the document. The _id field is datewise sorted. But if the document is regularly updated, then we need to modify the document atomically with the find and modify operation.
http://api.mongodb.org/java/2.6/com/mongodb/DBCollection.html#findAndModify%28com.mongodb.DBObject,%20com.mongodb.DBObject,%20com.mongodb.DBObject,%20boolean,%20com.mongodb.DBObject,%20boolean,%20boolean%29
Now you have updated the date of the last inserted/modified document. Make sure this field is indexed. Now using the above link, look for the sort parameter and populate it with
new BasicDbObject().put("infoId", -1); //-1 is for descending order.
I saw in some places where they are talking about two writes when doing inserts, one to write the document to the collection, other one to write it to another collection as last updated to keep track of the last inserted entry. Is this a good way? We don't normally to this with relational databases.
Terrible Idea ! Welcome to Mongo. You can do better than this.
I am developing an application using normal JDBC connection. The application is developed with Java-Java EE SpringsMVC 3.0 and SQL Server 08 as database. I am required to update a table based on a non primary key column.
Now, before updating the table we had to decide an approach for updating the table, as table may contain huge amount of data. The update Query will be executed in a batch and we are required to design application in a manner wherein it doesn't hog the system resources.
Now, We had to decide between either of the approaches,
1. SELECT DATA BEFORE YOU UPDATE or
2. UPDATE DATA AND THEN SELECT MISSING DATA.
Select data before update is only benificial if chances of failure are maximum, i.e. if a batch 100 Query update is executed, and out of which if only 20 rows are updated successfully, then this approach should be taken
Update data and then check missing data is benificial only when failure records are far less. By this ap[proach one database select call can be avoided, i.e after a batch update, the count of records updated can be taken and the select query should be executed if and only if theres is a count in mismatch w.r.t no of query.
We are totally unaware about the system on Production environment, but we want to counter for all possibilities and want a faster system. I need your inputs as which is a better approach.
Since there is 50:50 chance of successful updates or faster selects, its hard to tell from the current scenario mentioned. You probably would want a fuzzy logic approach, getting constant feedback of how many updates were successful over the period of time, and then decide on the basis of that data to either do an update before select or do a select before update.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
Let me first brief about the scenario. The database is Sybase. There are some 2-3 k stored procedures. Stored procedures might return huge data (around million records). There will be a service (servlet / spring controller) which will call the required procedure and flush the data back to client in xml format.
I need to appy filtering (on multiple column & multiple condition) / sort (based on some dynamic criteria), this I have done.
The issue is, as the data is huge, doing all the filtering / sorting in-memory is not good. I have thought of below options.
Option 1:
Once I get the ResultSet object, read some X no. of records, filter it, store it in some file, repeat this process till all the data is read. Then just read the file and flush the data to client.
I need to figure out how do I sort the data in file and how to store objects in file so that the filtering/sorting is fast.
Option 2:
Look for some Java API, which takes the data, filters it & sort it based on the given criteria and returns it back as a stream
Option 3:
Use in-memory database like hsqldb, h2database, But I think this will overhead instead of helping. I will need to insert data first and then query data and this will also in turn use the file system.
Note I don't want to modify the stored procedures so the option of doing filtering/sorting in database is not an option or might be the last option if nothing else works.
Also if it helps, every record that I read from ResultSet, I store it in a Map, with keys being the column name and this Map is stored in a List, on which I apply the filtering & sorting.
Which option do you think will be good for memory footprint, scalable, performance wise or any other option which will be good for this scenario ?
Thanks
I would recommend your Option 3 but it doesn't need to be an in-memory database; you could use a proper database instead. Any other option would be just a more specific solution to the general problem of sorting huge amounts of data. That is, after all, exactly what a database is for and it does it very well.
If you really believe your Option 3 is not a good solution then you could implement a sort/merge solution. Gather your Maps as you already do but whenever you reach a limit of records (say 10,000 perhaps) sort them, write them to disk and clear them down from memory.
Once your data is complete you can now open all files you wrote and perform a merge on them.
Is hadoop applicable for your problem?
You should filter the data in database itself. You can write aggregation procedure which will execute all other procedures, combine data or filter them However the best option is to modify 2-3 thousands stored procedures so they return only needed data.