Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 2 years ago.
Improve this question
I'm facing a task that clone a PostgreSQL database, keep all constraints, indexes, etc., and including all records that related to specific column value.
In other word, it's a separate big database to multiple smaller databases.
For example, my original database has numerous schemas, in each schema has numerous tables and in each table has records about multiple person. I want to clone it to new database but clone only records that related to specific person with person id (clone all records in all tables that have person_id = xxx).
Is there a tool for this task or any suggestions? (I'm familiar with Java and Python)
The best way I have found to do this is to first export the complete schema using the pg_dump tool with the -s flag (to dump schema only, and not data), and then export data separately.
To load your schemas starting from a fresh, empty database, use pg_restore. It will read the output from pg_dump and use it to build a database.
When exporting the data, you'll need to classify each table (you can write a helper script to make this easier, or use excel, etc...):
Tables that you want a subset of data from, based on some condition (ie. a certain value in person_id)
Tables you want to copy in their entirety (like dimension tables, such as calendar and company_locations)
Tables that you don't need any data from
For (1), you will need to write an appropriate SELECT query that returns the subset of data that you want to copy. Put those queries in a script and have each one write the result to a separate file, named <schema>.<table>. Lastly, use the psql utility to load the data to the test database. psql has a special \copy command that makes this easy. It can be used from a terminal like this:
psql --c "\copy schema.table FROM ‘~/dump_data/schema.table’ WITH DELIMITER ‘,’ CSV;"
Use pg_dump again to take care of all of those falling under (2), using the -t <table name> flag to dump only the named tables, and -a to dump only data (no schema). This could also be added to the script for (1) by just adding an unqualified SELECT * for each table in (2) and loading the data the same way.
Tables falling under (3) were already handled by the initial export.
Related
so, say I have a SQL(Mysql) database containing 4 tables {A, B, C, D}
and I want to create a testing database which contains a subset of data from the first database (both in time and type).
so for example:
"I want to create a new (identical in structure) database containing all of the data for user "bob" for the last two weeks."
the naive approach is to dump two weeks of data from the first database, use vagrant / chef to spin up a new empty database and import the dump data.
however, this does not work as each table has foreign keys with each other.
so, if I have two weeks of data of "A", it might rely on a year old data from "D".
My current solution is to use the data layer of my java application load the data in to memory and then inserted it into the database. however, this is not sustainable / scalable.
so, in a roundabout way, my question is, does anyone know of any tools or tricks to migrate a "complete" set of data from one database to another considering a time period of 1 table and including all other related data from the other tables as well?
any suggestions would be fantastic :)
Try your "naive approach", but SET FOREIGN_KEY_CHECKS=0; first, then run your backup queries, then SET FOREIGN_KEY_CHECKS=1;
There is the way to recreate similar database with part of data, not so simple, but can be used:
Create new database (only schema), for example using buckup or schema comparer tool.
The next step is to copy table data. You could do it with a help of data comparer tool (look at schema comparer link). Select tables you need and check record data to synchronize.
We have a linux box into which some third party tool drops 0.5MB of data and we have about 32000 similar files. We need to process those files and insert into Oracle10G DB. some one in our organization already has already created a Java program and it is running as a Daemon thread with static fields to map the data in the file and save data into db and clear the static fields for the next line.
This is a serial processing of file and it seems so slow. I'm planning to make this multithreaded by getting rid of it, Or, run multiple java processes(same jar but each one will be start with java -jar run.jar) for parallel execution. But, I'm concerned about the data locking etc., issues.
Questions is what is the best way to bulk load the data into the DB using Java? Or any other way.
Update:
the data that we work on is in the following format, we process the below lines, to make entries into db.
x.y.1.a.2.c.3.b = 12 // ID 1 of table A onetomany table C 3 ID sequence and its proprty b =12
x.y.1.a.2.c.3.f = 143 // ID 1 of table A onetomany table C 3 ID sequence and its proprty f =143
x.y.2.a.1.c.1.d = 12
Update:
We have about 15 tables that take this data. Data is in blocks, each block has related data, and related data will be processed at a time. So you are looking at the following figures when inserting one block
Table 1 | Table 2 | Table 3
---------------------------
5 rows | 8 rows | 12 rows
etc.,
Take a look at Oracle's SQL*Loader tool. It is a tool that is used to bulk load data into Oracle databases. There is a control file that you can write to describe the data, some basic transforms for the data, skip rows, convert types, etc. I've used it before for a similar process and it worked great. And the only thing I had to maintain was the driver script and the control files. I realize you asked for a Java solution, but this might also meet your needs.
Ideally, this sounds like a job for SQL Loader rather than Java.
If you do decide to do this job in java, consider using executebatch. An example is here.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 9 years ago.
Improve this question
I'm writing a Java Program right now, which reads files and writes the content of these files (after some modifications) into an relational database.
My problem right now is that the program should support a wide range of databases and not only one.
So in my program I create SQL statements and commit them to the DB - no problem. (SAP HANA)
Now I want so add another DB (MySQL) and have to slightly change the SQL syntax of the query before committing.
My solution right now is copying the code block, that creates the statements and make the DB specific changes to it. But that obviously can't be it.(to many databases -> 80% code never used) I probably need some kind of mapper, that converts my sql to a dialect, that the chosen DB understands.
Now, I found out about Hibernate and other mappers, but I don't think they fit my needs. The Problem is that they expect an java object (pojo) and convert them. But since I don't know what kind of data my Program is gonna load, I can not create static objects for each column for example.
Sometimes I need to create 4 columns, sometimes 10. sometimes they are Integer, sometimes Strings / varchar. And all of the time they have different names. So all tutorials I found on hibernate are starting from a point where the program is certain what kind of data is going to be inserted into the db which my program is not.
Moreover I need to insert a large number of lines per table (like a billion+) and I think it might be slow to create a object for each insert.
I hope anyone understands my problem and can give me some hints. maybe a mapper, that just converts sql without the need to create a object before.
thank you very much! : )
edit: to make it more clear: the purpose of the programm is to fill up a relational db with data that is stored / discribes in files like csv and xml ). so the db is not used as a tool to store the data but storing the data there is the main aim. I need a realtional db filled up with data that the user provides. and not only one db, but different kinds of rdbs
I think you are describing a perfect use for a file system. Or if you want to go with a filesystem abstraction:
have a look at the apache jackrabbit project
So basically you want to write a tool that writes a arbitrary text file (some kind of csv I assume) into an arbitrary database system? Creating tables and content on the fly, depending on the structure of the text tile?
Using a high level abstraction layer like hibernate is not gonna take you anywhere soon. What you want to do is low level database interaction. As long as you dont need any specific DBMS dependent features you should go a long way with ANSI sql. If that is not enough, I dont see an easy way out of this. Maybe it is an option to write your own abstraction layer that handles DBMS specific formating of the SQL statments. Doesn't sound nice though.
A different thing to think about is the large number of lines per table (like a billion+). Using single row INSERT statements is not a good idea. You have to make use of efficient mass data interfaces - which are strongly DBMS dependent! Prepared statements is the least measure here.
I have a use case where in I need to read rows from a file, transform them using an engine and then write the output to a database (that can be configured).
While I could write a query builder of my own, I was interested in knowing if there's already an available solution (library).
I searched online and could find jOOQ library but it looks like it is type-safe and has a code-gen tool so is probably suited for static database schema's. In the use case that I have db's can be configured dynamically and the meta-data is programatically read and made available for write-purposes (so a list of tables would be made available, user can select the columns to write and the insert script for these column needs to be dynamically created).
Is there any library that could help me with the use case?
If I understand correctly you need to query the database structure, display the result to via a GUI and have the user map data from a file to that structure?
Assuming this is the case, you're not looking for a 'library', you're looking for an ETL tool.
Alternatively, if you're set on writing something yourself, the (very) basic way to do this is:
the structure of a database using Connection.getMetaData(). The exact usage can vary between drivers so you'll need to create an abstraction layer that meets your needs - I'd assume you're just interested in the table structure here.
the format of the file needs to be mapped to a similar structure to the tables.
provide a GUI that allows the user to connect elements from the file to columns in the table including any type mapping that is needed.
create a parametrized insert statement based on file element to column mapping - this is just a simple bit of string concatenation.
loop throw the rows in the file performing a batch insert for each.
My advice, get an ETL tool, this sounds like a simple problem, but it's full of idiosyncrasies - getting even an 80% solution will be tough and time consuming.
jOOQ (the library you referenced in your question) can be used without code generation as indicated in the jOOQ manual:
http://www.jooq.org/doc/latest/manual/getting-started/use-cases/jooq-as-a-standalone-sql-builder
http://www.jooq.org/doc/latest/manual/sql-building/plain-sql
When searching through the user group, you'll find other users leveraging jOOQ in the way you intend
The setps you need to do is:
read the rows
build each row into an object
transform the above object to target object
insert the target object into the db
Among the above 4 steps, the only thing you need to do is step 3.
And for the above purpose, you can use Transmorph, EZMorph, Commons-BeanUtils, Dozer, etc.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I'm designing a database and a Java application to the following:
1. Allow user to query the database via an API.
2. Allow a user to save a query and identify the query via a 'query-id'. User can then pass-in 'query-id' on next call to API, which will execute the query associated with id but it will only retrieve data from the last time the specific query was requested.
- Along with this, I would also need to save the query-id information for each UserID.
Information regarding the Database
The database of choice is PostgreSQL and the information to be requested by user will be stored in various tables.
My question: Any suggestions/advice/tips on how to go about implementing requirement
No. 2?
Is there an existing design pattern, sql queries, built-in db function on how to save a query and fetch information from multiple tables from the last returned results.
Note:
My initial thoughts so far is to store the last row(each row in all the tables will have a primary key) read from each table into a data structure and then save this data structure for each saved query and use it when retrieving data again.
For storing the user and query-id information, I was thinking of creating a separate table to store the UserName, UserUUID, SavedQuery, LastInfoRetrieved.
Thanks.
This is quite a question. The obvious tool to use here would prepared statements but since these are planned on first run, they can run into problems when run multiple times with multiple parameters. Consider the difference, assuming that id ranges from 1 to 1000000 between:
SELECT * FROM mytable WHERE id > 999900;
and
SELECT * FROM mytable WHERE id > 10;
The first should use an index while the second should do a physical-order scan of the table.
A second possibility would be to have functions which return refcursors. This would mean the query is actually run when the refcursor is returned.
A third possibility would be to have a schema of tables that could be used for this, per session, holding results. Ideally these would be temporary tables in pg_temp, but if you have to preserve across sessions, that may be less desirable. Building such a solution is a lot more work and adds a lot of complexity (read: things that can go wrong) so it is really a last choice.
From what you say, refcursors sound like the way to do this but keep in mind PostgreSQL needs to know what data types to return so you can run into some difficulties in this regard (read the documentation thoroughly before proceeding), and if prepared statements gets where you need to go, that might be simpler.