I am trying to create an application in java which pulls out records from the database and maps it to objects. It does that without knowing what the schema of the database looks like. All i want to do is fetch all rows from all tables and store them somewhere. There could be a thousand tables with thousands of records each. The application doesn't know the name of any table or attribute. It should map "on the fly". I looked at hibernate but it doesnt give me what i want for this app. I don't want to create hard-coded xml files and classes for mapping. Any ideas how i can accomplish this ?
Thanks
Oracle has a bunch of data dictionary views for metadata.
ALL_TABLES, ALL_TAB_COLUMNS would be first places to start. Then you'd build ad-hoc queries based on what you get out of there. Not sure whether you have to deal with all data types (dates, blobs, spatial, user-defined....).
Not sure what you mean by "store them somewhere". If you start thinking CSV or XML files, you'll need to escape various characters from VARCHAR2 columns.
If you are looking for some generic extract/unload routines, you should look at what is already available in the database or open-source/commercially.
MyBatis provides a pretty simple way to map data results to objects and back, maybe check that out?
http://code.google.com/p/mybatis/
Not to be flip, but for this task, you might want to check out Ruby on Rails and its ActiveRecord approach
Related
I have a SAAS product, which is build by Spring MVC and Hibernate. Generally SAAS products allow user's to customize the product like adding extra fields to the table. So i want to give the flexibility to users, to create custom fields in the tables for themselves. Please provide all the viable solutions to achieve it. Thank you so much for your help.
I'm guessing your trying to back this to a Relational database. The primary problem is that relational databases store things in tables, and tables don't really handle free form data well.
So one solution is to use a document structure that is flexible, like XML (and perhaps ditch the database) but databases have features which are nice, so let's also consider the database-using approaches.
You could create a "custom field" table which would have columns (composite primary key) for
ExtendedTable
ColumnName
but you'd also have to store the data somewhere
(ExtendedKey)
DataItem
And now we get into the really nasty bits. How would you apply constraints to this data? I mean, what would the type be of a DataItem? A general solution would be quite complex (being a type of free form database). Hopefully you could limit the solution to solve only the problems you require solved.
Another approach is to use a single "extra" column that contains an XML record which embeds it's own "column and value" extensions, but if you wanted to display a table of the efficiently, you'd have to parse out every XML document in every field, which is not ideal.
Neither one of these approaches will work well with the existing SQL query language, so you'll then start building your own query language.
I suggest you go back and look at real data requirements, instead of sweeping them under the table with a "and anything else one might want" set of columns on your table.
Your requirement is best suited use case for NoSQL databases (like MongoDB).
Dynamically creating relational database tables & columns (modifying schemas) upon user requests in an application is not a best practice as these involve DDL operations, which are very powerful and in case if you don't handle them carefully, the whole application's database goes to the inconsistent state.
I have a requirement to store CSV data in an Oracle database for later retrieval by dynamic query scripts. The data needs to be stored such that any column of the CSV data can be queried using SQL and performance is key (some CSV files are 100k+ lines).
The content of the CSV files (number of columns, headings, data types) is not known ahead of time and the system needs to be able to handle multiple file structures (which are added to a config file so the system knows how to read them, by people who don't know SQL).
My current solution, in order to avoid an EAV model, is to have my code create new tables every time a new CSV structure is added to the config file. I'm curious to know if there is a better way to achieve what I'm trying to do. I'm not particularly fond of having my code create new tables in production at run-time.
The system is written in groovy, in case it matters.
I am inclined to go with your current solution, which is a separate table for each type. Somehow, I'm most comfortable with storing data in well-defined tables with well-defined types.
An EAV (entity-attribute-value) solution is also viable. With 100k rows of data, the EAV solution should perform pretty well, unless you have lots of tables. One downside is the types of the columns. Without a lot of extra work, you are pretty much limited to strings for all the values.
Oracle does offer another possibility, which is an XML solution. This can give you the flexibility of dynamic column names along with the "simplicity" of not having to define a separate table for each one. You can read more about it in the documentation here.
It comes down to what you want to model. If you need to handle adhoc queries against any of the columns in the CSV file, then I guess you need to model them all as Oracle columns. If you need to only retrieve a whole line based on a particular key, then you could model as two columns: the key and the line. If you need to model the individual columsn that such a thing would not be in first normal form.
When you create an EAV model, you are making a flexible system that allows for additional columns to be added/removed easily. Oracle is already a flexible system that allows for additional columns to be added/removed easily. They've just put more thought into locking, performance, scalability and tool support that your naive EAV model might have.
Overall, I think what you are probably doing is best. It's not an easy problem and it's not exactly what Oracle was designed for so you might have issues with statistics and which indexes to create and so on.
I have a use case where in I need to read rows from a file, transform them using an engine and then write the output to a database (that can be configured).
While I could write a query builder of my own, I was interested in knowing if there's already an available solution (library).
I searched online and could find jOOQ library but it looks like it is type-safe and has a code-gen tool so is probably suited for static database schema's. In the use case that I have db's can be configured dynamically and the meta-data is programatically read and made available for write-purposes (so a list of tables would be made available, user can select the columns to write and the insert script for these column needs to be dynamically created).
Is there any library that could help me with the use case?
If I understand correctly you need to query the database structure, display the result to via a GUI and have the user map data from a file to that structure?
Assuming this is the case, you're not looking for a 'library', you're looking for an ETL tool.
Alternatively, if you're set on writing something yourself, the (very) basic way to do this is:
the structure of a database using Connection.getMetaData(). The exact usage can vary between drivers so you'll need to create an abstraction layer that meets your needs - I'd assume you're just interested in the table structure here.
the format of the file needs to be mapped to a similar structure to the tables.
provide a GUI that allows the user to connect elements from the file to columns in the table including any type mapping that is needed.
create a parametrized insert statement based on file element to column mapping - this is just a simple bit of string concatenation.
loop throw the rows in the file performing a batch insert for each.
My advice, get an ETL tool, this sounds like a simple problem, but it's full of idiosyncrasies - getting even an 80% solution will be tough and time consuming.
jOOQ (the library you referenced in your question) can be used without code generation as indicated in the jOOQ manual:
http://www.jooq.org/doc/latest/manual/getting-started/use-cases/jooq-as-a-standalone-sql-builder
http://www.jooq.org/doc/latest/manual/sql-building/plain-sql
When searching through the user group, you'll find other users leveraging jOOQ in the way you intend
The setps you need to do is:
read the rows
build each row into an object
transform the above object to target object
insert the target object into the db
Among the above 4 steps, the only thing you need to do is step 3.
And for the above purpose, you can use Transmorph, EZMorph, Commons-BeanUtils, Dozer, etc.
Problem
We have an XML like (its having some non unicode which needs to be filtered of) data,
<row><div>1234</div><dept>ABCD</dept></row>
<row><div>5678</div><dept>EFGH</dept></row>
Just mentioning only 2 column tags for ease of understanding. Actually it has more than 20 column tags in each
XML data is directly inserted as records into an Oracle schema table as,
div_c qdept
1234 ABCD
5678 EFGH
More information
XML file is more than 9 Gigs and available in FTP.
Database table column names might be different from XML column tag names.
Might have to add/define some Rules to filter out the rows.
Question
What would be the appropriate way to parse this huge XML and find out whether that record exists in the database table? Any open source tools available to utilize?
What Am Trying
Wrote StAX parser using default implementation(XMLInputFactory) with Invalid characters fiter (FilterReader)
Planning to split the XML as chunks
Have concurrent threads processing each of the chunks
Each thread will generate a query to check whether that exists in database or not (i know its absurd)
Have a connection pool created and execute those queries by each of the thread
I know this is really worst what I am doing and it will take years to complete, I really need some advice on this like whether to go with any ORM to make the checking easy and make the XML parsing fast.
Some suggestions like that would really help me.
Yeah. I think you were right to use StAX. You definitely want to stream and StAX seems to have the simplest API for streaming XML. I wouldn't go to ORM right away. Most ORM is to round-trip data. It saves you work for mechanical transformations. That makes it good when you have very structured data but the mapping between the two schemas is not very complicated. Here you are trying import data from one format into another. It sounds like your large dataset has a fairly simple schema but the mapping is the more complicated part. Go with custom code. Pawel's suggestion of the temporary table sounds good. Try to do as much processing as you can in stored procedures that operate on the whole dataset at once (old and imported). You don't want to keep transferring those rows back and forth from the database to your app.
I am editing Java code which stores data in a YAML file, but I need to make it use MySQL instead, but I'm not sure how to go about doing this. The code makes request to read and write data such as SQLset("top.middle.nameleaf", "Joe") or SQLget("top.middle.ageleaf"). These functions are defined by me. This would be simple with YAML, but I'm not sure how to implement this with SQL. Thanks in advance. Another thing is that if top.middle was set to null then top.middle.nameleaf would be removed, like it would in YAML.
sql doesn't work in the same way as yaml. you cannot blindly replace a yaml solution with a sql one. you will have to actually think about what you want to do.
get a basic understanding of how sql works, with tables and columns, and relationships between them.
define a set of tables that match the data you have in yaml (it might be one table for each structure, and a foreign key linking tables that are nested in yaml).
work out how best to adapt your code to use sql. one approach might be to work with yaml until the data are "ready" and then translate the final yaml structure to sql. alternatively, you may want to replace all your yaml-routines with sql routines, but without doing the above it is hard to say exactly how that will work.