I have a use case where in I need to read rows from a file, transform them using an engine and then write the output to a database (that can be configured).
While I could write a query builder of my own, I was interested in knowing if there's already an available solution (library).
I searched online and could find jOOQ library but it looks like it is type-safe and has a code-gen tool so is probably suited for static database schema's. In the use case that I have db's can be configured dynamically and the meta-data is programatically read and made available for write-purposes (so a list of tables would be made available, user can select the columns to write and the insert script for these column needs to be dynamically created).
Is there any library that could help me with the use case?
If I understand correctly you need to query the database structure, display the result to via a GUI and have the user map data from a file to that structure?
Assuming this is the case, you're not looking for a 'library', you're looking for an ETL tool.
Alternatively, if you're set on writing something yourself, the (very) basic way to do this is:
the structure of a database using Connection.getMetaData(). The exact usage can vary between drivers so you'll need to create an abstraction layer that meets your needs - I'd assume you're just interested in the table structure here.
the format of the file needs to be mapped to a similar structure to the tables.
provide a GUI that allows the user to connect elements from the file to columns in the table including any type mapping that is needed.
create a parametrized insert statement based on file element to column mapping - this is just a simple bit of string concatenation.
loop throw the rows in the file performing a batch insert for each.
My advice, get an ETL tool, this sounds like a simple problem, but it's full of idiosyncrasies - getting even an 80% solution will be tough and time consuming.
jOOQ (the library you referenced in your question) can be used without code generation as indicated in the jOOQ manual:
http://www.jooq.org/doc/latest/manual/getting-started/use-cases/jooq-as-a-standalone-sql-builder
http://www.jooq.org/doc/latest/manual/sql-building/plain-sql
When searching through the user group, you'll find other users leveraging jOOQ in the way you intend
The setps you need to do is:
read the rows
build each row into an object
transform the above object to target object
insert the target object into the db
Among the above 4 steps, the only thing you need to do is step 3.
And for the above purpose, you can use Transmorph, EZMorph, Commons-BeanUtils, Dozer, etc.
Related
Hello World, Here is the situation,
Basically instead of generating values from a table like like so
34.324, 09/13/2011, thankyou,
I would like to generate the type of that specific value
e.g
VarChar2, Date, varchar2(char 30)
Just to add another layer of difficulty a Java application pulls data from the multiple Oracle database tables using HQL. Thus I would need an equivalent HQL statement to the SQL statement (if it exists)..
I understand the DESC keyword lists the column data types for an entire table however as mentioned above I require specificity.
In Summary
I would like to reverse engineer this application to generate a report of the data_types of the data instead of the actual data itself. I could easily just manually walk through the code however they are over 200 entries and this will be a real pain.
Any help is truly appreciated. If this question is unclear please let me know and I will provide more details and examples.
I have an Hbase table with a couple of million records. Each record has a couple of properties describing the record, stored each in a column qualifier.(Mostly int or string values)
I have a a requirement that I should be able to see the records paginated and sorted based on a column qualifier (or even more than one, in the future). What would be a best approach to do this? I have looked into secondary indexes using coprocessors (mostly hindex from huawei), but it doesn't seem to match my use case exactly. I've also thought about replicating all the data into multiple tables, one for each sort property, which would be included in the rowkey and then redirect queries to those tables. But this seems very tedious as I have a few so called properties already..
Thanks for any suggestions.
You need your NoSQL database to work just like a RDBMS, and given the size of your data your life would be a lot simpler if you stick to it, unless you expect exponential growth :) Also, you don't mention if your data gets updated, this is very important to make a good decision.
Having said that, you have a lot of options, here are some:
If you can wait for the results: Write a MapReduce task to do the scan, sort it and retrieve the top X rows, do you really need more than 1000 pages (20-50k rows) for each sort type?. Another option would be using something like Hive.
If you can aggregate the data and "reduce" the dataset: Write a MapReduce task to periodically export the newest aggregated data to a SQL table (which will handle the queries). I've done this a few times to and it works like a charm, but it depends on your requirements.
If you have plenty of storage: Write a MapReduce task to periodically regenerate (or append the data) a new table for each property (sorting by it in the row-key). You don't need multiple tables, just use a prefix in your rowkeys for each case, or, if you do not want tables and you won't have a lot queries, simply write the sorted data to csv files and store them in the HDFS, they could be easily read by your frontend app.
Manually maintain a secondary index: Which would not very tolerant to schema updates and new properties but would work great for near real-time results. To do it, you have to update your code to also to write to the secondary table with a good buffer to help with performance while avoiding hot regions. Think about this type of rowkeys: [4B SORT FIELD ID (4 chars)] [8B SORT FIELD VALUE] [8B timestamp], with just one column storing the rowkey of the main table. To retrieve the data sorted by any of the fields just perform a SCAN using the SORT FIELD ID as start row + the starting sort field value as pivot for pagination (ignore it to get the first page, then set the last one retrieved), that way you'll have the rowkeys of the main table, and you can just perform a multiget to it to retrieve the full data. Keep in mind that you'll need a small script to scan the main table and write the data to the index table for the existing rows.
Rely on any of the automatic secondary indexing through coprocessors like you mentioned, although I do not like this option at all.
You have mostly enumerated the options. HBase natively does not support secondary indexes as you are aware. In addition to hindex you may consider phoenix
https://github.com/forcedotcom/phoenix
( from SalesForce) which in addition to secondary indexes has jdbc driver and sql support.
I have found the Jquery datatables plug in extremely useful for simple, read only applications where I'd like to give the user pagination, sorting and searching of very large sets of data (millions of rows using server side processing).
I have a system for reusing this code but I end up doing the same thing over and over alot. I'd like to write a very generalized api that I essentially just need to configure the sql needed to retrieve the data used in the table. I am looking for a good design pattern/approach to do this. I've seen articles like this http://www.codeproject.com/Articles/359750/jQuery-DataTables-in-Java-Web-Applications and have a complete understanding of how server side processing works (have done it in java and asp.net many times). For someone to answer you will probably need to have a deep understanding of how server side processing works in java but here are some issues that come up with attempting to do this:
I generally run three separate queries. A count without the search clause, a count with the clause included, the query for the actual data. I haven't found an efficient way to do all 3 at once and doing so requires a lot of extra data to come back from db (ie counts over and over). The api needs to support behavior based on these three different queries and complex queries at that. I generally row number () over an index for the pagination to be relatively speedy with large data.
*where clause changes dynamically (user can search over a variable number of rows).
*order by clause changes for the same reason.
overall, each case is often pretty specific to the data we need. Is there a good way to abstract this so that I can do minimal work when I want to use the plug in server side.
So, the steps are as follows in most projects:
*extract the params the plug on sends to the server (alot of times my own are added, mostly date ranges)
*build the unfiltered count query (this is rarely dynamic).
*build the filtered count query (is dynamic)
*build the data query
*construct a model object of the table and return it as json.
A lot of the issues occur setting the prepared statements with a variable number of parameters. Dynamically generating the sql in a general way (say based on just column names) seems unlikely. I am wondering if someone else has created something they are using for this or if it sounds like a specific pattern is applicable. It has just occurred to me that creating a reusable filter may be helpful in java. Any advice would be greatly appreciated. Feel free to be language agnostic as the architecture is what I'm trying to figure out.
We have base search criteria where all request parameters relevant to DataTables are mapped onto class properties (fields) and custom search criteria class that extends base and contains specific to business logic fields for sutom search. Also on server side we have repository class that takes custom search criteria as an argument and makes queries to database.
If you are familiar with C#, you could check out custom binding code and example of usage.
You could do such custom binding in your Java code as well.
I am editing Java code which stores data in a YAML file, but I need to make it use MySQL instead, but I'm not sure how to go about doing this. The code makes request to read and write data such as SQLset("top.middle.nameleaf", "Joe") or SQLget("top.middle.ageleaf"). These functions are defined by me. This would be simple with YAML, but I'm not sure how to implement this with SQL. Thanks in advance. Another thing is that if top.middle was set to null then top.middle.nameleaf would be removed, like it would in YAML.
sql doesn't work in the same way as yaml. you cannot blindly replace a yaml solution with a sql one. you will have to actually think about what you want to do.
get a basic understanding of how sql works, with tables and columns, and relationships between them.
define a set of tables that match the data you have in yaml (it might be one table for each structure, and a foreign key linking tables that are nested in yaml).
work out how best to adapt your code to use sql. one approach might be to work with yaml until the data are "ready" and then translate the final yaml structure to sql. alternatively, you may want to replace all your yaml-routines with sql routines, but without doing the above it is hard to say exactly how that will work.
I am trying to create an application in java which pulls out records from the database and maps it to objects. It does that without knowing what the schema of the database looks like. All i want to do is fetch all rows from all tables and store them somewhere. There could be a thousand tables with thousands of records each. The application doesn't know the name of any table or attribute. It should map "on the fly". I looked at hibernate but it doesnt give me what i want for this app. I don't want to create hard-coded xml files and classes for mapping. Any ideas how i can accomplish this ?
Thanks
Oracle has a bunch of data dictionary views for metadata.
ALL_TABLES, ALL_TAB_COLUMNS would be first places to start. Then you'd build ad-hoc queries based on what you get out of there. Not sure whether you have to deal with all data types (dates, blobs, spatial, user-defined....).
Not sure what you mean by "store them somewhere". If you start thinking CSV or XML files, you'll need to escape various characters from VARCHAR2 columns.
If you are looking for some generic extract/unload routines, you should look at what is already available in the database or open-source/commercially.
MyBatis provides a pretty simple way to map data results to objects and back, maybe check that out?
http://code.google.com/p/mybatis/
Not to be flip, but for this task, you might want to check out Ruby on Rails and its ActiveRecord approach