I have a very common encapsulation problem. I am querying a table through jdbc and needs to hold the records in memory for sometime for processing.I dont have any hibernate POJO for the same table to create any objects and save.I am talking about a load of say 200 million in a single query.
The common approach is to create an object array and do casting when I need to use them. (Assume, I can get the table details like column name and data type which will be saved in some reference tables..) But this approach will be very expensive (Time) I guess when the load is taken into consideration..
Any good approach will be appreciated...
Sounds like a CachedRowSet would do the trick here. That's pretty much exactly what you want. It will take a ResultSet and suck the entire thing down, then you can work on it at your leisure.
Addenda:
I am really looking for a robust record holder with easy access on the members
But that's pretty much exactly what a CachedRowSet is.
It manages a collection of records with named (and numbered) columns, and provides typed access to those columns.
CachedRowSet crs = getACachedRowSet();
crs.absolute(5) // go to 5th row, shows you have random access to the contents.
String name = crs.getString("Name");
int age = crs.getInt("Age");
date dob = crs.getDate("DateOfBirth");
While I'm sure you can make up something on your own, a CachedRowSet gives you everything you've asked for. If you don't want to actually load the data in to RAM, you could just use a ResultSet.
Only down side is that it's not thread safe, so you'll need to synchronize around it. But that's life. How exactly does a CachedRowSet not meet your needs?
Well, if you need 200m objects in memory then you can initialize each while iterating through the ResultSet - you don't need to save the metadata
ResultSet rs = stmt.executeQuery();
while (rs.next()) {
String col1= rs.getString("col1");
Integer col2= rs.getInt("col2");
MyClass o = new MyClass(col1,col2);
add(o);
}
rs.close();
To make it more clear, the table involved is completely configurable.
That is why I cant create POJO classes prior to this task.
In that case, I'd have thought that the only real way of doing this is to turn each row into a string with delimiters (CSV, XML or something) and then create an array of strings.
You can get a list of column names returned by a JDBC query as in this answer:
Retrieve column names from java.sql.ResultSet
Related
I'm looking for opinions so I guess this is a 'which is better' question. I have a webapp build in Javascript/jQuery and struts that uses Hibernate to access data in a relational DB (MySQL). When an object/database field has a limited set of strings for values, is it better to use the full string in the object/DB or a 'code' for that string, like a single CHAR instead of the entire string?
class User {
int id;
String userName;
String type; // Values of 'Administrator', 'Regular'
OR
char type // Values of 'A', 'R'
OR
char type // Values of 'A', 'R'
String typeString; // Can be returned on the fly based on 'type' or by DB in SQL CASE statement
}
If the database has the full text string, then its easy coding all the way around, but its wasting the space (in the DB, data transfer) on something that only has a few values.
If the database has just a 'code' then when presenting this field to a user ( like in a grid of existing users, or a dropdown selection list when creating a new user ) the char value must be converted to the full string. Then the question is where should that conversion be done? It could be at the DB level where Hibernate can fill in the full string value from a CASE statement. This saves DB space, but not in data transfer or memory. It could be at the object level where its done in the getter/setter for the 'type' field. Or it could be all the way in the GUI where Javascript converts the 'char' to the appropriate string for the user to see.
Also... if either method is OK to use, what might influence the choice you make? The number of different values? The max length of the strings? How many rows are expected in the table?
I'm sure every DB/programmer has come across this situation many times and probably has a preference.
If you only have a fixed set of user types like Admin and Regular, I think it will easier to use a static hashmap in your code and just store A and R in your code. Something like:
static HashMap<Character,String> userRoles = new HashMap<>();
static{
userRoles.put("A","Admin");
userRoles.put("R", "Regular");
}
When ever you get result from DB, you can just do userRoles.get(type) to check the actual type. This saves space and also it's readable.
I would put the full name in the database alongside an associated short code or ID in some kind of lookup table. Use the shortcode/ID as the primary key for the lookup table, and as a foreign key from other tables. If someone needs to investigate the database layer, or someone needs to use the database for reporting, data warehousing, or analytics this will simplify things greatly.
It's commonly seen as bad practice to name variables, database tables, database columns, functions, etc. with unclear names or abbreviations that not everyone will understand - short codes like this should be seen the same way.
I think its better to do the conversion from the typecode to type (and vice versa) as close to database interaction as possible - in this case Hibernate. This is because your application logic would become more readable and intuitive if it uses the explicit types.
In my opinion- if(BMW.equals(carTypeCode)) {} is lot more readable than if("X".equals(carTypeCode)) {}.
I am not very familiar with Hibernate, but it would be awesome if you could leverage Hibernate for the mapping of String to DB representation and vice versa (maybe using CASE as you mentioned). Personally, I would probably have modeled these Strings as enums and used something like Hibernate Enum Type mapping. Also, you should think about making these type codes a little bit readable by making them at least few chars because these may come in handy when you are debugging some issue by looking at DB dump and you don't have to consult your type-code to type conversion chart.
I don't think performance wise either would not impact much in the average case.
We are examining 2 different methods to do our entities updates:
"Standard" updates - Fetch from DB, set Fields, persist.
We have a Query for each entity that looks like this:
String sqlQuery = "update Entity set entity.field1 = entity.field1, entity.field2 = entity.field2, entity.field3 = entity.field3, .... entity.fieldn = entity.fieldn"
We receive the fields that changed (and their new values) and we replace the string fields (only those required) with the new values. i.e. something like :
for each field : fields {
sqlQuery.replace(field.fieldName, getNewValue(field));
}
executeQuery(sqlQuery);
Any ideas if these options differ in performance to a large extent? Is the 2nd option even viable? what are the pros / cons?
Please ignore the SQL Injection vulnerability, the example above is only an example, we do use PreparedStatements for the query building.
And build a mix solution?
First, create a hasmap .
Second, create a new query for a PreparedStament using before hasmap (for avoid SQL injection)
Third, set all parameters.
The O(x)=n, where n is the number of parameters.
The first solution is much more flexible You can rely on Hibernate dirty checking mechanism for generating the right updates. That's one good reason why an ORM tool is great when writing data.
The second approach is no way way better because it might generate different update plans, hence you can't reuse the PreparedStatement statement cache across various column combinations. Instead of using string based templates (vulnerable to SQL injections) you could use JOOQ instead. JOOQ allows you to reference your table columns in Java, so you can build the UPDATE query in a type-safe fashion.
I want to append a new column in the fashion of
public CachedRowSet addColumn(cachedRowSet Original,List<item> column, String columnName);
or
public CachedRowSet addColumn(cachedRowSet Original,int column,String columnName);
with the column value repeated if it is a primitive.
What is the best way to do this?
Hmm.. hard to answer without knowing the context. Who is providing that CachedRowSet? They might or might not offer a way to generate a new instance. Are you using CachedRowSetImpl from the RI?
The RowSet is not really intended for that. Can you add it to the generating SQL? SELECT a,b,'additional' from .... Or you can use your CachedRowSet and generate JoinedRowSet with a FULL_JOIN with a single field result set.
You can't do that in SQL, let alone CachedRowSet, without executing DDL, and CachedRowSet doesn't support that. The part about a repeating value is an elementary violation of 3NF. You probably don't want to do any of this.
I have Employee table and an Entity class for it,
My task is such that i need the data of employee table within result set(of type scrollable) two times,
in such case what would be better of the following for using the data second time ->
1: create instance of entity class and store it in List while iterating through result set for first time.
OR
2: after first iteration call the first() method of result set to go back to first row and use data for second time.
Which option will consume less time and resources.
If You have better suggestions, please provide.
Thanks.
Unless this is about very large resultsets, you're probably much better off consuming the whole JDBC ResultSet into memory and operating on a java.util.List rather than on a java.sql.ResultSet for these reasons:
The List is more user-friendly for developers
The database resource can be released immediately (which is less error-prone)
You will probably not run out of memory in Java on 1000 rows.
You can make many many mistakes when operating on a scrollable ResultSet, as various JDBC drivers implement this functionality just subtly differently
You can use tools for consuming JDBC result sets. For instance Apache DbUtils:
QueryRunner run = new QueryRunner(dataSource);
ResultSetHandler<List<Person>> h
= new BeanListHandler<Person>(Person.class);
List<Person> persons = run.query("SELECT * FROM Person", h);
jOOQ (3.0 syntax):
List<Person> list =
DSL.using(connection, sqldialect)
.fetch("SELECT * FROM Person");
.into(Person.class);
Spring JdbcTemplate:
JdbcTemplate template = // ...
List result = template.query("SELECT * FROM Person", rowMapper);
Cache the data you retrieve from database. It's always better than polling it, even if driver provides caching on its own level. You can always withdraw it if it's not needed anymore.
Maybe using a Map of employees by their primary key would help?
If You'd describe why You think You need to iterate the list more than once, than we'd see if there's a better algorithm there to get rid of that second interation in the first place.
I had an issue in building the resultset using Java.
I am storing a collection object which is organized as row wise taken from a resultset object and putting the collection object (which is stored as vector/array list) in cache and trying to retrieve the same collection object.
Here I need to build back the resultset again using the collection object. Now my doubt is building the resultset in this way possible or not?
The best idea if you are using a collection in place of a cache is to use a CachedRowSet instead of a ResultSet. CachedRowSet is a Subinterface of ResultSet, but the data is already cached. This is far simpler than to write all the data into an ArrayList.
CachedRowSets can also be queried themselves.
CachedRowSet rs;
.......................
.......................
Integer id;
String name;
while (rs.next())
{
if (rs.getInt("id") == 13)
{
id = rs.getInt("id");
name = rs.getString("name"));
}
}
So you just call the CachedRowSet whenever you need the info. It's almost as good as sliced bread. :)
EDIT:
There are no set methods for ResultSet, while there are Update methods. The problem with using the Update method's for the purpose of rebuilding a ResultSet is that it requires selecting a Row to update. Once the ResultSet has freed itself, all rows are set to null. A null reference cannot be called. A List of Lists mimics a ResultSet itself, or more correctly, an array of arrays mimic a ResultSet.
While Vectors are thread safe, there is a huge overhead attached to them. Use the ArrayList instead. As each nested List is created and placed into the outer nest List, insert it in this manner.
nest.add(Collections.unmodifiableList(nested));
After all of the nested Lists are inserted, return the nest List as an umodifiableList as well. This will give you a thread-safe collection without the overhead of the vectors.
Take a look at this page. Try to see if the SimpleResultSet class is fine for your needs.
If you combine its source into a standalone set of classes, it should do the trick.
From what I could get, your code may be like this:
List collection = new ArrayList();
collection.add(" A collection in some order");
List cache = new ArrayList();
cache.add(collection); ...
Now when you retrieve I think you'll get your collection in order, since you have used List.
If this is not what you were expecting, do comment.
I will advise you to use CachedRowSet. Refer http://www.onjava.com/pub/a/onjava/2004/06/23/cachedrowset.html this article to know more about CachedRowSet. Once you create this CachedRowSet, you can disconnect from the database, make some changes to the cached data and letter can even open the DB connection and commit the changes back to the Database.
Another option you should consider is just to refactor your code to accept a Collection instead of a ResultSet.
I'm assuming you pass that ResultSet to a method that iterates over it. You might as well change the method to iterate over an ArrayList...