Advice on JDBC ResultSet

Advice on JDBC ResultSet - java

I have Employee table and an Entity class for it,
My task is such that i need the data of employee table within result set(of type scrollable) two times,
in such case what would be better of the following for using the data second time ->
1: create instance of entity class and store it in List while iterating through result set for first time.
OR
2: after first iteration call the first() method of result set to go back to first row and use data for second time.
Which option will consume less time and resources.
If You have better suggestions, please provide.
Thanks.

Unless this is about very large resultsets, you're probably much better off consuming the whole JDBC ResultSet into memory and operating on a java.util.List rather than on a java.sql.ResultSet for these reasons:
The List is more user-friendly for developers
The database resource can be released immediately (which is less error-prone)
You will probably not run out of memory in Java on 1000 rows.
You can make many many mistakes when operating on a scrollable ResultSet, as various JDBC drivers implement this functionality just subtly differently
You can use tools for consuming JDBC result sets. For instance Apache DbUtils:
QueryRunner run = new QueryRunner(dataSource);
ResultSetHandler<List<Person>> h
= new BeanListHandler<Person>(Person.class);
List<Person> persons = run.query("SELECT * FROM Person", h);
jOOQ (3.0 syntax):
List<Person> list =
DSL.using(connection, sqldialect)
.fetch("SELECT * FROM Person");
.into(Person.class);
Spring JdbcTemplate:
JdbcTemplate template = // ...
List result = template.query("SELECT * FROM Person", rowMapper);

Cache the data you retrieve from database. It's always better than polling it, even if driver provides caching on its own level. You can always withdraw it if it's not needed anymore.

Maybe using a Map of employees by their primary key would help?
If You'd describe why You think You need to iterate the list more than once, than we'd see if there's a better algorithm there to get rid of that second interation in the first place.

Related

Update query performance with Hibernate

We are examining 2 different methods to do our entities updates:
"Standard" updates - Fetch from DB, set Fields, persist.
We have a Query for each entity that looks like this:
String sqlQuery = "update Entity set entity.field1 = entity.field1, entity.field2 = entity.field2, entity.field3 = entity.field3, .... entity.fieldn = entity.fieldn"
We receive the fields that changed (and their new values) and we replace the string fields (only those required) with the new values. i.e. something like :
for each field : fields {
sqlQuery.replace(field.fieldName, getNewValue(field));
}
executeQuery(sqlQuery);
Any ideas if these options differ in performance to a large extent? Is the 2nd option even viable? what are the pros / cons?
Please ignore the SQL Injection vulnerability, the example above is only an example, we do use PreparedStatements for the query building.

And build a mix solution?
First, create a hasmap .
Second, create a new query for a PreparedStament using before hasmap (for avoid SQL injection)
Third, set all parameters.
The O(x)=n, where n is the number of parameters.

The first solution is much more flexible You can rely on Hibernate dirty checking mechanism for generating the right updates. That's one good reason why an ORM tool is great when writing data.
The second approach is no way way better because it might generate different update plans, hence you can't reuse the PreparedStatement statement cache across various column combinations. Instead of using string based templates (vulnerable to SQL injections) you could use JOOQ instead. JOOQ allows you to reference your table columns in Java, so you can build the UPDATE query in a type-safe fashion.

Encapsulating JDBC resultset

I have a very common encapsulation problem. I am querying a table through jdbc and needs to hold the records in memory for sometime for processing.I dont have any hibernate POJO for the same table to create any objects and save.I am talking about a load of say 200 million in a single query.
The common approach is to create an object array and do casting when I need to use them. (Assume, I can get the table details like column name and data type which will be saved in some reference tables..) But this approach will be very expensive (Time) I guess when the load is taken into consideration..
Any good approach will be appreciated...

Sounds like a CachedRowSet would do the trick here. That's pretty much exactly what you want. It will take a ResultSet and suck the entire thing down, then you can work on it at your leisure.
Addenda:
I am really looking for a robust record holder with easy access on the members
But that's pretty much exactly what a CachedRowSet is.
It manages a collection of records with named (and numbered) columns, and provides typed access to those columns.
CachedRowSet crs = getACachedRowSet();
crs.absolute(5) // go to 5th row, shows you have random access to the contents.
String name = crs.getString("Name");
int age = crs.getInt("Age");
date dob = crs.getDate("DateOfBirth");
While I'm sure you can make up something on your own, a CachedRowSet gives you everything you've asked for. If you don't want to actually load the data in to RAM, you could just use a ResultSet.
Only down side is that it's not thread safe, so you'll need to synchronize around it. But that's life. How exactly does a CachedRowSet not meet your needs?

Well, if you need 200m objects in memory then you can initialize each while iterating through the ResultSet - you don't need to save the metadata
ResultSet rs = stmt.executeQuery();
while (rs.next()) {
String col1= rs.getString("col1");
Integer col2= rs.getInt("col2");
MyClass o = new MyClass(col1,col2);
add(o);
}
rs.close();

To make it more clear, the table involved is completely configurable.
That is why I cant create POJO classes prior to this task.
In that case, I'd have thought that the only real way of doing this is to turn each row into a string with delimiters (CSV, XML or something) and then create an array of strings.
You can get a list of column names returned by a JDBC query as in this answer:
Retrieve column names from java.sql.ResultSet

Inject attribute into JPQL SELECT clause

Let's depict the following use case: I have a JPQL Query which on the fly creates data objects using the new keyword. In the SELECT clause I would like to inject an attribute which is not known to the database but to the layer which queries it.
This could look like
EntityManager em; // Got it from somewhere
boolean editable = false; // Value might change, e.g. depending on current date
Query q = em.createQuery("SELECT new foo.bar.MyDTO(o, :editable) FROM MyObject o")
.setParameter("editable", editable);
List<MyDTO> results = (List<MyDTO>) q.getResultList();
Any ideas how this kind of attribute or parameter injection into the SELECT clause might work in JPQL? Both JPA and JPA 2.0 solutions are applicable.
Edit: Performance does not play a key role, but clarity and cleanness of code.

Have you measured a performance problem when simply iterating over the list of results and call a setter on each of the elements. I would guess that compared to
the time it takes to execute the query over the database (inter-process call, network communication)
the time it takes to transform each row into a MyObject instance using reflection
the time it takes to transform each MyObject instance into a MyDTO using reflection
your loop will be very fast.
If you're so concerned about performance, you should construct your MyDTO instances manually from the returned MyObject instances instead of relying on Hibernate and reflection to do it.
Keep is simple, safe, readable and maintainable first. Then, if you have a performance problem, measure to detect where it comes from. Then and only then, optimize.

It will not work without possible vendor extensions, because according specification:
4.6.4 Input Parameters
...
Input parameters can only be used in the
WHERE clause or HAVING clause of a query.

Building resultset using collection object

I had an issue in building the resultset using Java.
I am storing a collection object which is organized as row wise taken from a resultset object and putting the collection object (which is stored as vector/array list) in cache and trying to retrieve the same collection object.
Here I need to build back the resultset again using the collection object. Now my doubt is building the resultset in this way possible or not?

The best idea if you are using a collection in place of a cache is to use a CachedRowSet instead of a ResultSet. CachedRowSet is a Subinterface of ResultSet, but the data is already cached. This is far simpler than to write all the data into an ArrayList.
CachedRowSets can also be queried themselves.
CachedRowSet rs;
.......................
.......................
Integer id;
String name;
while (rs.next())
{
if (rs.getInt("id") == 13)
{
id = rs.getInt("id");
name = rs.getString("name"));
}
}
So you just call the CachedRowSet whenever you need the info. It's almost as good as sliced bread. :)
EDIT:
There are no set methods for ResultSet, while there are Update methods. The problem with using the Update method's for the purpose of rebuilding a ResultSet is that it requires selecting a Row to update. Once the ResultSet has freed itself, all rows are set to null. A null reference cannot be called. A List of Lists mimics a ResultSet itself, or more correctly, an array of arrays mimic a ResultSet.
While Vectors are thread safe, there is a huge overhead attached to them. Use the ArrayList instead. As each nested List is created and placed into the outer nest List, insert it in this manner.
nest.add(Collections.unmodifiableList(nested));
After all of the nested Lists are inserted, return the nest List as an umodifiableList as well. This will give you a thread-safe collection without the overhead of the vectors.

Take a look at this page. Try to see if the SimpleResultSet class is fine for your needs.
If you combine its source into a standalone set of classes, it should do the trick.

From what I could get, your code may be like this:
List collection = new ArrayList();
collection.add(" A collection in some order");
List cache = new ArrayList();
cache.add(collection); ...
Now when you retrieve I think you'll get your collection in order, since you have used List.
If this is not what you were expecting, do comment.

I will advise you to use CachedRowSet. Refer http://www.onjava.com/pub/a/onjava/2004/06/23/cachedrowset.html this article to know more about CachedRowSet. Once you create this CachedRowSet, you can disconnect from the database, make some changes to the cached data and letter can even open the DB connection and commit the changes back to the Database.

Another option you should consider is just to refactor your code to accept a Collection instead of a ResultSet.
I'm assuming you pass that ResultSet to a method that iterates over it. You might as well change the method to iterate over an ArrayList...

How do you query object collections in Java (Criteria/SQL-like)?

Suppose you have a collection of a few hundred in-memory objects and you need to query this List to return objects matching some SQL or Criteria like query. For example, you might have a List of Car objects and you want to return all cars made during the 1960s, with a license plate that starts with AZ, ordered by the name of the car model.
I know about JoSQL, has anyone used this, or have any experience with other/homegrown solutions?

Filtering is one way to do this, as discussed in other answers.
Filtering is not scalable though. On the surface time complexity would appear to be O(n) (i.e. already not scalable if the number of objects in the collection will grow), but actually because one or more tests need to be applied to each object depending on the query, time complexity more accurately is O(n t) where t is the number of tests to apply to each object.
So performance will degrade as additional objects are added to the collection, and/or as the number of tests in the query increases.
There is another way to do this, using indexing and set theory.
One approach is to build indexes on the fields within the objects stored in your collection and which you will subsequently test in your query.
Say you have a collection of Car objects and every Car object has a field color. Say your query is the equivalent of "SELECT * FROM cars WHERE Car.color = 'blue'". You could build an index on Car.color, which would basically look like this:
'blue' -> {Car{name=blue_car_1, color='blue'}, Car{name=blue_car_2, color='blue'}}
'red' -> {Car{name=red_car_1, color='red'}, Car{name=red_car_2, color='red'}}
Then given a query WHERE Car.color = 'blue', the set of blue cars could be retrieved in O(1) time complexity. If there were additional tests in your query, you could then test each car in that candidate set to check if it matched the remaining tests in your query. Since the candidate set is likely to be significantly smaller than the entire collection, time complexity is less than O(n) (in the engineering sense, see comments below). Performance does not degrade as much, when additional objects are added to the collection. But this is still not perfect, read on.
Another approach, is what I would refer to as a standing query index. To explain: with conventional iteration and filtering, the collection is iterated and every object is tested to see if it matches the query. So filtering is like running a query over a collection. A standing query index would be the other way around, where the collection is instead run over the query, but only once for each object in the collection, even though the collection could be queried any number of times.
A standing query index would be like registering a query with some sort of intelligent collection, such that as objects are added to and removed from the collection, the collection would automatically test each object against all of the standing queries which have been registered with it. If an object matches a standing query then the collection could add/remove it to/from a set dedicated to storing objects matching that query. Subsequently, objects matching any of the registered queries could be retrieved in O(1) time complexity.
The information above is taken from CQEngine (Collection Query Engine). This basically is a NoSQL query engine for retrieving objects from Java collections using SQL-like queries, without the overhead of iterating through the collection. It is built around the ideas above, plus some more. Disclaimer: I am the author. It's open source and in maven central. If you find it helpful please upvote this answer!

I have used Apache Commons JXPath in a production application. It allows you to apply XPath expressions to graphs of objects in Java.

yes, I know it's an old post, but technologies appear everyday and the answer will change in the time.
I think this is a good problem to solve it with LambdaJ. You can find it here:
http://code.google.com/p/lambdaj/
Here you have an example:
LOOK FOR ACTIVE CUSTOMERS // (Iterable version)
List<Customer> activeCustomers = new ArrayList<Customer>();
for (Customer customer : customers) {
if (customer.isActive()) {
activeCusomers.add(customer);
}
}
LambdaJ version
List<Customer> activeCustomers = select(customers,
having(on(Customer.class).isActive()));
Of course, having this kind of beauty impacts in the performance (a little... an average of 2 times), but can you find a more readable code?
It has many many features, another example could be sorting:
Sort Iterative
List<Person> sortedByAgePersons = new ArrayList<Person>(persons);
Collections.sort(sortedByAgePersons, new Comparator<Person>() {
public int compare(Person p1, Person p2) {
return Integer.valueOf(p1.getAge()).compareTo(p2.getAge());
}
});
Sort with lambda
List<Person> sortedByAgePersons = sort(persons, on(Person.class).getAge());
Update: after java 8 you can use out of the box lambda expressions, like:
List<Customer> activeCustomers = customers.stream()
.filter(Customer::isActive)
.collect(Collectors.toList());

Continuing the Comparator theme, you may also want to take a look at the Google Collections API. In particular, they have an interface called Predicate, which serves a similar role to Comparator, in that it is a simple interface that can be used by a filtering method, like Sets.filter. They include a whole bunch of composite predicate implementations, to do ANDs, ORs, etc.
Depending on the size of your data set, it may make more sense to use this approach than a SQL or external relational database approach.

If you need a single concrete match, you can have the class implement Comparator, then create a standalone object with all the hashed fields included and use it to return the index of the match. When you want to find more than one (potentially) object in the collection, you'll have to turn to a library like JoSQL (which has worked well in the trivial cases I've used it for).
In general, I tend to embed Derby into even my small applications, use Hibernate annotations to define my model classes and let Hibernate deal with caching schemes to keep everything fast.

I would use a Comparator that takes a range of years and license plate pattern as input parameters. Then just iterate through your collection and copy the objects that match. You'd likely end up making a whole package of custom Comparators with this approach.

The Comparator option is not bad, especially if you use anonymous classes (so as not to create redundant classes in the project), but eventually when you look at the flow of comparisons, it's pretty much just like looping over the entire collection yourself, specifying exactly the conditions for matching items:
if (Car car : cars) {
if (1959 < car.getYear() && 1970 > car.getYear() &&
car.getLicense().startsWith("AZ")) {
result.add(car);
}
}
Then there's the sorting... that might be a pain in the backside, but luckily there's class Collections and its sort methods, one of which receives a Comparator...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.