I need to calculate average occupancy for selected day of week (eg. all Fridays - for each minute). I didn't find any JPQL/Querydsl solution for this problem because of lack of Date/Time functions. So I'm trying to make use of Java Streams. My (simplified) object:
class Occupancy {
private LocalDateTime timeStamp;
private int occupied;
}
my repo:
#Query("select o from Occupancy o")
public Stream<Occupancy> streamAllOccupancies();
sample:
try ( Stream<Occupancy> stream = repository.streamAllOccupancies()) {
Function<Occupancy,LocalTime> OccupancyMinutesGrouping = (Occupancy o) -> {
return o.getDateTime().toLocalTime().truncatedTo(ChronoUnit.MINUTES);
};
Map<LocalTime,Double> avgMap = stream
.filter( o -> o.getDateTime().getDayOfWeek() == DayOfWeek.MONDAY) //example
.collect(
Collectors.groupingBy(
OccupancyMinutesGrouping,
Collectors.averagingInt(Occupancy::getOccupied)
)
);
}
It works - but is it possible change this map into list of my occupancy objects:
new Occupancy( localTime, averagedOccupancy );
I'm also worried about stream efficiency - it has to process all records from the database. How does the stream work with jpa repo? First SQL gets all the records - then stream processes it? Or are they processed sequentially on every record? Maybe the best solution is to use Native SQL query insted of Stream? Any ideas will be very helpful...
As for conversion to the List<Occupancy>, please note that occupied field is of int type while average could be non-integral. So I assume that the Occupancy class is defined this way:
class Occupancy {
private LocalDateTime timeStamp;
private double occupied;
public Occupancy(LocalDateTime ts, double occ) {
this.timeStamp = ts;
this.occupied = occ;
}
}
Now you can just create one more stream from the resulting map:
List<Occupancy> occupancies = avgMap.entrySet().stream()
.map(e -> new Occupancy(e.getKey(), e.getValue()))
.collect(Collectors.toList());
It seems that intermediate Map is unavoidable (at least if your stream is not already sorted by LocalTime).
As for memory usage: it depends on the underlying JDBC driver. The resulting stream indeed reads the underlying ResultSet row-by-row, but it's JDBC-specific how many rows are prebuffered at once. For example, it's known that MySQL driver by default retrieves complete ResultSet into the memory, so you may need some query hint like this:
#QueryHints(value = #QueryHint(name = HINT_FETCH_SIZE, value = "" + Integer.MIN_VALUE))
See this blog post for details.
Also note that if your JDBC driver actually fetches the data row-by-row from the server (without buffering), this actually might have worse performance as you may need more round-trips between DBMS and your application (this might be especially crucial if DBMS server is located on different machine). So consult your JDBC driver documentation for additional details.
I need to iterate 50k objects and change some fields in them.
I'm limited in memory so I don't want to bring all 50k objects into memory at once.
I thought doing it with the following code using cursor, but I was wondering whether all the objects I've processes using the cursor are left in the Entity Manager cache.
The reason I don't want to do it with offset and limit is because the database needs to work much harder since each page is a complete new query.
From previous experience once the Entity manager cache gets bigger, updates become real slow.
So usually I call flush and clear after every few hundreds of updates.
The problem here is that flushing / clearing will break the cursor.
I will be happy to learn the best approach of updating a large set of objects without loading them all into memory.
Additional information on how EclipseLink cursor works in such scenraio will be valuable too.
JpaQuery<T> jQuery = (JpaQuery<T>) query;
jQuery.setHint(QueryHints.RESULT_SET_TYPE, ResultSetType.ForwardOnly)
.setHint(QueryHints.SCROLLABLE_CURSOR, true);
Cursor cursor = jQuery.getResultCursor();
Iterator<MyObj> cursorIterator = cursor.iterator();
while (cursorIterator.hasNext()) {
MyObj myObj = cursorIterator.next();
ChangeMyObj(myObj);
}
cursor.close();
Use pagination + entityManager.clear() after each page. Also execute every page in a single transaction OR you will have to create/get a new EntityManager after an exception occurs (ar least with Hibernate: the EntityManager instance could be in an inconsistent state after an exception).
Try this sample code:
List results;
int index= 0;
int max = 100;
do {
Query query= manager.createQuery("JPQL QUERY");
query.setMaxResults(max).
setFirstResult(index);
results = query.getResultList( );
Iterator it = results.iterator( );
while (it.hasNext( )) {
Object c = (Object)it.next( );
}
entityManager.clear( );
index = index + results.getSize( );
} while (results.size( ) > 0);
I have the following Data in my DAO class
public List<Environment> fetchMiddlewareVersions() throws SQLException{
System.out.println("reached version");
Environment environment;
List<Environment> environments=new ArrayList<Environment>();
try{
connection=DBConnectionUtil.getConnection();
statement=connection.createStatement();
statement.setFetchSize(100);
preparedStatement=connection.prepareStatement("select * from middleware_version_details order by application,environment");
resultSet=preparedStatement.executeQuery();
while(resultSet.next())
{
environment = new Environment();
environment.setAppName(resultSet.getString("APPLICATION"));
environment.setHostName(resultSet.getString("HOSTNAME"));
environment.setSoftwareComponent(resultSet.getString("SOFTWARE_COMPONENT"));
environment.setVersion(resultSet.getString("VERSION"));
environment.setInstallPath(resultSet.getString("INSTALL_PATH"));
environment.setRemarks(resultSet.getString("REMARKS"));
environment.setEnvironmental(resultSet.getString("ENVIRONMENT"));
environments.add(environment);
}
}
By the time I get the entire data into JSP page, it has consumed 20-30 seconds already. How do I Increase speed of the Fetch. I tried DynaCache and it hasn't helped.
So barring any sort of connectivity issues, it almost always comes down to the number of records you're fetching. If you're fetching A TON of records, the method will not return until it has gone through each item and created an array object.
I would try adding a LIMIT and OFFSET clause to your SQL Statement to only retrieve records, say, 25 at a time. setFetchSize( int ) does not affect the number of overall records, only the number of records the underlying transport will fetch at a time from your sever. (also, move your SQL query into a static final variable:
private static final String SQL_FETCH_MIDDLEWARE_VERSION =
"SELECT * FROM middleware_version_details order by application, environment " +
"LIMIT = ? OFFSET = ?";
then set the limit and the offset in your prepared statement like so:
preparedStatement.setInt( 1, <RECORD COUNT> );
preparedStatement.setInt( 2, <RECORD START> );
Third, do you have an index on application and environment? If you do not and you will be constantly ordering, filtering and joining on those columns, you should add an index.
Fourth, and it's a minor point but one that I adhere to, is that doing resultSet.getString( "<COLUMN NAME>" ) will cause another call to go look up the column index. It's not usually a huge deal, but if you're trying to be as performant as possible, you should use the numeric index. You can do this by creating private static variables holding the index:
private static int INDEX_ENVIRONMENT = 6;
or you can use a counter and just insure that the columns are in the correct order, something like this:
while(resultSet.next())
{
int iC = 0;
environment = new Environment();
environment.setAppName(resultSet.getString( iC++ ));
environment.setHostName(resultSet.getString( iC++ ));
environment.setSoftwareComponent(resultSet.getString( iC++ ));
environment.setVersion(resultSet.getString( iC++ ));
environment.setInstallPath(resultSet.getString( iC++ ));
environment.setRemarks(resultSet.getString( iC++ ));
environment.setEnvironmental(resultSet.getString( iC++ ));
environments.add(environment);
}
Just insure that you're setting the variables in the correct order and it will be slightly more performant. I like this counter approach as well because it allows me to easily adapt to changing schemas.
I have an application that uses hibernate. At one part I am trying to retrieve documents. Each document has an account number. The model looks something like this:
private Long _id;
private String _acct;
private String _message;
private String _document;
private String _doctype;
private Date _review_date;
I then retrieve the documents with a document service. A portion of the code is here:
public List<Doc_table> getDocuments(int hours_, int dummyFlag_,List<String> accts) {
List<Doc_table> documents = new ArrayList<Doc_table>();
Session session = null;
Criteria criteria = null;
try {
// Lets create a previous Date by subtracting the number of
// subtractHours_ passed.
session = HibernateUtil.getSession();
session.beginTransaction();
if (accts == null) {
Calendar cutoffTime = Calendar.getInstance();
cutoffTime.add(Calendar.HOUR_OF_DAY, hours_);
criteria = session.createCriteria(Doc_table.class).add(
Restrictions.gt("dbcreate_date", cutoffTime.getTime()))
.add(Restrictions.eq("dummyflag", dummyFlag_));
} else
{ criteria = session.createCriteria(Doc_table.class).add(Restrictions.in("acct", accts));
}
documents = criteria.list();
for (int x = 0; x < documents.size(); x++) {
Doc_table document = documents.get(x);
......... more stuff here
}
This works great if I'm retrieving a small number of documents. But when the document size is large I get a heap space error, probably because the documents take up a lot of space and when you retrieve several thousand of them, bad things happen.
All I really want to do is retrieve each document that fits my criteria, grab the account number and return a list of account numbers (a far smaller object than a list of objects). If this were jdbc, I would know exactly what to do.
But in this case I'm stumped. I guess I'm looking for a way where I can bring just get the account numbers of the Doc_table object back.
Or alternatively, some way where I can retrieve documents one at a time from the database using hibernate that fit my criteria (instead of bringing back the whole List of objects which uses too much memory).
There are several ways to deal with the problem:
loading the docs in batches of an smaller size
(The way you noticed) not to query for the Document, but only for the account numbers:
List accts = session.createQuery("SELECT d._acct FROM Doc d WHERE ...");
or
List<String> accts = session.createCriteria(Doc.class).
setProjection(Projections.property("_acct")).
list();
When there is a special field in you Document class that contains the huge amount Document byte data, then you could map this special field as a Lazy loaded field.
Create a second entity class (read only) that contains only the fields that you need and map it to the same table
Instead of fetching all documents i.e, all records at once, try to limit the rows being fetched. Also, deploy a strategy where in you can store documents temporarily as flat files and fetch them later or delete after usage. Though its a bit long process,its efficient way of handling and delivering documents from database.
I know that the only really correct way to protect SQL queries against SQL injection in Java is using PreparedStatements.
However, such a statement requires that the basic structure (selected attributes, joined tables, the structure of the WHERE condition) will not vary.
I have here a JSP application that contains a search form with about a dozen fields. But the user does not have to fill in all of them - just the one he needs. Thus my WHERE condition is different every time.
What should I do to still prevent SQL injection?
Escape the user-supplied values? Write a wrapper class that builds a PreparedStatement each time? Or something else?
The database is PostgreSQL 8.4, but I would prefer a general solution.
Thanks a lot in advance.
Have you seen the JDBC NamedParameterJDBCTemplate ?
The NamedParameterJdbcTemplate class
adds support for programming JDBC
statements using named parameters (as
opposed to programming JDBC statements
using only classic placeholder ('?')
arguments.
You can do stuff like:
String sql = "select count(0) from T_ACTOR where first_name = :first_name";
SqlParameterSource namedParameters = new MapSqlParameterSource("first_name", firstName);
return namedParameterJdbcTemplate.queryForInt(sql, namedParameters);
and build your query string dynamically, and then build your SqlParameterSource similarly.
I think that fundamentally, this question is the same as the other questions that I referred to in my comment above, but I do see why you disagree — you're changing what's in your where clause based on what the user supplied.
That still isn't the same as using user-supplied data in the SQL query, though, which you definitely want to use PreparedStatement for. It's actually very similar to the standard problem of needing to use an in statement with PreparedStatement (e.g., where fieldName in (?, ?, ?) but you don't know in advance how many ? you'll need). You just need to build the query dynamically, and add the parameters dynamically, based on information the user supplied (but not directly including that information in the query).
Here's an example of what I mean:
// You'd have just the one instance of this map somewhere:
Map<String,String> fieldNameToColumnName = new HashMap<String,String>();
// You'd actually load these from configuration somewhere rather than hard-coding them
fieldNameToColumnName.put("title", "TITLE");
fieldNameToColumnName.put("firstname", "FNAME");
fieldNameToColumnName.put("lastname", "LNAME");
// ...etc.
// Then in a class somewhere that's used by the JSP, have the code that
// processes requests from users:
public AppropriateResultBean[] doSearch(Map<String,String> parameters)
throws SQLException, IllegalArgumentException
{
StringBuilder sql;
String columnName;
List<String> paramValues;
AppropriateResultBean[] rv;
// Start the SQL statement; again you'd probably load the prefix SQL
// from configuration somewhere rather than hard-coding it here.
sql = new StringBuilder(2000);
sql.append("select appropriate,fields from mytable where ");
// Loop through the given parameters.
// This loop assumes you don't need to preserve some sort of order
// in the params, but is easily adjusted if you do.
paramValues = new ArrayList<String>(parameters.size());
for (Map.Entry<String,String> entry : parameters.entrySet())
{
// Only process fields that aren't blank.
if (entry.getValue().length() > 0)
{
// Get the DB column name that corresponds to this form
// field name.
columnName = fieldNameToColumnName.get(entry.getKey());
// ^-- You'll probably need to prefix this with something, it's not likely to be part of this instance
if (columnName == null)
{
// Somehow, the user got an unknown field into the request
// and that got past the code calling us (perhaps the code
// calling us just used `request.getParameterMap` directly).
// We don't allow unknown fields.
throw new IllegalArgumentException(/* ... */);
}
if (paramValues.size() > 0)
{
sql.append("and ");
}
sql.append(columnName);
sql.append(" = ? ");
paramValues.add(entry.getValue());
}
}
// I'll assume no parameters is an invalid case, but you can adjust the
// below if that's not correct.
if (paramValues.size() == 0)
{
// My read of the problem being solved suggests this is not an
// exceptional condition (users frequently forget to fill things
// in), and so I'd use a flag value (null) for this case. But you
// might go with an exception (you'd know best), either way.
rv = null;
}
else
{
// Do the DB work (below)
rv = this.buildBeansFor(sql.toString(), paramValues);
}
// Done
return rv;
}
private AppropriateResultBean[] buildBeansFor(
String sql,
List<String> paramValues
)
throws SQLException
{
PreparedStatement ps = null;
Connection con = null;
int index;
AppropriateResultBean[] rv;
assert sql != null && sql.length() > 0);
assert paramValues != null && paramValues.size() > 0;
try
{
// Get a connection
con = /* ...however you get connections, whether it's JNDI or some conn pool or ... */;
// Prepare the statement
ps = con.prepareStatement(sql);
// Fill in the values
index = 0;
for (String value : paramValues)
{
ps.setString(++index, value);
}
// Execute the query
rs = ps.executeQuery();
/* ...loop through results, creating AppropriateResultBean instances
* and filling in your array/list/whatever...
*/
rv = /* ...convert the result to what we'll return */;
// Close the DB resources (you probably have utility code for this)
rs.close();
rs = null;
ps.close();
ps = null;
con.close(); // ...assuming pool overrides `close` and expects it to mean "release back to pool", most good pools do
con = null;
// Done
return rv;
}
finally
{
/* If `rs`, `ps`, or `con` is !null, we're processing an exception.
* Clean up the DB resources *without* allowing any exception to be
* thrown, as we don't want to hide the original exception.
*/
}
}
Note how we use information the user supplied us (the fields they filled in), but we didn't ever put anything they actually supplied directly in the SQL we executed, we always ran it through PreparedStatement.
The best solution is to use a middle that does data validation and binding and acts as an intermediary between the JSP and the database.
There might be a list of column names, but it's finite and countable. Let the JSP worry about making the user's selection known to the middle tier; let the middle tier bind and validate before sending it on to the database.
Here is a useful technique for this particular case, where you have a number of clauses in your WHERE but you don't know in advance which ones you need to apply.
Will your user search by title?
select id, title, author from book where title = :title
Or by author?
select id, title, author from book where author = :author
Or both?
select id, title, author from book where title = :title and author = :author
Bad enough with only 2 fields. The number of combinations (and therefore of distinct PreparedStatements) goes up exponentially with the number of conditions. True, chances are you have enough room in your PreparedStatement pool for all those combinations, and to build the clauses programatically in Java, you just need one if branch per condition. Still, it's not that pretty.
You can fix this in a neat way by simply composing a SELECT that looks the same regardless of whether each individual condition is needed.
I hardly need mention that you use a PreparedStatement as suggested by the other answers, and a NamedParameterJdbcTemplate is nice if you're using Spring.
Here it is:
select id, title, author
from book
where coalesce(:title, title) = title
and coalesce(:author, author) = author
Then you supply NULL for each unused condition. coalesce() is a function that returns its first non-null argument. Thus if you pass NULL for :title, the first clause is where coalesce(NULL, title) = title which evaluates to where title = title which, being always true, has no effect on the results.
Depending on how the optimiser handles such queries, you may take a performance hit. But probably not in a modern database.
(Though similar, this problem is not the same as the IN (?, ?, ?) clause problem where you don't know the number of values in the list, since here you do have a fixed number of possible clauses and you just need to activate/disactivate them individually.)
I'm not confident if there is a quote() method, which was widely used in PHP's PDO. This would allow you a more flexible query building approach.
Also, one of the possible ideas could be creating special class, which would process filter criterias and would save into a stack all placeholders and their values.