I am writing an integration test between my JPA layer and the database to check the SQL I've written is correct. The real database is Oracle, unfortunately down to reasons out my control my test database has to be Derby so naturally there are some differences. For example my JPA class has the following SQL String constant
private static final String QUERY = "Select * from Users where regexp_like(user_code, '^SS(B)?N')";
Because Derby doesn't support regexp_like I am using JMockits Deencapsulation.setField to change the SQL on the fly. eg.
#Test
public void testMyDaoFind() {
new Expectations() {
{
Deencapsulation.setField(MyClass.class, "QUERY", "Select * from Users");
}
};
dao.findUsers();
}
Now ignoring the fact that this isn't a good test as it's not testing the actual query that will be running on the real database (this is purely to satisfy my curiousity as to what is going on), I am getting a SQL exception error from Eclipselink/Derby complaining about regexp_like is not recognized as a function or a procedure.
If I place a break point on the line in the DAO that attempts to get the result list, I can see from a new watch that
JMockit has substituted the query correctly
getResultList() returns the data I'm expecting to see
If however I let the test run all the way through then I get the afformentioned exception?!
Strings in Java are not handled the way you are thinking. The Java source compiler replaces reads from fields holding string literals with the fixed "address" where the string is stored (in the class' constant pool); the field is not read anymore at runtime. So, even if JMockit replaces the string reference stored in the field, it makes no difference as that reference isn't seen by the client code using the field.
(BTW, why is the test putting the call to Deencapsulatin.setField inside an expectation block? Such blocks are only meant for recording expectations...)
Bottom line, there is no way to achieve what you're trying to. Instead, either use an Oracle database for integration testing, or make all SQL code portable, avoiding RDBMS-specific functions such as regexp_like.
Related
Here's a little bit of background to give you some context to my question.
I am working on a project with a local company through my college, and the program that I am building must interact with a database and perform basic CRUD operations. The company is insisting that my code be unit tested and that I mock the connection to the database when performing unit tests. I have tried testing my code on a separate database but was told that doing this is an implementation test, not a unit test.
I have written a java class that contains methods which simply call and execute other java.sql methods, such as createStatement() and executeUpdate(...), so that instead of writing four or five lines of code to interact with the database, I can just call another piece of code to automate it slightly for me. Here is an example of one of the methods within my class:
public boolean insertIntoTable(Connection connection, IQueryBuilder queryBuilder, String tableName, String[] dbFields, String[] dbFieldsValues) {
String query = queryBuilder.insertIntoStatement(tableName, dbFields, dbFieldsValues);
try {
Statement st = connection.createStatement();
st.executeUpdate(query);
st.close();
connection.close();
return true;
} catch (SQLException exception) {
return false;
}
}
The above insertIntoTable method only relies on two other pieces of code: Connection and IQueryBuilder. It returns true if all of the lines within the try execute without fail, and false otherwise.
Connection is an interface from the java.sql package, so we know that its method calls and implementations should work properly.
IQueryBuilder is another interface of mine that returns formatted SQL Strings which are intended to be be used by the executeUpdate method of the Statement interface. The implementation of IQueryBuilder that I will be using for this method has been unit tested and approved by the company, so I am assuming the passed implementation at this point functions properly as well.
Thus, we reach my question. How do I unit test something like insertIntoTable that doesn't necessarily have any business logic and must use a mock connection?
Furthermore, what exactly am I testing here? That the method returns true or false? I mock the connection to the database all I want, no problem. However, I feel that if I mock the connection to the database I'm not really testing anything, since there's no way to know if my code truly worked or not.
Abstract:
An application I work on uses top link, I'm having trouble finding out if and when top link automatically uses bind variables.
Problem Description:
Lets say I need to do something akin to validating if a vehicle full of people can travel somewhere, where each person could invalidate the trip, and provide error messages so the person could get their restrictions removed before the trip starts. A simple way to do that is to validate each member in the list, and display a list of errors. Lets say their info is stored on an oracle database and I query for each riders info using their unique ids, This query will be executed for each member in the list. A naïve implementation would cause a hard parse, a new execution path, despite only the unique id changing.
I've been reading about bind variables in sql, and how they allow for reuse of an execution path, avoiding cpu intensive hard parses.
A couple links on them are:
http://www.akadia.com/services/ora_bind_variables.html
https://oracle-base.com/articles/misc/literals-substitution-variables-and-bind-variables
An application I work on uses toplink and does something similar to the situation described above. I'm looking to make the validation faster, without changing the implementation much.
If I do something like the following:
Pseudo-code
public class userValidator{
private static DataReadQuery GET_USER_INFO;
static{
GET_USER_INFO = "select * from schema.userInfo ui where ui.id= #accountId"
GET_USER_INFO.bindAllParameters();
GET_USER_INFO.cacheStatement();
GET_USER_INFO.addArgument("accountId", String.class);
}
void validate(){
List<String> listOfUserAccountIds = getuserAccountIdList();
list args;
for(String userAccountId: listOfUserAccountIds){
args = new ArrayList(1);
args.add(userAccountId)
doSomethingWithInfo(getUnitOfWork().executequery(GET_USER_INFO, args);
}
}
}
The Question:
Will a new execution path be parsed for each execution of GET_USER_INFO?
What I have found:
If I understand the bindAllParameters function inside of the DatabaseQuery class well enough, it simple is a type validation to stop sql injection attacks.
There is also a shouldPrepare function inside the same class, however that seems to have to do more with allowing dynamic sql usage where the number of arguments is variable. A prepared DatabaseQuery has its sql written once with just the values of the variables changing based on the argument list passed in, which sounds like simple substitution and not bind variables.
So I'm at a lost.
This seems answered by the TopLink documentation
By default, TopLink enables parameterized SQL but not prepared
statement caching.
So prepared statements are used by default, just not cached. This means subsequent queries will have the added cost of re-preparing statements if not optimized by the driver. See this for more information on optimizations within TopLink
I'm completely new to Spring, JPA and Hibernate, so apologies if I do not phase this question correctly.
I have been given an application using these frameworks that has multiple performance issues.
I have done numerous searches to no avail - probably due to my inability to phrase my question correctly.
The scenario is (in a nutshell) that when one sql statement is executed via the EntityManager, it in turn invokes a related query per row of the result set.
In my case the one sql statement could cause hundreds (and in some cases thousands) of related sql statements to be invoked - killing our server.
A contrived example, could be:
public class Job
{
public long jobId;
public String name;
public List<Transaction> transactions;
}
public class Transaction
{
public long transactionId
public List<Stock> stocks;
}
public class Stock
{
public long stockId;
public String name;
}
The code uses a statement like:
String jpaQuery = "select distinct j from Job j where j.jobId = :jobId order by name asc";
Query query = entityManager.createQuery(jpaQuery);
//set parameters...
//This will return a List<Job>, containing lists of Transaction and Stock objects
return query.getResultList();
The execution of this single sql statement results in multiple sql statements being executed (which can be viewed in the tomcat log / eclipse console), so they are being invoked by the code/framework.
EntityManager has not been extended, so the framework(s) are doing the work.
Now my question - thanks for reading this far...
Within Spring, JPA and Hibernate, what would be the best way to approach this so one call to the database returns the required objects without multiple requests to the database?
My thought of an approach is to have the single sql statement to invoke a stored procedure, that could either return multiple result sets or just one result set containing data for all related objects and have the framework do the rest (instantiate the relevant objects).
What I have no idea on is how to approach this within the Spring / JPA / Hibernate environment.
Could someone please point me to resources that would give me examples/ideas on how to approach this? Or keywords to search for?
Version:
Spring: 3.0.6.Release
Hibernate: 3.5.0-Beta-2
Thanks very much
Steve
First, this is not an SQL statement, it is a JPQL statement, which is a huge difference (not syntax wise, but logic wise).
Second, the reason for the multiple SQL statements is the one-to-many relationshop from job to transaction to stock (it's called N+1 problem). Depending on your use case you could change the relationships to lazy loading. This means, they are only loaded when you need the objects. However, this can cause other problems, if it is not handled correctly. You could also try to use Join Fetching. This will create a single query with a join, instead if issueing one subselect per element in the collection.
I'm trying to use setDistinct(true) as it is described in the guide: http://mybatis.github.io/generator/generatedobjects/exampleClassUsage.html
I've written in this way:
testExample ae = new testExample();
testExample.Criteria criteriatest = ae.createCriteria();
ae.setDistinct(true);
criteriatest.andIDENTLAVEqualTo(Long.parseLong(cert.getCODINDIVID()));
ae.or(criteriatest);
List<test> listtest = testMapper.selectByExample(ae);
but the setDistinct(true) doesn't affect the results.
Where should I add the setDistinct line?
It looks like the link you referenced is for an extremely old version of MyBatis. On that page, it lists the following:
Version: 1.3.3-SNAPSHOT
The latest version is:
mybatis-3.3.0-SNAPSHOT
Grepping the 3.x code for setDistinct does not return anything:
https://github.com/mybatis/mybatis-3/search?q=setDistinct
I'm surprised you don't get a compile-time error about the method not being found. Are you using version 1.3.3 (or 1.x)?
I would recommend doing the DISTINCT right in the query. Since MyBatis is generally a sort of a close-to-the-SQL-metal type of mapping framework, I think it's best to add it in the mapper file's query itself. Plus that way, you can choose specifically what to DISTINCT by. The setDistinct method does not seem to provide any way to specify the target.
For MyBatis 3, I think the analogous style of query would be this:
http://mybatis.github.io/mybatis-3/statement-builders.html
This seems to be analogous to a jOOQ-style DSL. It has a SELECT_DISTINCT method. I personally find it easier to code/read the pure SQL with some XML markup as needed for dynamic SQL in a mapper file, but this is certainly a viable option in MyBatis 3.
Edit:
So, I did some more digging, and the reason I couldn't find the code in the MyBatis3 git repo is because setDistinct is in the mybatis-generator code base.
I think part of the issue here may stem from what is part of Mybatis-Generator's description on GitHub:
MBG seeks to make a major impact on the large percentage of database
operations that are simple CRUD (Create, Retrieve, Update, Delete).
So, it provides a way to do simple DISTINCTs, but with limited control.
The code resides in the addClassElements method of the ProviderSelectByExampleWithoutBLOBsMethodGenerator class. Searching for setDistinct won't show up on a Github search since it's an automatically generated setter.
This is the relevant code snippet:
boolean distinctCheck = true;
for (IntrospectedColumn introspectedColumn : getColumns()) {
if (distinctCheck) {
method.addBodyLine("if (example != null && example.isDistinct()) {"); //$NON-NLS-1$
method.addBodyLine(String.format("%sSELECT_DISTINCT(\"%s\");", //$NON-NLS-1$
builderPrefix,
escapeStringForJava(getSelectListPhrase(introspectedColumn))));
method.addBodyLine("} else {"); //$NON-NLS-1$
method.addBodyLine(String.format("%sSELECT(\"%s\");", //$NON-NLS-1$
builderPrefix,
escapeStringForJava(getSelectListPhrase(introspectedColumn))));
method.addBodyLine("}"); //$NON-NLS-1$
} else {
method.addBodyLine(String.format("%sSELECT(\"%s\");", //$NON-NLS-1$
builderPrefix,
escapeStringForJava(getSelectListPhrase(introspectedColumn))));
}
distinctCheck = false;
}
So, essentially, this looks like it's wrapping the SELECT_DISTINCT method I mentioned originally, and it attempts to introspect the columns and apply the DISTINCT to all of the ones it gets back.
Digging a bit deeper, it ultimately calls this code to get the columns:
/**
* Returns all columns in the table (for use by the select by primary key
* and select by example with BLOBs methods)
*
* #return a List of ColumnDefinition objects for all columns in the table
*/
public List<IntrospectedColumn> getAllColumns() {
List<IntrospectedColumn> answer = new ArrayList<IntrospectedColumn>();
answer.addAll(primaryKeyColumns);
answer.addAll(baseColumns);
answer.addAll(blobColumns);
return answer;
}
So, this is definitely essentially an all-or-nothing DISTINCT (whereas Postgres itself allows DISTINCT on just certain columns).
Try moving the setDistinct to the very last line before you actually invoke the ae object. Perhaps subsequent calls are affecting the column set (although from the code, it doesn't seem like it should -- basically once the columns are set, the setDistinct should use them).
The other thing that would be interesting would be to see what SQL it is actually generating with and without setDistinct.
Check this link out for more detail on debug/logging:
http://mybatis.github.io/generator/reference/logging.html
I'd recommend perhaps trying out the XML-based mapper file definitions which interleave SQL with XML tags for dynamic-ness. IMO, it's much easier to follow than the code Mybatis Generator code snippet above. I suppose that's one of the main tradeoffs with a generator -- easier to create initially, but more difficult to read/maintain later.
For super-dynamic queries, I could see some more advantages, but then that sort of goes against their self-description of it being for simple CRUD operations.
Background: I have started a project using JDBC and MYSQL to simulate a bookstore, all local. To connect to the database, I started out using a Statement but I began to read that when using a query multiple times that just changes its parameters, it can be more efficient to use a PreparedStatement for those queries. However, the thing advantage I read the most about was how PreparedStatements could prevent SQL injection much better.
Sources:
Answers on this thread here
Google
Professors
My Question:
How do PreparedStatements prevent SQL injection better, or even different for that matter, than Statements when dealing with parametrized queries? I am confused because, if I understand correctly, the values still get passed into the SQL statement that gets executed, it's just up to the the programmer to sanitize the inputs.
You're right that you could do all the sanitation yourself, and thus be safe from injection. But this is more error-prone, and thus less safe. In other words, doing it yourself introduces more chances for bugs that could lead to injection vulnerabilities.
One problem is that escaping rules could vary from DB to DB. For instance, standard SQL only allows string literals in single quotes ('foo'), so your sanitation might only escape those; but MySQL allows string literals in double quotes ("foo"), and if you don't sanitize those as well, you'll have an injection attack if you use MySQL.
If you use PreparedStatement, the implementation for that interface is provided by the appropriate JDBC Driver, and that implementation is responsible for escaping your input. This means that the sanitization code is written by the people who wrote the JDBC driver as a whole, and those people presumably know the ins and outs of the DB's specific escaping rules. They've also most likely tested those escaping rules more thoroughly than you'd test your hand-rolled escaping function.
So, if you write preparedStatement.setString(1, name), the implementation for that method (again, written by the JDBC driver folks for the DB you're using) could be roughly like:
public void setString(int idx, String value) {
String sanitized = ourPrivateSanitizeMethod(value);
internalSetString(idx, value);
}
(Keep in mind that the above code is an extremely rough sketch; a lot of JDBC drivers actually handle it quite differently, but the principle is basically the same.)
Another problem is that it could be non-obvious whether myUserInputVar has been sanitized or not. Take the following snippet:
private void updateUser(int name, String id) throws SQLException {
myStat.executeUpdate("UPDATE user SET name=" + name + " WHERE id=" + id);
}
Is that safe? You don't know, because there's nothing in the code to indicate whether name is sanitized or not. And you can't just re-sanitize "to be on the safe side", because that would change the input (e.g., hello ' world would become hello '' world). On the other hand, a prepared statement of UPDATE user SET name=? WHERE id=? is always safe, because the PreparedStatement's implementation escapes the inputs before it plugs values into the ?.
When using a PreparedStatement the way it is meant to be used - with a fixed query text with parameter placeholders, no concatenation of external values -, then you are protected against SQL Injection.
There are roughly two ways this protection works:
The JDBC driver properly escapes the values and inserts them in the query at the placeholder positions, and sends the finished query to the server (AFAIK only MySQL Connector/J does this, and only with useServerPrepStmts=false which is the default).
The JDBC driver sends the query text (with placeholders) to the server, the server prepares the query and sends back a description of the parameters (eg type and length). The JDBC driver then collects the parameter values and sends these as a block of parameter values to the server. The server then executes the prepared query using those parameter values.
Given the way a query is prepared and executed by the server, SQL injection cannot occur at this point (unless of course you execute a stored procedure, and that stored procedure creates a query dynamically by concatenation).
The framework , Sql driver makes sure to escape the input. If you use string Statements and escape properly - will achieve same result. But that is not recommended as Preparend statements seem like more lines of code but lead to more structured code as well. Instead of a soup of long sql lines.
Plus since we set each parameter separately and explicitly the underlying driver class can escape them correctly depending on the data base in use. Meaning you could change the data base by config, but no matter the driver takes care of escaping. So one data base might need slashes escaped and another might want two single quotes ...
This also leads to less code as you do not need to bother about this. Simply put you let the framework / common classes one level below the app code take care of it.