Cassandra Prepared Statements broken after schema migration - java

I'm preparing statements in the constructor of my repository class, like this
PreparedStatement getStatement = cqlSession.prepare(selectFrom("the_table")
.all()
.whereColumn("the_key").isEqualTo(bindMarker())
.build());
I later bind it to a BoundStatement and read the results like this
String someColumn = row.isNull("some_column") ? null : row.getString("some_column")
During runtime I run an alter on the table, adding a column. Something I expected would work since I query for columns only by name. However, it seems that the driver internally has mapped the column names to column indexes which are now all broken.
Is this expected? It severely limits what can be done in run time.
Am I missing something that would have forced a re-prepare of a statement on schema change?
Can I intercept it and re-prepare manually?
I'm running spring boot 2.5.9 and the driver version 4.11.3
The cassandra cluster runs version 3.11.10

thanks for the question!
Your symptoms sound similar to the behaviours described in CASSANDRA-10786. If you're dealing with a version before Cassandra 4.0 this could very well be what's going on.
Does the description in that issue match up with what you're seeing?

To add to #absurdfarce's answer, Cassandra 3.7 is a very old release. In fact, it was released all the way back to early 2016.
There were several important fixes to prepared statements since then. Although a quick browse of those fixes don't seem to be directly related to the issue you reported, they are significant nonetheless.
Additionally there were several fixes to schema migration/propagation in the last 6 years. Again, they don't directly relate to your problem and I'm ordinarily loathe to push for an upgrade but I think 6 years' worth of fixes between C* 3.7 to 3.11.latest merits it. Cheers!

Related

Does deleting from a table of a h2 database handled by Hibernate corrupts the table?

Here a quick description of the system:
A java 7 REST client receives jsons and write their parsed content into an h2 database via Hibernate.
Some Pentaho Kettle Spoon 4 ETLs directly connect to the same database to read and delete a lot of entries at once.
This solution worked fine in our test environment, but in production (where the traffic is really higher because of course it is) the ETLs are often failing with the following error
Error inserting/updating row
General error: "java.lang.ArrayIndexOutOfBoundsException: -1"; SQL statement:
DELETE FROM TABLE_A
WHERE COLUMN_A < ? [50000-131]
and if I navigate the database I can indeed see that that table is not readable (apparently because it thinks its lenght is -1?). The error code 50000 is for "Generic" so is no use.
Apart from the trivial "maybe h2 is not good for an Event Handler", I've been thinking that the corruption could possible be caused by a confict between Kettle and Hibernate, or in other words that no one should delete from an Hibernate handled database without him knowing.
My questions to those more experienced then me with Hibernate are:
Is my sopposition correct?
Should I re-design my solution to also use the same restful Hibernate to perform deletes?
Should I resign using h2 for such a system?
Thanks for the help!
EDIT:
The database is created by a simple sh script that runs the following command that basically uses the provided Shell tool to connect to a non existing db which by defalts creates it.
$JAVA_HOME/bin/java -cp *thisIsAPath*/h2database/h2/main/h2-1.3.168-redhat-2.jar org.h2.tools.Shell -user $DB_USER -password $DB_PASSWORD -url jdbc:h2:$DB_FOLDER/Temp_SD_DS_EventAgent<<END
So all its parameters are set to version 1.3.168's defaults. Unfortunately while I can find the current URL setting I can't find where to look for that version's defauts and experimentals.
I also found the followings:
According to the tutorial When using Hibernate, try to use the H2Dialect if possible. which I didn't.
The tutorial also says Please note MVCC is enabled in version 1.4.x by default, when using the MVStore. Does that mean cuncurrency is disabled/unsupported by default in this older case and this is the problem?
The database is created with h2 version 1.3.168 but the consumer uses 1.4.197. Is this a big deal?
I cannot comment on the credibility of h2 db.
But from application perspective, I think you should use locking mechanism - Optimistic or Pessimistic lock. This will avoid the conflict situations. Hope this answer helps to point in correct direction
Article on Optimistic and Pessimistic locking

Can I keep iBatis and Mybatis in the same application while switching to myBatis?

The question is the one of the title. There follows a brief explanation.
I have an application which uses iBatis 2 and I would like to migrate to the latest version of myBatis (3.2.0 at the momento of writing). Since I don't have enough time to start and finish the work without having to do other tasks on that application, and considering that creating a branch would require a painful merge at the end, I was wondering if I can introduce myBatis and then meet my goal gradually. In the end iBatis would be removed entirely.
Can I encounter some conflicts on the way? In other words, can iBatis 2.3 and MyBatis 3.2 live together? Maybe some of you faced the same problem.
I think the migration process is not very complicated at all, it is a task you can achieve in a few hours.
Most of the work is in changing packages names. Take a look to this doc.
Anyway, since iBatis and myBatis use different packages should not be any problem to work with them at the same time.

Hibernate. ClassicQueryTranslatorFactory vs ASTQueryTranslatorFactory

What's the difference between those query translators (I mean differences for me as a Hibernate user). Some blogs on the internet say that ANTLR-based translator is faster. But I deem that if one of them was clearly better, then Hibernate developers would remove the other one. So.. what's the difference and why do we have both of them? In what situations should I choose first or second? In what situations I shouldn't choose one of translators?
It is an internal hibernate configuration; which got implemented when it got upgraded to version 3. You should not be worried about changing it until unless there is any strong reason for it. Also with the latest versions I think you need to change its default value. But if you want you can test it for performance improvement as told below.
From the Hibernate Core Migration Guide : 3.0;
Query Language Changes
New Parser - Hibernate3 comes with a brand-new, ANTLR-based HQL/SQL query translator. However, the Hibernate 2.1 query parser is still available. The query parser may be selected by setting the Hibernate property hibernate.query.factory_class. The possible values are org.hibernate.hql.ast.ASTQueryTranslatorFactory, for the new query parser, and org.hibernate.hql.classic.ClassicQueryTranslatorFactory, for the old parser. We are working hard to make the new query parser support all queries allowed by Hibernate 2.1.
However, we expect that many existing applications will need to use the Hibernate 2.1 parser during the migration phase. The Hibernate 1.x syntax "from f in class bar.Foo" is no longer supported, use "from bar.Foo as f" or "from bar.Foo f". Don't use dots in named HQL parameter names. Note: there is a known bug affecting dialects with theta-style outer joins (eg. OracleDialect for Oracle 8i, TimesTen dialect, Sybase11Dialect). Try to use a dialect which supports ANSI-style joins (eg. Oracle9Dialect), or fall back to the old query parser if you experience problems.
Here is Forum post and a blog post regarding this issue.
Now coming to your questions;
what's the difference and why do we have both of them?
As told in the change log, hibernate 3 replaces the ClassicQueryTranslatorFactory with ASTQueryTranslatorFactory. It is an internal change and the users need not be wooried about it until the change breaks your application.
In what situations should I choose first or second?In what situations I shouldn't choose one of translators?
By default ASTQueryTranslatorFactory is enabled, you should consider changing it only if any of your queries break while upgrading to version 3.
Once again, it a story of the past(2006 or so); the latest version of hibernate is 4.1 and the query translator must be stable by now. So 99% you do not have to change any thing.

How to diagnose performance problems with SQL Server Views and JDBC

I have a view defined in SQL server 2008 that joins 4 tables together. Executing this view in SQL Server Management Studio takes roughly 3 seconds to run and returns about 45,000 records. My application is written in Java using hibernate to simply do a "from MyViewObject" query in HQL. When this is run, the execution time is consistently around 45 seconds. I have also tried simply using JDBC to run this query and received the same level of performance, so I've assumed it has nothing to do with hibernate.
My question: What can I do to diagnose this problem? There is obviously something different between how Management Studio is running the query vs how my application is running the query but I have not been able to come up with much.
The only thing I've come up with as a potentially viable explanation is an issue with the jtds library that contains the driver for SQL Server in Java.
Any guidance here would be greatly appreciated.
UPDATE
I went back to trying pure JDBC and tried adding the selectMethod and responseBuffering attributes to my connection string but didn't get any improvements. I also took my JDBC code from my application and ran it from a test program containing nothing but my JDBC code and it ran in the expected 3 seconds. So to me this seems environmental for the application.
My application is a Google Web Toolkit(GWT) based app, and the JDBC code is being run in my primary RPC Servlet. Essentially, the RPC method receives the call and immediately executes the JDBC code. Nothing in this setup gives me much indication of why the performance is terrible though. I am going to try the JDBC 3.0 driver and see if that works any better, but it doesn't feel like that will fix the issue to me quite yet.
My goal for the moment is to get my query working live with JDBC and then switch it back over to Hibernate so I can keep the testing simple enough. Thanks for the help so far!
UPDATE 2
I'm finally starting to zero in on the source of the problem, though still no idea what the actual issue is. I opened up the view in SQL Server and copied the SQL statement (rather large) exactly into my code and executed it using JDBC instead of pulling the data from the view and most of the performance issues are gone. It seems that some combination of GWT, SQL Server Views and JDBC is not working properly here. I don't see keeping a very large hand-written query in my code as a long term solution, but it does offer a bit more insight.
<property name="hibernate.show_sql">true</property>
setting this will show you the SQL query generated by hibernate. Analyze the query and make sure you are not missing a relationship.
reply for Update 1 and 2:
Like you mentioned, ran the query on your sql query and it seems like it is fast. So another thing to remember about hibernate is that it creates the object that is returned by your query (of course this depends if you initialize lazy obj. Dont remember what it is called). How many objects does your query return? also you can do a simple bench on where the issue is.
For example, before running the query, sysout the current time and then sysout the current time after. do these for all the places that you suspect is slowing your application down.
To analyze the problem you should look up you manual for tools that display the query or execution plan. Maybe you're missing an index on a join column.

Large ResultSet on postgresql query

I'm running a query against a table in a postgresql database. The database is on a remote machine. The table has around 30 sub-tables using postgresql partitioning capability.
The query will return a large result set, something around 1.8 million rows.
In my code I use spring jdbc support, method JdbcTemplate.query, but my RowCallbackHandler is not being called.
My best guess is that the postgresql jdbc driver (I use version 8.3-603.jdbc4) is accumulating the result in memory before calling my code. I thought the fetchSize configuration could control this, but I tried it and nothing changes. I did this as postgresql manual recomended.
This query worked fine when I used Oracle XE. But I'm trying to migrate to postgresql because of the partitioning feature, which is not available in Oracle XE.
My environment:
Postgresql 8.3
Windows Server 2008 Enterprise 64-bit
JRE 1.6 64-bit
Spring 2.5.6
Postgresql JDBC Driver 8.3-603
In order to use a cursor to retrieve data you have to set the ResultSet type of ResultSet.TYPE_FORWARD_ONLY (the default) and autocommit to false in addition to setting a fetch size. That is referenced in the doc you linked to but you didn't explicitly mention that you did those steps.
Be careful with PostgreSQL's partitioning scheme. It really does very horrible things with the optimizer and can cause massive performance issues where there should not be (depending on specifics of your data). In any case, is your row only 1.8M rows? There is no reason that it would need to be partitioned based on size alone given that it is appropriately indexed.
I'm betting that there's not a single client of your app that needs 1.8M rows all at the same time. You should think of a sensible way to chunk the results into smaller pieces and give users the chance to iterate through them.
That's what Google does. When you do a search there might be millions of hits, but they return 25 pages at a time with the idea that you'll find what you want in the first page.
If it's not a client, and the results are being massaged in some way, I'd recommend letting the database crunch all those rows and simply return the result. It makes no sense to return 1.8M rows just to do a calculation on the middle tier.
If neither of those apply, you've got a real problem. Time to rethink it.
After reading the later responses it sounds to me like this is more of a reporting solution that ought to be crunched in batch or calculated in real time and stored in tables that are not part of your transactional system. There's no way that bringing 1.8M rows to the middle tier for calculating moving averages can scale.
I'd recommend reorienting yourself - start thinking about it as a reporting solution.
The fetchSize property worked as described at postgres manual.
My mistake was that I was setting auto commit = false to a connection from a connection pool that was not the connection being used by the prepared statement.
Thanks for all the feedback.
I did everything above, but I needed one last piece: be sure the call is wrapped in a transaction and set the transaction to read only, so that no rollback state is required.
I added this: #Transactional(readOnly = true)
Cheers.

Categories

Resources