JPA Deadlock when delete from a table in several parallel threads

JPA Deadlock when delete from a table in several parallel threads - java

I have several Threads, with different Transactions and EntityManagers, that must refresh events of an object. To refresh these events, first I delete the old ones to after persist the new. With one thread it works good, but with several it occurs dead lock when deleting the event.
All Threads are deleting different objects, and sometimes, in different tables. So why this competition for resources occurring? I'm using the primary key to JPA block the right object. I looked if there is other code that are also using the resource, but I didn't found it. Is JPA locking the whole table instead of the row?
Exception in thread "Thread-4" javax.persistence.PersistenceException: Exception [EclipseLink-4002] (Eclipse Persistence Services - 2.6.0.v20150309-bf26070): org.eclipse.persistence.exceptions.DatabaseException
Internal Exception: java.sql.SQLException: ORA-00060: deadlock detected while waiting for resource
Error Code: 60
Call: DELETE FROM event WHERE ((id = ?) AND (version = ?))
bind => [426687, 1]
Query: DeleteObjectQuery(Event[id=426687,tipo=BDE,status=1,data=java.util.GregorianCalendar[time=1431489600000,areFieldsSet=true,areAllFieldsSet=true,lenient=true,zone=sun.util.calendar.ZoneInfo[id="America/New_York",offset=-18000000,dstSavings=3600000,useDaylight=true,transitions=235,lastRule=java.util.SimpleTimeZone[id=America/New_York,offset=-18000000,dstSavings=3600000,useDaylight=true,startYear=0,startMode=3,startMonth=2,startDay=8,startDayOfWeek=1,startTime=7200000,startTimeMode=0,endMode=3,endMonth=10,endDay=1,endDayOfWeek=1,endTim
Here is the OOracle Trace File.
*** 2015-07-05 15:21:02.351
DEADLOCK DETECTED ( ORA-00060 )
[Transaction Deadlock]
The following deadlock is not an ORACLE error. It is a
deadlock due to user error in the design of an application
or from issuing incorrect ad-hoc SQL. The following
information may aid in determining the deadlock:
Deadlock graph:
---------Blocker(s)-------- ---------Waiter(s)---------
Resource Name process session holds waits process session holds waits
TM-00007874-00000000 44 29 SX 51 185 SX SSX
TX-00020021-00000a5f 51 185 X 44 29 X
session 29: DID 0001-002C-0000000D session 185: DID 0001-0033-00000004
session 185: DID 0001-0033-00000004 session 29: DID 0001-002C-0000000D
Rows waited on:
Session 29: obj - rowid = 00007874 - AAAHh0AABAAAO2fAAX
(dictionary objn - 30836, file - 1, block - 60831, slot - 23)
Session 185: no row
----- Information for the OTHER waiting sessions -----
Session 185:
sid: 185 ser: 3883 audsid: 422763 user: 55/LUPAZUL_DEV
flags: (0x41) USR/- flags_idl: (0x1) BSY/-/-/-/-/-
flags2: (0x40009) -/-
pid: 51 O/S info: user: oracle, term: UNKNOWN, ospid: 9977
image: oracle#sydney-oracle11gexpress
client details:
O/S info: user: Pickler, term: unknown, ospid: 1234
machine: MacBook program: JDBC Thin Client
application name: JDBC Thin Client, hash value=2546894660
current SQL:
DELETE FROM correios_event WHERE ((id = :1 ) AND (version = :2 ))
----- End of information for the OTHER waiting sessions -----
Information for THIS session:
----- Current SQL Statement for this session (sql_id=fcrp8hfyatd79) -----
DELETE FROM correios_destiny WHERE ((id = :1 ) AND (version = :2 ))

I solved the bug.
I had two JPA entities with bidirectional #OneToOne relationship. When I called the EntityManager.remove() I was not passing the owner of the relationship. This made Oracle throw DeadLockException.
This way EclipseLink created two statements on the same entity to delete.
It seems a bug to me, JPA implementation should handle this.

Related

SQL Warning with Spring boot and Sybase

I am getting a lot of SQL Warning in the logs with the spring boot and sybase.
o.h.engine.jdbc.spi.SqlExceptionHelper : [] SQL Warning Code: 0, SQLState: 010SK
o.h.engine.jdbc.spi.SqlExceptionHelper : [] 010SK: Database cannot set connection option SET_READONLY_TRUE.
o.h.engine.jdbc.spi.SqlExceptionHelper : [] 010SK: Database cannot set connection option SET_READONLY_FALSE.
Could anyone explain the reason behind this?

Solution 1:
java.sql.Connection has a setReadOnly(boolean) method that is meant to notify the database of the type of result set being requested in order to perform any optimizations. However Sybase ASE doesn't require any optimizations, therefore setReadOnly() produces a SQLWarning.
In order to suppress the message you'll need to update the spt_mda table in the MASTER database.
update spt_mda set querytype = 4, set query = '0'
where mdinfo = 'SET_READONLY_FALSE'
and
update spt_mda set querytype = 4, set query = '0'
where mdinfo = 'SET_READONLY_TRUE'
These two entries (they are the only ones) are set to a querytype of 3 by default, which means "not supported", which explains the SQLWarning.
Changing them to a 4 (meaning boolean values) with a query type of "0" basically causes the JDBC Driver to return false without the warning..
Solution 2:
You might turn off/on on logging for specific part of hibernate logging modules, these are different configurations:
# Hibernate logging
# Log everything (a lot of information, but very useful for troubleshooting)
log4j.logger.org.hibernate=FATAL
# Log all SQL DML statements as they are executed
log4j.logger.org.hibernate.SQL=INHERITED
# Log all JDBC parameters
log4j.logger.org.hibernate.type=INHERITED
# Log all SQL DDL statements as they are executed
log4j.logger.org.hibernate.tool.hbm2ddl=INHERITED
# Log the state of all entities (max 20 entities) associated with the session at flush time
log4j.logger.org.hibernate.pretty=INHERITED
# Log all second-level cache activity
log4j.logger.org.hibernate.cache=INHERITED
# Log all OSCache activity - used by Hibernate
log4j.logger.com.opensymphony.oscache=INHERITED
# Log transaction related activity
log4j.logger.org.hibernate.transaction=INHERITED
# Log all JDBC resource acquisition
log4j.logger.org.hibernate.jdbc=INHERITED
# Log all JAAS authorization requests
log4j.logger.org.hibernate.secure=INHERITED
Possible values:
OFF
FATAL
ERROR
WARN
INFO
DEBUG
TRACE
ALL

ibm mq test return MQJE001: Completion Code '2', Reason '2035'

I have web app that allow sent messages to queue, it deployed on Websphere Application Server and work very well.
I try to build light environment for autotests, but when i try to sent message to queue from test it returns to me MQJE001: Completion Code '2', Reason '2035'
I thought that problem in CHLAUTH rules but seems that i have all rights.
C:/> dspmqaut -m M00.EDOGO -n OEP.FROM.GW_SBAST.DLV -t q -p out-bychek-ao
Entity out-bychek-ao has the following authorizations for object OEP.FROM.GW_SBA
ST.DLV:
get
browse
put
inq
set
crt
dlt
chg
dsp
passid
passall
setid
setall
clr
error from logs :
AMQ8075: Authorization failed because the SID for entity 'out-bychek-a' cannot
be obtained.
EXPLANATION:
The Object Authority Manager was unable to obtain a SID for the specified
entity. This could be because the local machine is not in the domain to locate
the entity, or because the entity does not exist.
ACTION:
Ensure that the entity is valid, and that all necessary domain controllers are
available. This might mean creating the entity on the local machine.
----- amqzfubn.c : 2252 -------------------------------------------------------
7/9/2018 15:39:57 - Process(2028.3) User(MUSR_MQADMIN) Program(amqrmppa.exe)
Host(SBT-ORSEDG-204) Installation(Installation1)
VRMF(7.5.0.4) QMgr(M00.EDOGO)
AMQ9557: Queue Manager User ID initialization failed.
EXPLANATION:
The call to initialize the User ID failed with CompCode 2 and Reason 2035.
ACTION:
Correct the error and try again.
----- cmqxrsrv.c : 1975 -------------------------------------------------------
7/9/2018 15:39:57 - Process(2028.3) User(MUSR_MQADMIN) Program(amqrmppa.exe)
Host(SBT-ORSEDG-204) Installation(Installation1)
VRMF(7.5.0.4) QMgr(M00.EDOGO)
AMQ9999: Channel 'SC.EDOGO' to host '10.82.38.188' ended abnormally.
EXPLANATION:
The channel program running under process ID 2028(11564) for channel 'SC.EDOGO'
ended abnormally. The host name is '10.82.38.188'; in some cases the host name
cannot be determined and so is shown as '????'.
ACTION:
Look at previous error messages for the channel program in the error logs to
determine the cause of the failure. Note that this message can be excluded
completely or suppressed by tuning the "ExcludeMessage" or "SuppressMessage"
attributes under the "QMErrorLog" stanza in qm.ini. Further information can be
found in the System Administration Guide.
----- amqrmrsa.c : 909 --------------------------------------------------------
notice AMQ8075: Authorization failed because the SID for entity 'out-bychek-a' cannot in my account name lost last letter. Is it normal?
and this
DISPLAY CHLAUTH('SYSTEM.DEF.SVRCONN') MATCH(RUNCHECK) ALL ADDRESS('127.0.0.1') CLNTUSER('out-bychek-ao')
7 : DISPLAY CHLAUTH('SYSTEM.DEF.SVRCONN') MATCH(RUNCHECK) ALL ADDRESS('127.0.0.1') CLNTUSER('out-bychek-ao')
AMQ8898: Display channel authentication record details - currently disabled.
CHLAUTH(SYSTEM.*) TYPE(ADDRESSMAP)
DESCR(Default rule to disable all SYSTEM channels)
CUSTOM( ) ADDRESS(*)
USERSRC(NOACCESS) WARN(NO)
ALTDATE(2016-11-14) ALTTIME(17.33.34)
dmpmqaut -m M00.EDOGO -n OEP.FROM.GW_SBAST.DLV -t q -p out-bychek-ao -e
profile : OEP.FROM.GW_SBAST.DLV
object type: queue
entity : out-bychek-ao#alpha
entity tyoe: principal
authority : allmqi dlt chg dsp clr
- - - - - - - - -
profile : CLASS
object type: queue
entity : out-bychek-ao#alpha
entity tyoe: principal
authority : clt

Hazelcast - client mode topology / distributed map lock issue

Below is the description of problem we faced in production. Please note that I could not reproduce the issue in test or local environment and therfore can not provide you with test code.
We have a hazelcast cluster with two members M1, M2 and three clients C1,C2,C3. Hazelcast version is 3.9.
Clients use IMap.tryLock() method with timeout of 10 seconds. After getting the lock, critical and long running operations are performed and finally the lock is released using IMap.unlock() method.
The problem occured in production is as follows:
At some time instant t, we first saw heartbeat failure to M2 at client C2. Afterwards there are errors in fetching partition table casued by com.hazelcast.spi.exception.TargetDisconnectedException:
[hz.client_0.internal-2 ] WARN [] HeartbeatManager - hz.client_0 [mygroup] [3.9] HeartbeatManager failed to connection: .....
[hz.client_0.internal-3 ] WARN [] ClientPartitionService - hz.client_0 [mygroup] [3.9] Error while fetching cluster partition table!
java.util.concurrent.ExecutionException: com.hazelcast.spi.exception.TargetDisconnectedException: Heartbeat timed out to owner connection ClientConnection{alive=true, connectionId=1, ......
Around 250 ms after initial heartbeat failure, client gets disconnected and then reconnects in 20 ms.
[hz.client_0.cluster- ] INFO [] LifecycleService - hz.client_0 [mygroup] [3.9] HazelcastClient 3.9 (20171023 - b29f549) is CLIENT_DISCONNETED
[hz.client_0.cluster- ] INFO [] LifecycleService - hz.client_0 [mygroup] [3.9] HazelcastClient 3.9 (20171023 - b29f549) is CLIENT_CONNECTED
The problem we are having is, for some keys that are previously acquired by C2, C1 and C3 can not acquire the lock even if it seems to be released by C2. C2 can get the lock, but this puts unacceptable delays
to the application and is not acceptable.. All clients should get since lock is released...
We were notified of the problem after receiving complaints, and then restarted the client application C2.
As documented in http://docs.hazelcast.org/docs/latest-development/manual/html/Distributed_Data_Structures/Lock.html, locks acquired by restarted member (C2 in my case) seemed to be removed after restart operation.
Currently the issue seems to go away, but we are not sure if it will recur.
Do you have any suggestions about the probable cause and more importantly do you have any recommendations?
Would enabling redo-operation in client help for this problem case?
As I tried to explain client seems to recover the problem, but keys remain locked in cluster and this is fatal to my application.
Thanks

It looks like the client had lost the ownership of the lock because of its disconnection from the cluster. You can use IMap#forceUnlock API in cases such as you faced. It releases the lock regardless of the lock owner and it always successfully unlocks, never blocks, and returns immediately.

jooq insert throws an exception when another thread is reading from same table

I have a table where I am inserting records using record.insert() method. I believe this method is doing an insert and then a select but in a different transactions. At the same time I have another thread which pools this table for records processes them and then deletes them.
In some cases I am getting the below exception:
org.jooq.exception.NoDataFoundException: Exactly one row expected for refresh. Record does not exist in database.
at org.jooq.impl.UpdatableRecordImpl.refresh(UpdatableRecordImpl.java:345)
at org.jooq.impl.TableRecordImpl.getReturningIfNeeded(TableRecordImpl.java:232)
at org.jooq.impl.TableRecordImpl.storeInsert0(TableRecordImpl.java:208)
at org.jooq.impl.TableRecordImpl$1.operate(TableRecordImpl.java:169)
My solution was to use DSL.using(configuration()).insertInto instead of record.insert().
My question is shouldn't the insert and fetch be done in the same transaction?
UPDATE:
This is a dropwizard app that is using jooqbundle: com.bendb.dropwizard:dropwizard-jooq.
The configuration is injected in the DAO, the insert is as follows:
R object = // jooq record
object.attach(configuration);
object.insert();
On the second thread I am just selecting some records from this table, processing them and then deleting them
Jooq logs clearly shows that the 2 queries are not run in same transaction:
14:07:09.550 [main] DEBUG org.jooq.tools.LoggerListener - -> with bind values : insert into "queue"
....
14:07:09.083', 1)
14:07:09.589 [main] DEBUG org.jooq.tools.LoggerListener - Affected row(s) : 1
14:07:09.590 [main] DEBUG org.jooq.tools.StopWatch - Query executed : Total: 47.603ms
14:07:09.591 [main] DEBUG org.jooq.tools.StopWatch - Finishing : Total: 48.827ms, +1.223ms
14:07:09.632 [main] DEBUG org.jooq.tools.LoggerListener - Executing query : select "queue"."
I do not see the "autocommit off" or "savepoint" statements in the logs which are generally printed by jooq in case the queries are run in a transaction. I hope this helps, let me know if you need more info
UPDATE 2:
Jooq version is 3.9.1
mysql version 5.6.23
Database and jooq entry yml file:
database:
driverClass: com.mysql.jdbc.Driver
user: ***
password: ***
url: jdbc:mysql://localhost:3306/mySchema
properties:
charSet: UTF-8
characterEncoding: UTF-8
# the maximum amount of time to wait on an empty pool before throwing an exception
maxWaitForConnection: 1s
# the SQL query to run when validating a connection's liveness
validationQuery: "SELECT 1"
# the timeout before a connection validation queries fail
validationQueryTimeout: 3s
# initial number of connections
initialSize: 25
# the minimum number of connections to keep open
minSize: 25
# the maximum number of connections to keep open
maxSize: 25
# whether or not idle connections should be validated
checkConnectionWhileIdle: true
# the amount of time to sleep between runs of the idle connection validation, abandoned cleaner and idle pool resizing
evictionInterval: 10s
# the minimum amount of time an connection must sit idle in the pool before it is eligible for eviction
minIdleTime: 1 minute
jooq:
# The flavor of SQL to generate. If not specified, it will be inferred from the JDBC connection URL. (default: null)
dialect: MYSQL
# Whether to write generated SQL to a logger before execution. (default: no)
logExecutedSql: no
# Whether to include schema names in generated SQL. (default: yes)
renderSchema: yes
# How names should be rendered in generated SQL. One of QUOTED, AS_IS, LOWER, or UPPER. (default: QUOTED)
renderNameStyle: QUOTED
# How keywords should be rendered in generated SQL. One of LOWER, UPPER. (default: UPPER)
renderKeywordStyle: UPPER
# Whether generated SQL should be pretty-printed. (default: no)
renderFormatted: no
# How parameters should be represented. One of INDEXED, NAMED, or INLINE. (default: INDEXED)
paramType: INDEXED
# How statements should be generated; one of PREPARED_STATEMENT or STATIC_STATEMENT. (default: PREPARED_STATEMENT)
statementType: PREPARED_STATEMENT
# Whether internal jOOQ logging should be enabled. (default: no)
executeLogging: no
# Whether optimistic locking should be enabled. (default: no)
executeWithOptimisticLocking: yes
# Whether returned records should be 'attached' to the jOOQ context. (default: yes)
attachRecords: yes
# Whether primary-key fields should be updatable. (default: no)
updatablePrimaryKeys: no
Have included the jooq bundle in the Application class as described in https://github.com/benjamin-bader/droptools/tree/master/dropwizard-jooq.
Using https://github.com/xvik/dropwizard-guicey to inject the configuration into each DAO.
The guide module has the following binding:
bind(Configuration.class).toInstance(jooqBundle.getConfiguration());

Apache Solr - waiting on sql queries when delta-import command is executed

I'm using PostgreSQL 8.2.9, Solr 3.1, Tomcat 5.5
I have following problem:
When I execute delta-import - /dataimport?command=delta-import - any update queries to database are not responding for about 30 seconds.
I can easily repeat this behaviour (using psql or hibernate):
PSQL:
Execute delta-import
Immediately in psql - run SQL query: 'UPDATE table SET ... WHERE id = 1;' several times
The second/third/... time - I must wait ~30 seconds for query to return
Hibernate:
In logs - hibernate waits ~30 seconds on method 'doExecuteBatch(...)' after setting query parameters
No other queries are executed when I'm testing this. On the other hand when I'm executing other commands (like full-import, etc.)- everything works perfectly fine.
In Solr's dataconfig.xml:
I have readOnly attribute set to true on PostgreSQL dataSource.
deltaImportQuery, deltaQuery, ... on entity tags don't lock database (simple SELECT's)
Web app (using hibernate) logs:
2012-01-08 18:54:52,403 DEBUG my.package.Class1... Executing method: save
2012-01-08 18:55:26,707 DEBUG my.package.Class1... Executing method: list
Solr logs:
INFO: [] webapp=/search path=/dataimport params={debug=query&command=delta-import&commit=true} status=0 QTime=1
2012-01-08 18:54:50 org.apache.solr.handler.dataimport.DataImporter doDeltaImport
INFO: Starting Delta Import
...
2012-01-08 18:54:50 org.apache.solr.handler.dataimport.JdbcDataSource$1 call
INFO: Time taken for getConnection(): 4
...
FINE: Executing SQL: select ... where last_edit_date > '2012-01-08 18:51:43'
2012-01-08 18:54:50 org.apache.solr.core.Config getNode
...
FINEST: Time taken for sql :4
...
INFO: Import completed successfully
2012-01-08 18:54:50 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true,expungeDeletes=false)
2012-01-08 18:54:53 org.apache.solr.core.Config getNode
...
2012-01-08 18:54:53 org.apache.solr.update.DirectUpdateHandler2 commit
INFO: end_commit_flush
...
2012-01-08 18:54:53 org.apache.solr.handler.dataimport.DocBuilder execute
INFO: Time taken = 0:0:2.985
There're no 'SELECT ... FOR UPDATE / LOCK / etc.' queries in above logs.
I have set logging for PostgreSQL - there're no locks. Even sessions are set to:
Jan 11 14:33:07 test postgres[26201]: [3-1] <26201> LOG: execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION READ ONLY
Jan 11 14:33:07 test postgres[26201]: [4-1] <26201> LOG: execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
Why is this happening? This looks like some kind of database lock but then why when import is completed (2 secs) queries are still waiting (for 30 secs)?

The Update waiting for the SELECT statement to complete before executing. Not a lot you can do about that that I'm aware of. We get around the issue by doing our indexing in batches. Multiple SELECT statements are fine, but UPDATE and DELETE affect the records and wont execute until it can lock the table.

Ok. It was hard to find solution to this problem.
The reason was the underlying platform - disk write saturation. There was too many small "disk-write"'s that consumed all disk-write power.
Now we have new agreement with our service layer provider.
Test query:
while true ; do echo "UPDATE table_name SET type='P' WHERE type='P'" | psql -U user -d database_name ; sleep 1 ; done
plus making changes via out other application, plus update index simultaneously.
This was before platform change:
And here is how it works now:

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.