How to achieve row level locking in cassandra - java

I have a Cassandra cluster setup like Node N1, Node N2 and Node N3
I have a user table, I need to write create a row level locking for it across the nodes in the cluster, So could you please guide me by answering the following questions?
1) What is the maximum level of locking possible in Cassandra?
2) What is lightweight transaction? How much it is possible to achieve row level locking?
3) Is there an alternate way to achieve the row level locking in Cassandra?

None, but you might stretch it to say column level.
It uses paxos for consensus and can perform conditional updates. It doesnt do locking. It will either succeed or not if another update occurred, if it doesnt succeed you can try again. If it does succeed everything in "transaction" (poor naming) will apply. However, theres still no isolation within it and if multiple columns in row are updated you may read between them being applied. details here
Design data model so you dont need locking.

There is no transactions in cassandra, there is no locking. There is however light weight transactions. They're not great, the performance is worse and there are alot of tradeoffs.
Depending on what the use case is for this lock you could do:
INSERT INTO User (userID, email)
VALUES (‘MyGuid’, ‘user#example.com’)
IF NOT EXISTS;
If the query returns an error/failure you would have to handle that, it won't just fail if someone inserts before you. A failure also might mean that 1 of your nodes did get the write but not all of them. LWT don't roll back.

Related

How to write several item in table at the same time using ORMLite

I use ORMLite on a solution made by server and clients.
On server side I use PostgreSql, on client side I use SQLite.
In code, I use the same ORMLite methods, without taking care of the DB that is managed (PostgreSql or SQLite).
Let's say that:
Table A corresponds to class A
I have an Arraylist of objects A
I want to insert all items of ArrayList in DB.
Today I use a for() cycle, and I insert them one by one (doing it in Transaction Manager).
When the items are few, no problem, but now the items are becaming more and this is not probably the best way, also because I lock the DB for long time.
I'm searching a way to insert all the items in one step, so to go quickly, to not lock the DB for long time. I understood that it should be a sort of Stored Procedures (I'm not expert...).
To be noted that some items could be new (that is it not exist already an item with the same primary key id), then must be performed and INSERT; other items could be existing, so it should be performed an UPDATE.
Thank you
I'm searching a way to insert all the items in one step, so to go quickly, to not lock the DB for long time.
So there are two ways to do this that I know of: transactions and disabling auto-commit. If you are inserting into the database and it needs to all happen "at once" from a consistency standpoint, transactions are the only way to go. If you just want to insert and update a large number of records with higher performance then you can disable auto-commit, do the operations, and then commit. Depending on the database implementation, this is what the TransactionManager is really doing.
I understood that it should be a sort of Stored Procedures...
I don't see how stored procedures helps you at all. They aren't magic.
but now the items are becoming more and this is not probably the best way, also because I lock the DB for long time.
I don't think there is a magic solution to this. If you are pushing a large number of objects to the database and you need the data to be transactional, then locks are going to be have to be held during the updates. One thing to realize is that postgres should handle this a ton better than Sqlite. Sqlite does not (I don't think) have row level locking meaning that the whole DB is paused during transactions. Postgres has a much more mature locking system and should be more performant in this situation. This is also why Sqlite is so fast in many other operations because it doesn't have to burdened with the lock complexity.
One thing to consider is to rearchitect your schema. Try to figure out the minimal amount of data that needs to be transactionally inserted. For example, maybe just the object relationships needs to be changed transactionally but all of the data can be stored later. For example, you could have an AccountOwner object which just has 2 ids while all of information about the Account can be stored outside of the transaction. This makes your schema more complicated but maybe much faster.
Hope something here helps.
you can user entityManager.merge([list of items]);
the entityManager will insert the list in one shot.
Merge create the object if it doesn't exist in the database and updated if already exsit.

Strictly auto-increment value in MySQL

I have to create a MySQL InnoDB table using a strictly sequential ID to each element in the table (row). There cannot be any gap in the IDs - each element has to have a different ID and they HAVE TO be sequentially assigned. Concurrent users create data on this table.
I have experienced MySQL "auto-increment" behaviour where if a transaction fails, the PK number is not used, leaving a gap. I have read online complicated solutions that did not convince me and some other that dont really address my problem (Emulate auto-increment in MySQL/InnoDB, Setting manual increment value on synchronized mysql servers)
I want to maximise writing concurrency. I cant afford having users writing on the table and waiting long times.
I might need to shard the table... but still keeping the ID count.
The sequence of the elements in the table is NOT important, but the IDs have to be sequential (ie, if an element is created before another does not need to have a lower ID, but gaps between IDs are not allowed).
The only solution I can think of is to use an additional COUNTER table to keep the count. Then create the element in the table with an empty "ID" (not PK) and then lock the COUNTER table, get the number, write it on the element, increase the number, unlock the table. I think this will work fine but has an obvious bottle neck: during the time of locking nobody is able to write any ID.
Also, is a single point of failure if the node holding the table is not available. I could create a "master-master"? replication but I am not sure if this way I take the risk of using an out-of-date ID counter (I have never used replication).
Thanks.
I am sorry to say this, but allowing high concurrency to achieve high performance and at the same time asking for a strictly monotone sequence are conflicting requirements.
Either you have a single point of control/failure that issues the IDs and makes sure there are neither duplicates nor is one skipped, or you will have to accept the chance of one or both of these situations.
As you have stated, there are attempts to circumvent this kind of problem, but in the end you will always find that you need to make a tradeoff between speed and correctness, because as soon as you allow concurrency you can run into split-brain situations or race-conditions.
Maybe a strictly monotone sequence would be ok for each of possibly many servers/databases/tables?

How to code optimistic and pessimistic locking from java code

I know what optimistic and pessimistic locking is, but when you write a java code how do you do it? Suppose I am using Oracle with Java, do I have any methods in JDBC that will help me do that? How will I configure this thing? Any pointers will be appreciated.
You can implement optimistic locks in your DB table in this way (this is how optimistic locking is done in Hibernate):
Add integer "version" column to your table.
Increase the value of this column with each update of corresponding row.
To obtain lock, just read "version" value of the row.
Add "version = obtained_version" condition to where clause of
your update statement. Verify number of affected rows after update.
If no rows were affected - someone has already modified your entry.
Your update should look like
UPDATE mytable SET name = 'Andy', version = 3 WHERE id = 1 and version = 2
Of course, this mechanism works only if all parties follow it, contrary to DBMS-provided locks that require no special handling.
Hope this helps.
Suppose I am using Oracle with Java, do I have any methods in JDBC that will help me do that?
This Oracle paper should provide you with some tips on how to do this.
There are no specific JDBC methods. Rather, you achieve optimistic locking by the way that you design your SQL queries / updates and where you put the transaction boundaries.

MySQL InnoDB hangs on waiting for table-level locks

I have a big production web-application (Glassfish 3.1 + MySQL 5.5). All tables are InnoDB. Once per several days application totally hangs.
SHOW FULL PROCESSLIST shows many simple insert or update queries on different tables but all having status
Waiting for table level lock
Examples:
update user<br>
set user.hasnewmessages = NAME_CONST('in_flag',_binary'\0' COLLATE 'binary')
where user.id = NAME_CONST('in_uid',66381)
insert into exchanges_itempacks
set packid = NAME_CONST('in_packId',332149), type = NAME_CONST('in_type',1), itemid = NAME_CONST('in_itemId',23710872)
Queries with the longest 'Time' are waiting for the table-level lock too.
Please help to figure out why MySQL tries to get level lock and what can be locking all these tables. All articles about the InnoDB locking say this engine uses no table locking if you don't force it to do so.
My my.cnf has this:
innodb_flush_log_at_trx_commit = 0
innodb_support_xa = 0
innodb_locks_unsafe_for_binlog = 1
innodb_autoinc_lock_mode=2
Binary log is off. I have no "LOCK TABLES" or other explicit locking commands at all. Transactions are READ_UNCOMMITED.
SHOW ENGINE INNODB STATUS output:
http://avatar-studio.ru:8080/ph/imonout.txt
Are you using MSQLDump to backup your database while it is still being accessed by your application? This could cause that behaviour.
I think there are some situations when MySQL does a full table lock (i.e. using auto-inc).
I found a link which may help you: http://mysqldatabaseadministration.blogspot.com/2007/06/innodb-table-locks.html
Also review java persistence code having all con's commited/rollbacked and closed. (Closing always in finally block.)
Try setting innodb_table_locks=0 in MySQL configuration.
http://dev.mysql.com/doc/refman/5.0/en/innodb-parameters.html#sysvar_innodb_table_locks
Just a few ideas ...
I see you havily use NAME_CONST in your code. Just try not to use it. You know, mysql can be sometimes buggy (I also found several bugs), so I recommend don't rely on features which are not so common / well tested. It is related to column names, so maybe it locks something? Well it should't if it affects only the result, but who knows? This is suspicious. Moreover, this is marked as a function for internal use only.
This may seem simple, but you don't have a long-running select statement that is possibly locking out updates and inserts? There's no query that's actually running and not locked?
Have you considered using MyISAM instead of InnoDB?
If you are not utilizing any transactional features, MyISAM might make more sense.
Its simpler, easier to optimize, and since it doesn't have sophisticated transactional capabilities, easier to configure in your my.cnf.
Also, depending on the type of db load your app creates, MyISAM might be more appropriate. I prefer MyISAM for read-heavy applications, again, it's easier to configure and understand.
Other suggestions:
It might be a good idea to find a way to not use NAME_CONST in your SQL.
"This function was added in MySQL 5.0.12. It is for internal use only."
When the documentation of an open source product says this, its probably a good idea to heed it's advise.
By default, MySQL stores all InnoDB tables & schemas data in 1 enormous file, there could be some kind of OS level locking on that particular file that propogates to MySQL that prevents all table access. By using the innodb_file_per_table option , you may eliminate that potential issue. This also makes MySQL more space efficient.
in this case you have to create several different database table with same column each other and do not inset more then 3000 row per table, in this case if you want to enter more data into table you have to create another dynamic table(generate table using code) and insert new data into this table and access data from that table. in your condition if more and more table will have to generate then you have to create new database.
i think this tip will help you to design your database more carefully and solve error.

Is there a good patterns for distributed software and one backend database for this problem?

I'm looking for a high level answer, but here are some specifics in case it helps, I'm deploying a J2EE app to a cluster in WebLogic. There's one Oracle database at the backend.
A normal flow of the app is
- users feed data (to be inserted as rows) to the app
- the app waits for the data to reach a certain size and does a batch insert into the database (only 1 commit)
There's a constraint in the database preventing "duplicate" data insertions. If the app gets a constraint violation, it will have to rollback and re-insert one row at a time, so the duplicate rows can be "renamed" and inserted.
Suppose I had 2 running instances of the app. Each of the instances is about to insert 1000 rows. Even if there is only 1 duplicate, one instance will have to rollback and insert rows one by one.
I can easily see that it would be smarter to re-insert the non-conflicting 999 rows as a batch in this instance, but what if I had 3 running apps and the 999 rows also had a chance of duplicates?
So my question is this: is there a design pattern for this kind of situation?
This is a long question, so please let me know where to clarify. Thank you for your time.
EDIT:
The 1000 rows of data is in memory for each instance, but they cannot see the rows of each other. The only way they know if a row is a duplicate is when it's inserted into the database.
And if the current application design doesn't make sense, feel free to suggest better ways of tackling this problem. I would appreciate it very much.
http://www.oracle-developer.net/display.php?id=329
The simplest would be to avoid parallel processing of the same data. For example, your size or time based event could run only on one node or post a massage to a JMS queue, so only one of the nodes would process it (for instance, by using similar duplicate-check, e.g. based on a timestamp of the message/batch).

Categories

Resources