Synchronizing websocket connections to the same endpoint - java

In my code, there is a websocket server that persists information to the database on behalf of the connected client.
I am using Jetty 9, Hibernate, and Postgres.
Essentially, when a client posts a data object via the websocket:
The server deserializes the data
Checks to see if the object already exists in the database, based on content
If a match is found
The server will update the database row, and indicate that the item already exists
Else
The server will create a new database row and indicate that the item was added.
In this system:
An endpoint (ie., URL) corresponds to a single user principal
Multiple connections to the same endpoint are allowed, meaning multiple connections from different clients to a single user.
Data objects in the database are specific to a user
(This means that the server will only check rows belonging to the user for already-existing items)
This is all working well, except when 2 clients post the same data at precisely the same time. Neither server instance knows about the other, and the database ends up with 2 rows that are the same, except for the server-generated ID.
How can I prevent the server from writing the same data at the same time?
I can't use a UNIQUE constraint in the database, because I actually do have to support having multiple rows with the same data, but posted at different times.
I can get all the other sessions attached to the same endpoint, using "session.getOpenSessions()".
Can I:
Synchronize on the endpoint, somehow?
Synchronize on something attached to the session?
Do something with threading configuration?
Thanks for any suggestions.

When you say:
Checks to see if the object already exists in the database, based on
content
You already define a unique constraint, meaning there's a column set combination that must be unique, so that when you match an existing row, you will update instead of inserting.
The database already offers a centralized concurrency control mechanism, so you should probably use an index on all the aforementioned columns.

Related

Retrieve data and wait time to check if something new appeared in database

I would like to retrieve data from my in-memory H2 database via rest endpoint using Spring and Java8. I have 2 endpoints: one to retrieve data, and second one to add data to database.
How can I achieve something like it is described below in easiest way? I am not sure what solution can be better I thought about JMS Queue or CompletableFuture (if it is possible). It should work for few users, they will call to retrieve data saved under their id number.
Scenario:
User calls rest-endpoint to retrieve data.
If data is present in database then it is retrieved and returned to user.
If data is not present in database then connection is hold for 60 seconds and if during that time something will appear in database (added via endpoint to add new data) then data will be returned.
If data is not present in database and new data won’t appear in 60 seconds then endpoint returns no content.
There were multiple ways of doing that and if requirements is clear then i suggest below two approaches.
Approach 1:
Find and retrieve if data available without waiting.
If data not available set resource id and retrieveTime in header and respond to consumer.
Based resource id you can be ready with data if available.
In this way you can sure about your endpoint service time always consistent and ideally it shouldn't be more than 3 seconds.
Approach 2.
if data not available then out sleep in 60 seconds (not in database connection scope) and then again try with same thread.
Don't need any queue or ansyc process here.
Here your loosing resources and service time would take more.
Apart from other approaches, if your systems using eventing then use eventing approach when there is record persistent then send event to consumer (all database has the feature to send event to source system today).

Best way to remove tokens from database

I'm developing a token based api gateway. Its basically provide a token for authentic clients. So I'm not sure how to remove expired tokens. For every request I checked whether the token is valid or not.
Option 1 is
Mark status of token as expired in database table row.
and create a scheduler to run in midnight to delete expired tokens.
Option 2 is
Delete the token from the row when its expired.
In here No need to run a scheduler.
Normally this API Gateway will handle around 1000 requests per second and and this will increase day by day.
So I'm not sure which option I should use.
The technology I have used is.
Spring mvc,Spring data jpa and Postgre DB. Will deploy on tomcat server.
Neither of the two options is particularly good as both will modify a table row and therefore generate I/O. At 1,000 q/s you need a better solution. On 2ndQuadrant is a blog post on authenticating users through connection pooling in the context of row-level security. The blog post has some issues IMHO and non-relevant material as well so I'll try to redo it here in the right way (or read my comment on the blog post over there).
In Java - as in most other programming languages and/or frameworks - connection pooling is the preferred way to connect to a database server for performance reasons. There is an implicit contract that the application requests a Connection instance from the pool, uses it and then returns the instance to the pool for some other thread. Holding on to a Connection is not an option as it breaks the pooling logic. So proceed as follows:
Connection pool object
Create a connection pool object with database cluster credentials. That role should be GRANTed all necessary privileges on tables and other objects.
Authentication
In the application a user authenticates doing myapp_login(username, password) or something similar using a Connection from the pool. In the database the credentials are checked against a table users or whatever it is called in your setup. If a match is found then create a random token and insert that in a table:
CREATE UNLOGGED TABLE sessions (
token text DEFAULT uuid_generate_v4()::text,
login_time timestamp DEFAULT CURRENT_TIME,
user_name integer,
...
);
Add as many fields as you want. I use a uuid here (cast to text, read on) but you could also md5() some data or use some pg_crypto routine.
This table has to be fast so it is UNLOGGED. That means it is not crash-safe and will be truncated after some server error but that is not a problem: all database sessions will have been invalidated anyway. Also, do not put any constraints like NOT NULL on the table because the only access to this table is through the functions that you as a developer design, no ordinary user ever touches this table, and every constraint involves more CPU cycles.
The myapp_login() function looks somewhat like this:
CREATE FUNCTION myapp_login(uname text, password text) RETURNS text AS $$
DECLARE
t text;
BEGIN
PERFORM * FROM app_users WHERE username = uname AND pwd = password;
IF FOUND THEN
INSERT INTO sessions(user_name) VALUES (uname) RETURNING token INTO t;
EXECUTE format('SET SESSION "my_app.session_user" TO %s', t);
RETURN t;
END IF;
SET SESSION "my_app.session_user" = '';
RETURN NULL;
END;
$$ LANGUAGE plpgsql STRICT SECURITY DEFINER;
REVOKE EXECUTE ON FUNCTION myapp_login(text, text) FROM PUBLIC;
GRANT EXECUTE ON FUNCTION myapp_login(text, text) TO myapp_role;
As you can see, the token is also set in an environment variable with SET SESSION (which needs a literal text value, hence the uuid::text cast and the EXECUTE command) and then returned to the caller. That session token should be stored somewhere in your application code on the Java side.
The function does a lookup on the app_users table and an INSERT on the sessions table. The first is cheap, the second is expensive.
Resume the same session for further queries
If your app user needs further database access after the first queries, then get a Connection instance from the connection pool again, but don't call myapp_ login() but myapp_resume(token) instead. This latter function looks up the token in the sessions table (cheap) and, if found, sets the session variable to this new token. You can also check that the login_time value is recent or set it with the CURRENT_TIME to keep the session "alive" (expensive) or do any other necessary business.
The trick is to keep resuming the session as lean as possible because this is likely to be happening multiple time during a session (from the application perspective).
Close the session
When your app user is done, do myapp_logout(token) which deletes the row from the sessions table that corresponds to the token.
Sessions that are not properly closed are not deleted from the sessions table, but I would not worry too much about that.You could schedule a job that runs once a week to delete all rows that are older than 6 hours or so. That would also allow you to figure out where the error comes from, for instance.
A final word on the token. A uuid is just a random number, but you could also make a hash of the application user name with some random data and use that, for instance, in RLS or some other row-based access mechanism; the blog post I link to above has good info on that. In an application I have developed myself I link the row from the users table to what the user is allowed to see. In either case you should really weigh the pro's and con's: a hash that can be used in RLS sounds nice, but it requires the hash to be re-calculated (which tends to be expensive) and compared to the session hash on every query, a repeated lookup against a users table is also an overhead. Setting another session variable that can be checked at query time with current_setting() might be a good alternative.
I think the easiest way would be like this. When you generate token in your database, you can store time of generation. So that when client sends a request to your database, you can check if it's expired and delete it in request time.

Is checksum a good way to see if table has been modified in MySQL?

I'm currently developing an application in Java that connects to a MySQL database using JDBC, and displays records in jTable. The application is going to be run by more than one user at a time and I'm trying to implement a way to see if the table has been modified. EG if user one modifies a column such as stock level, and then user two tries to access the same record tries to change it based on level before user one interacts.
At the moment I'm storing the checksum of the table that's being displayed as a variable and when a user tries to modify a record it will do a check whether the stored checksum is the same as the one generated before the edit.
As I'm new to this I'm not sure if this a correct way to do it or not; as I have no experience in this matter.
Calculating the checksum of an entire table seems like a very heavy-handed solution and definitely something that wouldn't scale in the long term. There are multiple ways of handling this but the core theme is to do as little work as possible to ensure that you can scale as the number of users increase. Imagine implementing the checksum based solution on table with million rows continuously updated by hundreds of users!
One of the solutions (which requires minimal re-work) would be to "check" the stock name against which the value is updated. In the background, you'll fire across a query to the table to see if the data for "that particular stock" has been updated after the table was populated. If yes, you can warn the user or mark the updated cell as dirty to indicate that that value has changed. The problem here is that the query won't be fired off till the user tries to save the updated value. Or you could poll the database to avoid that but again hardly an efficient solution.
As a more robust solution, I would recommend using a database which implements native "push notifications" to all the connected clients. Redis is a NoSQL database which comes to mind for this.
Another tried and tested technique would be to forgo direct database connection and use a middleware layer like a messaging queue (e.g. RabbitMQ). Message queues enable design of systems which communicate using message. So for e.g. every update the stock value in the JTable would be sent across as a message to an "update database queue". Once the update is done, a message would be sent across to a "update notification queue" to which all clients would be connected. This will enable all of them to know that the value of a given stock has been updated and act accordingly. The advantage to this solution is that you get to keep your existing stack (Java, MySQL) and can implement notifications without polling the DB and killing it.
Checksum is a way to see if data has changed.
Anyway I would suggest you store a column "last_update_date", this column is supposed to be always updated at every update of the record.
So you juste have to store this date (precision date time) and do the check with that.
You can also add a column version number : a simple counter incremented by 1 at each update.
Note:
You can add a trigger on update for updating last_update_date, it should be 100% reliable, maybe you don't need a trigger if you control all updates.
When using in network communication:
A checksum is a count of the number of bits in a transmission unit
that is included with the unit so that the receiver can check to see
whether the same number of bits arrived. If the counts match, it's
assumed that the complete transmission was received.
So it can be translated to check 2 objects are different, your approach is correct.

Concurrent Web Service Requests using JAX-WS

I need to design a solution for my problem. So, Please help me.
My Problem :
We have one Web Project. There we are using four tables called A, B, C, D.
For each table, we had created one front-end page and we provided a button for save the record to respective tables.
Now we need to share this each record data to another application using Web-Service integration.
I have the knowledge on JAX-WS web service.
We Identified the required fields and we created only one common WSDL for all four tables.
When the user will try to save the record, at the time only i need to raise the web service call.(Event Base)
Here we are fallowing the Synchronous web service. I.e. for every request system will wait for response from other end.
Suppose, first i am trying to save the record in table A. so, i filled all required fields in form A, and i am trying to hit the save button. record saved in database as well as raised a web service request to other end, sent an record to server and waiting for response.....
If in this mean while if again i am trying to send another record for same form A (or) new record for form B.
Then how to handle this kind of scenario, because already a thread was busy with server for their response. So, how to raise an multiple request concurrently as we as in synchronously.
Please suggest me with the possible solutions that i can apply.
An suggestion will be great helpful for me.
(Sorry for my bad English)
Looking at your scenario i see that you have something like:
Database -> Web JAX-WS Server -> Multiple JAX-WS Clients
When you call from the client to the WS Server a new thread is created to handle the request and process the response for every client. Web servers are "multithread" and support multiple clients calling at same time. Your problem probably is after the WS service.
If with your WS you are trying to read the same table with two clients there is no problem but if one client tries to save when other reads or two or more clients are updating the table a Transaction lock is probably the problem.
Depending on the database configuration you should need to configure your transaction isolation options and handle carfully your database connections opening and closing your transactions only when is absolutly required
For example if you are using MySQL with InnDB (http://dev.mysql.com/doc/refman/5.0/es/innodb-transaction-isolation.html) and your Transaction isolation is "SERIALIZABLE" when you perform a query al table is locked until transaction ends and any other client is waiting until the transaction is released or a timeout is raised.
But if you hace "REPEATABLE READ" only the records readed by one transaction are locked to other transactions. This can be "good" in some environments but two SQL sentences that aplies to the same row probably cause a dead lock. * Default for MySQL InnDB
Also you should use READ COMMITED or READ UNCOMMITED to allow read all table and modify different records. Also to handle the same record with minimal problems is always recommended: "Open your transactions only the minimal required time"
Check your WS client for singleton patterns that destroys first request when you create a second request. Also check if you are using an WS with state, preserving session or another server side objects in a user session that are the same for different requests..
Related to Stateles:
Is it possible to use #WebService, #Stateless and #Singleton altogether in one EJB 3 bean?

Syncronizing 2 databases using hibernate - use save(), update() or saveOrUpdate()?

I am trying to sync multiple databases whose items have GUID for IDs, meaning that one item has the same ID on all databases.
My question is:
If i modify or create on item on 1 database, and want to synchronize this change to the other database should i:
1.) Check if the item is new or just modified, if its new then use the save() function, if its modified then use the update() function
or
2.)Do not check if its new or modified and just use the saveOrUpdate() function?
After seeing your use case in the comments, I think the best approach is to track (on both the client and server) when the last updated/last synced time was. In the event that the last sync time is null, or comes before the last updated time, you know that the data needs to be synced.
Now, on to the heart of your question: how to sync it. The client need not know the state of a server when it sends an object to you. In fact, it shouldn't. Consider the case where the client posts an object, your server receives it and process it, but the connection dies before your client receives the response. This is a very valid scenario and will result in a mis-match of data. As a result, any way that you try to determine whether or not the server has received an object (from the client) is likely to end up in a bad state.
The best solution is really to create an idempotent endpoint on the server (an upsert method, or saveOrUpdate as you referred to it in your question) which is able to determine what to do with the object. The server can query it's database by primary key to determine if it has the object or not. If it does, it can update, if not, it can insert.
Understandably, performance is important as well as the data. But, stick with primary keys in the database and that one additional select query you add should be extremely minimal (sub-10ms). If you really want to squeeze some more performance out, you could always use memcache or redis as a caching layer to determine if you have a certain GUID in your database. This way, you only have to hit memory (not your database) to determine if an object exists or not. The overhead of that would be measured only in the latency between your web server and cache server (since a memory read is incredibly cheap).
tl;dr
Upsert (or saveOrUpdate) is the way to go. Try not to track the state of one machine on another.

Categories

Resources