Java in memory data storage thread safety

Java in memory data storage thread safety - java

I'm making a real time multiplayer game server in Java. I'm storing all data for matches in memory in a HashMap with "match" objects. Each match object contains information about the game and game state for all players (anywhere from 2-5 in one match). The server will pass the same match object for each user's connection to the server.
What I'm a little concerned about is making this thread safe. Connections could be made to different threads in the server, all of which need to access the same match.
The problem with that is there would be a lot of variables/lists in the object, all of which would need to be synchronized. Some of them may need to be used to perform calculations that affect each other, meaning I would need nested synchronized blocks, which I don't want.
Is synchronized blocks for every variable in the match object my only solution, or can I do something else?
I know SQLite has an in memory mode, but the problem I found was this:
Quote from their website:
SQLite supports an unlimited number of simultaneous readers, but it will only allow one writer at any instant in time. For many situations, this is not a problem. Writer queue up. Each application does its database work quickly and moves on, and no lock lasts for more than a few dozen milliseconds. But there are some applications that require more concurrency, and those applications may need to seek a different solution
A few dozen milliseconds? That's a long time. Would that be fast enough, or is there another in memory database that would be suited for real time games?

Your architecture is off in this case. You want a set of data to be modified and updated by several threads at once, which might be possible, but is extremely difficult to get right and fast at the same time.
It would be much easier if you change the architecture like follows:
There is one thread that has exclusive access to a single match object. A thread could handle multiple match objects, but a single match object will only be handled/guarded by a single thread. Now if any external effect wants to change any values, it needs to make a "change request", but cannot change it immediately on it's own. And once the change has been implemented and the values updated, the thread guarding the match object will send out an update to the clients.
So lets say a player scores a goal, then the client thread calls a function
void clientScoredGoal(Client client) {
actionQueue.put(new GoalScoredEvent(client));
}
Where actionQueue is i.E. a BlockingQueue.
The thread handling the match objects is listening on this queue via actionQueue.take() and reacts as soon as a new action has been found. It will then apply the change, updated internal values if neccessary, and then distributes an update package (a "change request" to clients if you want).
Also in general synchronized should be considered bad practice in Java. There are certain situations where it is a good way to handle synchronization, but in like 99% of all cases using features from the Concurrent package will be by far the better solution. Notice the complete lack of synchronized in the example code above, yet it is perfectly thread-safe.

the question is very generic. It is difficult to give specific advice.
I'm making a real time multiplayer game server in Java. I'm storing all data for matches in memory in a HashMap with "match" objects.
If you want to store "match" objects in a Map and then have multiple threads requesting/adding/removing objects from the map, then you have to use a "ConcurrentHashMap".
What I'm a little concerned about is making this thread safe. Connections could be made to different threads in the server, all of which need to access the same match.
The safest and easiest way to have multithreading is to make each "match" an immutable object, then there is no need to synchronize.
If "match" information is mutable and accessed simultaneously by many threads, then you will have to synchronize. But in this case, the "mutable state" is contained within a "match", so only the class "match" will need to use synchronization.
I would need nested synchronized blocks, which I don't want.
I haven't ever seen the need to have nested synchronized blocks. perhaps you should refactor your solution before you try to make it thread safe.
Is synchronized blocks for every variable in the match object my only solution, or can I do something else? I know SQLite has an in memory mode
If you have objects with mutable state that are accessed by multiple threads, then you need to make them thread safe. there is no other way (notice that I didn't say that "synchronized blocks" is the only option. there are different ways to achieve thread safety). Using an in memory database is not the solution to your thread safety problem.
The advantage of using an in memory database is in speeding up the access to information (as you don't have to access a regular database with information stored in an HDD), but with the penalty that now your application needs more RAM.
By the way, even faster than using an in memory database would be to keep all the information that you need within objects in your program (which has the same limitation of requiring more RAM).

Related

Send Data from multiple threads to a single thread

I'm coding a Java socket server that connects to Arduino which in turn send and receive data. As shown by the Java socket documentation I've set up the server to open a new thread for every connection.
My question is, how will I be able to send the data from the socket threads to my main thread? The socket will be constantly open, so the data has to be sent while the thread is running.
Any suggestion?
Update: the goal of the server is to send commands to an Arduino (ie. Turn ligh on or off) and receive data from sensors, therefore I need a way to obtain that data from the sensors which are connected to individual threads and to send them into a single one.

Sharing data among threads is always tricky. There is no "correct" answer, it all depends on your use case. I suppose you are not searching for the highest performance, but for easiness of use, right?
For that case, I would recommend looking at synchronized collections, maps, lists or queues perhaps. One class, which seems like a good fit for you, is ConcurrentLinkedQueue.
You can also create synchronized proxies for all usual collections using the factory methods in Collections class:
Collections.synchronizedList(new ArrayList<String>());
You do not have to synchronize access to them.
Another option, which might be an overkill, is using database. There are some in-memory databases, like H2.
In any case, I suggest you to lower the amount of shared information to the lowest possible level. For example, you can keep the "raw" data separate per thread (e.g. in ThreadLocal variables) and then just synchronize during aggregation.

You seem to have the right idea - you need a thread to run the connection to the external device and you need a main thread to run your application.
How do you share data between these threads: This isn't in general a problem - different threads can write to the same memory; within the same application threads share memory space.
What you probably want to avoid is the two thread concurrently changing or reading the data - java provides a very useful keyword - synchronized - to handle this sort of situation which is straight forward to use and provides the kind of guarantees you need. This is a bit technical but discusses the concurrency features.

Here is a tutorial you might be able to get some more information on. Please note, a quick google search will bring up lots of answers to your question.
http://tutorials.jenkov.com/java-multithreaded-servers/multithreaded-server.html
In answer to your question, you can send the information from one thread to another by using a number of options - I would recommend if it is a simple setup, just use static variables/methods to pass the information.
Also as reference, for large scale programs, it is not recommended to start a thread for every connection. It works fine on smaller scale (e.g. a few number of clients), but scales poorly.

If this is a web application and you are just going to show the current readout of any of the sensors, then blocking queue is a huge overkill and will cause more problems than it solves. Just use a volatile static field of the required type. The field itself can be static, or it could reside in a singleton object, or it could be part of a context passed to the worker.
in the SharedState class:
static volatile float temperature;
in the thread:
SharedState.temperature = 13.2f;
In the web interface (assuming jsp):
<%= SharedState.temperature %>
btw: if you want to access last 10 readouts, then it's equally easy: just store an array with last 10 readouts instead of a single value (just don't modifiy what's inside the array, replace the whole array instead - otherwise synchronization issues might occur).

java application multi-threading design and optimization

I designed a java application. A friend suggested using multi-threading, he claims that running my application as several threads will decrease the run time significantly.
In my main class, I carry several operations that are out of our scope to fill global static variables and hash maps to be used across the whole life time of the process. Then I run the core of the application on the entries of an array list.
for(int customerID : customers){
ConsumerPrinter consumerPrinter = new ConsumerPrinter();
consumerPrinter.runPE(docsPath,outputPath,customerID);
System.out.println("Customer with CustomerID:"+customerID+" Done");
}
for each iteration of this loop XMLs of the given customer is fetched from the machine, parsed and calculations are taken on the parsed data. Later, processed results are written in a text file (Fetched and written data can reach up to several Giga bytes at most and 50 MBs on average). More than one iteration can write on the same file.
Should I make this piece of code multi-threaded so each group of customers are taken in an independent thread?
How can I know the most optimal number of threads to run?
What are the best practices to take into consideration when implementing multi-threading?

Should I make this piece of code multi-threaded so each group of customers are taken
in an independent thread?
Yes multi-threading will save your processing time. While iterating on your list you can spawn new thread each iteration and do customer processing in it. But you need to do proper synchronization meaning if two customers processing requires operation on same resource you must synchronize that operation to avoid possible race condition or memory inconsistency issues.
How can I know the most optimal number of threads to run?
You cannot really without actually analyzing the processing time for n customers with different number of threads. It will depend on number of cores your processor has, and what is the actually processing that is taking place for each customer.
What are the best practices to take into consideration when implementing multi-threading?
First and foremost criteria is you must have multiple cores and your OS must support multi-threading. Almost every system does that in present times but is a good criteria to look into. Secondly you must analyze all the possible scenarios that may led to race condition. All the resource that you know will be shared among multiple threads must be thread-safe. Also you must also look out for possible chances of memory inconsistency issues(declare your variable as volatile). Finally there are something that you cannot predict or analyze until you actually run test cases like deadlocks(Need to analyze Thread dump) or memory leaks(Need to analyze Heap dump).

The idea of multi thread is to make some heavy process into another, lets say..., "block of memory".
Any UI updates have to be done on the main/default thread, like print messenges or inflate a view for example. You can ask the app to draw a bitmap, donwload images from the internet or a heavy validation/loop block to run them on a separate thread, imagine that you are creating a second short life app to handle those tasks for you.
Remember, you can ask the app to download/draw a image on another thread, but you have to print this image on the screen on the main thread.
This is common used to load a large bitmap on a separated thread, make math calculations to resize this large image and then, on the main thread, inflate/print/paint/show the smaller version of that image to te user.
In your case, I don't know how heavy runPE() method is, I don't know what it does, you could try to create another thread for him, but the rest should be on the main thread, it is the main process of your UI.
You could optmize your loop by placing the "ConsumerPrinter consumerPrinter = new ConsumerPrinter();" before the "for(...)", since it does not change dinamically, you can remove it inside the loop to avoid the creating of the same object each time the loop restarts : )

While straight java multi-threading can be used (java.util.concurrent) as other answers have discussed, consider also alternate programming approaches to multi-threading, such as the actor model. The actor model still uses threads underneath, but much complexity is handled by the actor framework rather than directly by you the programmer. In addition, there is less (or no) need to reason about synchronizing on shared state between threads because of the way programs using the actor model are created.
See Which Actor model library/framework for Java? for a discussion of popular actor model libraries.

shared data for multi thread in java

i have a problem when i write a program on android for monitoring ecg real time.
Ecg data is transfered to mobile in real time by udp. In mobile, there have 2 thread: a thread gets ecg data transfered, a thread draws the ecg data.
Cicurlar buffer is common data for two thread above, and two threads always confict when read and write to buffer. And result is that ecg is lost or slow.
Before user cicurlar buffer, i had used 5 linkedblockingqueu but result was same.
Can any one give me some solution for data for multithread in my program?
Thank you.
Sorry, my english is not good.!
there is model when i used linkedblockingqueue:

You need to synchronize access to your data using a shared lock. I highly recommend Java Concurrency in Practice if you want to truly understand threading and concurrency models in Java.

I think, Synchronization is the solution for your problem.
Threads communicate primarily by
sharing access to fields and the
objects reference fields refer to.
This form of communication is
extremely efficient, but makes two
kinds of errors possible: thread
interference and memory consistency
errors. The tool needed to prevent
these errors is synchronization.
From the JavaDoc's BlockingQueue
BlockingQueue implementations are
thread-safe. All queuing methods
achieve their effects atomically using
internal locks or other forms of
concurrency control. However, the bulk
Collection operations addAll,
containsAll, retainAll and removeAll
are not necessarily performed
atomically unless specified otherwise
in an implementation. So it is
possible, for example, for addAll(c)
to fail (throwing an exception) after
adding only some of the elements in c.

In my assumption you are directly accessing the collection (Any Fifo based), you must try to make a bean which should have getter and setters for data not for collection and the collection should be define in bean. you can create the bean object before you create thread objects and pass the bean object to threads at contructing time, hope this will you.

Why does unsynchronization make ArrayList faster and less secure?

I read the following statement:
ArrayLists are unsynchronized and therefore faster than Vector, but less secure in a multithreaded environment.
I would like to know why unsynchronization can improve the speed, and why it will be less secure?

I will try to address both of your questions:
Improve speed
If the ArrayList were synchronized and multiple threads were trying to read data out of the list at the same time, the threads would have to wait to get an exclusive lock on the list. By leaving the list unsynchronized, the threads don't have to wait and the program will run faster.
Unsafe
If multiple threads are reading and writing to a list at the same time, the threads can have unstable view of the list, and this can cause instability in multi-threaded programs.

The whole point of synchronization is that it means only one thread has access to an object at any given time. Take a box of chocolates as an example. If the box is synchronized (Vector), and you get there first, no one else can take any and you get your pick. If the box is NOT synchronized (ArrayList), anyone walking by can snag a chocolate - It will disappear faster, but you may not get the ones you want.

ArrayLists are unsynchronized and
therefore faster than Vector, but less
secure in a multithreaded environment.
I would like to know why
unsynchronization can improve the
speed,and why it will be less secure?
When multiple threads are reading/writing to a shared memory location, the program might compute incorrect results due to lack of mutual exclusion and proper visibility. Hence lack of synchronization is considered "unsafe". This blog post by Jeremy Manson might provide a good introduction to the topic.
When the JVM executes a synchronized method, it makes sure that the current thread has an exclusive lock on the object on which the method is invoked. Similarly when the method finishes execution, the JVM releases the lock held by the executing thread. Synchronized methods provide mutual exclusion and visibility guarantees - and is important for "safety" (i.e. guaranteeing correctness) of the executing code. But, if only one thread is ever accessing the methods of the object, there is no safety issues to worry about. Although the JVM performance has improved over the years, uncontended synchronization (i.e. locking/unlocking of objects accessed by only one thread) still takes non-zero amount of time. For unsynchronized methods, the JVM does not pay this extra penalty - hence they are faster than their synchronized counterparts.
Vectors force their choice on you. All methods are synchronized and it is difficult to use them incorrectly. But when Vectors are used in a single-threaded context, you pay the price for the extra synchronization unnecessarily. ArrayLists leave the choice to you. When used in the multi-threaded context, it is up to you (the programmer) to correctly synchronizing the code; but when used in a single-threaded context you are guaranteed not to pay any extra synchronization overhead.
Also, when an collection is populated initially, and read subsequently ArrayLists perform better even in a multi-threaded context. For example, consider this method:
public synchronized List<String> getList() {
List<String> list = new Vector<String>();
list.add("Foo");
list.add("Bar");
return Collections.unmodifiableList(list);
}
A list is created, populated, and an immutable view of it is safely published. Looking at the code above it is clear that all subsequent uses of this list are reads and won't need any synchronization even when used by multiple threads - the object is effectively immutable. Using a Vector here incurs the synchronization overhead even for reads where it is not needed; using an ArrayList instead would perform better.

Data structures that synchronize use locks (or other synchronization constructs) to ensure that their data is always in a consistent state. Oftentimes, this requires that one or more threads wait on another thread to finish updating the structure's state, which will then reduce performance, since a wait has been introduced where before there was none.

2 threads can modify the list at the same time and add a new item or delete/modify the same item in the list at the same time because no synchronization (or lock mechanism if you prefer) exists. So imagine you delete one item of the list while somebody else is trying to work with it or you modify an item while someone uses it, it's not very secure.
http://download.oracle.com/javase/1.4.2/docs/api/java/util/ArrayList.html
Read the "Note that this implementation is not synchronized." paragraph, it explains a bit better.
And I forgot, considering speed, it seems quite trivial to imagine that when you try to control the access to a data, you add some mechanisms that prevent other people from accessing your data. Thus, you add some more computations so it is slower...

Non-blocking data structures will be faster than ones that bock, because of that fact. With blocking data structures, if a resources is acquired by some entity it will take time for another entity to acquire that same resource, once it becomes available.
However, this can be less secure in some instances depending on the situation. The main points of contention are during writes. If it can be guaranteed that the data contained in a data structure will not change it has been added and will only be accessed to read the value than there will not be a problem. The issues arise when there is a conflict between a write and a read, or a write and a write.

Should volatile be used for attributes of domain model classes in Java web apps?

Here's my thinking:
Even though a HTTP request cycle is essentially handled by a 'single thread', each time a HTTP request is processed for that same session it is likely to be processed by a different thread from the thread pool.
Without the volatile keyword being used on a domain model object, whose lifecycle extends across multiple HTTP requests for the same session, then, according to my understanding, isn't it possible that the attribute could be thread local cached (an optimization by the compiler) in the thread that serviced the first HTTP request? If the second HTTP request is serviced by another thread then that second thread may not see the changes in that attribute that were made by the first thread.
Does this spell "Danger Will Robinson"? Or am I missing a vital plot point about the use (or not) of the volatile keyword?

I think you are forgetting that the threads handling the HTTP request first need to retrieve the instance of the domain model object from the HttpSession provided by your application server. The thread handling request 2 in the scenario you describe does not already have an instance of this domain model - it has to retrieve it from the session implementation at the start of handling each and every request.
I think it is completely reasonable to assume that the session-handling implementation in your application server is handling session data in such a way that memory model visibility issues are avoided. Apache Tomcat's default (non-clustered) HttpSession implementation, for example, stores the session attributes in a ConcurrentHashMap.
Adding volatile seems completely unnecessary to me. I have never seen this done for domain model objects handled by HTTP requests in a Servlet environment in any project I have worked in.
This would be a different story if thread-1 and thread-2 had references to the same object instance simulatenously while processing two different requests, and you were concerned about changes in one thread being visible to the other as each are processing the request, but this does not sound like what you are asking about.

Yes, if you are sharing an object between different threads, you may have race conditions. Without a happens before relationship, writes made by one thread may not be seen by a read in another thread.
Doing a volatile write in one thread and doing a volatile read of the same field in another thread establishes a happens before relationship between the two threads, and ensures visibility of the write.
This is a complicated problem, simply using a volatile keyword is probably not a good solution.

I think your understanding of it is correct. Given your description I would say it should be used. If its something more than a primitive type I would rather synchronize.
Good information on volatile:
http://www.javamex.com/tutorials/synchronization_volatile_when.shtml

If you have a mutable object in session, that is trouble. But usually the solution is not to guard individual fields; rather the entire object should be swapped.
Say you have the user object in the session. Most requests simply retrieve it, read it and display it.
There is a request that can modify user information. It would be a really bad idea to retrieve the user object, modify it. It's better to create complete new user object, and insert it into session.
In that case, fields in User don't need any protection; thread safety is guaranteed by session setAttribute() - getAttribute()

If you have concurrency issues, just adding 'volatile' probably won't help you.
As for keeping the object as an attribute of Session, I'd recommend you to keep just the object's ID, and use it to retrieve a 'live' instance when you need it (if you use Hibernate, successive retrieves will return the same object, so this shouldn't cause performance problems). Encapsulate all modification logic to this specific object into a single façade, and do the control concurrency there, using dababase locking.
Or, if you really, really, really want to use memory-based locking, and are really sure that you'll never have two instances of the application running in a cluster, make sure that your façade logic is synchronized at the right level. If your synchronization is too fine grained (low-level operations, such as volatile variables), it probably won't be enough to make your code thread-safe. For example, java.util.Hashtable is fully synchronized, but it doesn't mean anything if you have logic like this:
01 if (!hashtable.containsKey(key)) {
02 hashtable.put(key, calculate(key));
03 }
If two threads, say, t1 and t2, hit this block at the same time, t1 may execute line 01, then t2 may also execute 01, and then 02, and t1 then will execute 02, overwriting what t2 had done. The operations containsKey() and put() are atomic individually, but what should be atomic is the whole block.
Sometimes recalculating a value doesn't matter, but sometimes it does, and it will break.
When it comes to concurrency, there's no magic. I mean, seam some crappy frameworks try to sell you the idea that they solve this problem for you. They don't. Even if it works 99% of the time, it will break spectacularly when you go to production and start to get heavy traffic. Or (much, much) worse, it will silently generate wrong results.
Concurrency is one of the most complex problems in programming. And the only way to handle it is to avoid it. All this functional programming trend is not about dealing with concurrency, is about avoiding it altogether.

It turns out that volatile was not needed in the end. The problem that "appeared" to be fixed with volatile was actually a very subtle timing sensitive bug that was fixed in a much more elegant and proper way ;)
So sbrigdes was correct when he said "simply using a volatile keyword is probably not a good solution."

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.