Game loop using threads and synchronization

Game loop using threads and synchronization - java

I've been muddling through the internet and my own code in an attempt to write a game-loop I'm satisfied with (I'm picky).
I implemented DeWitter's loop but decided I didn't like interpolation for many different reasons. I don't find it a practical solution.
Anyway, I would like to create two threads, one for updating and one for rendering. I would regulate their execution with minimum and maximum looping intervals and a call to sleep. Then all I would have to deal with would be synchronization.
Is this a reasonable loop? Any major problems that would arise?
It seems to be the only implementation I can think of so far that would give me all the things I'm looking for.

The idea of running dual threads like this is supposedly to increase performance but in reality implementing it is extremely difficult.
Obviously you can't have both threads accessing the same objects and variables at the same time, so you setup synchronization to make either thread wait it's turn but then why bother having dual threads at all? It's self-defeating.
Unless your game is truly massive and eats cpu like nobody's business, then I'd just execute logic and rendering on a single thread, profile it's performance and fine tune it.
Just my 2 cents.

Related

Downsides of structuring all multi-threading CSP-like

Disclaimer: I don't know much about the theoretical background of CSP.
Since I read about it, I tend to structure most of my multi-threading "CSP-like", meaning I have threads waiting for jobs on a BlockingQueue.
This works very well and simplified my thinking about threading a lot.
What are the downsides of this approach?
Can you think of situations where I'm performance-wise better off with a synchronized block?
...or Atomics?
If I have many threads mostly sleeping/waiting, is there some kind of performance impact, except the memory they use? For example during scheduling?

This is one possibly way to designing the architecture of your code to prevent thread issues from even happening, this is however not the only one and sometimes not the best one.
First of all you obviously need to have a series of tasks that can be splitted and put into such a queue, which is not always the case if you for example have to calculate the result of a single yet very straining formula, which just cannot be taken apart to utilize multi-threading.
Then there is the issue if the task at hand is so tiny, that creating the task and adding it into the list is already more expensive than the task itself. Example: You need to set a boolean flag on many objects to true. Splittable, but the operation itself is not complex enough to justify a new Runnable for each boolean.
You can of course come up with solutions to work around this sometimes, for example the second example could be made reasonable for your approach by having each thread set 100 flags per execution, but then this is only a workaround.
You should imagine those ideas for threading as what they are: tools to help you solve your problem. So the concurrent framework and patters using those are all together nothing but a big toolbox, but each time you have a task at hand, you need to select one tool out of that box, because in the end putting in a screw with a hammer is possible, but probably not the best solution.
My recommendation to get more familiar with the tools is, that each time you have a problem that involves threading: go through the tools, select the one you think fits best, then experiment with it until you are satisfied that this specific tool fits the specific task best. Prototyping is - after all - another tool in the box. ;)

What are the downsides of this approach?
Not many. A queue may require more overhead than an uncontended lock - a lock of some sort is required internally by the queue classs to protect it from multiple access. Compared with the advantages of thread-pooling and queued comms in general, some extra overhead does not bother me much.
better off with a synchronized block?
Well, if you absolutely MUST share mutable data between threads :(
is there some kind of performance impact,
Not so anyone would notice. A not-ready thread is, effectively, an extra pointer entry in some container in the kernel, (eg. a queue belonging to a semaphore). Not worth bothering about.

You need synchronized blocks, Atomics, and volatiles whenever two or more threads access mutable data. Keep this to a minimum and it needn't affect your design. There are lots of Java API classes that can handle this for you, such as BlockingQueue.
However, you could get into trouble if the nature of your problem/solution is perverse enough. If your threads try to read/modify the same data at the same time, you'll find that most of your threads are waiting for locks and most of your cores are doing nothing. To improve response time you'll have to let a lot more threads run, perhaps forgetting about the queue and letting them all go.
It becomes a trade off. More threads chew up a lot of CPU time, which is okay if you've got it, and speed response time. Fewer threads use less CPU time for a given amount of work (but what will you do with the savings?) and slow your response time.
Key point: In this case you need a lot more running threads than you have cores to keep all your cores busy.
This sort of programming (multithreaded as opposed to parallel) is difficult and (irreproducible) bug prone, so you want to avoid it if you can before you even start to think about performance. Plus, it only helps noticably if you've got more than 2 free cores. And it's only needed for certain sorts of problems. But you did ask for downsides, and it might pay to know this is out there.

Simple Multi-Threading in Java

Currently, I'm running on a thread-less model that isn't working simply because I'm running out of memory before I can process the data I'm being handed. I've made all the changes that I can to optimize the code, and it's still just not quite quick enough.
Clearly I should move on to a threaded model. I'm wondering what the simplest, easiest way to do the following is:
The main thread passes some info to the worker
That worker performs some work that I'll refactor out of the main method
The workers will disappear and new ones will be instantiated when needed
I've never worked with java threading and from what I've read up on it seems pretty complicated, even if what I'm looking for seems pretty simple.

If you have multiple independent units of work of equal priority, the best solution is generally some sort of work queue, where a limited number of threads (the number chosen to optimize performance) sit in a while(true) loop dequeuing work units from the queue and executing them.
Generally the optimum number of threads is going to be the number of processors +/- 1, though in some cases a larger number will be optimal if the threads tend to get stalled by disk I/O requests or some such.
But keep in mind that tuning the entire system may be required. Eg, you may need more disk arms, and certainly more RAM may be required.

I'd start by having a read through Java Concurrency as refresher ;)
In particular, I would spend some time getting to know the Executors API as it will do most of what you've described without a lot of the overhead of dealing with to many locks ;)

Distributing the memory consumption to multiple threads will not change overall memory consumption. From what I read out of your question, I would like to step forward and tell you: Increase the heap of the Java engine, this will help. Looks like you have to optimize the Java startup parameters and not your code. If I am wrong, then you will have to buffer the data. To Disk! Not to a thread in the same memory model.

Java threading objects

I've created an object of arrays with a size of 1000, they are all threaded so that means 1000 threads are added. Each object holds a socket and 9 more global variables. The whole object consists of 1000 lines of code.
I'm looking for ways to make the program efficient because it lags. CPU use is at 100% everytime I start the program.
I understand that I'm going to have to change the way the program works, but I can't find a good way. Can anyone explain how to achieve this?

It depends on what your threads actually do - are the tasks primarily using CPU or other resources? For CPU intensive tasks, the best strategy is to run as many threads as you have cores, or a few more. For threads which are blocking a lot on e.g. reading files, waiting for the net etc. you can have many more threads than CPUs.
It also depends on how many cores the system has. Obviously the answer is very different for a single processor machine than for a 128-way multiprocessor. The above rules of thumb can give you some estimates, but it is best to make experiments yourself based on these, to figure out the ideal number of threads for your specific setup.
Moreover, since Java5, it is always advisable to use e.g. a ThreadPoolExecutor instead of creating your threads manually. This makes your app both more robust and more flexible.

1/ use thread pool
2/ use futures

You should consider refactor you usage of threads.
1000 Threads normally makes no sense on a normal machine/server although your problem seems to be I/O-heavy. You should consider the number of cpu-threads that are available.
A possible solution would be to use a dispatcher that passes the handling (and possible responding) to a request on the socket into a queue of a ThreadPoolExecutor.

From my experience, 1000 threads are just too many (at least on 8core/8GB RAM machines). A common symptom is context switching slashing, where your OS is just busy jumping from thread to thread while doing little useful work (and a lot of memory is wasted etc.).
If you have to maintain 1000 sockets, you probably have to go for NIO. Easier way out would be closing/opening sockets every time (whether you can do this dependents on the characteristics of your work.).
The way you solve this many thread problem is to use a thread pool, as others note. Instead of extending Thread, code a Runnable instead. This is easier said than done though because you have to maintain state if you need conversation. This commonly involves a ConcurrentMap. I personally tend to put a Handler (which implements Runnable) on this map that should run when the counter party returns a response (the response contains a key everytime). In this case you'd be closing the socket every time. If you use NIO, it's more like coding with Threads in the sense you don't need to identify the counterparty like this, but it has its own complexity.

How can I make sure N threads run at roughly the same speed?

I'm toying with the idea of writing a physics simulation software in which each physical element would be simulated in its own thread.
There would be several advantages to this approach. It would be conceptually very close to how the real world works. It would be much easier to scale the system to multiple machines.
However, for this to work I need to make sure that all threads run at the same speed, with a rather liberal interpretation of 'same'. Say within 1% of each others.
That's why I don't necessarily need a Thread.join() like solution. I don't want some uber-controlling school mistress that ensures all threads regularly synchronize with each others. I just need to be able to ask the runtime (whichever it is---could be Java, Erlang, or whatever is most appropriate for this problem) to run the threads at a more or less equal speed.
Any suggestions would be extremely appreciated.
UPDATE 2009-03-16
I wanted to thank everyone who answered this question, in particular all those whose answer was essentially "DON'T DO THIS". I understand my problem much better now thanks to everybody's comments and I am less sure I should continue as I originally planned. Nevertheless I felt that Peter's answer was the best answer to the question itself, which is why I accepted it.

You can't really do this without coordination. What if one element ended up needing cheaper calculations than another (in a potentially non-obvious way)?
You don't necessarily need an uber-controller - you could just keep some sort of step counter per thread, and have a global counter indicating the "slowest" thread. (When each thread has done some work, it would have to check whether it had fallen behind the others, and update the counter if so.) If a thread notices it's a long way ahead of the slowest thread, it could just wait briefly (potentially on a monitor).
Just do this every so often to avoid having too much overhead due to shared data contention and I think it could work reasonably well.

You'll need some kind of synchronization. CyclicBarrier class has what you need:
A synchronization aid that allows a
set of threads to all wait for each
other to reach a common barrier point.
CyclicBarriers are useful in programs
involving a fixed sized party of
threads that must occasionally wait
for each other. The barrier is called
cyclic because it can be re-used after
the waiting threads are released.
After each 'tick', you can let all your threads to wait for others, which were slower. When remaining threads reach the barrier, they all will continue.

Threads are meant to run completely independent of each other, which means synchronizing them in any way is always a pain. In your case, you need a central "clock" because there is no way to tell the VM that each thread should get the same amount of ... uh ... what should it get? The same amount of RAM? Probably doesn't matter. The same amount of CPU? Are all your objects so similar that each needs the same number of assembler instructions?
So my suggestion is to use a central clock which broadcasts clock ticks to every process. All threads within each process read the ticks (which should be absolute), calculate the difference to the last tick they saw and then update their internal model accordingly.
When a thread is done updating, it must put itself to sleep; waiting for the next tick. In Java, use wait() on the "tick received" lock and wake all threads with "notifyAll()".

I'd recommend not using threads wherever possible because they just add problems later if you're not careful. When doing physics simulations you could use hundreds of thousands of discrete objects for larger simulations. You can't possibly create this many threads on any OS that I know of, and even if you could it would perform like shit!
In your case you could create a number of threads, and put an event loop in each thread. A 'master' thread could sequence the execution and post a 'process' event to each worker thread to wake it up and make it do some work. In that way the threads will sleep until you tell them to work.
You should be able to get the master thread to tick at a rate that allows all your worker threads to complete before the next tick.
I don't think threads are the answer to your problem, with the exception of parallelising into a small number of worker threads (equal to the number of cores in the machine) which each linearly sequence a series of physical objects. You could still use the master/event-driven approach this way, but you would remove a lot of the overhead.

Please don't. Threads are an O/S abstraction permitting the appearance of parallel execution. With multiple and multicore CPU's, the O/S can (but need not) distribute threads among the different cores.
The closest thing to your scalability vision which I see as workable is to use worker threads, dimensioned to roughly match the number of cores you have, and distribute work among them. A rough draft: define a class ActionTick which does the updating for one particle, and let the worker thread pick ActionTicks to process from a shared queue. I see several challenges even with such a solution.
Threading overheads: you get context switching overhead among different worker threads. Threads by themselves are expensive (if not actually as ruinous as processes): test performance with different thread pool sizes. Adding more threads beyond the number of cores tends to reduce performance!
Synchronization costs: you get several spots of contention: access to the work queue for one, but worse, access to the simulated world. You need to delimit the effects of each ActionTick or implement a lot of locking/unlocking.
Difficulty of optimizating the physics. You want to delimit the number of objects/particles each ActionTick looks at (distance cut-off? 3D-tree-subdivision of the simulation space?). Depending on the simulation domain, you may be able to eliminate a lot of work by examining whether any changes is even needed in a subset of items. Doing these kinds of optimizations is easier before queueing work items, rather than as a distributed algorithm. But then that part of your simulation becomes a potential scalability bottleneck.
Complexity. Threading and concurrency introduces several cans of worms to a solution. Always consider other options first -- but if you need them, try threads before creating your own work item scheduling, locking and execution strategies...
Caveat: I haven't worked with any massive simulation software, just some hobbyist code.

As you mention, there are many "DON'T DO THIS" answers. Most seem to read threads as OS threads used by Java. Since you mentioned Erlang in your post, I'd like to post a more Erlang-centered answer.
Modeling this kind of simulation with processes (or actors, micro threads, green threads, as they are sometimes called) doesn't necessarily need any synchronization. In essence, we have a couple of (most likely thousands or hundreds of thousands) physics objects that need to be simulated. We want to simulate these objects as realistically as possible, but there is probably also some kind of real time aspect involved (doesn't have to be though, you don't mention this in your question).
A simple solution would be to spawn of an Erlang process for each object, sent ticks to all of them and collect the results of the simulation before proceeding with the next tick. This is in practice synchronizing everything. It is of course more of a deterministic solution and does not guarantee any real time properties. It is also non-trivial how the processes would talk to each other to get the data they need for the calculations. You probably need to group them in clever ways (collision groups etc), have hibernated processes (which Erlang has neat support for) for sleeping objects, etc to speed things up.
To get real time properties you probably need to restrain the calculations performed by the processes (trading accuracy for speed). This could perhaps be done by sending out ticks without waiting for answers, and letting the object processes reply back to each tick with their current position and other data you need (even though it might only be approximated at the time). As DJClayworth says, this could lead to errors accumulating in the simulation.
I guess in one sense, the question is really about if it is possible to use the strength of concurrency to gain some kind of advantage here. If you need synchronization, it is a quite strong sign that you do not need concurrency between each physics object. Because you essentially throw away a lot of computation time by waiting for other processes. You might use concurrency during calculation but that is another discussion, I think.
Note: none of these ideas take the actual physics calculations into account. This is not Erlang strong side and could perhaps be performed in a C library or whatever strikes your fancy, depending on the type of characteristics you want.
Note: I do not know of any case where this has been done (especially not by me), so I cannot guarantee that this is sound advice.

Even with perfect software, hardware will prevent you doing this. Hardware threads typically don't have fair performance. Over a short period, you are lucky if threads run within +-10% performance.
The are, of course, outliers. Some chipsets will run some cores in powersaving mode and others not. I believe one of the Blue Gene research machines had software controlled scheduling of hardware threads instead of locks.

Erlang will by default try and spread its processes evenly over the available threads. It will also by default try to run threads on all available processors. So if you have enough runnable Erlang processes then you will get a relatively even balance.

I'm not a threading expert, but isn't the whole point of threads that they are independent from each other - and non-deterministic?

I think you have a fundamental misconception in your question where you say:
It would be conceptually very close to how the real world works
The real world does not work in a thread-like way at all. Threads in most machines are not independent and not actually even simultaneous (the OS will use context-switching instead). They provide the most value when there is a lot of IO or waiting occurring.
Most importantly, the real-world does not "consume more resources" as more complex things happen. Think of the difference between two objects falling from a height, one falling smoothly and the other performing some kind of complex tumbling motion...

I would make a kind of "clock generator" - and would register every new object/thread there. The clock will notify all registered objects when the delta-t has passed.
However this does not mean you need a separate thread for every object. Ideally you will have as many threads as processors.
From a design point of you could separate the execution of the object-tasks through an Executor or a thread-pool, e.g. when an object receives the tick event, it goes to a thread pool and schedules itself for execution.

Two things has to happen in order to achieve this. You have to assure thah you have equal number of threads per CPU core, and you need some kind of synchronization.
That sync can be rather simple, like checking "cycle-done" variable for each thread while performing computation, but you can't avoid it.

Working at control for motors i have used some math to maintain velocity at stable state.
The system have PID control, proportional, integral and derivative. But this is analog/digital system. Maybe can use similarly to determine how mush time each thread must run, but the biggest tip I can give you is that all threads will each have a clock synchronization.

I'm first to admit I'm not a threading expert, but this sounds like a very wrong way to approach simulation. As others have already commented having too many threads is computationally expensive. Furthermore, if you are planing to do what I think you are thinking of doing, your simulation may turn out to produce random results (may not matter if you are making a game).
I'd go with a few worker threads used to calculate discrete steps of the simulation.

How expensive is Java Locking?

In general, how expensive is locking in Java?
Specifically in my case: I have a multi-threaded app in which there is one main loop that takes objects off a DelayQueue and processes them (using poll()). At some point a different thread will have to remove errant elements from the queue (using remove()).
Given that the remove() is relatively uncommon, I am worried that locking on each poll() will result in slow code. Are my worries justified?

They are not justified unless you profile your app and find that this is a bottleneck.
Generally speaking uncontested locking (i.e. locks that don't have to wait for someone to release it most of the time) have become a lot cheaper with some changes in Java 5 and Java 6.
Implement it safe and simple and profile if it's fast enough.

Have you taken some measurements and found that locking is too slow? No? Then it isn’t.
Honestly, though: too many people worry about too many irrelevant things. Get your code working before you worry about things like whether “++i” is faster than “i++” or similar stuff.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.