Java Unisex bathroom using semaphores & monitors

Java Unisex bathroom using semaphores & monitors - java

I've been set an assignment for concurrent programming, to code a Unisex Toilet. It seems to be a common enough assignment for this subject. For those unfamiliar, the rule are set (in this case at least)
The bathroom can be used by both men and women but not both at the same time.
The most people you can have in the toilet at once are 5.
If you use a semaphore, you must implement it yourself.
I haven’t coded in a long time before this subject, and my knowledge is rusty. I initially coded this up with just counters, and had problems understanding the flow of information though the programme and that version got nowhere!
So I’m starting again, and am looking to know what general way I should go about this. My initial idea is to have 1 toilet, implemented with a binary semaphore, with monitor for the bathroom, limited to 5.
I've also read that the idea behind the problem lends its self best to an implmentation where each person is a thread. If this is the case, it could be messy, as I've tried thread pooling once (few week back) and it never ran for me. :-/
For the moment I’ve no code to show, so an outline as to how things are set up is my biggest concern.

Looks like this problem is already been solved in java by
Cormac Redmond

Related

Saving commonly called properties in variables, in Java?

When I was learning C, I was taught to do stuff, say, if I wanted to loop through a something strlen(string) times, I should save that value in an 'auxiliary' variable, say count and put that in the for condition clause instead of the strlen, as I'd save the need of processing that many times.
Now, when I started learning Java, I noticed this is quite not the norm. I've seen lots of code and programmers doing just what I was told not to do in C.
What's the reason for this? Are they trading efficiency for 'readibility'? Or does the compiler manage to fix that?
E: This is NOT a duplicate to the linked question. I'm not simply asking about string length, mine is a more general question.

In the old times, every function call was expensive, compilers were dumb, usable profilers yet to come, and computers slow. This way C macros and other terrible things were born. Java is not that old.
Efficiency is important, but the impact of most program parts on efficiency is very small. But reading code still needs programmers time and this is much more costly than CPU. So we'd better optimize for readability most of the time and care about speed just in the most important places.
A local variable can make the code simpler, when it avoids repetitions of complicated expressions - this happens sometimes. It can make it faster, when it avoids expensive computation which the compiler can't do - this happens rather rarely. When neither condition is met, it's just a wasted line, so why bother?

Downsides of structuring all multi-threading CSP-like

Disclaimer: I don't know much about the theoretical background of CSP.
Since I read about it, I tend to structure most of my multi-threading "CSP-like", meaning I have threads waiting for jobs on a BlockingQueue.
This works very well and simplified my thinking about threading a lot.
What are the downsides of this approach?
Can you think of situations where I'm performance-wise better off with a synchronized block?
...or Atomics?
If I have many threads mostly sleeping/waiting, is there some kind of performance impact, except the memory they use? For example during scheduling?

This is one possibly way to designing the architecture of your code to prevent thread issues from even happening, this is however not the only one and sometimes not the best one.
First of all you obviously need to have a series of tasks that can be splitted and put into such a queue, which is not always the case if you for example have to calculate the result of a single yet very straining formula, which just cannot be taken apart to utilize multi-threading.
Then there is the issue if the task at hand is so tiny, that creating the task and adding it into the list is already more expensive than the task itself. Example: You need to set a boolean flag on many objects to true. Splittable, but the operation itself is not complex enough to justify a new Runnable for each boolean.
You can of course come up with solutions to work around this sometimes, for example the second example could be made reasonable for your approach by having each thread set 100 flags per execution, but then this is only a workaround.
You should imagine those ideas for threading as what they are: tools to help you solve your problem. So the concurrent framework and patters using those are all together nothing but a big toolbox, but each time you have a task at hand, you need to select one tool out of that box, because in the end putting in a screw with a hammer is possible, but probably not the best solution.
My recommendation to get more familiar with the tools is, that each time you have a problem that involves threading: go through the tools, select the one you think fits best, then experiment with it until you are satisfied that this specific tool fits the specific task best. Prototyping is - after all - another tool in the box. ;)

What are the downsides of this approach?
Not many. A queue may require more overhead than an uncontended lock - a lock of some sort is required internally by the queue classs to protect it from multiple access. Compared with the advantages of thread-pooling and queued comms in general, some extra overhead does not bother me much.
better off with a synchronized block?
Well, if you absolutely MUST share mutable data between threads :(
is there some kind of performance impact,
Not so anyone would notice. A not-ready thread is, effectively, an extra pointer entry in some container in the kernel, (eg. a queue belonging to a semaphore). Not worth bothering about.

You need synchronized blocks, Atomics, and volatiles whenever two or more threads access mutable data. Keep this to a minimum and it needn't affect your design. There are lots of Java API classes that can handle this for you, such as BlockingQueue.
However, you could get into trouble if the nature of your problem/solution is perverse enough. If your threads try to read/modify the same data at the same time, you'll find that most of your threads are waiting for locks and most of your cores are doing nothing. To improve response time you'll have to let a lot more threads run, perhaps forgetting about the queue and letting them all go.
It becomes a trade off. More threads chew up a lot of CPU time, which is okay if you've got it, and speed response time. Fewer threads use less CPU time for a given amount of work (but what will you do with the savings?) and slow your response time.
Key point: In this case you need a lot more running threads than you have cores to keep all your cores busy.
This sort of programming (multithreaded as opposed to parallel) is difficult and (irreproducible) bug prone, so you want to avoid it if you can before you even start to think about performance. Plus, it only helps noticably if you've got more than 2 free cores. And it's only needed for certain sorts of problems. But you did ask for downsides, and it might pay to know this is out there.

Multi-thread state visibility in Java: is there a way to turn the JVM into the worst case scenario?

Suppose our code has 2 threads (A and B) have a reference to the same instance of this class somewhere:
public class MyValueHolder {
private int value = 1;
// ... getter and setter
}
When Thread A does myValueHolder.setValue(7), there is no guarantee that Thread B will ever read that value: myValueHolder.getValue() could - in theory - keep returning 1 forever.
In practice however, the hardware will clear the second level cache sooner or later, so Thread B will read 7 sooner or later (usually sooner).
Is there any way to make the JVM emulate that worst case scenario for which it keeps returning 1 forever for Thread B? That would be very useful to test our multi-threaded code with our existing tests under those circumstances.

jcstress maintainer here. There are multiple ways to answer that question.
The easiest solution would be wrapping the getter in the loop, and let JIT hoist it. This is allowed for non-volatile field reads, and simulates the visibility failure with compiler optimization.
More sophisticated trick involves getting the debug build of OpenJDK, and using -XX:+StressLCM -XX:+StressGCM, effectively doing the instruction scheduling fuzzing. Chances are the load in question will float somewhere you can detect with the regular tests your product has.
I am not sure if there is practical hardware holding the written value long enough opaque to cache coherency, but it is somewhat easy to build the testcase with jcstress. You have to keep in mind that the optimization in (1) can also happen, so we need to employ a trick to prevent that. I think something like this should work.

It would be great to have a Java compiler that would intentionally perform as many weird (but allowed) transfirmations as possible to be able to break thread unsafe code more easily, like Csmith for C. Unfortunately, such a compiler does not exist (as far as I know).
In the meantime, you can try the jcstress library* and exercise your code on several architectures, if possible with weaker memory models (i.e. not x86) to try and break your code:
The Java Concurrency Stress tests (jcstress) is an experimental harness and a suite of tests aid research in the correctness of concurrency support in the JVM, class libraries, and hardware.
But in the end, unfortunately, the only way to prove that a piece of code is 100% correct is code inspection (and I don't know of a static code analysis tool able to detect all race conditions).
*I have not used it and I am unclear which of jcstress and the java-concurrency-torture library is more up to date (I would suspect jcstress).

Not on a real machine, sadly testing multi-threaded code will remain difficult.
As you say, the hardware will clear the second level cache and the JVM has no control over that. The JSL only specifies what must happen and this is a case where B might never see the updated value of value.
The only way to force this to happen on a real machine is to alter the code in such a way to void your testing strategy i.e. you end up testing different code.
However, you might be able to run this on a simulator that simulates hardware that doesn't clear the second level cache. Sounds like a lot of effort though!

I think you are refering to the principle called "false sharing" where different CPUs must synchronize their caches or else face the possibility that data such as you describe could become mismatched. There is a very good article on false sharing on Intel's website. Intel describes some useful tools in their article for diagnosing this problem. This is a relevant quote:
The primary means of avoiding false sharing is through code
inspection. Instances where threads access global or dynamically
allocated shared data structures are potential sources of false
sharing. Note that false sharing can be obscured by the fact that
threads may be accessing completely different global variables that
happen to be relatively close together in memory. Thread-local storage
or local variables can be ruled out as sources of false sharing.
Although methods described in the article are not what you have asked for (forcing worst-case behavior from the JVM), as already stated this isn't really possible. The methods described in this article are the best way I know to try to diagnose and avoid false sharing.
There are other resources addressing this problem around the web. For example, this article has a suggestion for a way to avoid false sharing in Java. I have not tried this method, so I cannot vouch for it, but I think the author's idea is sound. You might consider trying out his suggestion.

I have previously suggested a worst case behaving JVM for test purposes on the memory model list but the idea didn't seem popular.
So how to gain "worst case JVM behaviour" , with existing technology i.e how can I test the scenario in the question and get it to fail EVERY time. You could try to find the setup with the weakest memory model possible but that's unlikely to be perfect.
What I have often considered is using a distributed JVM something similar to how I believe Terracotta works under the cover so your application now runs on multiple JVM's (either remote or local) (threads in the same application run in different instances). In this setup inter JVM thread communication takes place at memory barriers e.g. the synchronized keywords you are missing in bugged code for instance (it conforms to the Java Memory Model) and the application is configured i.e. you say this class thread runs here . No code change required to your tests just configuration, any well ordered java application should run out of the box, however this setup would be very intolerant of a badly ordered application (normally a problem ... now an asset i.e. the Memory model exhibits very weak but legal behavior). In the example above loading the code onto a cluster, if two threads run on different nodes setValue has no effect visible to the other thread unless the code was changed and synchronized, volatile etc etc were used, then the code works as intended.
Now your test for the example above (configured correctly) would fail every time without correct "happens before ordering" which is potentially very useful for tests. The flaw in the plan for complete coverage you would need a potentially a node per application thread (can be same machine or multiple in a cluster) or multiple test runs. If you have 1000's of threads then that could be prohibitive though hopefully they would be pooled and scaled down for E2E test scenarios or run it in a cloud. If nothing else this kind of setup might be useful in demonstrating the issue.
inter thread communication across JVMs

The example you have given is described as Incorrectly Synchronized in http://docs.oracle.com/javase/specs/jls/se7/html/jls-17.html#jls-17.4. I think this is always incorrect and will lead to bugs sooner or later. Most of the times later :-).
To find such incorrectly synchronized code blocks, I use the following algorithm:
Record the threads for all field modifications using instrumentation. If a field is modified by more than one thread without synchronization, I have found a data race.
I implemented this algorithm inside http://vmlens.com, which is a tool to find data races inside java programs.

Here's a simple way: just comment out the code for setValue. You can uncomment it after testing. Since in many cases like this a mechanism is needed to fake failures, it would be a good idea to build a general mechanism for all such cases.

Game loop using threads and synchronization

I've been muddling through the internet and my own code in an attempt to write a game-loop I'm satisfied with (I'm picky).
I implemented DeWitter's loop but decided I didn't like interpolation for many different reasons. I don't find it a practical solution.
Anyway, I would like to create two threads, one for updating and one for rendering. I would regulate their execution with minimum and maximum looping intervals and a call to sleep. Then all I would have to deal with would be synchronization.
Is this a reasonable loop? Any major problems that would arise?
It seems to be the only implementation I can think of so far that would give me all the things I'm looking for.

The idea of running dual threads like this is supposedly to increase performance but in reality implementing it is extremely difficult.
Obviously you can't have both threads accessing the same objects and variables at the same time, so you setup synchronization to make either thread wait it's turn but then why bother having dual threads at all? It's self-defeating.
Unless your game is truly massive and eats cpu like nobody's business, then I'd just execute logic and rendering on a single thread, profile it's performance and fine tune it.
Just my 2 cents.

Debugging visually using >>, >, >|, ||, |<, <, <<

Debugging performance problems using a standard debugger is almost hopeless since the level of detail is too high. Other ways are using a profiler, but they seldom give me good information, especially when there is GUI and background threads involved, as I never know whether the user was actually waiting for the computer, or not. A different way is simply using Control + C and see where in the code it stops.
What I really would like is to have Fast Forward, Play, Pause and Rewind functionality combined with some visual repressentation of the code. This means that I could set the code to run on Fast Forward until I navigate the GUI to the critical spot. Then I set the code to be run in slow mode, while I get some visual repressentation of, which lines of are being executed (possibly some kind of zoomed out view of the code). I could for example set the execution speed to something like 0.0001x. I believe that I would get a very good visualization this way of whether the problem is inside a specific module, or maybe in the communication between modules.
Does this exist? My specific need is in Python, but I would be interested in seeing such functionality in any language.

The "Fast Forward to critical spot" function already exists in any debugger, it's called a "breakpoint". There are indeed debuggers that can slow down execution, but that will not help you debug performance problems, because it doesn't slow down the computer. The processor and disk and memory is still exactly as slow as before, all that happens is that the debugger inserts delays between each line of code. That means that every line of code suddenly take more or less the same time, which means that it hides any trace of where the performance problem is.
The only way to find the performance problems is to record every call done in the application and how long it took. This is what a profiler does. Indeed, using a profiler is tricky, but there probably isn't a better option. In theory you could record every call and the timing of every call, and then play that back and forwards with a rewind, but that would use an astonishing amount of memory, and it wouldn't actually tell you anything more than a profiler does (indeed, it would tell you less, as it would miss certain types of performance problems).
You should be able to, with the profiler, figure out what is taking a long time. Note that this can be both by certain function calls taking a long time because they do a lot of processing, or it can be system calls that take a long time becomes something (network/disk) is slow. Or it can be that a very fast call is called loads and loads of times. A profiler will help you figure this out. But it helps if you can turn the profiler on just at the critical section (reduces noise) and if you can run that critical section many times (improves accuracy).

The methods you're describing, and many of the comments, seem to me to be relatively weak probabilistic attempts to understand the performance impact. Profilers do work perfectly well for GUIs and other idle-thread programs, though it takes a little practice to read them. I think your best bet is there, though -- learn to use the profiler better, that's what it's for.
The specific use you describe would simply be to attach the profiler but don't record yet. Navigate the GUI to the point in question. Hit the profiler record button, do the action, and stop the recording. View the results. Fix. Do it again.

I assume there is a phase in the app's execution that takes too long - i.e. it makes you wait.
I assume what you really want is to see what you could change to make it faster.
A technique that works is random-pausing.
You run the app under the debugger, and in the part of its execution that makes you wait, pause it, and examine the call stack. Do this a few times.
Here are some ways your program could be spending more time than necessary.
I/O that you didn't know about and didn't really need.
Allocating and releasing objects very frequently.
Runaway notifications on data structures.
others too numerous to mention...
No matter what it is, when it is happening, an examination of the call stack will show it.
Once you know what it is, you can find a better way to do it, or maybe not do it at all.
If the program is taking 5 seconds when it could take 1 second, then the probability you will see the problem on each pause is 4/5. In fact, any function call you see on more than one stack sample, if you could avoid doing it, will give you a significant speedup.
AND, nearly every possible bottleneck can be found this way.
Don't think about function timings or how many times they are called. Look for lines of code that show up often on the stack, that you don't need.
Example Added: If you take 5 samples of the stack, and there's a line of code appearing on 2 of them, then it is responsible for about 2/5 = 40% of the time, give or take. You don't know the precise percent, and you don't need to know.
(Technically, on average it is (2+1)/(5+2) = 3/7 = 43%. Not bad, and you know exactly where it is.)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.