Related
In Java 8, the Collection interface was extended with two methods that return Stream<E>: stream(), which returns a sequential stream, and parallelStream(), which returns a possibly-parallel stream. Stream itself also has a parallel() method that returns an equivalent parallel stream (either mutating the current stream to be parallel or creating a new stream).
The duplication has obvious disadvantages:
It's confusing. A question asks whether calling both parallelStream().parallel() is necessary to be sure the stream is parallel, given that parallelStream() may return a sequential stream. Why does parallelStream() exist if it can't make a guarantee? The other way around is also confusing -- if parallelStream() returns a sequential stream, there's probably a reason (e.g., an inherently sequential data structure for which parallel streams are a performance trap); what should Stream.parallel() do for such a stream? (UnsupportedOperationException is not allowed by parallel()'s specification.)
Adding methods to an interface risks conflicts if an existing implementation has a similarly-named method with an incompatible return type. Adding parallelStream() in addition to stream() doubles the risk for little gain. (Note that parallelStream() was at one point just named parallel(), though I don't know if it was renamed to avoid name clashes or for another reason.)
Why does Collection.parallelStream() exist when calling Collection.stream().parallel() does the same thing?
The Javadocs for Collection.(parallelS|s)tream() and Stream itself don't answer the question, so it's off to the mailing lists for the rationale. I went through the lambda-libs-spec-observers archives and found one thread specifically about Collection.parallelStream() and another thread that touched on whether java.util.Arrays should provide parallelStream() to match (or actually, whether it should be removed). There was no once-and-for-all conclusion, so perhaps I've missed something from another list or the matter was settled in private discussion. (Perhaps Brian Goetz, one of the principals of this discussion, can fill in anything missing.)
The participants made their points well, so this answer is mostly just an organization of the relevant quotes, with a few clarifications in [brackets], presented in order of importance (as I interpret it).
parallelStream() covers a very common case
Brian Goetz in the first thread, explaining why Collections.parallelStream() is valuable enough to keep even after other parallel stream factory methods have been removed:
We do not have explicit parallel versions of each of these [stream factories]; we did
originally, and to prune down the API surface area, we cut them on the
theory that dropping 20+ methods from the API was worth the tradeoff of
the surface yuckiness and performance cost of .intRange(...).parallel().
But we did not make that choice with Collection.
We could either remove the Collection.parallelStream(), or we could add
the parallel versions of all the generators, or we could do nothing and
leave it as is. I think all are justifiable on API design grounds.
I kind of like the status quo, despite its inconsistency. Instead of
having 2N stream construction methods, we have N+1 -- but that extra 1
covers a huge number of cases, because it is inherited by every
Collection. So I can justify to myself why having that extra 1 method
is worth it, and why accepting the inconsistency of going no further is
acceptable.
Do others disagree? Is N+1 [Collections.parallelStream() only] the practical choice here? Or should we go
for the purity of N [rely on Stream.parallel()]? Or the convenience and consistency of 2N [parallel versions of all factories]? Or is
there some even better N+3 [Collections.parallelStream() plus other special cases], for some other specially chosen cases we
want to give special support to?
Brian Goetz stands by this position in the later discussion about Arrays.parallelStream():
I still really like Collection.parallelStream; it has huge
discoverability advantages, and offers a pretty big return on API
surface area -- one more method, but provides value in a lot of places,
since Collection will be a really common case of a stream source.
parallelStream() is more performant
Brian Goetz:
Direct version [parallelStream()] is more performant, in that it requires less wrapping (to
turn a stream into a parallel stream, you have to first create the
sequential stream, then transfer ownership of its state into a new
Stream.)
In response to Kevin Bourrillion's skepticism about whether the effect is significant, Brian again:
Depends how seriously you are counting. Doug counts individual object
creations and virtual invocations on the way to a parallel operation,
because until you start forking, you're on the wrong side of Amdahl's
law -- this is all "serial fraction" that happens before you can fork
any work, which pushes your breakeven threshold further out. So getting
the setup path for parallel ops fast is valuable.
Doug Lea follows up, but hedges his position:
People dealing with parallel library support need some attitude
adjustment about such things. On a soon-to-be-typical machine,
every cycle you waste setting up parallelism costs you say 64 cycles.
You would probably have had a different reaction if it required 64
object creations to start a parallel computation.
That said, I'm always completely supportive of forcing implementors
to work harder for the sake of better APIs, so long as the
APIs do not rule out efficient implementation. So if killing
parallelStream is really important, we'll find some way to
turn stream().parallel() into a bit-flip or somesuch.
Indeed, the later discussion about Arrays.parallelStream() takes notice of lower Stream.parallel() cost.
stream().parallel() statefulness complicates the future
At the time of the discussion, switching a stream from sequential to parallel and back could be interleaved with other stream operations. Brian Goetz, on behalf of Doug Lea, explains why sequential/parallel mode switching may complicate future development of the Java platform:
I'll take my best stab at explaining why: because it (like the stateful
methods (sort, distinct, limit)) which you also don't like, move us
incrementally farther from being able to express stream pipelines in
terms of traditional data-parallel constructs, which further constrains
our ability to to map them directly to tomorrow's computing substrate,
whether that be vector processors, FPGAs, GPUs, or whatever we cook up.
Filter-map-reduce map[s] very cleanly to all sorts of parallel computing
substrates; filter-parallel-map-sequential-sorted-limit-parallel-map-uniq-reduce
does not.
So the whole API design here embodies many tensions between making it
easy to express things the user is likely to want to express, and doing
is in a manner that we can predictably make fast with transparent cost
models.
This mode switching was removed after further discussion. In the current version of the library, a stream pipeline is either sequential or parallel; last call to sequential()/parallel() wins. Besides side-stepping the statefulness problem, this change also improved the performance of using parallel() to set up a parallel pipeline from a sequential stream factory.
exposing parallelStream() as a first-class citizen improves programmer perception of the library, leading them to write better code
Brian Goetz again, in response to Tim Peierls's argument that Stream.parallel() allows programmers to understand streams sequentially before going parallel:
I have a slightly different viewpoint about the value of this sequential
intuition -- I view the pervasive "sequential expectation" as one if the
biggest challenges of this entire effort; people are constantly
bringing their incorrect sequential bias, which leads them to do stupid
things like using a one-element array as a way to "trick" the "stupid"
compiler into letting them capture a mutable local, or using lambdas as
arguments to map that mutate state that will be used during the
computation (in a non-thread-safe way), and then, when its pointed out
that what they're doing, shrug it off and say "yeah, but I'm not doing
it in parallel."
We've made a lot of design tradeoffs to merge sequential and parallel
streams. The result, I believe, is a clean one and will add to the
library's chances of still being useful in 10+ years, but I don't
particularly like the idea of encouraging people to think this is a
sequential library with some parallel bags nailed on the side.
In Java 8, the Collection interface was extended with two methods that return Stream<E>: stream(), which returns a sequential stream, and parallelStream(), which returns a possibly-parallel stream. Stream itself also has a parallel() method that returns an equivalent parallel stream (either mutating the current stream to be parallel or creating a new stream).
The duplication has obvious disadvantages:
It's confusing. A question asks whether calling both parallelStream().parallel() is necessary to be sure the stream is parallel, given that parallelStream() may return a sequential stream. Why does parallelStream() exist if it can't make a guarantee? The other way around is also confusing -- if parallelStream() returns a sequential stream, there's probably a reason (e.g., an inherently sequential data structure for which parallel streams are a performance trap); what should Stream.parallel() do for such a stream? (UnsupportedOperationException is not allowed by parallel()'s specification.)
Adding methods to an interface risks conflicts if an existing implementation has a similarly-named method with an incompatible return type. Adding parallelStream() in addition to stream() doubles the risk for little gain. (Note that parallelStream() was at one point just named parallel(), though I don't know if it was renamed to avoid name clashes or for another reason.)
Why does Collection.parallelStream() exist when calling Collection.stream().parallel() does the same thing?
The Javadocs for Collection.(parallelS|s)tream() and Stream itself don't answer the question, so it's off to the mailing lists for the rationale. I went through the lambda-libs-spec-observers archives and found one thread specifically about Collection.parallelStream() and another thread that touched on whether java.util.Arrays should provide parallelStream() to match (or actually, whether it should be removed). There was no once-and-for-all conclusion, so perhaps I've missed something from another list or the matter was settled in private discussion. (Perhaps Brian Goetz, one of the principals of this discussion, can fill in anything missing.)
The participants made their points well, so this answer is mostly just an organization of the relevant quotes, with a few clarifications in [brackets], presented in order of importance (as I interpret it).
parallelStream() covers a very common case
Brian Goetz in the first thread, explaining why Collections.parallelStream() is valuable enough to keep even after other parallel stream factory methods have been removed:
We do not have explicit parallel versions of each of these [stream factories]; we did
originally, and to prune down the API surface area, we cut them on the
theory that dropping 20+ methods from the API was worth the tradeoff of
the surface yuckiness and performance cost of .intRange(...).parallel().
But we did not make that choice with Collection.
We could either remove the Collection.parallelStream(), or we could add
the parallel versions of all the generators, or we could do nothing and
leave it as is. I think all are justifiable on API design grounds.
I kind of like the status quo, despite its inconsistency. Instead of
having 2N stream construction methods, we have N+1 -- but that extra 1
covers a huge number of cases, because it is inherited by every
Collection. So I can justify to myself why having that extra 1 method
is worth it, and why accepting the inconsistency of going no further is
acceptable.
Do others disagree? Is N+1 [Collections.parallelStream() only] the practical choice here? Or should we go
for the purity of N [rely on Stream.parallel()]? Or the convenience and consistency of 2N [parallel versions of all factories]? Or is
there some even better N+3 [Collections.parallelStream() plus other special cases], for some other specially chosen cases we
want to give special support to?
Brian Goetz stands by this position in the later discussion about Arrays.parallelStream():
I still really like Collection.parallelStream; it has huge
discoverability advantages, and offers a pretty big return on API
surface area -- one more method, but provides value in a lot of places,
since Collection will be a really common case of a stream source.
parallelStream() is more performant
Brian Goetz:
Direct version [parallelStream()] is more performant, in that it requires less wrapping (to
turn a stream into a parallel stream, you have to first create the
sequential stream, then transfer ownership of its state into a new
Stream.)
In response to Kevin Bourrillion's skepticism about whether the effect is significant, Brian again:
Depends how seriously you are counting. Doug counts individual object
creations and virtual invocations on the way to a parallel operation,
because until you start forking, you're on the wrong side of Amdahl's
law -- this is all "serial fraction" that happens before you can fork
any work, which pushes your breakeven threshold further out. So getting
the setup path for parallel ops fast is valuable.
Doug Lea follows up, but hedges his position:
People dealing with parallel library support need some attitude
adjustment about such things. On a soon-to-be-typical machine,
every cycle you waste setting up parallelism costs you say 64 cycles.
You would probably have had a different reaction if it required 64
object creations to start a parallel computation.
That said, I'm always completely supportive of forcing implementors
to work harder for the sake of better APIs, so long as the
APIs do not rule out efficient implementation. So if killing
parallelStream is really important, we'll find some way to
turn stream().parallel() into a bit-flip or somesuch.
Indeed, the later discussion about Arrays.parallelStream() takes notice of lower Stream.parallel() cost.
stream().parallel() statefulness complicates the future
At the time of the discussion, switching a stream from sequential to parallel and back could be interleaved with other stream operations. Brian Goetz, on behalf of Doug Lea, explains why sequential/parallel mode switching may complicate future development of the Java platform:
I'll take my best stab at explaining why: because it (like the stateful
methods (sort, distinct, limit)) which you also don't like, move us
incrementally farther from being able to express stream pipelines in
terms of traditional data-parallel constructs, which further constrains
our ability to to map them directly to tomorrow's computing substrate,
whether that be vector processors, FPGAs, GPUs, or whatever we cook up.
Filter-map-reduce map[s] very cleanly to all sorts of parallel computing
substrates; filter-parallel-map-sequential-sorted-limit-parallel-map-uniq-reduce
does not.
So the whole API design here embodies many tensions between making it
easy to express things the user is likely to want to express, and doing
is in a manner that we can predictably make fast with transparent cost
models.
This mode switching was removed after further discussion. In the current version of the library, a stream pipeline is either sequential or parallel; last call to sequential()/parallel() wins. Besides side-stepping the statefulness problem, this change also improved the performance of using parallel() to set up a parallel pipeline from a sequential stream factory.
exposing parallelStream() as a first-class citizen improves programmer perception of the library, leading them to write better code
Brian Goetz again, in response to Tim Peierls's argument that Stream.parallel() allows programmers to understand streams sequentially before going parallel:
I have a slightly different viewpoint about the value of this sequential
intuition -- I view the pervasive "sequential expectation" as one if the
biggest challenges of this entire effort; people are constantly
bringing their incorrect sequential bias, which leads them to do stupid
things like using a one-element array as a way to "trick" the "stupid"
compiler into letting them capture a mutable local, or using lambdas as
arguments to map that mutate state that will be used during the
computation (in a non-thread-safe way), and then, when its pointed out
that what they're doing, shrug it off and say "yeah, but I'm not doing
it in parallel."
We've made a lot of design tradeoffs to merge sequential and parallel
streams. The result, I believe, is a clean one and will add to the
library's chances of still being useful in 10+ years, but I don't
particularly like the idea of encouraging people to think this is a
sequential library with some parallel bags nailed on the side.
When objects are locked in languages like C++ and Java where actually on a low level scale) is this performed? I don't think it's anything to do with the CPU/cache or RAM. My best guestimate is that this occurs somewhere in the OS? Would it be within the same part of the OS which performs context switching?
I am referring to locking objects, synchronizing on method signatures (Java) etc.
It could be that the answer depends on which particular locking mechanism?
Locking involves a synchronisation primitive, typically a mutex. While naively speaking a mutex is just a boolean flag that says "locked" or "unlocked", the devil is in the detail: The mutex value has to be read, compared and set atomically, so that multiple threads trying for the same mutex don't corrupt its state.
But apart from that, instructions have to be ordered properly so that the effects of a read and write of the mutex variable are visible to the program in the correct order and that no thread inadvertently enters the critical section when it shouldn't because it failed to see the lock update in time.
There are two aspects to memory access ordering: One is done by the compiler, which may choose to reorder statements if that's deemed more efficient. This is relatively trivial to prevent, since the compiler knows when it must be careful. The far more difficult phenomenon is that the CPU itself, internally, may choose to reorder instructions, and it must be prevented from doing so when a mutex variable is being accessed for the purpose of locking. This requires hardware support (e.g. a "lock bit" which causes a pipeline flush and a bus lock).
Finally, if you have multiple physical CPUs, each CPU will have its own cache, and it becomes important that state updates are propagated to all CPU caches before any executing instructions make further progress. This again requires dedicated hardware support.
As you can see, synchronisation is a (potentially) expensive business that really gets in the way of concurrent processing. That, however, is simply the price you pay for having one single block of memory on which multiple independent context perform work.
There is no concept of object locking in C++. You will typically implement your own on top of OS-specific functions or use synchronization primitives provided by libraries (e.g. boost::scoped_lock). If you have access to C++11, you can use the locks provided by the threading library which has a similar interface to boost, take a look.
In Java the same is done for you by the JVM.
The java.lang.Object has a monitor built into it. That's what is used to lock for the synchronized keyword. JDK 6 added a concurrency packages that give you more fine-grained choices.
This has a nice explanation:
http://www.artima.com/insidejvm/ed2/threadsynch.html
I haven't written C++ in a long time, so I can't speak to how to do it in that language. It wasn't supported by the language when I last wrote it. I believe it was all 3rd party libraries or custom code.
It does depend on the particular locking mechanism, typically a semaphore, but you cannot be sure, since it is implementation dependent.
All architectures I know of use an atomic Compare And Swap to implement their synchronization primitives. See, for example, AbstractQueuedSynchronizer, which was used in some JDK versions to implement Semiphore and ReentrantLock.
I am working on a scientific application that has readily separable parts that can proceed in parallel. So, I've written those parts to each run as independent threads, though not for what appears to be the standard reason for separating things into threads (i.e., not blocking some quit command or the like).
A few questions:
Does this actually buy me anything on standard multi-core desktops - i.e., will the threads actually run on the separate cores if I have a current JVM, or do I have to do something else?
I have few objects which are read (though never written) by all the threads. Potential problems with that? Solutions to those problems?
For actual clusters, can you recommend frameworks to distribute the threads to the various nodes so that I don't have to manage that myself (well, if such exist)? CLARIFICATION: by this, I mean either something that automatically converts threads into task for individual nodes or makes the entire cluster look like a single JVM (i.e., so it could send threads to whatever processors it can access) or whatever. Basically, implement the parallelization in a useful way on a cluster, given that I've built it into the algorithm, with the minimal job husbandry on my part.
Bonus: Most of the evaluation consists of set comparisons (e.g., union, intersection, contains) with some mapping from keys to get the pertinent sets. I have some limited experience with FORTRAN, C, and C++ (semester of scientific computing for the first, and HS AP classes 10 years ago for the other two) - what sort of speed/ease of parallelization gains might I find if I tied my Java front-end to an algorithmic back-end in one of those languages, and what sort of headache might my level of experience find implementing those operations in those languages?
Yes, using independent threads will use multiple cores in a normal JVM, without you having to do any work.
If anything is only ever read, it should be fine to be read by multiple threads. If you can make the objects in question immutable (to guarantee they'll never be changed) that's even better
I'm not sure what sort of clustering you're considering, but you might want to look at Hadoop. Note that distributed computing distributes tasks rather than threads (normally, anyway).
Multi-core Usage
Java runtimes conventionally schedule threads to run concurrently on all available processors and cores. I think it's possible to restrict this, but it would take extra work; by default, there is no restriction.
Immutable Objects
For read-only objects, declare their member fields as final, which will ensure that they are assigned when the object is created and never changed. If a field is not final, even if it never changed after construction, there can be some "visibility" issues in a multi-threaded program. This could result in the assignments made by one thread never becoming visible to another.
Any mutable fields that are accessed by multiple threads should be declared volatile, be protected by synchronization, or use some other concurrency mechanism to ensure that changes are consistent and visible among threads.
Distributed Computing
The most widely used framework for distributed processing of this nature in Java is called Hadoop. It uses a paradigm called map-reduce.
Native Code Integration
Integrating with other languages is unlikely to be worthwhile. Because of its adaptive bytecode-to-native compiler, Java is already extremely fast on a wide range of computing tasks. It would be wrong to assume that another language is faster without actual testing. Also, integrating with "native" code using JNI is extremely tedious, error-prone, and complicated; using simpler interfaces like JNA is very slow and would quickly erase any performance gains.
As some people have said, the answers are:
Threads on cores - Yes. Java has had support for native threads for a long time. Most OSes have provided kernel threads which automagically get scheduled to any CPUs you have (implementation performance may vary by OS).
The simple answer is it will be safe in general. The more complex answer is that you have to ensure that your Object is actually created & initialized before any threads can access it. This is solved one of two ways:
Let the class loader solve the problem for you using a Singleton (and lazy class loading):
public class MyImmutableObject
{
private static class MyImmutableObjectInstance {
private static final MyImmutableObject instance = new MyImmutableObject();
}
public MyImmutableObject getInstance() {
return MyImmutableObjectInstance.instance;
}
}
Explicitly using acquire/release semantics to ensure a consistent memory model:
MyImmutableObject foo = null;
volatile bool objectReady = false;
// initializer thread:
....
/// create & initialize object for use by multiple threads
foo = new MyImmutableObject();
foo.initialize();
// release barrier
objectReady = true;
// start worker threads
public void run() {
// acquire barrier
if (!objectReady)
throw new IllegalStateException("Memory model violation");
// start using immutable object foo
}
I don't recall off the top of my head how you can exploit the memory model of Java to perform the latter case. I believe, if I remember correctly, that a write to a volatile variable is equivalent to a release barrier, while a read from a volatile variable is equivalent to an acquire barrier. Also, the reason for making the boolean volatile as opposed to the object is that access of a volatile variable is more expensive due to the memory model constraints - thus, the boolean allows you to enforce the memory model & then the object access can be done much faster within the thread.
As mentioned, there's all sorts of RPC mechanisms. There's also RMI which is a native approach for running code on remote targets. There's also frameworks like Hadoop which offer a more complete solution which might be more appropriate.
For calling native code, it's pretty ugly - Sun really discourages use by making JNI an ugly complicated mess, but it is possible. I know that there was at least one commercial Java framework for loading & executing native dynamic libraries without needing to worry about JNI (not sure if there are any free or OSS projects).
Good luck.
If I create classes, that are used at the moment only in a single thread, should I make them thread-safe, even if I don't need that at the moment? It could be happen, that I later use this class in multiple threads, and at that time I could get race conditions and may have a hard time to find them if I didn't made the class thread-safe in the first place. Or should I make the class not thread-safe, for better performance? But premature optimization is evil.
Differently asked: Should I make my classes thread-safe if needed (if used in multiple threads, otherwise not) or should I optimize this issue then needed (if I see that the synchronization eats up an important part of processing time)?
If I choose one of the both ways, are there methods to reduce the disadvantages? Or exists a third possibility, that I should use?
EDIT: I give the reason this question came up to my mind. At our company we have written a very simple user-management that writes the data into property-files. I used it in a web-app and after some work on it I got strange errors, that the user-management forgot about properties of users(including name and password) and roles. That was very annoying but not consistently reproducible, so I think it was race condition. Since I synchronized all methods reading and writing from/on disk, the problem disappeared. So I thought, that I probably could have been avoided all the hassle, if we had written the class with synchronization in the first place?
EDIT 2: As I look over the tips of Pragmatic Programmer, I saw tip #41: Always Design for Concurrency. This doesn't say that all code should be thread-safe, but it says the design should have the concurrency in mind.
I used to try to make everything thread-safe - then I realised that the very meaning of "thread-safe" depends on the usage. You often just can't predict that usage, and the caller will have to take action anyway to use it in a thread-safe way.
These days I write almost everything assuming single threading, and put threading knowledge in the select few places where it matters.
Having said that, I do also (where appropriate) create immutable types, which are naturally amenable to multi-threading - as well as being easier to reason about in general.
Start from the data. Decide which data is explicitly shared and protect it. If at all possible, encapsulate the locking with the data. Use pre-existing thread-safe concurrent collections.
Whenever possible, use immutable objects. Make attributes final, set their values in the constructors. If you need to "change" the data consider returning a new instance. Immutable objects don't need locking.
For objects that are not shared or thread-confined, do not spend time making them thread-safe.
Document the expectations in the code. The JCIP annotations are the best pre-defined choice available.
Follow the prinicple of "as simple as possible, but no simpler." Absent a requirement, you should not make them thread-safe. Doing so would be speculative, and likely unnecessary. Thread-safe programming adds much more complexity to your classes, and will likely make them less performant due to synchronization tasks.
Unless explicitly stated that an object is thread-safe, the expectation is that it is not.
I personally would only design classes that are "thread-safe" when needed - on the principle of optimise only when needed. Sun seem to have gone the same way with the example of single threaded collections classes.
However there are some good principles that will help you either way if you decide to change:
Most important: THINK BEFORE YOU SYNCHRONIZE. I had a colleague once who used to synchronize stuff "just in case - after all synchronized must be better, right?" This is WRONG, and was a cause of multiple deadlock bugs.
If your Objects can be immutable, make them immutable. This will not only help with threading, will help them be safely used in sets, as keys for Maps etc
Keep your Objects as simple as possible. Each one should ideally only do one job. If you ever find you might want to synchronise access to half the members, then you possibly should split the Object in two.
Learn java.util.concurrent and use it whenever possible. Their code will be better, faster and safer than yours (or mine) in 99% of cases.
Read Concurrent Programming in Java, it's great!
Just as a side remark: Synchronization != Thread-safety. Even so you might not concurrently modify data, but you might read it concurrently. So keep the Java Memory Model in mind where synchronization means making data reliable available in all threads, not only protecting the concurrent modification of it.
And yes, in my opinion thread-safety has to built in right from the beginning and it depends on the application logic if you need handling of concurrency. Never assume anything and even if your test seems to be fine, race conditions are sleeping dogs.
I found the JCIP annotations very useful to declare which classes are thread-safe. My team annotates our classes as #ThreadSafe, #NotThreadSafe or #Immutable. This is much clearer than having to read Javadoc, and FindBugs helps us find violations of the #Immutable and #GuardedBy contracts too.
You should absolutely know which segments of your code will be multi-threaded and which won't.
Without being able to concentrate the area of multithreadedness into a small, controllable section, you will not succeed. The parts of your app that are multi-threaded need to be gone over carefully, fully analyzed, understood and adapted for a multi-threaded environment.
The rest does not and therefore making it thread-safe would be a waste.
For instance, with the swing GUI, Sun just decided that none of it would be multi-threaded.
Oh, and if someone uses your classes--it's up to them to ensure that if it's in a threaded section then make it threadsafe.
Sun initially came out with threadsafe collections (only). the problem is, threadsafe cannot be made un-threadsafe (for performance purposes). So now they came out with un-threadsafe versions with wrappers to make them threadsafe. For most cases, the wrappers are unnecessary--assume that unless you are creating the threads yourself, that your class does not have to be threadsafe--but DOCUMENT it in the javadocs.
If I create classes, that are used at the moment only in a single thread, should I make them thread-safe
It is not necessary for a class used by a thread to by itself thread-safe for the program as a whole to be thread-safe. You can safely share objects of non "thread safe" classes between threads if they are protected by appropriate synchronization. So, there is no need to make a class itself thread-safe until that becomes apparent.
However, multi-threading is fundamental (architectural) choice in a program. It is not really something to add as an after thought. So you should know right from the start which classes need to be thread safe.
Here's my personal approach:
Make objects and data structure immutable wherever you can. That is good practice in general, and is automatically thread safe. Problem solved.
If you have to make an object mutable then normally don't bother trying to make it thread safe. The reasoning for this is simple: when you have mutable state then locking / control cannot be safely handled by a single class. Even if you synchronize all the methods, this doesn't guarantee thread safety. And if you add synchronisation to an object that only ever gets used in a single-threaded context, then you've just added unnecessary overhead. So you might as well leave it up to the caller / user to implement whatever locking system is necessary.
If you provide a higher level public API then implement whatever locking is required to make your API thread safe. For higher level functionality the overhead of thread safety is pretty trivial, and your users will definitely thank you. An API with complicated concurrency semantics that the users need to work around is not a good API!
This approach has served me well over time: you may need to make the occasional exception but on average it's a very good place to start!
If you want to follow what Sun did in the Java API, you can take a look at the collection classes. Many common collection classes are not thread-safe, but have thread-safe counterparts. According to Jon Skeet (see comments), many of the Java classes were originally thread-safe, but they were not benefiting developers, so some classes now have two versions - one being thread-safe and the other not thread-safe.
My advice is to not make the code thread-safe until you have to, as there is some overhead involved with thread-safety. I guess this falls into the same category as optimization - don't do it before you have to.
Design separately the classes to use from multiple threads and document other ones to be used from only single thread.
Single threaded ones are much easier to work with.
Separating the multithreaded logic helps to make the synchronization correct.
"Always" is a very dangerous word in software development... choices like this are "always" situational.
To avoid race conditions, lock on only one object - read descriptions of race conditions tediously and you will discover that cross-locks ( race condition is a misnomer - race comes to halt there ) are always a consequence of two + threads trying to lock on two + objects.
Make all methods synchronized and do testing - for any real world app that actually has to deal with the issues sync is a small cost. What they don't tell you is that the whole thing does lockout on 16 bit pointer tables ... at that point you are uh,...
Just keep your burger flippin resume' current.