Understanding Unit of Work

Understanding Unit of Work - java

I'm reading J. Bloch's effective java and now I'm at the section about executors. He said that we should prefer using executors to directly usage of Threads. As far as I got the primary reason for that is
The key abstraction is no longer Thread , which served as both the
unit of work and the mechanism for executing it. Now the unit of work
and mechanism are separate. The key abstraction is the unit of
work, which is called a task.
It's not quite clear what the unit of work means here. I tried to search for it and found that there's a design pattern related to db-operation. But how does it tie with Threads? Or there is another interpretation of this pattern?

It's purposefully nebulous: it's just "a thing you want done," and the more precise meaning is up to you.
If you want to download a file, that's a unit of work. If you want to compute a hash of some big chuck of data, that's a unit of work. If you want to draw something on the screen, that's a unit of work.
What this blurb is getting at is that this unit of work used to be tied directly to a Java thread (via the Thread class), which is in turn tied relatively directly to the OS's threads (some hand-waving there :) ). A more modern approach is to define the work as a task, and then give a bunch of tasks to a Thread whose life cycle is longer than any of those tasks. That thread then executes those tasks. This lets you more explicitly manage Thread resources, which are relatively heavyweight.
A rough analogy would be to hire a new employee for every task you want done (write this spec, or make some coffee, or fix this bug) vs hiring just a few employees and giving them small tasks as needed.

This question is similar to this SO post (q.v. Jon Skeet's highly upvoted answer), but I will give an answer anyway. There are two ways to create a thread in Java. The first way is to extend the Thread class directly, and the second is to implement the Runnable interface. These two options are what Bloch is generally arguing over.
If you choose to extend the Thread class directly, then you are not free to extend any other class (since Java does not allow multiple inheritance). This is a design limitation. On the other hand, if you implement the marker interface Runnable, then you are still free to extend any other class you wish.
Philosophically, using the interface is generally considered better design, because it simply marks a class a something which can be run as a thread, without explicitly saying how it will be run.

The simplest (and so not quite correct) approach is to understand the unit of work as execution of a method (procedure, function). Anyway, unit of work is always represented as a method.

Related

Is it safe to let "this" escape in the last statement of a constructor if it has happens-before guarantees?

A common advice in Java is to not to let the "this" reference escape during construction of an object and therefore not start any threads in a constructor. But I find myself writing a lot of classes that should start a thread using an executor. According to the common advice, I should write an extra start() method that submits the task to the executor.
But submitting a task to an executor gives happens-before guarantees as documented here. So would it be fine to submit the task in the last statement of the constructor? Or the more general question: is it safe to let "this" escape in the last statement of a constructor if that statement provides happens-before guarantees?

The Answer by Stefan Feuerhahn is correct.
I’ll add the suggestion that embedding an executor service within the class performing the work can be a “code smell”, an indication of weak design.
Generally we want to follow the single responsibility principle in our designs. A class should have a single purpose, and should try not to stray from that narrow specific purpose.
If, for example, a class were written to create a report, that class should know only about that report. That class should not know about when that report should be run, or how often to run the report, or what other code cares about if the report has been run.
Such scheduling of when to run the report is tied to the lifecycle of the app. For one important thing, the executor service must eventually be shut down when no longer needed or when the app is exiting. Otherwise the backing thread pool may continue indefinitely like a zombie 🧟. Your report-generating class should not know about when it is no longer needed, nor should it know about when or why the app is exiting.
Another aspect of the issue is that configuring an executor service involves knowing about the deployment scenario. How much RAM, how many CPU cores, how much other burden on that host machine, all contribute to decisions about how to set up the executor service(s). Your report-generating code should not have to change because of changes to your deployment situation.
The report-generating class should not know anything about the calling app’s lifecycle, not know anything about the executor service. The report-generating app should know nothing more than how to generate that one report. Some other place in your code, perhaps some report manager class or your app’s lifecycle orchestration code, should handle how often and when to run the report.

Yes, this is safe, because the statement providing happens-before guarantees will make sure all fields are correctly initialized visible to other threads. One caveat is that a subclass could ruin this safety so its better to make the class final. But, as Holger pointed out, even then an additional constructor delegating to the one that started the thread could harm safety.
The general advice "don't let this escape from the constructor" exists mainly because it is easier and thus less error prone to follow this rule then to keep all nuances in mind (like subclassing).

Why does Java have no async/await?

Using async/await it is possible to code asynchronous functions in an imperative style. This can greatly facilitate asynchronous programming. After it was first introduced in C#, it was adopted by many languages such as JavaScript, Python, and Kotlin.
EA Async is a library that adds async/await like functionality to Java. The library abstracts away the complexity of working with CompletableFutures.
But why has async/await neither been added to Java SE, nor are there any plans to add it in the future?

The short answer is that the designers of Java try to eliminate the need for asynchronous methods instead of facilitating their use.
According to Ron Pressler's talk asynchronous programming using CompletableFuture causes three main problems.
branching or looping over the results of asynchronous method calls is not possible
stacktraces cannot be used to identify the source of errors, profiling becomes impossible
it is viral: all methods that do asynchronous calls have to be asynchronous as well, i.e. synchronous and asynchronous worlds don't mix
While async/await solves the first problem it can only partially solve the second problem and does not solve the third problem at all (e.g. all methods in C# doing an await have to be marked as async).
But why is asynchronous programming needed at all? Only to prevent the blocking of threads, because threads are expensive. Thus instead of introducing async/await in Java, in project Loom Java designers are working on virtual threads (aka fibers/lightweight threads) which will aim to significantly reduce the cost of threads and thus eliminate the need of asynchronous programming. This would make all three problems above also obsolete.

Better late than never!!!
Java is 10+ years late in trying to come up with lighter weight units of execution which can be executed in parallel. As a side note, Project loom also aims to expose in Java 'delimited continuation' which, I believe is nothing more than good old 'yield' keyword of C# (again almost 20 years late!!)
Java does recognize the need for solving the bigger problem solved by asyn await (or actually Tasks in C# which is the big idea. Async Await is more of a syntactical sugar. Highly significant improvement, but still not a necessity to solve the actual problem of OS mapped Threads being heavier than desired).
Look at the proposal for project loom here: https://cr.openjdk.java.net/~rpressler/loom/Loom-Proposal.html
and navigate to last section 'Other Approaches'. You will see why Java does not want to introduce async/await.
Having said this, I don't really agree with the reasoning being provided. Neither in this proposal nor in Stephan's answer.
First let us diagnose Stephan's answer
async await solves point 1 mentioned there. (Stephan also acknowledges it further down the answer)
It is extra work for sure on the part of the framework and tools but not at all on the part of the programmers. Even with async await, .Net debuggers are pretty good in this aspect.
This I only partially agree with. Whole purpose of async await is to elegantly mix asynchronous world with synchronous constructs. But yes, you either need to declare the caller also as async or deal directly with Task in the caller routine. However, project loom will not solve it either in a meaningful way. To fully benefit from the light weight virtual threads, even the caller routine must be getting executed on a virtual thread. Otherwise what's the benefit? You will end up blocking an OS backed thread!!! Hence even virtual threads need to be 'viral' in the code. On the contrary, it will be easier in Java to not notice that the routine you are calling is async and will block the calling thread (which will be concerning if the calling routine is itself not executing on a virtual thread). Async keyword in C# makes the intent very clear and forces you to decide (it is possible in C# to block as well if you want by asking for Task.Result. Most of the time the calling routine can just as easily be async itself).
Stephan is right when he says async programming is needed to prevent blocking of (OS) threads as (OS) threads are expensive. And that's precisely the whole reason why virtual threads (or C# tasks) are needed. You should be able to 'block' on these tasks without losing your sleep. Offcourse to not lose the sleep, either the calling routine itself should be a task or blocking should be on non-blocking IO, with framework being smart enough to not block the calling thread in that case (power of continuation).
C# supports this and proposed Java feature aims to support this.
According to the proposed Java api, blocking on virtual thread will require calling vThread.join() method in Java.
How is it really more beneficial than calling await workDoneByVThread()?
Now let us look at project loom proposal reasoning
Continuations and fibers dominate async/await in the sense that async/await is easily implemented with continuations (in fact, it can be implemented with a weak form of delimited continuations known as stackless continuations, that don't capture an entire call-stack but only the local context of a single subroutine), but not vice-versa
I don't simply understand this statement. If someone does, please let me know in the comments.
For me, async/await are implemented using continuations and as far as stack trace is concerned, since the fibres/virtual threads/tasks are within the virtual machine, it must be possible to manage that aspect. In-fact .net tools do manage that.
While async/await makes code simpler and gives it the appearance of normal, sequential code, like asynchronous code it still requires significant changes to existing code, explicit support in libraries, and does not interoperate well with synchronous code
I have already covered this. Not making significant changes to existing code and no explicit support in libraries will actually mean not using this feature effectively. Until and unless Java is aiming to transparently transform all the threads to virtual threads, which it can't and isn't, this statement does not make sense to me.
As a core idea, I find no real difference between Java virtual threads and C# tasks. To the point that project loom is also aiming for work-stealing scheduler as default, same as the scheduler used by .Net by default (https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler?view=net-5.0, scroll to last remarks section ).
Only debate it seems is on what syntax should be adopted to consume these.
C# adopted
A distinct class and interface as compared to existing threads
Very helpful syntactical sugar for marrying async with sync
Java is aiming for:
Same familiar interface of Java Thread
No special constructs apart from try-with-resources support for ExecutorService so that the result for submitted tasks/virtual threads can be automatically waited for (thus blocking the calling thread, virtual/non-virtual).
IMHO, Java's choices are worse than those of C#. Having a separate interface and class actually makes it very clear that the behavior is a lot different. Retaining same old interface can lead to subtle bugs when a programmer does not realize that she is now dealing with something different or when a library implementation changes to take advantage of the new constructs but ends up blocking the calling (non-virtual) thread.
Also no special language syntax means that reading async code will remain difficult to understand and reason about (I don't know why Java thinks programmers are in love with Java's Thread syntax and they will be thrilled to know that instead of writing sync looking code they will be using the lovely Thread class)
Heck, even Javascript now has async await (with all its 'single-threadedness').

I release a new project JAsync implement async-await fashion in java which use Reactor as its low level framework. It is in the alpha stage. I need more suggest and test case.
This project makes the developer's asynchronous programming experience as close as possible to the usual synchronous programming, including both coding and debugging.
I think my project solves point 1 mentioned by Stephan.
Here is an example:
#RestController
#RequestMapping("/employees")
public class MyRestController {
#Inject
private EmployeeRepository employeeRepository;
#Inject
private SalaryRepository salaryRepository;
// The standard JAsync async method must be annotated with the Async annotation, and return a JPromise object.
#Async()
private JPromise<Double> _getEmployeeTotalSalaryByDepartment(String department) {
double money = 0.0;
// A Mono object can be transformed to the JPromise object. So we get a Mono object first.
Mono<List<Employee>> empsMono = employeeRepository.findEmployeeByDepartment(department);
// Transformed the Mono object to the JPromise object.
JPromise<List<Employee>> empsPromise = Promises.from(empsMono);
// Use await just like es and c# to get the value of the JPromise without blocking the current thread.
for (Employee employee : empsPromise.await()) {
// The method findSalaryByEmployee also return a Mono object. We transform it to the JPromise just like above. And then await to get the result.
Salary salary = Promises.from(salaryRepository.findSalaryByEmployee(employee.id)).await();
money += salary.total;
}
// The async method must return a JPromise object, so we use just method to wrap the result to a JPromise.
return JAsync.just(money);
}
// This is a normal webflux method.
#GetMapping("/{department}/salary")
public Mono<Double> getEmployeeTotalSalaryByDepartment(#PathVariable String department) {
// Use unwrap method to transform the JPromise object back to the Mono object.
return _getEmployeeTotalSalaryByDepartment(department).unwrap(Mono.class);
}
}
In addition to coding, JAsync also greatly improves the debugging experience of async code.
When debugging, you can see all variables in the monitor window just like when debugging normal code. I will try my best to solve point 2 mentioned by Stephan.
For point 3, I think it is not a big problem. Async/Await is popular in c# and es even if it is not satisfied with it.

Observer: Implement with pattern (subject & observer) or inter-thread communication (wait & notify)

I usually use the Observer pattern, my colleague at work though has implemented an Observer using Thread intercommunication (using wait and notify/notifyAll).
Should I implement my observers using the pattern or inter-Thread-communication using wait and notify? Are there any good reasons to avoid one approach and always use the other?
I've always gone with the first one, using the pattern, out of convention, and because it seems more expressive (involved identifiers are a good way to express and understand what is communicated and how).
EDIT:
I'm using the pattern in a Swing GUI, he's using the inter-thread solution in an Android application.
In his solution one thread generates data and then calls notify, to wake up another thread that paints the generated data and calls wait after every paint.
His argument for the wait/notify solution is that it creates less threads and even several concurrent calls to notify will cause only 1 paint event, whereas an observer-based solution would call a repaint with every call. He says it's just another valid approach, but doesn't claim he's done it for performance reasons.
My argument is that I would express the communication among objects on the OO design level rather than use a language-specific feature that makes the communication almost invisible. Also, low-level thread communication is hard to master, might be hard to understand by other readers, and should rather be implemented on a higher level, i. e. using a CyclicBarrier. I don't have any sound arguments for one or the other solution, but I was wondering if there are any sound arguments for either one or the other approach (i. e. "This-and-that can happen, if you use this approach, whereas in the other one that's not possible.").

You are comparing apples and oranges. The wait/notify mechanism is used for thread synchronization, and while your colleague may have used it within an Observer/Observable implementation, it is not, in itself the pattern implementation. It simply means it is a multithreaded implementation.
There are many implementations of this pattern, and they are typically tailored to the environment in which you are working. There are event mechanisms built into most UI frameworks/toolkits. JMS for distributed environments, ...
I don't find much use for the generic Observer/Observable classes provided by the JDK, and from experience I haven't found many other developers use them either. Most will use a provided mechanism, if appropriate, or roll their own specific and ultimately more useful implementation if needed.
Since I have done most of my coding in an OSGi environment of late, I have a preference for a variation of observer/observable called the whiteboard pattern. This may or may not be feasible for you, depending on your environment.

You should avoid, or rather refrain from, inter-thread communication in 99.99% of the cases. If there is a real need for a multi threaded solution, you should use a higher level concurrency mechanism such as an ExecutorService or a good concurrency library such as jetlang: http://code.google.com/p/jetlang/.

Difficult. I would normally use Observer / Observable when not explicitly writing a multithreaded application. However, convention in this case might be for you to use his design. Perhaps see if you can abstract it out somehow so that you can replace it with the Observer pattern at a later stage if necessary?
However, I found these two articles which seem to indicate that the Observer/Observable pattern in Java is not ideal and should be avoided.
An inside view of Observer and
The event generator idiom

How to determine what part of Java code needs to be synchronized?

How to determine part of what Java code needs to be synchronized? Are there any unit testing technics?
Samples of code are welcome.

Code needs to be synchronized when there might be multiple threads that work on the same data at the same time.
Whether code needs to be synchronized is not something that you can discover by unit testing. You must think and design your program carefully when your program is multi-threaded to avoid issues.
A good book on concurrent programming in Java is Java Concurrency in Practice.

If I understand your question correctly, you want to know what you have to synchronise. Unfortunately there isn't a boiler plate code to provide that shows you what to synchronise - you should take a look at methods and instance variables that can be accessed by multiple threads at the same time. If there aren't such, you usually don't need to worry about synchronisation too much.

This is a good source for some general information:
http://weblogs.java.net/blog/caroljmcdonald/archive/2009/09/17/some-java-concurrency-tips
When you are in a multithreaded environment in Java and you want to do many things in parallel, I would suggest using an approach which uses the concurrent Queue (like BlockingQueue or ConcurrentLinkedQueue) implementations and a simple Runnable that has a reference to the queue and pulls 'messages' of the queue. Use an ExecutorService to manage the tasks. Sort of a (very simplified) Actor type of model.
So choose not to share state as much as possible, because if you do, you need to synchronize, or use a data structure that supports concurrent access like the ConcurrentHashMap.

There's no substitute for thinking about the issues surrounding your code (as the other answers here illustrate). Once you've done that, though, it's worth running FindBugs over your code. It will identify where you've applied synchronisation inconsistently, and is a great help in tracking otherwise hard-to-find bugs.

Lot of nice answers here:
Java synchronization and performance in an aspect
A nice analysis of your problem is available here:
http://portal.acm.org/citation.cfm?id=1370093&dl=GUIDE&coll=GUIDE&CFID=57662261&CFTOKEN=95754288 (require access to ACM portal)

Yes, all these folks are right - no alternative for thinking. But here is the thumb rule..
1. If its a read - perhaps you do not need synchronization
2. If its a 'write' - you should consider it...

Should I make all my java code threadsafe?

I was reading some of the concurrency patterns in Brian Goetze's Java Concurrency in Practice and got confused over when is the right time to make the code thread safe.
I normally write code that's meant to run in a single thread so I do not worry too much about thread safety and synchronization etc. However, there always exists a possibility that the same code may be re-used sometime later in a multi-threaded environment.
So my question is, when should one start thinking about thread safety? Should I assume the worst at the onset and always write thread-safe code from the beginning or should I revisit the code and modify for thread safety if such a need arises later ?
Are there some concurrency patterns/anti-patterns that I must always be aware of even while writing single-threaded applications so that my code doesn't break if it's later used in a multi-threaded environment ?

You should think about thread safety when your code will be used in a multithreaded environment. There is no point in tackling the complexity if it will only be run in a singlethreaded environment.
That being said, there are simple things you can do that are good practices anyway and will help with multithreading:
As Josh Bloch says, Favor Immutability. Immutable classes are threadsafe almost by definition;
Only use data members or static variables where required rather than a convenience.

Making your code thread safe can be as simple as adding a comment that says the class was not designed for concurrent use by multiple threads. So, in that sense: yes, all of your classes should be thread safe.
However, in practice, many, many types are likely to be used by only a single thread, often only referenced as local variables. This can be true even if the program as a whole is multi-threaded. It would be a mistake to make every object safe for multi-threaded access. While the penalty may be small, it is pervasive, and can add up to be a significant, hard-to-fix performance problem.

I advise you to obtain a copy of "Effective Java", 2nd Ed. by Joshua Bloch. That book devotes a whole chapter to concurrency, including a solid exploration of the issue of when (and when not) to synchronize. Note, for example, the title of item 67 in "Effective Java": 'Avoid excessive synchronization', which is elaborated over five pages.

As was stated previously, you need thread safety when you think your code will be used in a multithreaded environment.
Consider the approach taken by the Collections classes, where you provide a thread-unsafe class that does all its work without using synchronize, and you also provide another class that wraps the unsynchonized class and providing all of the same public methods but making them synchronize on the underlying object.
This gives your clients a choice of using the multi-threaded or the single-threaded version of your code. It may also simplify your coding by isolating all of the threading/locking logic in a separate class.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.