I am writing a program in Java in where I have a HashMap<String, Deque<Integer>> info;
My data is a list of Wikipedia pages that were visited with an hour time period, along with a count of how many times each was visited.
de Florian_David_Fitz 18
de G%C3%BCnther_Jauch 1
de Gangs_of_New_York 2
de Georg_VI._(Vereinigtes_K%C3%B6nigreich) 7
de Gerry_Rafferty 2
This data gets stored in the HashMap from above with the page name as key and the Deque updated hourly with the number of visits that hour.
I want to have one thread ThreadRead that reads input files and stores the info in the HashMap. And then one ThreadCompute thread for each key in the HashMap that consumes the associated Deque.
ThreadRead needs to lock all ThreadComputes while active, then wake them up when finished so the ThreadComputes can work concurrently.
If I need a different mutex for each ThreadCompute then how can I keep all of them locked while ThreadRead works? And how can I wake up all the ThreadComputes from ThreadRead when is done?
I have used info as a lock for ThreadRead, and info.get(key) for each ThreadCompute But it is not working as I expected.
Edit:
I add some code to try to make more clear the problem. This is what I have at the moment:
HashMap<String, Deque<Integer>> info;
boolean controlCompute, control Read;
private static class ThreadRead extends Thread {
public void run() {
while(controlRead) {
try {
read();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public void read() throws InterruptedException{
synchronized(info){
while(count==numThreads){
for (File file: files){
reader.parse(file, info); // Reads the file and store the data in the Hashmap
keys=true;
while(info.getSizeDeque()>10){
count=0;
info.wait();
info.notifyAll();
}
}
}
controlRead=false;
}
}
}
private static class ThreadCompute extends Thread {
public String key;
public void run() {
while(controlCompute) {
try {
compute();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
public void compute() throws InterruptedException{
synchronized(info.get(key)){
if(count!=numThreads){
algorithms(); //Here I apply the algorithms to the integers in the deque
if(controlRead){
info.get(key).removeFirst();
count++;
if(count==numThreads){
info.notify();
info.get(key).wait();
}
info.get(key).wait();
}
if(info.isEmptyDeque(key)){
controlCompute=false;
}
}
}
}
}
Class java.util.concurrent.locks.ReentrantReadWriteLock is good for this kind of problem. There should be exactly one instance to guard the whole HashMap. The file reader needs to acquire the write lock of the ReadWriteLock because it wants to modify the map. The other threads need to each acquire their own read lock from the one ReadWriteLock.
All your threads must be careful to limit as much as possible the scope in which they hold their locks, so in particular, the file-read thread should acquire the write lock immediately before modifying the map, hold it until all modifications for one entry are complete, then release it. The other threads don't block each other, so they could in principle hold their locks longer, but doing so will block the file reader.
Related
I'm trying to implement a system that follows the following constraints :
I have a shared resource, for example Atomic array
I want to support multiple reads from the array simultaneously.
I want to support multiple writes to the array simultaneously
I dont want read and write operations to happen simultaneously.
I found [this][1] stackoverflow post regarding a similar goal but I think that the solution suggested there is allowing reads simultaneously to writes :
Class ReadAndWrite {
private ReentrantLock readLock;
private ReentrantLock writeLock;
private AtomicInteger readers;
private AtomicInteger writers;
private File file;
public void write() {
if (!writeLock.isLocked()) {
readLock.tryLock();
writers.incrementAndGet(); // Increment the number of current writers
// ***** Write your stuff *****
writers.decrementAndGet(); // Decrement the number of current writers
if (readLock.isHeldByCurrentThread()) {
while(writers != 0); // Wait until all writers are finished to release the lock
readLock.unlock();
}
} else {
writeLock.lock();
write();
}
}
public void read() {
if (!readLock.isLocked()) {
writeLock.tryLock();
readers.incrementAndGet();
// ***** read your stuff *****
readers.decrementAndGet(); // Decrement the number of current read
if (writeLock.isHeldByCurrentThread()) {
while(readers != 0); // Wait until all writers are finished to release the lock
writeLock.unlock();
}
} else {
readLock.lock();
read();
}
}
As I see it, this code allows reads and writes simultaneously, for example : two threads will try to read/writer at the same time. Each one of them will enter the first if in the write/read. How can I make sure that the writes blocks the reads and reads blocks writes ?
[1]: Multiple readers and multiple writers(i mean multiple) synchronization
Rather than checking the lock repeatedly, just attempt using it:
private void writeInternal() {
//thread-unsafe writing code
}
public void write() {
if (!writeLock.tryLock()) {
writeLock.lock();
}
try {
this.writeInternal(); //in try-block to ensure unlock is called
} finally {
writeLock.unlock();
}
}
Using the readLock would be a similar approach. You also want to ensure you're truly using Read/Write locks and not just two separate locks:
private final ReadWriteLock lock;
public ReadAndWrite() {
this.lock = new ReentrantReadWriteLock();
}
Then you would access read/write locks via this.lock.readLock(), etc.
When I first read about interface BlockingQueue I read that: Producer blocks any more put() calls in a queue if it has no more space. And the opposite, it blocks method take(), if there are no items to take. I thought that it internally works same as wait() and notify(). For example, when there are no more elements to read internally wait() is called until Producer adds one more and calls notify()..or that's what we would do in 'old producer/consumer pattern. BUT IT DOESN'T WORK LIKE THAT IN BLOCKING QUEUE. How? What is the point? I am honestly surprised!
I will demonstrate:
public class Testing {
BlockingQueue<Integer> blockingQueue = new ArrayBlockingQueue<>(3);
synchronized void write() throws InterruptedException {
for (int i = 0; i < 6; i++) {
blockingQueue.put(i);
System.out.println("Added " + i);
Thread.sleep(1000);
}
}
synchronized void read() throws InterruptedException {
for (int i = 0; i < 6; i++) {
System.out.println("Took: " + blockingQueue.take());
Thread.sleep(3000);
}
}
}
class Test1 {
public static void main(String[] args) {
Testing testing = new Testing();
new Thread(new Runnable() {
#Override
public void run() {
try {
testing.write();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}).start();
new Thread(new Runnable() {
#Override
public void run() {
try {
testing.read();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}).start();
}
}
OUTPUT:
Added 0
Added 1
Added 2
'program hangs'.
My questions is how does take() and put() BLOCK if they don't use wait() or notify() internally? Do they have some while loops that burns CPU circles fast? I am frankly confused.
Here's the current implementation of ArrayBlockingQueue#put:
/**
* Inserts the specified element at the tail of this queue, waiting
* for space to become available if the queue is full.
*
* #throws InterruptedException {#inheritDoc}
* #throws NullPointerException {#inheritDoc}
*/
public void put(E e) throws InterruptedException {
Objects.requireNonNull(e);
final ReentrantLock lock = this.lock;
lock.lockInterruptibly();
try {
while (count == items.length)
notFull.await();
enqueue(e);
} finally {
lock.unlock();
}
}
You'll see that, instead of using wait() and notify(), it invokes notFull.await(); where notFull is a Condition.
The documentation of Condition states the following:
Condition factors out the Object monitor methods (wait, notify and notifyAll) into distinct objects to give the effect of having multiple wait-sets per object, by combining them with the use of arbitrary Lock implementations. Where a Lock replaces the use of synchronized methods and statements, a Condition replaces the use of the Object monitor methods.
If you go through below code, you will get an idea that how producer/consumer problem will get resolve using BlokingQueue interface.
Here you are able to see that same queue has been shared by Producer and Consumer.
And from main class you are starting both thread Producer and Consumer.
class Producer implements Runnable {
protected BlockingQueue blockingQueue = null;
public Producer(BlockingQueue blockingQueue) {
this.blockingQueue = blockingQueue;
}
#Override
public void run() {
for (int i = 0; i < 6; i++) {
try {
blockingQueue.put(i);
Thread.sleep(1000);
} catch (InterruptedException e) {
e.printStackTrace();
}
System.out.println("Added " + i);
}
}
}
class Consumer implements Runnable {
protected BlockingQueue blockingQueue = null;
public Consumer(BlockingQueue blockingQueue) {
this.blockingQueue = blockingQueue;
}
#Override
public void run() {
for (int i = 0; i < 6; i++) {
try {
System.out.println("Took: " + blockingQueue.take());
Thread.sleep(3000);
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
}
class Test1 {
public static void main(String[] args) throws InterruptedException {
BlockingQueue queue = new ArrayBlockingQueue(3);
Producer producer = new Producer(queue);
Consumer consumer = new Consumer(queue);
new Thread(producer).start();
new Thread(consumer).start();
Thread.sleep(4000);
}
}
This code will print output like
Took: 0
Added 0
Added 1
Added 2
Took: 1
Added 3
Added 4
Took: 2
Added 5
Took: 3
Took: 4
Took: 5
(I'm sure some or all parts of my answer could be something that you have already understood, in that case, please just consider it as a clarification :)).
1. Why did your code example using BlockingQueue get to ‘program hangs’?
1.1 Conceptually
First of all, if we can leave out the implementation level detail such as ‘wait()’, ‘notify()’, etc for a second, conceptually, all implementation in JAVA of BlockingQueue do work to the specification, i.e. like you said:
‘Producer blocks any more put() calls in a queue if it has no more
space. And the opposite, it blocks method take(), if there are no
items to take.’
So, conceptually, the reason that your code example hangs is because
1.1.1.
the thread calling the (synchronized) write() runs first and alone, and not until ‘testing.write()’ returns in this thread, the 2nd thread calling the (synchronized) read() will ever have a chance to run — this is the essence of ‘synchronized’ methods in the same object.
1.1.2.
Now, in your example, conceptually, ‘testing.write()’ will never return, in that for loop, it will ‘put’ the first 3 elements onto the queue and then kinda ‘spin wait’ for the 2nd thread to consume/’take’ some of these elements so it can ‘put’ more, but that will never happen due to aforementioned reason in 1.1.1
1.2 Programmatically
1.2.1.
(For producer) In ArrayBlockingQueue#put, the ‘spin wait’ I mentioned in 1.1.2 took form of
while (count == items.length) notFull.await();
1.2.2.
(For consumer) In ArrayBlockingQueue#take, it calls dequeue(), which in turn calls notFull.signal(), which will end the ‘spin wait’ in 1.2.1
2.Now, back to your original post’s title ‘What is the point of BlockingQueue not being able to work in synchronized Producer/Consumer methods?’.
2.1.
If I take the literal meaning of this question, then an answer could be ‘there are reasons for a convenient BlockingQueue facility to exist in JAVA other than using them in synchronized methods/blocks’, i.e. they can certainly live outside of any ‘synchronized’ structure and facilitate a vanilla producer/consumer implementation.
2.2.
However, if you meant to inquire one step further - Why can’t JAVA BlockQueue implementations work easily/nicely/smoothly in synchronized methods/blocks?
That will be a different question, a valid and interesting one that I am also incidentally puzzling about.
Specifically, see this post for further information (note that in this post, the consumer thread ‘hangs’ because of EMPTY queue and its possession of the exclusive lock, as opposed to your case where the producer thread ‘hangs’ because of FULL queue and its possession of the exclusive lock; but the core of the problems should be the same)
I made a two thread one is get Data and another is save Data.
My problem is it is not handled in the process of storing the data read from Thread1.
I want to extract 1,000,000 elements and create them as file. The element size is so big, So i divide a elements size by 100,000. And then, the loop will run a 10 time. One thread reads a data from other server by 100,000. Another thread takes the data from the first thread and writes it to a file.
My Original scenario is below:
First thread read a total key, value size.
It will be 100,000 ~ 1,000,000. I would assume that i would process 1,000,000 data.Then Count sets 1,000,000. First Thread divide by 100,000 and read a data from server by 100,000. And then, First Thread calls a setData(Key,Value map).
It will loop 10 times.
Second Thread will loop 10 time. first, get a data by calling getMap() method. And It calls writeSeq(hashmap) method. It writes a data to writer stream. It is not yet flush. There is a problem here. It successfully gets a data size by calling getMap(). But, writeSeq method can not process all of size of value. When i gets a size of 100,000, it process as random. It will be 100, 1500, 0, 8203 ...
First Thread is below:
public void run() {
getValueCount(); //initialize value.
while (this.jobFlag) {
getSortedMap(this.count); //count starts the number of all elements size.
//For example, Total size is 1,000,000. Then count will sets a 1,000,000 and it is decreased as 100,000.
// Also setMap() is called in this method.
if (!jobFlag) //If all processing is done, jobFlag is set as false.
break;
}
resetValue();
}
Second Thread is below:
public void run() {
setWriter(); //Writer Stream creates;
double count = 10; //the number of loop.
ConcurrentHashMap<String, String> hash = new ConcurrentHashMap<String,String>();
for (int i = 0; i <= count - 1; i++) {
hash = share.getMap();
writeSeq(hash);
}
closeWriter(); //close Writer stream
}
This is shared source:
import java.util.HashMap;
import java.util.concurrent.ConcurrentHashMap;
public class ShareData {
ConcurrentHashMap<String, String> map;
public synchronized ConcurrentHashMap<String, String> getMap(){
if (this.map == null) {
try {
wait();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
ConcurrentHashMap<String, String> hashmap = map;
this.map = null;
return hashmap;
}
public synchronized void setMap(ConcurrentHashMap<String, String> KV) {
if (this.map != null) {
try {
wait();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
this.map = KV;
notify();
}
}
After that, second thread which save a data is stared. The size of KV is fine, but all values are not processed when foreach is proecessed. Also, each time i create a file, the size is different. Is it problem of synchronized?
public synchronized void writeSeq(ConcurrentHashMap<String, String> KV) {
AtomicInteger a = new AtomicInteger(0);
System.out.println(KV.size()); //ex) 65300
redisKV.entrySet().parallelStream().forEach(
entry -> {
try {
a.incrementAndGet();
writer.append(new Text(entry.getKey()), new Text(entry.getValue()));
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
});
System.out.println(a.get()); //ex) 1300
i = 0;
notify();
}
The size of KV is fine, but all values are not processed when foreach is processed. Also, each time i create a file, the size is different. Is it problem of synchronized?
Unclear. I can see a small issue but it is not likely to cause the problem you describe.
The if (map == null) wait(); code should be a while loop.
The if (map != null) wait(); code should be a while loop.
The issue is that if one thread gets a spurious notify, it may proceed with map in the wrong state. You need to retry the test. (If you read the javadoc for Object, you will see an example that correctly implements a condition variable.)
Apart from that, the root cause of your problem does not appear to be in the code that you have shown us.
However, if I was to take a guess, my guess would be that one thread is adding or removing entries in the ConcurrentHashMap while the second thread is processing it1. The getMap / setMap methods you have shown us have to be used appropriately (i.e. called at appropriate points with appropriate arguments) to avoid the two threads interfering with each other. You haven't shown us that code.
So, if my guess is correct, your problem is a logic error rather than a low level synchronization problem. But if you need a better answer you will need to write and post a proper MCVE.
1 - A ConcurrentHashMap's iterators are weakly consistent. This means that if you update the map while iterating, you may miss entries in the iteration, or possibly see them more than once.
better way is using BlockingQueue, one thread put the queue, another thread take from the queue.
i++; is not thread-safe. You will get a lower count than there are updates. Use AtomicInteger And its incrementAndGet() method instead.
I have a use case with many writer threads and a single reader thread. The data being written is an event counter which is being read by a display thread.
The counter only ever increases and the display is intended for humans, so the exact point-in-time value is not critical. For this purpose, I would consider a solution to be correct as long as:
The value seen by the reader thread never decreases.
Reads are eventually consistent. After a certain amount of time without any writes, all reads will return the exact value.
Assuming writers are properly synchronized with each other, is it necessary to synchronize the reader thread with the writers in order to guarantee correctness, as defined above?
A simplified example. Would this be correct, as defined above?
public class Eventual {
private static class Counter {
private int count = 0;
private Lock writeLock = new ReentrantLock();
// Unsynchronized reads
public int getCount() {
return count;
}
// Synchronized writes
public void increment() {
writeLock.lock();
try {
count++;
} finally {
writeLock.unlock();
}
}
}
public static void main(String[] args) {
List<Thread> contentiousThreads = new ArrayList<>();
final Counter sharedCounter = new Counter();
// 5 synchronized writer threads
for(int i = 0; i < 5; ++i) {
contentiousThreads.add(new Thread(new Runnable(){
#Override
public void run() {
for(int i = 0; i < 20_000; ++i) {
sharedCounter.increment();
safeSleep(1);
}
}
}));
}
// 1 unsynchronized reader thread
contentiousThreads.add(new Thread(new Runnable(){
#Override
public void run() {
for(int i = 0; i < 30; ++i) {
// This value should:
// +Never decrease
// +Reach 100,000 if we are eventually consistent.
System.out.println("Count: " + sharedCounter.getCount());
safeSleep(1000);
}
}
}));
contentiousThreads.stream().forEach(t -> t.start());
// Just cleaning up...
// For the question, assume readers/writers run indefinitely
try {
for(Thread t : contentiousThreads) {
t.join();
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
private static void safeSleep(int ms) {
try {
Thread.sleep(ms);
} catch (InterruptedException e) {
//Don't care about error handling for now.
}
}
}
There is no guarantee that the readers would ever see an update to the count. A simple fix is to make count volatile.
As noted in another answer, in your current example, the "Final Count" will be correct because the main thread is joining the writer threads (thus establishing a happens-before relationship). however, your reader thread is never guaranteed to see any update to the count.
JTahlborn is correct, +1 from me. I was rushing and misread the question, I was assuming wrongly that the reader thread was the main thread.
The main thread can display the final count correctly due to the happens-before relationship:
All actions in a thread happen-before any other thread successfully returns from a join on that thread.
Once the main thread has joined to all the writers then the counter's updated value is visible. However, there is no happens-before relationship forcing the reader's view to get updated, you are at the mercy of the JVM implementation. There is no promise in the JLS about values getting visible if enough time passes, it is left open to the implementation. The counter value could get cached and the reader could possibly not see any updates whatsoever.
Testing this on one platform gives no assurance of what other platforms will do, so don't think this is OK just because the test passes on your PC. How many of us develop on the same platform we deploy to?
Using volatile on the counter or using AtomicInteger would be good fixes. Using AtomicInteger would allow removing the locks from the writer thread. Using volatile without locking would be OK only in a case where there is just one writer, when two or more writers are present then ++ or += not being threadsafe will be an issue. Using an Atomic class is a better choice.
(Btw eating the InterruptedException isn't "safe", it just makes the thread unresponsive to interruption, which happens when your program asks the thread to finish early.)
I have two threads both of which accesses an Vector. t1 adds a random number, while t2 removes and prints the first number. Below is the code and the output. t2 seems to execute only once (before t1 starts) and terminates forever. Am I missing something here? (PS: Tested with ArrayList as well)
import java.util.Random;
import java.util.Vector;
public class Main {
public static Vector<Integer> list1 = new Vector<Integer>();
public static void main(String[] args) throws InterruptedException {
System.out.println("Main started!");
Thread t1 = new Thread(new Runnable() {
#Override
public void run() {
System.out.println("writer started! ");
Random rand = new Random();
for(int i=0; i<10; i++) {
int x = rand.nextInt(100);
list1.add(x);
System.out.println("writer: " + x);
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
});
Thread t2 = new Thread(new Runnable() {
#Override
public void run() {
System.out.println("reader started! ");
while(!list1.isEmpty()) {
int x = list1.remove(0);
System.out.println("reader: "+x);
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
});
t2.start();
t1.start();
t1.join();
t2.join();
}
}
Output:
Main started!
reader started!
writer started!
writer: 40
writer: 9
writer: 23
writer: 5
writer: 41
writer: 29
writer: 72
writer: 73
writer: 95
writer: 46
This sounds like a toy to understand concurrency, so I didn't mention it before, but I will now (at the top because it is important).
If this is meant to be production code, don't roll your own. There are plenty of well implemented (debugged) concurrent data structures in java.util.concurrent. Use them.
When consuming, you need to not shutdown your consumer based on "all items consumed". This is due to a race condition where the consumer might "race ahead" of the producer and detect an empty list only because the producer hasn't yet written the items for consumption.
There are a number of ways to accomplish a shutdown of the consumer, but none of them can be done by looking at the data to be consumed in isolation.
My recommendation is that the producer "signals" the consumer when the producer is done producing. Then the consumer will stop when it has both the "signal" no more data is being produced AND the list is empty.
Alternative techniques include creating a "shutdown" item. The "producer" adds the shutdown item, and the consumer only shuts down when the "shutdown" item is seen. If you have a group of consumers, keep in mind that you shouldn't remove the shutdown item (or only one consumer would shutdown).
Also, the consumer could "monitor" the producer, such that if the producer is "alive / existent" and the list is empty, the consumer assumes that more data will become available. Shutdown occurs when the producer is dead / non-existent AND no data is available.
Which technique you use will depend on the approach you prefer and the problem you're trying to solve.
I know that people like the elegant solutions, but if your single producer is aware of the single consumer, the first option looks like.
public class Producer {
public void shutdown() {
addRemainingItems();
consumer.shutdown();
}
}
where the Consumer looks like {
public class Consumer {
private boolean shuttingDown = false;
public void shutdown() {
shuttingDown = true;
}
public void run() {
if (!list.isEmpty() && !shuttingDown) {
// pull item and process
}
}
}
Note that such lack of locking around items on the list is inherently dangerous, but you stated only a single consumer, so there's no contention for reading from the list.
Now if you have multiple consumers, you need to provide protections to assure that a single item isn't pulled by two threads at the same time (and need to communicate in such a manner that all threads shutdown).
I think this is a typical Producer–consumer problem. Try to have a look into Semaphore.
Update: The issue`s gone after changing the while loop in the consumer (reader). Instead of exiting the thread if the list is empty, it now enters the loop but does not do anything. Below is the updated reader thread. Of course a decent shutdown mechanism can also be added to the code such as Edwin suggested.
public void run() {
System.out.println("reader started! ");
while(true) {
if(!list1.isEmpty()) {
int x = list1.remove(0);
System.out.println("reader: "+x);
try {
Thread.sleep(100);
} catch (InterruptedException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}
}
Please note, this is not a code snippet taken from a real product or will go in one!