I have couple of threads that run in background. They do share a common HashMap.
Is it possible to store (safely) PipedOutputStream there?
I have this following scenario:
When first background thread receives a specific event, it should start read text data from a huge file into a buffer.
Second background thread (they are independent) should be notified somehow and then read data from the buffer (pipe) as it arrives.
Because all threads can access the HashMap, is it ok to store there all the streams?
You can use a ConcurrentHashMap. I don't see much point in using a Pipe here as files will be read by the OS in advance of where you are reading anyway.
HashMap is not synchronized, so you have to add your logic or use a synchronized collection. As for the streams, ince you can only write to an OutputStream and only read from an InputStream, you will have no problem writing from one thread and reading from another.
Related
I have on input stream coming in that is periodically receiving data. One of my threads (let's call it threadA) reads every message from the stream and makes sure the data is ok, but will through an error otherwise. My other thread (let's call it threadB) needs to read a few specific messages and then process it. As of now I have threadA just store the important messages in a global variable, and threadB read the messages from the global variable.
Is there any way to allow for two threads to read from the same source to avoid this?
edit: the data coming in are responses to commands threadB issued. My issue is that threadB needs the replies from certain commands, which are issued in no particular pattern, but it does not need all the replies.
You probably could create a threadsafe inputstream or a wrapper and if the stream supports mark/reset you could also have two streams read the data in parallel. However, you'd have to handle situations where one thread reads faster than the other thus making mark/reset unusable or having to skip data - there's so much involved, I doubt you'll want to bother with all this.
I'd suggest you keep your basic setup but try to get rid of global variables, e.g. by using the obverser pattern, passing references to the shared store to the threads etc.
I'd like my program to get a file, and then create 4 files based on its byte content.
Working with only the main thread, I just create one DataInputStream and do my thing sequentially.
Now, I'm interested in making my program concurrent. Maybe I can have four threads - one for each file to be created.
I don't want to read the file's bytes into memory all at once, so my threads will need to query the DataInputStream constantly to stream the bytes using read().
What is not clear to me is, should my 4 threads call read() on the same DataInputStream, or should each one have their own separate stream to read from?
I don't think this is a good idea. See http://download.java.net/jdk7/archive/b123/docs/api/java/io/DataInputStream.html
DataInputStream is not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class.
Assuming you want all of the data in each of your four new files, each thread should create its own DataInputStream.
If the threads share a single DataInputStream, at best each thread will get some random quarter of the data. At worst, you'll get a crash or data corruption due to multithreaded access to code that is not thread safe.
If you want to read data from 1 file into 4 separate ones you will not share DataInputStream. You can however wrap that stream and add functionality that would make it thread safe.
For example you may want to read in a chunk of data from your DataInputStream and cache that small chunk. When all 4 threads have read the chunk you can dispose of it and continue reading. You would never have to load the complete file into memory. You would only have to load a small amount.
If you look at the doc of DataInputStream. It is a FilterInputStream, which means the read operation is delegated to other inputStream. Suppose you use here is a FileInputStream, In most platform, concurrent read will be supported.
So in your case, you should initialize four different FileInputStream, result in four DataInputStream, used in four thread separately. The read operation will not be interfered.
Short answer is no.
Longer answer: have a single thread read the DataInputStream, and put the data into one of four Queues, one per output file. Decide which Queue based upon the byte content.
Have four threads, each one reading from a Queue, that write to the output files.
Is there a way to have one thread in java make a read call to some FileInputStream or similar and have a second thread processing the bytes being loaded at the same time?
I've tried a number of things - my current attempt has one thread running this:
FileChannel inStream;
try {
inStream = (new FileInputStream(inFile)).getChannel();
} catch (FileNotFoundException e) {
e.printStackTrace();
}
int result;
try {
result = inStream.read(inBuffer);
} ...
And a second thread wanting to access the bytes as they are being loaded. Clearly the read call in the first thread blocks until the buffer is full, but I want to be able to access the bytes loaded into the buffer before that point. Currently, everything I try has the buffer and it's backing array unchanged until the read completes - this not only defeats the point of this threading but also suggests the data is being loaded into some intermediate buffer somewhere and then copied into my buffer later, which seems daft.
One option would be to do a bunch of smaller reads into the array with offsets on subsequent reads, but that adds extra overhead.
Any ideas?
When you read data sequentially, the OS will read ahead the data before you need it. As the system is doing this for you already, you may not get the benefit you might expect.
why can't I just make my Filechannel or FileInputStream "flow" into my ByteBuffer or some byte array?
That is sort of what it does already.
If you want a more seamless loading of the data, you can use a memory mapped files as it "appears" in the memory of the program immediately and is loaded in the background as you use it.
What I usually do with requirements like this is to use multiple buffer class instances, preferably sized to allow efficient loading - a multiple of cluster-size, say. As soon as the first buffer gets loaded up, queue it off, (ie. push its pointer/instance onto a producer-consumer queue), to the thread that will process it and immediately create, (or depool), another buffer instance and start loading that one. To control overall data flow, you can create a suitable number of buffer objects at startup and store them in a 'pool queue', (another producer-consumer queue), and then you can circulate the objects full of data from the pool, to the file-read thread, then to the buffer-processing thread, than back to the pool.
This keeps the file->processing queue 'topped up' with buffer-objects full of data, no bulk copying required, no unavoidable delays, no inefficient inter-thread comms of single bytes, no messy locking of buffer-indexes, no chance that the file-read thread and data-processing thread can ever operate on the same buffer object.
If you want/need to use a threadPool to perform the processing, you can easily do so but you may need a sequence-number in the buffer objects if you need any resulting output from this subsystem to be in the same order as it was read from the file.
The buffer-objects may also contain result data members, exception/errorMessage fields, anything that you might want. The file and/or result data could easily be forwarded on to other thread/s from the data-processing, (eg. a logger or GUI display of progress), before getting repooled. Since it's all just pointer/instance queueing, the huge amount of data wil lflow around your system quickly and efficiently.
I would recommend to use SynchronousQueue. Reader will retrieve data from the queue and writer will "publish" the data from your file.
Use a PipedInput/OutputStream to create a familiar looking pipe with a buffer.?
Also use a FileInputStream to read it byte per byte if necessary. the fis.read() function will not block, it will return -1 if there is no data and you can always check for available();
I have File and I want to do the following task: (just to get more knowledge about the thread reading and writing file.)
When an application starts andthe file is read I want to have information about all the streams which are open and how many threads are reading from the same stream.
Is there a way I can have all the stream information via reflection . Is there another way ?
I would suggest some sort of StreamFactory class that will maintain that information for you. Threads can then do a InputStream getStream(File) and closeStream(InputStream) or some such and the factory will maintain the list of which thread has what streams open and provide some statistics functions such as:
public Collection<InputStream> getOpenStreams()
and
public int getNumThreadsWithStream(InputStream);
I believe this is something you have to keep track of yourself. If you are sharing a file stream between threads (and I suggest you don't do this, use one thread for reading the stream and pass work to a thread pool if you need too.) you can keep track of all the Thread which have attempted to read the stream.
Is it possible to have one thread write to the OutputStream of a Java Socket, while another reads from the socket's InputStream, without the threads having to synchronize on the socket?
Sure. The exact situation you're describing shouldn't be a problem (reading and writing simultaneously).
Generally, the reading thread will block if there's nothing to read, and might timeout on the read operation if you've got a timeout specified.
Since the input stream and the output stream are separate objects within the Socket, the only thing you might concern yourself with is, what happens if you had 2 threads trying to read or write (two threads, same input/output stream) at the same time? The read/write methods of the InputStream/OutputStream classes are not synchronized. It is possible, however, that if you're using a sub-class of InputStream/OutputStream, that the reading/writing methods you're calling are synchronized. You can check the javadoc for whatever class/methods you're calling, and find that out pretty quick.
Yes, that's safe.
If you wanted more than one thread reading from the InputStream you would have to be more careful (assuming you are reading more than one byte at a time).