How to get the Thread trace during reading a File - java

I have File and I want to do the following task: (just to get more knowledge about the thread reading and writing file.)
When an application starts andthe file is read I want to have information about all the streams which are open and how many threads are reading from the same stream.
Is there a way I can have all the stream information via reflection . Is there another way ?

I would suggest some sort of StreamFactory class that will maintain that information for you. Threads can then do a InputStream getStream(File) and closeStream(InputStream) or some such and the factory will maintain the list of which thread has what streams open and provide some statistics functions such as:
public Collection<InputStream> getOpenStreams()
and
public int getNumThreadsWithStream(InputStream);

I believe this is something you have to keep track of yourself. If you are sharing a file stream between threads (and I suggest you don't do this, use one thread for reading the stream and pass work to a thread pool if you need too.) you can keep track of all the Thread which have attempted to read the stream.

Related

How can I get two threads to read from one inputStream?

I have on input stream coming in that is periodically receiving data. One of my threads (let's call it threadA) reads every message from the stream and makes sure the data is ok, but will through an error otherwise. My other thread (let's call it threadB) needs to read a few specific messages and then process it. As of now I have threadA just store the important messages in a global variable, and threadB read the messages from the global variable.
Is there any way to allow for two threads to read from the same source to avoid this?
edit: the data coming in are responses to commands threadB issued. My issue is that threadB needs the replies from certain commands, which are issued in no particular pattern, but it does not need all the replies.
You probably could create a threadsafe inputstream or a wrapper and if the stream supports mark/reset you could also have two streams read the data in parallel. However, you'd have to handle situations where one thread reads faster than the other thus making mark/reset unusable or having to skip data - there's so much involved, I doubt you'll want to bother with all this.
I'd suggest you keep your basic setup but try to get rid of global variables, e.g. by using the obverser pattern, passing references to the shared store to the threads etc.

Should multiple threads read from the same DataInputStream?

I'd like my program to get a file, and then create 4 files based on its byte content.
Working with only the main thread, I just create one DataInputStream and do my thing sequentially.
Now, I'm interested in making my program concurrent. Maybe I can have four threads - one for each file to be created.
I don't want to read the file's bytes into memory all at once, so my threads will need to query the DataInputStream constantly to stream the bytes using read().
What is not clear to me is, should my 4 threads call read() on the same DataInputStream, or should each one have their own separate stream to read from?
I don't think this is a good idea. See http://download.java.net/jdk7/archive/b123/docs/api/java/io/DataInputStream.html
DataInputStream is not necessarily safe for multithreaded access. Thread safety is optional and is the responsibility of users of methods in this class.
Assuming you want all of the data in each of your four new files, each thread should create its own DataInputStream.
If the threads share a single DataInputStream, at best each thread will get some random quarter of the data. At worst, you'll get a crash or data corruption due to multithreaded access to code that is not thread safe.
If you want to read data from 1 file into 4 separate ones you will not share DataInputStream. You can however wrap that stream and add functionality that would make it thread safe.
For example you may want to read in a chunk of data from your DataInputStream and cache that small chunk. When all 4 threads have read the chunk you can dispose of it and continue reading. You would never have to load the complete file into memory. You would only have to load a small amount.
If you look at the doc of DataInputStream. It is a FilterInputStream, which means the read operation is delegated to other inputStream. Suppose you use here is a FileInputStream, In most platform, concurrent read will be supported.
So in your case, you should initialize four different FileInputStream, result in four DataInputStream, used in four thread separately. The read operation will not be interfered.
Short answer is no.
Longer answer: have a single thread read the DataInputStream, and put the data into one of four Queues, one per output file. Decide which Queue based upon the byte content.
Have four threads, each one reading from a Queue, that write to the output files.

using ThreadPoolTaskExecutor while transferring files over the network

I am currently writing a spring batch which is supposed to transfer all the files from my application to a shared location. The batch consists of a single step which consists of a reader which reads byte[], a processor that converts it to pdf and a writer that creates new file at the shared location.
1) Since its an IO bound operation should I use ThreadPoolTaskExecutor in my batch? Will using it cause the data to be lost?
2) Also in my ItemWriter I am writing using a FileOutputStream. My server is in paris and the shared location is in New York. SO while writing the file in such scenarios is there any better or effecient way to achieve this with least delay?
Thanks in advance
1) If you can separate the IO-bound operation into its own thread and other parts into their own threads, you could go for a multithreaded approach. Data won't be lost if you write it correctly.
One approach could be as follows:
Have one reader thread read the data into a buffer.
Have a second thread perform the transformation into PDF.
Have a third thread perform the write out (if it's not to the same
disk as the reading).
2) So it's a mapped network share? There's probably not much you can do from Java then. But before you get worried about it, you should make sure it's an actual issue and not premature optimization.
I guess you can achieve the above task by partitioning. Create Master step which returns you the file paths and slave task with multi-thread with two approach.
simple tasklet which would read the file /convert to pdf and write to shared drive.
Rather than using FileOutputStream use FileChannel with BufferedRead/Write it has much better performance
Use Chunk Reader specifically ItemStream and pass it to custom ItemWriter to write to file(Never got chance to write to pdf but I believe at the end of the day it will be a stream with different encoding)
I would recommand the second one, its always better to use lib rather than using custom code

PipedOutputStream in HashMap for threads

I have couple of threads that run in background. They do share a common HashMap.
Is it possible to store (safely) PipedOutputStream there?
I have this following scenario:
When first background thread receives a specific event, it should start read text data from a huge file into a buffer.
Second background thread (they are independent) should be notified somehow and then read data from the buffer (pipe) as it arrives.
Because all threads can access the HashMap, is it ok to store there all the streams?
You can use a ConcurrentHashMap. I don't see much point in using a Pipe here as files will be read by the OS in advance of where you are reading anyway.
HashMap is not synchronized, so you have to add your logic or use a synchronized collection. As for the streams, ince you can only write to an OutputStream and only read from an InputStream, you will have no problem writing from one thread and reading from another.

Do Java sockets support full duplex?

Is it possible to have one thread write to the OutputStream of a Java Socket, while another reads from the socket's InputStream, without the threads having to synchronize on the socket?
Sure. The exact situation you're describing shouldn't be a problem (reading and writing simultaneously).
Generally, the reading thread will block if there's nothing to read, and might timeout on the read operation if you've got a timeout specified.
Since the input stream and the output stream are separate objects within the Socket, the only thing you might concern yourself with is, what happens if you had 2 threads trying to read or write (two threads, same input/output stream) at the same time? The read/write methods of the InputStream/OutputStream classes are not synchronized. It is possible, however, that if you're using a sub-class of InputStream/OutputStream, that the reading/writing methods you're calling are synchronized. You can check the javadoc for whatever class/methods you're calling, and find that out pretty quick.
Yes, that's safe.
If you wanted more than one thread reading from the InputStream you would have to be more careful (assuming you are reading more than one byte at a time).

Categories

Resources