Java Multithreaded I/0 and communication problem - java

I am using java to create an application for network management. In this application I establish communication with network devices using SNMP4j library (for the snmp protocol). So, Im supposed to scan certain values of the network devices using this protocol and put the result into a file for caching. Up in some point I decided to make my application multi-threaded and assign a device to a thread. I created a class that implements the runnable interface and then scans for the values that I want for each device.
When i run this class alone it, works fine. but when I put multiple threads at the same time the output mess up, it prints additional or out of order output into the files. Now, i wonder if this problem is due to the I/O or due to the communication.
Here I'll put some of the code so that you can see what im doing and help me figure what's wrong.
public class DeviceScanner implements Runnable{
private final SNMPCommunicator comm;
private OutputStreamWriter out;
public DeviceScanner(String ip, OutputStream output) throws IOException {
this.device=ip;
this.comm = new SNMPV1Communicator(device);
oids=MIB2.ifTableHeaders;
out = new OutputStreamWriter(output);
}
#Override
public void run(){
//Here I use the communicator to request for desired data goes something like ...
String read=""
for (int j=0; j<num; j++){
read= comm.snmpGetNext(oids);
out.write(read);
this.updateHeaders(read);
}
out.flush();
//...
}
}
some of the expected ooutput would be something like:
1.3.6.1.2.1.1.1.0 = SmartSTACK ELS100-S24TX2M
1.3.6.1.2.1.1.2.0 = 1.3.6.1.4.1.52.3.9.1.10.7
1.3.6.1.2.1.1.3.0 = 26 days, 22:35:02.31
1.3.6.1.2.1.1.4.0 = admin
1.3.6.1.2.1.1.5.0 = els
1.3.6.1.2.1.1.6.0 = Computer Room
but instead i get something like (varies):
1.3.6.1.2.1.1.1.0 = SmartSTACK ELS100-S24TX2M
1.3.6.1.2.1.1.2.0 = 1.3.6.1.4.1.52.3.9.1.10.7
1.3.6.1.2.1.1.4.0 = admin
1.3.6.1.2.1.1.5.0 = els
1.3.6.1.2.1.1.3.0 = 26 days, 22:35:02.31
1.3.6.1.2.1.1.6.0 = Computer Room
1.3.6.1.2.1.1.1.0 = SmartSTACK ELS100-S24TX2M
1.3.6.1.2.1.1.2.0 = 1.3.6.1.4.1.52.3.9.1.10.7
*Currently I have one file per device scanner desired.
i get them from a list of ip , it looks like this. Im also using a little threadpool to keep a limited number of threads at the same time .
for (String s: ips){
output= new FileOutputStream(new File(path+s));
threadpool.add(new DeviceScanner(s, output));
}

I suspect SNMPV1Communicator(device) is not thread-safe. As I can see it's not a part of SNMP4j library.

Taking a wild guess at what's going on here, try putting everything inside a synchronized() block, like this:
synchronized (DeviceScanner.class)
{
for (int j=0; j<num; j++){
read= comm.snmpGetNext(oids);
out.write(read);
this.updateHeaders(read);
}
out.flush();
}
If this works, my guess is right and the reason for the problems you're seeing is that you have many OutputStreamWriters (one on each thread), all writing to a single OutputStream. Each OutputStreamWriter has its own buffer. When this buffer is full, it passes the data to the OutputStream. It's essentially random when each each OutputStreamWriter's buffer is full - it might well be in the middle of a line.
The synchronized block above means that only one thread at a time can be writing to that thread's OutputStreamWriter. The flush() at the end means that before leaving the synchronized block, the OutputStreamWriter's buffer should have been flushed to the underlying OutputStream.
Note that synchronizing in this way on the class object isn't what I'd consider best practice. You should probably be looking at using a single instance of some other kind of stream class - or something like a LinkedBlockingQueue, with all of the SNMP threads passing their data over to a single file-writing thread. I've added the synchronized as above because it was the only thing available to synchronize on within your pasted example code.

You've got multiple threads all using buffered output, and to the same file.
There's no guarantees as to when those threads will be scheduled to run ... the output will be fairly random ordered, dictated by the thread scheduling.

Related

Example on how to use TFileTransport in Thrift (Client/Server)

Is there anyone who managed to get TFileTransport as a transport layer, to work? I've tried but since there is no documentation (or have I not found it?) for this, I am not able to make it work.
If anyone have been more successful and could provide some sample code, it would be great.
edit:
What I've tried so far:
public class FileThriftServer {
public static void startThriftServer(
ThriftDataBenchmark.Processor<ThriftDataBenchmarkHandler> processor) {
try {
File input = new File("ThriftFile.in");
if(!input.exists()){
input.createNewFile();
}
File output = new File("ThriftFile.out");
if(!output.exists()){
output.createNewFile();
}
TFileTransport inputFileTransport = new TFileTransport(input.getAbsolutePath(), true);
TFileTransport outputFileTransport = new TFileTransport(output.getAbsolutePath(), false);
inputFileTransport.open();
outputFileTransport.open();
TFileProcessor fProcessor =
new TFileProcessor(processor, new TJSONProtocol.Factory(), inputFileTransport, outputFileTransport);
// this results in error in case I don't call those open methods above
fProcessor.processChunk();
System.out.println("File Thrift service started ...");
} catch (Exception e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
// ThriftDataBenchmarkHandler is an implementation of my test service
startThriftServer(new ThriftDataBenchmark.Processor<ThriftDataBenchmarkHandler>(
new ThriftDataBenchmarkHandler()));
}
}
Now I don't know if I am even on a good way, maybe I misunderstood the concept of this transport (again, it is not documented). I would expect I start the server by some method now which will listen on the input file. When clients put there something, it would process it and write the answer to output file (I didn't try to write client yet since this peace of code just executes and exists, it is obviously not right).
edit 2:
Ok, so if I understand it right, this code is ok and it should process one request of the client, if it's there. So I am moving to the client side, doing something like this:
File input = new File(THRIFT_INPUT_FILE_PATH);
if (!input.exists()) {
input.createNewFile();
}
TTransport transport = new TFileTransport(input.getAbsolutePath(),
false);
TProtocol protocol = new TJSONProtocol(transport);
ThriftDataBenchmark.Client client = new ThriftDataBenchmark.Client(
protocol);
// my testing service, the parameters are not important
SimpleCompany company = client.getSimpleCompanyData("token", 42);
Unfortunatelly calling getSimpleCompanyData results in:
org.apache.thrift.transport.TTransportException: Not Supported
at org.apache.thrift.transport.TFileTransport.write(TFileTransport.java:572)
at org.apache.thrift.transport.TTransport.write(TTransport.java:105)
at org.apache.thrift.protocol.TJSONProtocol.writeJSONArrayStart(TJSONProtocol.java:476)
at org.apache.thrift.protocol.TJSONProtocol.writeMessageBegin(TJSONProtocol.java:487)
at org.apache.thrift.TServiceClient.sendBase(TServiceClient.java:62)
It's a bit confusing that server side requires input and output transport but on the client side, it only accepts one. How does it read an answer and from where?
Let's not move into some extra logic of checking the file for changes, if it's not already part of Thrift. I'll be ok at this point by doing it manually in sense of: running the client first, then running the server side.
I would expect I start the server by some method now which will listen on the input file. When clients put there something, it would process it and write the answer to output file (I didn't try to write client yet since this peace of code just executes and exists, it is obviously not right).
That's exactly right. In particular, the fProcessor.processChunk() call you used will process exactly one chunk (the current one). The whole class looks as designed around the assumption that the file size is static and does not change over time. However, the underlying TFileTransport supports what's called a tailPolicy, used when a read call hits EOF:
public class TFileTransport extends TTransport {
public static enum tailPolicy {
NOWAIT(0, 0),
WAIT_FOREVER(500, -1);
/**
* Time in milliseconds to sleep before next read
* If 0, no sleep
*/
public final int timeout_;
/**
* Number of retries before giving up
* if 0, no retries
* if -1, retry forever
*/
public final int retries_;
// ... ctor ...
}
/**
* Current tailing policy
*/
tailPolicy currentPolicy_ = tailPolicy.NOWAIT;
Another option to get it to work could be calling fProcessor.processChunk(int chunkNum), watching the file contents separately and repeat the calls when new data come in. It's certainly not such a bad idea to use the TFileProcessor as a starting point and improve it as needed.
// this results in error in case I don't call those open methods above
fProcessor.processChunk();
Opening the transports before using is fine. I think that part is ok.
org.apache.thrift.transport.TTransportException: Not Supported
at org.apache.thrift.transport.TFileTransport.write(TFileTransport.java:572)
at org.apache.thrift.transport.TTransport.write(TTransport.java:105)
Unfortunately, that seems pretty correct yet. The only place where writing is implemented is the code in the C++ library. Both Java and D only support reading (yet).

Named pipes in Java and multithreading

Am I correct I's suppose that within the bounds of the same process having 2 threads reading/writing to a named pipe does not block reader/writer at all? So with wrong timings it's possible to miss some data?
And in case of several processes - reader will wait until some data is available, and writer will be blocked until reader will read all the data supplied by reader?
I am planning to use named pipe to pass several (tens, hundreds) of files from external process and consume ones in my Java application. Writing simple unit tests to use one thread for writing to the pipe, and another one - for reading from the pipe, resulted in sporadic test failures because of missing data chunks.
I think it's because of the threading and same process, so my test is not correct in general. Is this assumption correct?
Here is some sort of example which illustrates the case:
import java.io.{FileOutputStream, FileInputStream, File}
import java.util.concurrent.Executors
import org.apache.commons.io.IOUtils
import org.junit.runner.RunWith
import org.scalatest.FlatSpec
import org.scalatest.junit.JUnitRunner
#RunWith(classOf[JUnitRunner])
class PipeTest extends FlatSpec {
def md5sum(data: Array[Byte]) = {
import java.security.MessageDigest
MessageDigest.getInstance("MD5").digest(data).map("%02x".format(_)).mkString
}
"Pipe" should "block here" in {
val pipe = new File("/tmp/mypipe")
val srcData = new File("/tmp/random.10m")
val md5 = "8e0a24d1d47264919f9d47f5223c913e"
val executor = Executors.newSingleThreadExecutor()
executor.execute(new Runnable {
def run() {
(1 to 10).foreach {
id =>
val fis = new FileInputStream(pipe)
assert(md5 === md5sum(IOUtils.toByteArray(fis)))
fis.close()
}
}
})
(1 to 10).foreach {
id =>
val is = new FileInputStream(srcData)
val os = new FileOutputStream(pipe)
IOUtils.copyLarge(is, os)
os.flush()
os.close()
is.close()
Thread.sleep(200)
}
}
}
without Thread.sleep(200) the test is failing to pass for reasons
broken pipe exception
incorrect MD5 sum
with this delay set - it works just great. I am using file with 10 megabytes of random data.
This is a very simple race condition in your code: you're writing fixed-size messages to the pipe, and assuming that you can read the same messages back. However, you have no idea how much data is available in the pipe for any given read.
If you prefix your writes with the number of bytes written, and ensure that each read only reads that number of bytes, you'll see that pipes work exactly as advertised.
If you have a situation with multiple writers and/or multiple readers, I recommend using an actual message queue. Actually, I recommend using a message queue in any case, as it solves the issue of message boundary demarcation; there's little point in reinventing that particular wheel.
Am I correct I's suppose that within the bounds of the same process having 2 threads reading/writing to a named pipe does not block reader/writer at all?
Not unless you are using non-blocking I/O, which you aren't.
So with wrong timings it's possible to miss some data?
Not unless you are using non-blocking I/O, which you aren't.

Sharing a resource among Threads, different behavior in different java versions

This is the first time I've encountered something like below.
Multiple Threads (Inner classes implementing Runnable) sharing a Data Structure (instance variable of the upper class).
Working: took classes from Eclipse project's bin folder, ran on a Unix machine.
NOT WORKING: directly compiled the src on Unix machine and used those class files. Code compiles and then runs with no errors/warnings, but one thread is not able to access shared resource properly.
PROBLEM: One thread adds elements to the above common DS. Second thread does the following...
while(true){
if(myArrayList.size() > 0){
//do stuff
}
}
The Log shows that the size is updated in Thread 1.
For some mystic reason, the workflow is not enetering if() ...
Same exact code runs perfectly if I directly paste the class files from Eclipse's bin folder.
I apologize if I missed anything obvious.
Code:
ArrayList<CSRequest> newCSRequests = new ArrayList<CSRequest>();
//Thread 1
private class ListeningSocketThread implements Runnable {
ServerSocket listeningSocket;
public void run() {
try {
LogUtil.log("Initiating...");
init(); // creates socket
processIncomongMessages();
listeningSocket.close();
} catch (IOException e) {
e.printStackTrace();
}
}
private void processIncomongMessages() throws IOException {
while (true) {
try {
processMessage(listeningSocket.accept());
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
}
}
private void processMessage(Socket s) throws IOException, ClassNotFoundException {
// read message
ObjectInputStream ois = new ObjectInputStream(s.getInputStream());
Object message = ois.readObject();
LogUtil.log("adding...: before size: " + newCSRequests.size());
synchronized (newCSRequests) {
newCSRequests.add((CSRequest) message);
}
LogUtil.log("adding...: after size: " + newCSRequests.size()); // YES, THE SIZE IS UPDATED TO > 0
//closing....
}
........
}
//Thread 2
private class CSRequestResponder implements Runnable {
public void run() {
LogUtil.log("Initiating..."); // REACHES..
while (true) {
// LogUtil.log("inside while..."); // IF NOT COMMENTED, FLOODS THE CONSOLE WITH THIS MSG...
if (newCSRequests.size() > 0) { // DOES NOT PASS
LogUtil.log("inside if size > 0..."); // NEVER REACHES....
try {
handleNewCSRequests();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
....
}
UPDATE
Solution was to add synchronized(myArrayList) before I check the size in the Thread 2.
To access a shared structure in a multi-threaded environment, you should use implicit or explicit locking to ensure safe publication and access among threads.
Using the code above, it should look like this:
while(true){
synchronized (myArrayList) {
if(myArrayList.size() > 0){
//do stuff
}
}
//sleep(...) // outside the lock!
}
Note: This pattern looks much like a producer-consumer and is better implemented using a queue. LinkedBlockingQueue is a good option for that and provides built-in concurrency control capabilities. It's a good structure for safe publishing of data among threads.
Using a concurrent data structure lets you get rid of the synchronized block:
Queue queue = new LinkedBlockingQueue(...)
...
while(true){
Data data = queue.take(); // this will wait until there's data in the queue
doStuff(data);
}
Every time you modify a given shared variable inside a parallel region (a region with multiple threads running in parallel) you must ensure mutual exclusion. You can guarantee mutual exclusion in Java by using synchronized or locks, normally you use locks when you want a finer grain synchronization.
If the program only performance reads on a given shared variable there is no need for synchronized/lock the accesses to this variable.
Since you are new in this subject I recommend you this tutorial
If I got this right.. There are at least 2 threads that work with the same, shared, datastructure. The array you mentioned.. One thread adds values to the array and the second thread "does stuff" if the size of the array > 0.
There is a chance that the thread scheduler ran the second thread (that checks if the collection is > 0), before the first thread got a chance to run and add a value.
Running the classes from bin or recompiling them has nothing to do. If you were to run the application over again from the bin directory, you might seen the issue again. How many times did you ran the app?
It might not reproduce consistently but at one point you might see the issue again.
You could access the datastruce in a serial fashion, allowing only one thread at a time to access the array. Still that does not guarantee that the first thread will run and only then the second one will check if the size > 0.
Depending on what you need to accomplish, there might be better / other ways to achieve that. Not necessarily using a array to coordinate the threads..
Check the return of
newCSRequests.add((CSRequest) message);
I am guessing its possible that it didn't get added for some reason. If it was a HashSet or similar, it could have been because the hashcode for multiple objects return the same value. What is the equals implementation of the message object?
You could also use
List list = Collections.synchronizedList(new ArrayList(...));
to ensure the arraylist is always synchronised correctly.
HTH

java Multithreading (newCachedThreadPool ), then writing result to one file?

I have tried to do this with simple Threads and succeded but I believe that using Threadpool I could do the same thing more effeciently:)?
simple threads:
public static class getLogFile implements Runnable {
private String file;
public void setFilename(String namefile){
file=namefile;
}
public int run1(String Filenamet) {
connectToServer(XXX, Filenamet, XXX, XXX, XXX, XXX);//creates a file and downloads it
return 0;
}
public void run() {
run1(file);
}
}
in main:
for(x=0 ; x < 36 ; x++){
String Filename1=Filename+x;
getLogFile n=new getLogFile();
n.setFilename(Filename1);
(new Thread(n)).start();
}
Program connects to the server executes 36 commands(using threadpool/simplethreads?!) at the same time and either downloads 36 result files, than merges them, or maybe it could just write to one file on server and then download it?
how to transform this code into threadpools?
how to write data to one file from 36 threads?
I can only offer you directions.
In order to use thread pool, look how ServiceExecutor works. Any example from Google will give you enough information. As an example look at:
http://www.deitel.com/articles/java_tutorials/20051126/JavaMultithreading_Tutorial_Part4.html
Concerning writing 36 threads to its own file, or writing into the one file. I cannot say anything about writing by several threads into the same file, but you may use CyclicBarrier to wait event when all threads will finish writing. Example of its using you may find here:
http://download.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/CyclicBarrier.html
It's not clear what you want to do. My thoughts are that creating 36 separate connections to the server will be a sizable load that it could well do without.
Can the server assemble these 36 files itself and look after the threading itself ? That seems a more logical partition of duties. The server would have knowledge of how parallelisable this work is, and there's a substantial impact on the server of servicing multiple connections (including potentially blocking out other clients).
A simple way to do it using java task executors, as follows:
ExecutorService executor = Executors.newFixedThreadPool(100);
for(int i=0;i<100;i++)
{
executor.execute(new Runnable(i));
}
You can also use spring task executors, it will be easier. However, I will also suggest using a single connection, as mentioned above.

Java IO inputstream blocks while reading standard output & standard error of an external C program

I've posted the same question here a few days ago(Java reading standard output from an external program using inputstream), and I found some excellent advices in dealing with a block in while reading ( while(is.read()) != -1)), but I still cannot resolve the problem.
After reading the answers to this similar question,
Java InputStream blocking read
(esp, the answer posted by Guss),
I am beginning to believe that looping an input stream by using is.read() != -1 condition doesn't work if the program is interactive (that is it takes multiple inputs from user and present additional outputs upon subsequent inputs, and the program exits only when an explicit exit command is given). I admit that I don't know much about multi-threading, but I think what I need is a mechanism to promptly pause input stream threads(one each for stdout, stderr) when an user input is needed, and resume once the input is provided to prevent a block. The following is my current code which is experiencing a block on the line indicated:
EGMProcess egm = new EGMProcess(new String[]{directory + "/egm", "-o",
"CasinoA", "-v", "VendorA", "-s", "localhost:8080/gls/MessageRobot.action ",
"-E", "glss_env_cert.pem", "-S", "glss_sig_cert.pem", "-C", "glsc_sig_cert.pem",
"-d", "config", "-L", "config/log.txt", "-H", "GLSA-SampleHost"}, new String[]{"PATH=${PATH}"}, directory);
egm.execute();
BufferedReader stdout = new BufferedReader(new InputStreamReader(egm.getInputStream()));
BufferedReader stderr = new BufferedReader(new InputStreamReader(egm.getErrorStream()));
EGMStreamGobbler stdoutprocessor = new EGMStreamGobbler(stdout, egm);
EGMStreamGobbler stderrprocessor = new EGMStreamGobbler(stderr, egm);
BufferedWriter stdin = new BufferedWriter(new OutputStreamWriter(egm.getOutputStream()));
stderrprocessor.run(); //<-- the block occurs here!
stdoutprocessor.run();
//EGM/Agent test cases
//check bootstrap menu
if(!checkSimpleResult("******** EGM Bootstrap Menu **********", egm))
{
String stdoutdump = egm.getStdOut();
egm.cleanup();
throw new Exception("can't find '******** EGM Bootstrap Menu **********'" +
"in the stdout" + "\nStandard Output Dump:\n" + stdoutdump);
}
//select bootstrap
stdin.write("1".toCharArray());
stdin.flush();
if(!checkSimpleResult("Enter port to receive msgs pushed from server ('0' for no push support)", egm)){
String stdoutdump = egm.getStdOut();
egm.cleanup();
throw new Exception("can't find 'Enter port to receive msgs pushed from server ('0' for no push support)'" +
"in the stdout" + "\nStandard Output Dump:\n" + stdoutdump);
}
...
public class EGMStreamGobbler implements Runnable{
private BufferedReader instream;
private EGMProcess egm;
public EGMStreamGobbler(BufferedReader isr, EGMProcess aEGM)
{
instream = isr;
egm = aEGM;
}
public void run()
{
try{
int c;
while((c = instream.read()) != 1)
{
egm.processStdOutStream((char)c);
}
}
catch(IOException e)
{
e.printStackTrace();
}
}
}
I apologize for the length of the code, but my questions are,
1) Is there any way to control the process of taking in inputstreams (stdout, stderr) without using read()? Or am I just implementing this badly?
2) Is multi-threading the right strategy for developing the process of taking in inputstreams and writing an output?
PS: if anyone can provide a similar problem with solution, it will help me a lot!
instead of
stderrprocessor.run(); //<-- the block occurs here!
stdoutprocessor.run();
You need to start threads:
Thread errThread = new Thread(stderrprocessor);
errThread.setDaemon( true );
errThread.start();
Thread outThread = new Thread(stdoutprocessor);
outThread.setDaemon( true );
outThread.start();
run() is just a method specified in Runnable. Thread.start() calls run() on the Runnable in a new Thread.
If you just call #run() on a runnable, it will not be executed in parallel. To run it in parallel, you have to spawn a java.lang.Thread, that executes the #run() of your Runnable.
Whether a stream blocks depends on both sides of the stream. If either the sender does not send any data or the receiver does not receive data, you have a block situation. If the processor has to do something, while the stream is blocked, you need to spawn a(nother) thread within the processor to wait for new data and to interrupt the alternate process, when new data is streamed.
First, you need to read up on Thread and Runnable. You do not call Runnable.run() directly, you set up Threads to do that, and start the threads.
But more important, the presence of three independent threads implies the need for some careful design. Why 3 thread? The two you just started, and the main one.
I assume that the generall idea of your app is to wait for some output to arrive, interpret it and as a result send a command to the application you are controlling?
So your main thread needs to wait around for one of the reader threads to say "Aha! that's interesting, better ask the user what he wants to do."
In other words you need some communication mechanism between your readers and your writer.
This might be implemented using Java's event mechanism. Yet more reading I'm afraid.
Isn't this why the nio was created?
I don't know much about the Channels in nio, but this answer may be helpful. It shows how to read a file using nio. May be useful.

Categories

Resources