Tee the InputStream from a launched process in Java/Kotlin - java

I'm launching a process using ProcessBuilder like so:
val pb = ProcessBuilder("/path/to/process")
pb.redirectErrorStream(true)
val proc = pb.start()
I'd like to do 2 things with the stdout of the process:
Continually monitor its most recent line of output
Log all lines to a file
As far as I can tell, in order to do both of these things I'll need to "split" the InputStream I get from proc.inputStream so that every line is mirrored to 2 other InputStreams: one that can be used to log to a file, and another to parse and monitor the status of the process.
One option would be to have a thread which reads from the InputStream fires an event with each line read to "subscribers", and I think this should work fine, but I was hoping to come up with a more generic "Tee" type functionality that would expose InputStreams to be consumed by whatever wanted to. Basically something like this:
val pb = ProcessBuilder("/path/to/process")
pb.redirectErrorStream(true)
val proc = pb.start()
val originalInputStream = proc.inputStream
val tee = Tee(originalInputStream)
// Every line read from originalInputStream would be
// mirrored to all branches (not necessarily every line
// from the beginning of the originalInputStream, but
// since the start of the lifetime of the created branch)
val branchOne: InputStream = tee.addBranch()
val branchTwo: InputStream = tee.addBranch()
I took a shot at a Tee class, but I'm not sure what to do in the addBranch method:
class Tee(inputStream: InputStream) {
val reader = BufferedReader(InputStreamReader(inputStream))
val branches = mutableListOf<OutputStream>()
fun readLine() {
val line = reader.readLine()
branches.forEach {
it.write(line.toByteArray())
}
}
fun addBranch(): InputStream {
// What to do here? Need to create an OutputStream
// which readLine can write to, but return an InputStream
// which will be updated with each future write to that
// OutputStream
}
}
EDIT: The implementation of Tee I ended up with was as follows:
/**
* Reads from the given [InputStream] and mirrors the read
* data to all of the created 'branches' off of it.
* All branches will 'receive' all data from the original
* [InputStream] starting at the the point of
* the branch's creation.
* NOTE: This class will not read from the given [InputStream]
* automatically, its [read] must be invoked
* to read the data from the original stream and write it to
* the branches
*/
class Tee(inputStream: InputStream) {
val reader = BufferedReader(InputStreamReader(inputStream))
var branches = CopyOnWriteArrayList<OutputStream>()
fun read() {
val c = reader.read()
branches.forEach {
// Recreate the carriage return so that readLine on the
// branched InputStreams works
it.write(c)
}
}
fun addBranch(): InputStream {
val outputStream = PipedOutputStream()
branches.add(outputStream)
return PipedInputStream(outputStream)
}
}

Take a look at the org.apache.commons.io.output.TeeInputStream from Apache Commons then you don't need to bother writing your own.
val pb = ProcessBuilder("/path/to/process")
pb.redirectErrorStream(true)
val proc = pb.start()
val original = proc.inputStream
val out = new PipedOutputStream()
val in = new PipedInputStream()
out.connect(in)
val tee = new TeeInputStream(in, out)
Then just read from tee instead of original, and any bytes read will be also written to out. By using the Piped streams, the data written to out will be made available to be read via in and so now you can have two threads reading from in and tee independently. One thread writing to logs, and one thread monitoring lines.

Looks like simple decorator will be enough for you:
class Tee(private vararg val branches: OutputStream) : OutputStream() {
override fun write(b: Int) {
for (branch in branches) {
branch.write(b)
}
}
override fun write(b: ByteArray?) {
for (branch in branches) {
branch.write(b)
}
}
override fun write(b: ByteArray?, off: Int, len: Int) {
for (branch in branches) {
branch.write(b,off, len)
}
}
override fun flush() {
for (branch in branches) {
branch.flush()
}
}
override fun close() {
for (branch in branches) {
branch.close()
}
}
}
And then you can just copy your input stream to Tee, which, underneath, can do anything — write to file, parse input and so on.
If I understand correctly, you need to parse data line by line, so you can add one else implementation of output steam, which, in reality, will parse input data and do what you need.
Also, please take a look at this answer. Possibly it's what you need if you don't want to deal with multiple output streams.
Also I think you can combine both technics to gain even more power — write to several output streams and parse data at te same time, for example.

Related

How to implement fault tolerant file upload with akka remote and steam

I'm an Akka beginner. (I am using Java)
I'm making a file transfer system using Akka.
Currently, I have completed sending the Actor1(Local) -> Actor2(Remote) file.
Now,
When I have a problem transferring files, I'm thinking about how to solve it.
Then I had a question. The questions are as follows.
If I lost my network connection while I was transferring files, the file transfer failed (90 percent complete).
I will recover my network connection a few minutes later.
Is it possible to transfer the rest of the file data? (10% Remaining)
If that's possible, Please give me some advice.
here is my simple code.
thanks :)
Actor1 (Local)
private Behavior<Event> onTick() {
....
String fileName = "test.zip";
Source<ByteString, CompletionStage<IOResult>> logs = FileIO.fromPath(Paths.get(fileName));
logs.runForeach(f -> originalSize += f.size(), mat).thenRun(() -> System.out.println("originalSize : " + originalSize));
SourceRef<ByteString> logsRef = logs.runWith(StreamRefs.sourceRef(), mat);
getContext().ask(
Receiver.FileTransfered.class,
selectedReceiver,
timeout,
responseRef -> new Receiver.TransferFile(logsRef, responseRef, fileName),
(response, failure) -> {
if (response != null) {
return new TransferCompleted(fileName, response.transferedSize);
} else {
return new JobFailed("Processing timed out", fileName);
}
}
);
}
Actor2 (Remote)
public static Behavior<Command> create() {
return Behaviors.setup(context -> {
...
Materializer mat = Materializer.createMaterializer(context);
return Behaviors.receive(Command.class)
.onMessage(TransferFile.class, command -> {
command.sourceRef.getSource().runWith(FileIO.toPath(Paths.get("test.zip")), mat);
command.replyTo.tell(new FileTransfered("filename", 1024));
return Behaviors.same();
}).build();
});
}
You need to think about following for a proper implementation of file transfer with fault tolerance:
How to identify that a transfer has to be resumed for a given file.
How to find the point from which to resume the transfer.
Following implementation makes very simple assumptions about 1 and 2.
The file name is unique and thus can be used for such identification. Strictly speaking, this is not true, for example you can transfer files with the same name from different folders. Or from different nodes, etc. You will have to readjust this based on your use case.
It is assumed that the last/all writes on the receiver side wrote all bytes correctly and total number of written bytes indicate the point to resume the transfer. If this cannot be guaranteed, you need to logically split the original file into chunks and transfer hashes of each chunk, its size and position to the receiver, which has to validate chunks on its side and find correct pointer for resuming the transfer.
(That's a bit more than 2 :) ) This implementation ignores identification of transfer problem and focuses on 1 and 2 instead.
The code:
object Sender {
sealed trait Command
case class Upload(file: String) extends Command
case class StartWithIndex(file: String, index: Long) extends Sender.Command
def behavior(receiver: ActorRef[Receiver.Command]): Behavior[Sender.Command] = Behaviors.setup[Sender.Command] { ctx =>
implicit val materializer: Materializer = SystemMaterializer(ctx.system).materializer
Behaviors.receiveMessage {
case Upload(file) =>
receiver.tell(Receiver.InitUpload(file, ctx.self.narrow[StartWithIndex]))
ctx.log.info(s"Initiating upload of $file")
Behaviors.same
case StartWithIndex(file, starWith) =>
val source = FileIO.fromPath(Paths.get(file), chunkSize = 8192, starWith)
val ref = source.runWith(StreamRefs.sourceRef())
ctx.log.info(s"Starting upload of $file")
receiver.tell(Receiver.Upload(file, ref))
Behaviors.same
}
}
}
object Receiver {
sealed trait Command
case class InitUpload(file: String, replyTo: ActorRef[Sender.StartWithIndex]) extends Command
case class Upload(file: String, fileSource: SourceRef[ByteString]) extends Command
val behavior: Behavior[Receiver.Command] = Behaviors.setup[Receiver.Command] { ctx =>
implicit val materializer: Materializer = SystemMaterializer(ctx.system).materializer
Behaviors.receiveMessage {
case InitUpload(path, replyTo) =>
val file = fileAtDestination(path)
val index = if (file.exists()) file.length else 0
ctx.log.info(s"Got init command for $file at pointer $index")
replyTo.tell(Sender.StartWithIndex(path, index.toLong))
Behaviors.same
case Upload(path, fileSource) =>
val file = fileAtDestination(path)
val sink = if (file.exists()) {
FileIO.toPath(file.toPath, Set(StandardOpenOption.APPEND, StandardOpenOption.WRITE))
} else {
FileIO.toPath(file.toPath, Set(StandardOpenOption.CREATE_NEW, StandardOpenOption.WRITE))
}
ctx.log.info(s"Saving file into ${file.toPath}")
fileSource.runWith(sink)
Behaviors.same
}
}
}
Some auxiliary methods
val destination: File = Files.createTempDirectory("destination").toFile
def fileAtDestination(file: String) = {
val name = new File(file).getName
new File(destination, name)
}
def writeRandomToFile(file: File, size: Int): Unit = {
val out = new FileOutputStream(file, true)
(0 until size).foreach { _ =>
out.write(Random.nextPrintableChar())
}
out.close()
}
And finally some test code
// sender and receiver bootstrapping is omitted
//Create some dummy file to upload
val file: Path = Files.createTempFile("test", "test")
writeRandomToFile(file.toFile, 1000)
//Initiate a new upload
sender.tell(Sender.Upload(file.toAbsolutePath.toString))
// Sleep to allow file upload to finish
Thread.sleep(1000)
//Write more data to the file to emulate a failure
writeRandomToFile(file.toFile, 1000)
//Initiate a new upload that will "recover" from the previous upload
sender.tell(Sender.Upload(file.toAbsolutePath.toString))
Finally, the whole process can be defined as

How to read characters from fifo file with kotlin

What I am trying to do:
I am trying to make a combination between Linux udev and Kotlin. More exactly when I plug in a USB into my PC one of the rules from udev will launch a script that will append to a FIFO file some text. (Like: add,003,026. Where 003 is the bus number and the 026 is the device number).
Now on the Kotlin side, I intend to read this information and show it to the IDE console. All good here.
My problem:
When I receive only one event due to only one plugin everything is ok. But when I try to plug in multiple devices ( by pressing the power button on a hub with 7 devices connected ) I usually receive only 3 devices on the Kotlin side. Even if the FIFO file has all the values.
Sample code
Here is my last try of gaining all the information
fun main(args: Array<String>) {
println("Hello, World")
while(true) {
println("I had received this: " + readUsbState())
//println("Am primit inapoi: " + ins.read())
TimeUnit.SECONDS.sleep(1L)
}
}
#Throws(FileNotFoundException::class)
private fun readUsbState(): String {
if (!File("/emy/usb_events").exists()) {
throw FileNotFoundException("The file /emy/usb_events doesn't exists!")
}
val bytes = ByteArrayOutputStream()
var byteRead = 0
val bytesArray = ByteArray(1024)
try {
FileInputStream("/emy/usb_events").use { inputStream ->
byteRead = inputStream.read(bytesArray, 0, bytesArray.size)
if (byteRead >= 0) {
bytes.write(bytesArray, 0, byteRead)
}
}
} catch (ex: IOException) {
ex.printStackTrace()
}
return bytes.toString()
}
More instructions:
My fifo file is "/emy/usb_events". This file was created with mkfifo /emy/usb_events
and for the testing part to don't bother with the udev rules you can simply make echo -e "add,001,001\nadd,001,002\nadd,001,003\n..." >> /emy/usb_events
I have found the correct answer.
The problem was that I was closing the FIFO file after the first enter that was found. The below code works perfectly:
fun main(args: Array<String>) {
println("Hello, World")
while(true) {
println("I had received this: " + readUsbState4())
TimeUnit.SECONDS.sleep(1L)
}
}
private fun readUsbState4(): String {
return File("/certus/usb_events").readLines(Charset.defaultCharset()).toString()
}
In the list that I am receiving, I may have multiple pieces of information like:
Hello, World
I had received this: [add,046,003,4-Port_USB_2.0_Hub,Generic,]
I had received this: [add,048,003,Android_Phone,FA696BN00557,HTC, add,047,003,4-Port_USB_2.0_Hub,Generic,, add,049,003,DataTraveler_2.0,001BFC31A1C7C161D9C75AED,Kingston]
I had received this: [add,050,003,SAMSUNG_Android,06157df6cc9ac70e,SAMSUNG, add,053,003,SAMSUNG_Android,ce0416046d6a9e3f05,SAMSUNG, add,051,003,Acer_S57,0123456789ABCDEF,MediaTek, add,052,003,ACER_Z160,SKU4HI8L4L99N76H,MediaTek]
I had received this: [remove,048,003]
I had received this: [remove,051,003, remove,052,003, remove,049,003, remove,053,003, remove,050,003]
I had received this: [remove,047,003]
I had received this: [remove,046,003]

Spock mocking inputStream causes infinite loop

I have a code:
gridFSFile.inputStream?.bytes
When I try to test it this way:
given:
def inputStream = Mock(InputStream)
def gridFSDBFile = Mock(GridFSDBFile)
List<Byte> byteList = "test data".bytes
...
then:
1 * gridFSDBFile.getInputStream() >> inputStream
1 * inputStream.getBytes() >> byteList
0 * _
The problem is that inputStream.read(_) is called infinite number of times. When I remove the 0 * _ - the test hangs until garbage collector dies.
Please advise how can I properly mock the InputStream without falling into infinite loops i.e. to be able to test the row above with 2 (or like that) interactions.
The following test works:
import spock.lang.Specification
class Spec extends Specification {
def 'it works'() {
given:
def is = GroovyMock(InputStream)
def file = Mock(GridFile)
byte[] bytes = 'test data'.bytes
when:
new FileHolder(file: file).read()
then:
1 * file.getInputStream() >> is
1 * is.getBytes() >> bytes
}
class FileHolder {
GridFile file;
def read() {
file.getInputStream().getBytes()
}
}
class GridFile {
InputStream getInputStream() {
null
}
}
}
Not 100% sure about it but it seems that you need to use GroovyMock here since getBytes is a method added dynamically by groovy. Have a look here.

Difference in collecting output of executing external command in Groovy

Following code gets stuck(which I think is blocking I/O) many times (works some time).
def static executeCurlCommand(URL){
def url = "curl " + URL;
def proc = url.execute();
def output = proc.in.text;
return output;
}
But when I changes the code to
def static executeCurlCommand(URL){
def url = "curl " + URL;
def proc = url.execute();
def outputStream = new StringBuffer();
proc.waitForProcessOutput(outputStream, System.err)
return outputStream.toString();
}
it works fine every time. I am not able to understand why does the 1st way i.e taking input by proc.in.text hangs some time? Does not look an environment specific problem as I tried it on Windows as well as cygwin.
To test/run the above method I have tried -
public static void main(def args){
def url = 'http://mail.google.com';
println("Output : " + executeCurlCommand(url));
}
I have seen multiple questions on SO and all provide the 2nd approach. Although it works good I wish I could know whats wrong with 1st approach ? Has anyone has encountered this scenario before?
The first approach fills a buffer up and then blocks waiting for more room to write output to.
The second approach streams output from the buffer via a separate thread as the process is running, so the process doesn't block.

Java data object for bidirectional I/O

I am developing an interface that takes as input an encrypted byte stream -- probably a very large one -- that generates output of more or less the same format.
The input format is this:
{N byte envelope}
- encryption key IDs &c.
{X byte encrypted body}
The output format is the same.
Here's the usual use case (heavily pseudocoded, of course):
Message incomingMessage = new Message (inputStream);
ProcessingResults results = process (incomingMessage);
MessageEnvelope messageEnvelope = new MessageEnvelope ();
// set message encryption options &c. ...
Message outgoingMessage = new Message ();
outgoingMessage.setEnvelope (messageEnvelope);
writeProcessingResults (results, message);
message.writeToOutput (outputStream);
To me, it seems to make sense to use the same object to encapsulate this behaviour, but I'm at a bit of a loss as to how I should go about this. It isn't practical to load all of the encrypted body in at a time; I need to be able to stream it (so, I'll be using some kind of input stream filter to decrypt it) but at the same time I need to be able to write out new instances of this object. What's a good approach to making this work? What should Message look like internally?
I won't create one class to handle in- and output - one class, one responsibility. I would like two filter streams, one for input/decryption and one for output/encryption:
InputStream decrypted = new DecryptingStream(inputStream, decryptionParameters);
...
OutputStream encrypted = new EncryptingStream(outputSream, encryptionOptions);
They may have something like a lazy init mechanism reading the envelope before first read() call / writing the envelope before first write() call. You also use classes like Message or MessageEnvelope in the filter implementations, but they may stay package protected non API classes.
The processing will know nothing about de-/encryption just working on a stream. You may also use both streams for input and output at the same time during processing streaming the processing input and output.
Can you split the body at arbitrary locations?
If so, I would have two threads, input thread and output thread and have a concurrent queue of strings that the output thread monitors. Something like:
ConcurrentLinkedQueue<String> outputQueue = new ConcurrentLinkedQueue<String>();
...
private void readInput(Stream stream) {
String str;
while ((str = stream.readLine()) != null) {
outputQueue.put(processStream(str));
}
}
private String processStream(String input) {
// do something
return output;
}
private void writeOutput(Stream out) {
while (true) {
while (outputQueue.peek() == null) {
sleep(100);
}
String msg = outputQueue.poll();
out.write(msg);
}
}
Note: This will definitely not work as-is. Just a suggestion of a design. Someone is welcome to edit this.
If you need to read and write same time you either have to use threads (different threads reading and writing) or asynchronous I/O (the java.nio package). Using input and output streams from different threads is not a problem.
If you want to make a streaming API in java, you should usually provide InputStream for reading and OutputStream for writing. This way those can then be passed for other APIs so that you can chain things and so get the streams go all the way as streams.
Input example:
Message message = new Message(inputStream);
results = process(message.getInputStream());
Output example:
Message message = new Message(outputStream);
writeContent(message.getOutputStream());
The message needs to wrap the given streams with a classes that do the needed encryption and decryption.
Note that reading multiple messages at same time or writing multiple messages at same time would need support from the protocol too. You need to get the synchronization correct.
You should check Wikipedia article on different block cipher modes supporting encryption of streams. The different encryption algorithms may support a subset of these.
Buffered streams will allow you to read, encrypt/decrypt and write in a loop.
Examples demonstrating ZipInputStream and ZipOutputStream could provide some guidance on how you may solve this. See example.
What you need is using Cipher Streams (CipherInputStream). Here is an example of how to use it.
I agree with Arne, the data processor shouldn't know about encryption, it just needs to read the decrypted body of the message, and write out the results, and stream filters should take care of encryption. However, since this is logically operating on the same piece of information (a Message), I think they should be packaged inside one class which handles the message format, although the encryption/decryption streams are indeed independent from this.
Here's my idea for the structure, flipping the architecture around somewhat, and moving the Message class outside the encryption streams:
class Message {
InputStream input;
Envelope envelope;
public Message(InputStream input) {
assert input != null;
this.input = input;
}
public Message(Envelope envelope) {
assert envelope != null;
this.envelope = envelope;
}
public Envelope getEnvelope() {
if (envelope == null && input != null) {
// Read envelope from beginning of stream
envelope = new Envelope(input);
}
return envelope
}
public InputStream read() {
assert input != null
// Initialise the decryption stream
return new DecryptingStream(input, getEnvelope().getEncryptionParameters());
}
public OutputStream write(OutputStream output) {
// Write envelope header to output stream
getEnvelope().write(output);
// Initialise the encryption
return new EncryptingStream(output, getEnvelope().getEncryptionParameters());
}
}
Now you can use it by creating a new message for the input, and one for the output:
OutputStream output; // This is the stream for sending the message
Message inputMessage = new Message(input);
Message outputMessage = new Message(inputMessage.getEnvelope());
process(inputMessage.read(), outputMessage.write(output));
Now the process method just needs to read chunks of data as required from the input, and write results to the output:
public void process(InputStream input, OutputStream output) {
byte[] buffer = new byte[1024];
int read;
while ((read = input.read(buffer) > 0) {
// Process buffer, writing to output as you go.
}
}
This all now works in lockstep, and you don't need any extra threads. You can also abort early without having to process the whole message (if the output stream is closed for example).

Categories

Resources