How to implement fault tolerant file upload with akka remote and steam

How to implement fault tolerant file upload with akka remote and steam - java

I'm an Akka beginner. (I am using Java)
I'm making a file transfer system using Akka.
Currently, I have completed sending the Actor1(Local) -> Actor2(Remote) file.
Now,
When I have a problem transferring files, I'm thinking about how to solve it.
Then I had a question. The questions are as follows.
If I lost my network connection while I was transferring files, the file transfer failed (90 percent complete).
I will recover my network connection a few minutes later.
Is it possible to transfer the rest of the file data? (10% Remaining)
If that's possible, Please give me some advice.
here is my simple code.
thanks :)
Actor1 (Local)
private Behavior<Event> onTick() {
....
String fileName = "test.zip";
Source<ByteString, CompletionStage<IOResult>> logs = FileIO.fromPath(Paths.get(fileName));
logs.runForeach(f -> originalSize += f.size(), mat).thenRun(() -> System.out.println("originalSize : " + originalSize));
SourceRef<ByteString> logsRef = logs.runWith(StreamRefs.sourceRef(), mat);
getContext().ask(
Receiver.FileTransfered.class,
selectedReceiver,
timeout,
responseRef -> new Receiver.TransferFile(logsRef, responseRef, fileName),
(response, failure) -> {
if (response != null) {
return new TransferCompleted(fileName, response.transferedSize);
} else {
return new JobFailed("Processing timed out", fileName);
}
}
);
}
Actor2 (Remote)
public static Behavior<Command> create() {
return Behaviors.setup(context -> {
...
Materializer mat = Materializer.createMaterializer(context);
return Behaviors.receive(Command.class)
.onMessage(TransferFile.class, command -> {
command.sourceRef.getSource().runWith(FileIO.toPath(Paths.get("test.zip")), mat);
command.replyTo.tell(new FileTransfered("filename", 1024));
return Behaviors.same();
}).build();
});
}

You need to think about following for a proper implementation of file transfer with fault tolerance:
How to identify that a transfer has to be resumed for a given file.
How to find the point from which to resume the transfer.
Following implementation makes very simple assumptions about 1 and 2.
The file name is unique and thus can be used for such identification. Strictly speaking, this is not true, for example you can transfer files with the same name from different folders. Or from different nodes, etc. You will have to readjust this based on your use case.
It is assumed that the last/all writes on the receiver side wrote all bytes correctly and total number of written bytes indicate the point to resume the transfer. If this cannot be guaranteed, you need to logically split the original file into chunks and transfer hashes of each chunk, its size and position to the receiver, which has to validate chunks on its side and find correct pointer for resuming the transfer.
(That's a bit more than 2 :) ) This implementation ignores identification of transfer problem and focuses on 1 and 2 instead.
The code:
object Sender {
sealed trait Command
case class Upload(file: String) extends Command
case class StartWithIndex(file: String, index: Long) extends Sender.Command
def behavior(receiver: ActorRef[Receiver.Command]): Behavior[Sender.Command] = Behaviors.setup[Sender.Command] { ctx =>
implicit val materializer: Materializer = SystemMaterializer(ctx.system).materializer
Behaviors.receiveMessage {
case Upload(file) =>
receiver.tell(Receiver.InitUpload(file, ctx.self.narrow[StartWithIndex]))
ctx.log.info(s"Initiating upload of $file")
Behaviors.same
case StartWithIndex(file, starWith) =>
val source = FileIO.fromPath(Paths.get(file), chunkSize = 8192, starWith)
val ref = source.runWith(StreamRefs.sourceRef())
ctx.log.info(s"Starting upload of $file")
receiver.tell(Receiver.Upload(file, ref))
Behaviors.same
}
}
}
object Receiver {
sealed trait Command
case class InitUpload(file: String, replyTo: ActorRef[Sender.StartWithIndex]) extends Command
case class Upload(file: String, fileSource: SourceRef[ByteString]) extends Command
val behavior: Behavior[Receiver.Command] = Behaviors.setup[Receiver.Command] { ctx =>
implicit val materializer: Materializer = SystemMaterializer(ctx.system).materializer
Behaviors.receiveMessage {
case InitUpload(path, replyTo) =>
val file = fileAtDestination(path)
val index = if (file.exists()) file.length else 0
ctx.log.info(s"Got init command for $file at pointer $index")
replyTo.tell(Sender.StartWithIndex(path, index.toLong))
Behaviors.same
case Upload(path, fileSource) =>
val file = fileAtDestination(path)
val sink = if (file.exists()) {
FileIO.toPath(file.toPath, Set(StandardOpenOption.APPEND, StandardOpenOption.WRITE))
} else {
FileIO.toPath(file.toPath, Set(StandardOpenOption.CREATE_NEW, StandardOpenOption.WRITE))
}
ctx.log.info(s"Saving file into ${file.toPath}")
fileSource.runWith(sink)
Behaviors.same
}
}
}
Some auxiliary methods
val destination: File = Files.createTempDirectory("destination").toFile
def fileAtDestination(file: String) = {
val name = new File(file).getName
new File(destination, name)
}
def writeRandomToFile(file: File, size: Int): Unit = {
val out = new FileOutputStream(file, true)
(0 until size).foreach { _ =>
out.write(Random.nextPrintableChar())
}
out.close()
}
And finally some test code
// sender and receiver bootstrapping is omitted
//Create some dummy file to upload
val file: Path = Files.createTempFile("test", "test")
writeRandomToFile(file.toFile, 1000)
//Initiate a new upload
sender.tell(Sender.Upload(file.toAbsolutePath.toString))
// Sleep to allow file upload to finish
Thread.sleep(1000)
//Write more data to the file to emulate a failure
writeRandomToFile(file.toFile, 1000)
//Initiate a new upload that will "recover" from the previous upload
sender.tell(Sender.Upload(file.toAbsolutePath.toString))
Finally, the whole process can be defined as

Related

How to combine a WebFlux WebClient DataBuffer download with more actions

I am trying to download a file (or multiple files), based on the result of a previous webrequest. After downloading the file I need to send the previous Mono result (dossier and obj) and the file to another system. So far I have been working with flatMaps and Monos. But when reading large files, I cannot use the Mono during the file download, as the buffer is too small.
Simplified the code looks something like this:
var filePath = Paths.get("test.pdf");
this.dmsService.search()
.flatMap(result -> {
var dossier = result.getObjects().get(0).getProperties();
var objectId = dossier.getReferencedObjectId();
return Mono.zip(this.dmsService.getById(objectId), Mono.just(dossier));
})
.flatMap(tuple -> {
var obj = tuple.getT1();
var dossier = tuple.getT2();
var media = this.dmsService.getDocument(objectId);
var writeMono = DataBufferUtils.write(media, filePath);
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);
})
.flatMap(tuple -> {
var obj = tuple.getT1();
var dossier = tuple.getT2();
var objectId = dossier.getReferencedObjectId();
var zip = zipService.createZip(objectId, obj, dossier);
return zipService.uploadZip(Flux.just(zip));
})
.flatMap(newWorkItemId -> {
return updateMetadata(newWorkItemId);
})
.subscribe(() -> {
finishItem();
});
dmsService.search(), this.dmsService.getById(objectId), zipService.uploadZip() all return Mono of a specific type.
dmsService.getDocument(objectId) returns a Flux due to support for large files. With a DataBuffer Mono it was worked for small files if I simply used a Files.copy:
...
var contentMono = this.dmsService.getDocument(objectId);
return contentMono;
})
.flatMap(content -> {
Files.copy(content.asInputStream(), Path.of("test.pdf"));
...
}
I have tried different approaches but always ran into problems.
Based on https://www.amitph.com/spring-webclient-large-file-download/#Downloading_a_Large_File_with_WebClient
DataBufferUtils.write(dataBuffer, destination).share().block();
When I try this, nothing after .block() is ever executed. No download is made.
Without the .share() I get an exception, that I may not use block:
java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread reactor-http-nio-5
Since DataBufferUtils.write returns a Mono my next assumption was, that instead of calling block, I can Mono.zip() this together with my other values, but this never returns either.
var media = this.dmsService.getDocument(objectId);
var writeMono = DataBufferUtils.write(media, filePath);
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);
Any inputs on how to achieve this are greatly appreachiated.

I finally figured out that if I use a WritableByteChannel which returns a Flux<DataBuffer> instead of a Mono<Void> I can map the return value to release the DataBufferUtils, which seems to do the trick. I found the inspiration for this solution here: DataBuffer doesn't write to file
var media = this.dmsService.getDocument(objectId);
var file = Files.createTempFile(objectId, ".tmp");
WritableByteChannel filechannel = Files.newByteChannel(file, StandardOpenOption.WRITE);
var writeMono = DataBufferUtils.write(media, filechannel)
.map(DataBufferUtils::release)
.then(Mono.just(file));
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);

Azure Java SDK v12 is not downloading a file asynchronously

I am writing a quick proof-of-concept for downloading images from Azure Blob Storage using the Java 12 Azure Storage SDK. The following code works properly when I convert it to synchronous. However, despite the subscribe() at the bottom of the code, I only see the subscription message. The success and error handlers are not firing. I would appreciate any suggestions or ideas.
Thank you for your time and help.
private fun azureReactorDownload() {
var startTime = 0L
var accountName = "abcd"
var key = "09sd0908sd08f0s&&6^%"
var endpoint = "https://${accountName}.blob.core.windows.net/$accountName
var containerName = "mycontainer"
var blobName = "animage.jpg"
// Get the Blob Service client, so we can use it to access blobs, containers, etc.
BlobServiceClientBuilder()
// Container URL
.endpoint(endpoint)
.credential(
SharedKeyCredential(
accountName,
key
)
)
.buildAsyncClient()
// Get the container client so we can work with our container and its blobs.
.getContainerAsyncClient(containerName)
// Get the block blob client so we can access individual blobs and include the path
// within the container as part of the filename.
.getBlockBlobAsyncClient(blobName)
// Initiate the download of the desired blob.
.download()
.map { response ->
// Drill down to the ByteBuffer.
response.value()
}
.doOnSubscribe {
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subscription arrived.")
startTime = System.currentTimeMillis()
}
.doOnSuccess { data ->
data.map { byteBuffer ->
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> READY TO WRITE TO THE FILE")
byteBuffer.writeToFile("/tmp/azrxblobdownload.jpg")
val elapsedTime = System.currentTimeMillis() - startTime
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Finished downloading blob in $elapsedTime ms.")
}
}
.doOnError {
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Failed to download blob: ${it.localizedMessage}")
}
.subscribe()
}
fun ByteBuffer.writeToFile(path: String) {
val fc = FileOutputStream(path).channel
fc.write(this)
fc.close()
}

I see someone asking the same question 4 months ago and getting no answer:
Azure Blob Storage Java SDK: Why isn't asynchronous working?
I'm going to conjecture that this part of the JDK just isn't working right now. I wouldn't recommend using Azure's version of Java.
You should be able to accomplish it another way perhaps one of these answers:
Downloading Multiple Files Parallelly or Asynchronously in Java

I've worked with Microsoft and have a documented solution at the following link: https://github.com/Azure/azure-sdk-for-java/issues/5071. The person who worked with me provided very good background information, so it is more than just some working code.
I have opened a similar query with Microsoft for the downloadToFile() method in the Azure Java SDK v12, which is throwing an exception when saving to a file.
Here is the working code from that posting:
private fun azureReactorDownloadMS() {
var startTime = 0L
val chunkCounter = AtomicInteger(0)
// Get the Blob Service client, so we can use it to access blobs, containers, etc.
val aa = BlobServiceClientBuilder()
// Container URL
.endpoint(kEndpoint)
.credential(
SharedKeyCredential(
kAccountName,
kAccountKey
)
)
.buildAsyncClient()
// Get the container client so we can work with our container and its blobs.
.getContainerAsyncClient(kContainerName)
// Get the block blob client so we can access individual blobs and include the path
// within the container as part of the filename.
.getBlockBlobAsyncClient(kBlobName)
.download()
// Response<Flux<ByteBuffer>> to Flux<ByteBuffer>
.flatMapMany { response ->
response.value()
}
.doOnSubscribe {
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Subscription arrived.")
startTime = System.currentTimeMillis()
}
.doOnNext { byteBuffer ->
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> CHUNK ${chunkCounter.incrementAndGet()} FROM BLOB ARRIVED...")
}
.doOnComplete {
val elapsedTime = System.currentTimeMillis() - startTime
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Finished downloading ${chunkCounter.incrementAndGet()} chunks of data for the blob in $elapsedTime ms.")
}
.doOnError {
println(">>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Failed to download blob: ${it.localizedMessage}")
}
.blockLast()
}

s3 multipart upload always fails on the second part with timeout

I'm trying to get a simple proof of concept multi part upload working in Kotlin using the amazon s3 client based on the documentation. The first part uploads successful and I get a response with an etag. The second part doesn't upload a single thing and times out. It always fails after the first part. Is there some connection cleanup that I need to do manually somehow?
Credentials and rights are all fine. The magic numbers below are just to get to the minimum part size of 5MB.
What am I doing wrong here?
fun main() {
val amazonS3 =
AmazonS3ClientBuilder.standard().withRegion(Regions.EU_WEST_1).withCredentials(ProfileCredentialsProvider())
.build()
val bucket = "io.inbot.sandbox"
val key = "test.txt"
val multipartUpload =
amazonS3.initiateMultipartUpload(InitiateMultipartUploadRequest(bucket, key))
var pn=1
var off=0L
val etags = mutableListOf<PartETag>()
for( i in 0.rangeTo(5)) {
val buf = ByteArrayOutputStream()
val writer = buf.writer().buffered()
for(l in 0.rangeTo(100000)) {
writer.write("part $i - Hello world for the $l'th time this part.\n")
}
writer.flush()
writer.close()
val bytes = buf.toByteArray()
val md = MessageDigest.getInstance("MD5")
md.update(bytes)
val md5 = Base64.encodeBytes(md.digest())
println("going to write ${bytes.size}")
bytes.inputStream()
var partRequest = UploadPartRequest().withBucketName(bucket).withKey(key)
.withUploadId(multipartUpload.uploadId)
.withFileOffset(off)
.withPartSize(bytes.size.toLong())
.withPartNumber(pn++)
.withMD5Digest(md5)
.withInputStream(bytes.inputStream())
.withGeneralProgressListener<UploadPartRequest> { it ->
println(it.bytesTransferred)
}
if(i == 5) {
partRequest = partRequest.withLastPart(true)
}
off+=bytes.size
val partResponse = amazonS3.uploadPart(partRequest)
etags.add(partResponse.partETag)
println("part ${partResponse.partNumber} ${partResponse.eTag} ${bytes.size}")
}
val completeMultipartUpload =
amazonS3.completeMultipartUpload(CompleteMultipartUploadRequest(bucket, key, multipartUpload.uploadId, etags))
}
This always fails on the second part with
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: F419872A24BB5526; S3 Extended Request ID: 48XWljQNuOH6LJG9Z85NJOGVy4iv/ru44Ai8hxEP+P+nqHECXZwWNwBoMyjiQfxKpr6icGFjxYc=), S3 Extended Request ID: 48XWljQNuOH6LJG9Z85NJOGVy4iv/ru44Ai8hxEP+P+nqHECXZwWNwBoMyjiQfxKpr6icGFjxYc=
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1630)
Just to preempt some of the answers I'm not looking for, my intention with this is NOT to upload files but to eventually be able to stream arbitrary length streams to s3 by simply uploading parts until done and then combining them. So, I can't really use the TransferManager because that requires me to know the size in advance, which I won't. Also, buffering this as a file is not something I want to do since this will run in a dockerized server application. So I really want to upload an arbitrary number of parts. I'm happy to do it sequentially; though I wouldn't mind parallelism.
I've also used "com.github.alexmojaki:s3-stream-upload:1.0.1" but that seems to keep a lot of state in memory (I've ran out a couple of times), so I'd like to replace it with something simpler.
Update. Thanks ilya in the comments below. Removing the withFileOffset fixes things.

Removing withFileOffset fixes things. Thanks #Ilya for pointing this out.
Here's a simple outputstream that I implemented that actually works.
package io.inbot.aws
import com.amazonaws.auth.profile.ProfileCredentialsProvider
import com.amazonaws.regions.Regions
import com.amazonaws.services.s3.AmazonS3
import com.amazonaws.services.s3.AmazonS3ClientBuilder
import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest
import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest
import com.amazonaws.services.s3.model.InitiateMultipartUploadResult
import com.amazonaws.services.s3.model.PartETag
import com.amazonaws.services.s3.model.UploadPartRequest
import mu.KotlinLogging
import java.io.ByteArrayOutputStream
import java.io.OutputStream
import java.security.MessageDigest
import java.util.Base64
private val logger = KotlinLogging.logger { }
class S3Writer(
private val amazonS3: AmazonS3,
private val bucket: String,
private val key: String,
private val threshold: Int = 5*1024*1024
) : OutputStream(), AutoCloseable {
private val etags: MutableList<PartETag> = mutableListOf()
private val multipartUpload: InitiateMultipartUploadResult = this.amazonS3.initiateMultipartUpload(InitiateMultipartUploadRequest(bucket, key))
private val currentPart = ByteArrayOutputStream(threshold)
private var partNumber = 1
override fun write(b: Int) {
currentPart.write(b)
if(currentPart.size() > threshold) {
sendPart()
}
}
private fun sendPart(last: Boolean = false) {
logger.info { "sending part $partNumber" }
currentPart.flush()
val bytes = currentPart.toByteArray()
val md = MessageDigest.getInstance("MD5")
md.update(bytes)
val md5 = Base64.getEncoder().encode(md.digest())
var partRequest = UploadPartRequest().withBucketName(bucket).withKey(key)
.withUploadId(multipartUpload.uploadId)
.withPartSize(currentPart.size().toLong())
.withPartNumber(partNumber++)
.withMD5Digest(md5.contentToString())
.withInputStream(bytes.inputStream())
if(last) {
logger.info { "final part" }
partRequest = partRequest.withLastPart(true)
}
val partResponse = amazonS3.uploadPart(partRequest)
etags.add(partResponse.partETag)
currentPart.reset()
}
override fun close() {
if(currentPart.size() > 0) {
sendPart(true)
}
logger.info { "completing" }
amazonS3.completeMultipartUpload(CompleteMultipartUploadRequest(bucket, key, multipartUpload.uploadId, etags))
}
}
fun main() {
val amazonS3 =
AmazonS3ClientBuilder.standard().withRegion(Regions.EU_WEST_1).withCredentials(ProfileCredentialsProvider())
.build()
val bucket = "io.inbot.sandbox"
val key = "test.txt"
try {
S3Writer(amazonS3, bucket, key).use {
val w = it.bufferedWriter()
for (i in 0.rangeTo(1000000)) {
w.write("Line $i: hello again ...\n")
}
}
} catch (e: Throwable) {
logger.error(e.message,e)
}
}

How to read characters from fifo file with kotlin

What I am trying to do:
I am trying to make a combination between Linux udev and Kotlin. More exactly when I plug in a USB into my PC one of the rules from udev will launch a script that will append to a FIFO file some text. (Like: add,003,026. Where 003 is the bus number and the 026 is the device number).
Now on the Kotlin side, I intend to read this information and show it to the IDE console. All good here.
My problem:
When I receive only one event due to only one plugin everything is ok. But when I try to plug in multiple devices ( by pressing the power button on a hub with 7 devices connected ) I usually receive only 3 devices on the Kotlin side. Even if the FIFO file has all the values.
Sample code
Here is my last try of gaining all the information
fun main(args: Array<String>) {
println("Hello, World")
while(true) {
println("I had received this: " + readUsbState())
//println("Am primit inapoi: " + ins.read())
TimeUnit.SECONDS.sleep(1L)
}
}
#Throws(FileNotFoundException::class)
private fun readUsbState(): String {
if (!File("/emy/usb_events").exists()) {
throw FileNotFoundException("The file /emy/usb_events doesn't exists!")
}
val bytes = ByteArrayOutputStream()
var byteRead = 0
val bytesArray = ByteArray(1024)
try {
FileInputStream("/emy/usb_events").use { inputStream ->
byteRead = inputStream.read(bytesArray, 0, bytesArray.size)
if (byteRead >= 0) {
bytes.write(bytesArray, 0, byteRead)
}
}
} catch (ex: IOException) {
ex.printStackTrace()
}
return bytes.toString()
}
More instructions:
My fifo file is "/emy/usb_events". This file was created with mkfifo /emy/usb_events
and for the testing part to don't bother with the udev rules you can simply make echo -e "add,001,001\nadd,001,002\nadd,001,003\n..." >> /emy/usb_events

I have found the correct answer.
The problem was that I was closing the FIFO file after the first enter that was found. The below code works perfectly:
fun main(args: Array<String>) {
println("Hello, World")
while(true) {
println("I had received this: " + readUsbState4())
TimeUnit.SECONDS.sleep(1L)
}
}
private fun readUsbState4(): String {
return File("/certus/usb_events").readLines(Charset.defaultCharset()).toString()
}
In the list that I am receiving, I may have multiple pieces of information like:
Hello, World
I had received this: [add,046,003,4-Port_USB_2.0_Hub,Generic,]
I had received this: [add,048,003,Android_Phone,FA696BN00557,HTC, add,047,003,4-Port_USB_2.0_Hub,Generic,, add,049,003,DataTraveler_2.0,001BFC31A1C7C161D9C75AED,Kingston]
I had received this: [add,050,003,SAMSUNG_Android,06157df6cc9ac70e,SAMSUNG, add,053,003,SAMSUNG_Android,ce0416046d6a9e3f05,SAMSUNG, add,051,003,Acer_S57,0123456789ABCDEF,MediaTek, add,052,003,ACER_Z160,SKU4HI8L4L99N76H,MediaTek]
I had received this: [remove,048,003]
I had received this: [remove,051,003, remove,052,003, remove,049,003, remove,053,003, remove,050,003]
I had received this: [remove,047,003]
I had received this: [remove,046,003]

Tee the InputStream from a launched process in Java/Kotlin

I'm launching a process using ProcessBuilder like so:
val pb = ProcessBuilder("/path/to/process")
pb.redirectErrorStream(true)
val proc = pb.start()
I'd like to do 2 things with the stdout of the process:
Continually monitor its most recent line of output
Log all lines to a file
As far as I can tell, in order to do both of these things I'll need to "split" the InputStream I get from proc.inputStream so that every line is mirrored to 2 other InputStreams: one that can be used to log to a file, and another to parse and monitor the status of the process.
One option would be to have a thread which reads from the InputStream fires an event with each line read to "subscribers", and I think this should work fine, but I was hoping to come up with a more generic "Tee" type functionality that would expose InputStreams to be consumed by whatever wanted to. Basically something like this:
val pb = ProcessBuilder("/path/to/process")
pb.redirectErrorStream(true)
val proc = pb.start()
val originalInputStream = proc.inputStream
val tee = Tee(originalInputStream)
// Every line read from originalInputStream would be
// mirrored to all branches (not necessarily every line
// from the beginning of the originalInputStream, but
// since the start of the lifetime of the created branch)
val branchOne: InputStream = tee.addBranch()
val branchTwo: InputStream = tee.addBranch()
I took a shot at a Tee class, but I'm not sure what to do in the addBranch method:
class Tee(inputStream: InputStream) {
val reader = BufferedReader(InputStreamReader(inputStream))
val branches = mutableListOf<OutputStream>()
fun readLine() {
val line = reader.readLine()
branches.forEach {
it.write(line.toByteArray())
}
}
fun addBranch(): InputStream {
// What to do here? Need to create an OutputStream
// which readLine can write to, but return an InputStream
// which will be updated with each future write to that
// OutputStream
}
}
EDIT: The implementation of Tee I ended up with was as follows:
/**
* Reads from the given [InputStream] and mirrors the read
* data to all of the created 'branches' off of it.
* All branches will 'receive' all data from the original
* [InputStream] starting at the the point of
* the branch's creation.
* NOTE: This class will not read from the given [InputStream]
* automatically, its [read] must be invoked
* to read the data from the original stream and write it to
* the branches
*/
class Tee(inputStream: InputStream) {
val reader = BufferedReader(InputStreamReader(inputStream))
var branches = CopyOnWriteArrayList<OutputStream>()
fun read() {
val c = reader.read()
branches.forEach {
// Recreate the carriage return so that readLine on the
// branched InputStreams works
it.write(c)
}
}
fun addBranch(): InputStream {
val outputStream = PipedOutputStream()
branches.add(outputStream)
return PipedInputStream(outputStream)
}
}

Take a look at the org.apache.commons.io.output.TeeInputStream from Apache Commons then you don't need to bother writing your own.
val pb = ProcessBuilder("/path/to/process")
pb.redirectErrorStream(true)
val proc = pb.start()
val original = proc.inputStream
val out = new PipedOutputStream()
val in = new PipedInputStream()
out.connect(in)
val tee = new TeeInputStream(in, out)
Then just read from tee instead of original, and any bytes read will be also written to out. By using the Piped streams, the data written to out will be made available to be read via in and so now you can have two threads reading from in and tee independently. One thread writing to logs, and one thread monitoring lines.

Looks like simple decorator will be enough for you:
class Tee(private vararg val branches: OutputStream) : OutputStream() {
override fun write(b: Int) {
for (branch in branches) {
branch.write(b)
}
}
override fun write(b: ByteArray?) {
for (branch in branches) {
branch.write(b)
}
}
override fun write(b: ByteArray?, off: Int, len: Int) {
for (branch in branches) {
branch.write(b,off, len)
}
}
override fun flush() {
for (branch in branches) {
branch.flush()
}
}
override fun close() {
for (branch in branches) {
branch.close()
}
}
}
And then you can just copy your input stream to Tee, which, underneath, can do anything — write to file, parse input and so on.
If I understand correctly, you need to parse data line by line, so you can add one else implementation of output steam, which, in reality, will parse input data and do what you need.
Also, please take a look at this answer. Possibly it's what you need if you don't want to deal with multiple output streams.
Also I think you can combine both technics to gain even more power — write to several output streams and parse data at te same time, for example.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to implement fault tolerant file upload with akka remote and steam - java

Related

How to combine a WebFlux WebClient DataBuffer download with more actions

Azure Java SDK v12 is not downloading a file asynchronously

s3 multipart upload always fails on the second part with timeout

How to read characters from fifo file with kotlin

Tee the InputStream from a launched process in Java/Kotlin

Categories

Resources