s3 multipart upload always fails on the second part with timeout

s3 multipart upload always fails on the second part with timeout - java

I'm trying to get a simple proof of concept multi part upload working in Kotlin using the amazon s3 client based on the documentation. The first part uploads successful and I get a response with an etag. The second part doesn't upload a single thing and times out. It always fails after the first part. Is there some connection cleanup that I need to do manually somehow?
Credentials and rights are all fine. The magic numbers below are just to get to the minimum part size of 5MB.
What am I doing wrong here?
fun main() {
val amazonS3 =
AmazonS3ClientBuilder.standard().withRegion(Regions.EU_WEST_1).withCredentials(ProfileCredentialsProvider())
.build()
val bucket = "io.inbot.sandbox"
val key = "test.txt"
val multipartUpload =
amazonS3.initiateMultipartUpload(InitiateMultipartUploadRequest(bucket, key))
var pn=1
var off=0L
val etags = mutableListOf<PartETag>()
for( i in 0.rangeTo(5)) {
val buf = ByteArrayOutputStream()
val writer = buf.writer().buffered()
for(l in 0.rangeTo(100000)) {
writer.write("part $i - Hello world for the $l'th time this part.\n")
}
writer.flush()
writer.close()
val bytes = buf.toByteArray()
val md = MessageDigest.getInstance("MD5")
md.update(bytes)
val md5 = Base64.encodeBytes(md.digest())
println("going to write ${bytes.size}")
bytes.inputStream()
var partRequest = UploadPartRequest().withBucketName(bucket).withKey(key)
.withUploadId(multipartUpload.uploadId)
.withFileOffset(off)
.withPartSize(bytes.size.toLong())
.withPartNumber(pn++)
.withMD5Digest(md5)
.withInputStream(bytes.inputStream())
.withGeneralProgressListener<UploadPartRequest> { it ->
println(it.bytesTransferred)
}
if(i == 5) {
partRequest = partRequest.withLastPart(true)
}
off+=bytes.size
val partResponse = amazonS3.uploadPart(partRequest)
etags.add(partResponse.partETag)
println("part ${partResponse.partNumber} ${partResponse.eTag} ${bytes.size}")
}
val completeMultipartUpload =
amazonS3.completeMultipartUpload(CompleteMultipartUploadRequest(bucket, key, multipartUpload.uploadId, etags))
}
This always fails on the second part with
Exception in thread "main" com.amazonaws.services.s3.model.AmazonS3Exception: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: F419872A24BB5526; S3 Extended Request ID: 48XWljQNuOH6LJG9Z85NJOGVy4iv/ru44Ai8hxEP+P+nqHECXZwWNwBoMyjiQfxKpr6icGFjxYc=), S3 Extended Request ID: 48XWljQNuOH6LJG9Z85NJOGVy4iv/ru44Ai8hxEP+P+nqHECXZwWNwBoMyjiQfxKpr6icGFjxYc=
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1630)
Just to preempt some of the answers I'm not looking for, my intention with this is NOT to upload files but to eventually be able to stream arbitrary length streams to s3 by simply uploading parts until done and then combining them. So, I can't really use the TransferManager because that requires me to know the size in advance, which I won't. Also, buffering this as a file is not something I want to do since this will run in a dockerized server application. So I really want to upload an arbitrary number of parts. I'm happy to do it sequentially; though I wouldn't mind parallelism.
I've also used "com.github.alexmojaki:s3-stream-upload:1.0.1" but that seems to keep a lot of state in memory (I've ran out a couple of times), so I'd like to replace it with something simpler.
Update. Thanks ilya in the comments below. Removing the withFileOffset fixes things.

Removing withFileOffset fixes things. Thanks #Ilya for pointing this out.
Here's a simple outputstream that I implemented that actually works.
package io.inbot.aws
import com.amazonaws.auth.profile.ProfileCredentialsProvider
import com.amazonaws.regions.Regions
import com.amazonaws.services.s3.AmazonS3
import com.amazonaws.services.s3.AmazonS3ClientBuilder
import com.amazonaws.services.s3.model.CompleteMultipartUploadRequest
import com.amazonaws.services.s3.model.InitiateMultipartUploadRequest
import com.amazonaws.services.s3.model.InitiateMultipartUploadResult
import com.amazonaws.services.s3.model.PartETag
import com.amazonaws.services.s3.model.UploadPartRequest
import mu.KotlinLogging
import java.io.ByteArrayOutputStream
import java.io.OutputStream
import java.security.MessageDigest
import java.util.Base64
private val logger = KotlinLogging.logger { }
class S3Writer(
private val amazonS3: AmazonS3,
private val bucket: String,
private val key: String,
private val threshold: Int = 5*1024*1024
) : OutputStream(), AutoCloseable {
private val etags: MutableList<PartETag> = mutableListOf()
private val multipartUpload: InitiateMultipartUploadResult = this.amazonS3.initiateMultipartUpload(InitiateMultipartUploadRequest(bucket, key))
private val currentPart = ByteArrayOutputStream(threshold)
private var partNumber = 1
override fun write(b: Int) {
currentPart.write(b)
if(currentPart.size() > threshold) {
sendPart()
}
}
private fun sendPart(last: Boolean = false) {
logger.info { "sending part $partNumber" }
currentPart.flush()
val bytes = currentPart.toByteArray()
val md = MessageDigest.getInstance("MD5")
md.update(bytes)
val md5 = Base64.getEncoder().encode(md.digest())
var partRequest = UploadPartRequest().withBucketName(bucket).withKey(key)
.withUploadId(multipartUpload.uploadId)
.withPartSize(currentPart.size().toLong())
.withPartNumber(partNumber++)
.withMD5Digest(md5.contentToString())
.withInputStream(bytes.inputStream())
if(last) {
logger.info { "final part" }
partRequest = partRequest.withLastPart(true)
}
val partResponse = amazonS3.uploadPart(partRequest)
etags.add(partResponse.partETag)
currentPart.reset()
}
override fun close() {
if(currentPart.size() > 0) {
sendPart(true)
}
logger.info { "completing" }
amazonS3.completeMultipartUpload(CompleteMultipartUploadRequest(bucket, key, multipartUpload.uploadId, etags))
}
}
fun main() {
val amazonS3 =
AmazonS3ClientBuilder.standard().withRegion(Regions.EU_WEST_1).withCredentials(ProfileCredentialsProvider())
.build()
val bucket = "io.inbot.sandbox"
val key = "test.txt"
try {
S3Writer(amazonS3, bucket, key).use {
val w = it.bufferedWriter()
for (i in 0.rangeTo(1000000)) {
w.write("Line $i: hello again ...\n")
}
}
} catch (e: Throwable) {
logger.error(e.message,e)
}
}

Related

How to combine a WebFlux WebClient DataBuffer download with more actions

I am trying to download a file (or multiple files), based on the result of a previous webrequest. After downloading the file I need to send the previous Mono result (dossier and obj) and the file to another system. So far I have been working with flatMaps and Monos. But when reading large files, I cannot use the Mono during the file download, as the buffer is too small.
Simplified the code looks something like this:
var filePath = Paths.get("test.pdf");
this.dmsService.search()
.flatMap(result -> {
var dossier = result.getObjects().get(0).getProperties();
var objectId = dossier.getReferencedObjectId();
return Mono.zip(this.dmsService.getById(objectId), Mono.just(dossier));
})
.flatMap(tuple -> {
var obj = tuple.getT1();
var dossier = tuple.getT2();
var media = this.dmsService.getDocument(objectId);
var writeMono = DataBufferUtils.write(media, filePath);
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);
})
.flatMap(tuple -> {
var obj = tuple.getT1();
var dossier = tuple.getT2();
var objectId = dossier.getReferencedObjectId();
var zip = zipService.createZip(objectId, obj, dossier);
return zipService.uploadZip(Flux.just(zip));
})
.flatMap(newWorkItemId -> {
return updateMetadata(newWorkItemId);
})
.subscribe(() -> {
finishItem();
});
dmsService.search(), this.dmsService.getById(objectId), zipService.uploadZip() all return Mono of a specific type.
dmsService.getDocument(objectId) returns a Flux due to support for large files. With a DataBuffer Mono it was worked for small files if I simply used a Files.copy:
...
var contentMono = this.dmsService.getDocument(objectId);
return contentMono;
})
.flatMap(content -> {
Files.copy(content.asInputStream(), Path.of("test.pdf"));
...
}
I have tried different approaches but always ran into problems.
Based on https://www.amitph.com/spring-webclient-large-file-download/#Downloading_a_Large_File_with_WebClient
DataBufferUtils.write(dataBuffer, destination).share().block();
When I try this, nothing after .block() is ever executed. No download is made.
Without the .share() I get an exception, that I may not use block:
java.lang.IllegalStateException: block()/blockFirst()/blockLast() are blocking, which is not supported in thread reactor-http-nio-5
Since DataBufferUtils.write returns a Mono my next assumption was, that instead of calling block, I can Mono.zip() this together with my other values, but this never returns either.
var media = this.dmsService.getDocument(objectId);
var writeMono = DataBufferUtils.write(media, filePath);
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);
Any inputs on how to achieve this are greatly appreachiated.

I finally figured out that if I use a WritableByteChannel which returns a Flux<DataBuffer> instead of a Mono<Void> I can map the return value to release the DataBufferUtils, which seems to do the trick. I found the inspiration for this solution here: DataBuffer doesn't write to file
var media = this.dmsService.getDocument(objectId);
var file = Files.createTempFile(objectId, ".tmp");
WritableByteChannel filechannel = Files.newByteChannel(file, StandardOpenOption.WRITE);
var writeMono = DataBufferUtils.write(media, filechannel)
.map(DataBufferUtils::release)
.then(Mono.just(file));
return Mono.zip(Mono.just(obj), Mono.just(dossier), writeMono);

No configuration setting found for key 'conf' while trying to use ConfigFactory.parseString

I am trying to read my application.conf which is stored in in my s3 bucket.I used Bufferedsource to read from s3 but when I try to use ConfigFactory.parseString(source.mkString).getConfig("conf") it did not find the 'conf' which is there.Below is my source code :
import com.amazonaws.auth.DefaultAWSCredentialsProviderChain
import com.amazonaws.services.s3.model.S3Object
import com.amazonaws.services.s3.{AmazonS3Client, AmazonS3ClientBuilder, AmazonS3URI}
import scala.collection.JavaConversions._
import scala.io.{BufferedSource, Source}
object Test {
def main(args: Array[String]): Unit = {
import com.amazonaws.auth.BasicAWSCredentials
val credentials = new BasicAWSCredentials("key", "secertkey")
// val credentialsProvider = new DefaultAWSCredentialsProviderChain()
val s3Client = new AmazonS3Client(credentials)
val uri: AmazonS3URI = new AmazonS3URI("s3://test-buck/conf/application.conf")
val s3Object: S3Object = s3Client.getObject(uri.getBucket, uri.getKey)
val source: BufferedSource = Source.fromInputStream(s3Object.getObjectContent)
try {
println(source.mkString)
import com.typesafe.config.{Config, ConfigFactory}
val rawConfig: Config = ConfigFactory.parseString(source.mkString)
val rootConfig = rawConfig.getConfig("conf")
println(rootConfig)
// println(rotConfig)
} finally {
source.close()
}
}
}
My application config looks like below
conf {
source_data_list = ["OL", "SB","1CP"]
//some other value
OL {
filename = "receipts_delta_GBR_14_10_2017.csv"
sftp_conf {
hostname = "endpoint"
port = "22"
username = "ubuntu"
pem = "pemfile"
filetype = "csv"
delimiter = "|"
directory = "/home/ubuntu/data"
}
}
}
Not sure what i am doing wrong here .Same application config if i put on resource and try loading by ConfigFactory.load("application.conf").getConfig("conf") it works as expected .
Any clue on this would help .
Exception I got
Exception in thread "main" Exception in thread "main" com.typesafe.config.ConfigException$Missing: No configuration setting found for key 'conf'
at com.typesafe.config.impl.SimpleConfig.findKey(SimpleConfig.java:124)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:145)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:159)
at com.typesafe.config.impl.SimpleConfig.find(SimpleConfig.java:164)
at com.typesafe.config.impl.SimpleConfig.getObject(SimpleConfig.java:218)
at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:224)
at com.typesafe.config.impl.SimpleConfig.getConfig(SimpleConfig.java:33)
at com.dsm.utils.Test$.main(Test.scala:26)
at com.dsm.utils.Test.main(Test.scala)

Actually you succeeded to read the configuration.
The issue you're having is because of BufferedSource. The Buffered source can be read once. You read it, in order to debug, I guess, and then the source gets to the end. The second time you read it, in order to populate rawConfig you get an empty string. I solved it by extracting the configuration string into a variable, and then using it.
val config = source.mkString
println(s"config is: $config")
val rawConfig: Config = ConfigFactory.parseString(config)
val rootConfig = rawConfig.getConfig("conf")
println(s"rootConfig is: $rootConfig")
The output is:
rootConfig is: Config(SimpleConfigObject({"OL":{"filename":"receipts_delta_GBR_14_10_2017.csv","sftp_conf":{"delimiter":"|","directory":"/home/ubuntu/data","filetype":"csv","hostname":"endpoint","pem":"pemfile","port":"22","username":"ubuntu"}},"source_data_list":["OL","SB","1CP"]}))

MediaManager is already initialized

I want to upload image to Cloudinary. I have done this with firebase and it works well. But, the client wants Cloudinary.
I am passing byte-array as seen below
val outputStream = ByteArrayOutputStream()
capture.compress(Bitmap.CompressFormat.JPEG, 100, outputStream)
val data = outputStream.toByteArray()
val config = HashMap<String, String>()
config.put("cloud_name", "carflux")
MediaManager.init(requireContext(), config)
val uploadRequest = MediaManager.get().upload(data).unsigned("of6bplnq")
.option("resource_type", "image")
.maxFileSize(5 * 1024 * 1024)
uploadRequest.dispatch(requireContext())
I get the error MediaManager is already initialized
I have checked thoroughly I have only initialized it at a place.
Is there something else I'm doing wrong?

KOTLIN
i fixed it by making a static object and making a function to start thr Instance if not initialized.
**The Object **
object ApplicationObject{
private var mediaManager: Any? = null
fun startMediaManager(context:Context){
if (mediaManager == null){
val config = HashMap<String, Any>()
config["cloud_name"] = "socialseller"
config["secure"] = false
mediaManager = MediaManager.init(context, config)
}
}
}
Note: Pass Application Context Only
In Activity
ApplicationObject.startMediaManager(this.applicationContext)
In Fragment
ApplicationObject.startMediaManager(requireContext().applicationContext)

How to implement fault tolerant file upload with akka remote and steam

I'm an Akka beginner. (I am using Java)
I'm making a file transfer system using Akka.
Currently, I have completed sending the Actor1(Local) -> Actor2(Remote) file.
Now,
When I have a problem transferring files, I'm thinking about how to solve it.
Then I had a question. The questions are as follows.
If I lost my network connection while I was transferring files, the file transfer failed (90 percent complete).
I will recover my network connection a few minutes later.
Is it possible to transfer the rest of the file data? (10% Remaining)
If that's possible, Please give me some advice.
here is my simple code.
thanks :)
Actor1 (Local)
private Behavior<Event> onTick() {
....
String fileName = "test.zip";
Source<ByteString, CompletionStage<IOResult>> logs = FileIO.fromPath(Paths.get(fileName));
logs.runForeach(f -> originalSize += f.size(), mat).thenRun(() -> System.out.println("originalSize : " + originalSize));
SourceRef<ByteString> logsRef = logs.runWith(StreamRefs.sourceRef(), mat);
getContext().ask(
Receiver.FileTransfered.class,
selectedReceiver,
timeout,
responseRef -> new Receiver.TransferFile(logsRef, responseRef, fileName),
(response, failure) -> {
if (response != null) {
return new TransferCompleted(fileName, response.transferedSize);
} else {
return new JobFailed("Processing timed out", fileName);
}
}
);
}
Actor2 (Remote)
public static Behavior<Command> create() {
return Behaviors.setup(context -> {
...
Materializer mat = Materializer.createMaterializer(context);
return Behaviors.receive(Command.class)
.onMessage(TransferFile.class, command -> {
command.sourceRef.getSource().runWith(FileIO.toPath(Paths.get("test.zip")), mat);
command.replyTo.tell(new FileTransfered("filename", 1024));
return Behaviors.same();
}).build();
});
}

You need to think about following for a proper implementation of file transfer with fault tolerance:
How to identify that a transfer has to be resumed for a given file.
How to find the point from which to resume the transfer.
Following implementation makes very simple assumptions about 1 and 2.
The file name is unique and thus can be used for such identification. Strictly speaking, this is not true, for example you can transfer files with the same name from different folders. Or from different nodes, etc. You will have to readjust this based on your use case.
It is assumed that the last/all writes on the receiver side wrote all bytes correctly and total number of written bytes indicate the point to resume the transfer. If this cannot be guaranteed, you need to logically split the original file into chunks and transfer hashes of each chunk, its size and position to the receiver, which has to validate chunks on its side and find correct pointer for resuming the transfer.
(That's a bit more than 2 :) ) This implementation ignores identification of transfer problem and focuses on 1 and 2 instead.
The code:
object Sender {
sealed trait Command
case class Upload(file: String) extends Command
case class StartWithIndex(file: String, index: Long) extends Sender.Command
def behavior(receiver: ActorRef[Receiver.Command]): Behavior[Sender.Command] = Behaviors.setup[Sender.Command] { ctx =>
implicit val materializer: Materializer = SystemMaterializer(ctx.system).materializer
Behaviors.receiveMessage {
case Upload(file) =>
receiver.tell(Receiver.InitUpload(file, ctx.self.narrow[StartWithIndex]))
ctx.log.info(s"Initiating upload of $file")
Behaviors.same
case StartWithIndex(file, starWith) =>
val source = FileIO.fromPath(Paths.get(file), chunkSize = 8192, starWith)
val ref = source.runWith(StreamRefs.sourceRef())
ctx.log.info(s"Starting upload of $file")
receiver.tell(Receiver.Upload(file, ref))
Behaviors.same
}
}
}
object Receiver {
sealed trait Command
case class InitUpload(file: String, replyTo: ActorRef[Sender.StartWithIndex]) extends Command
case class Upload(file: String, fileSource: SourceRef[ByteString]) extends Command
val behavior: Behavior[Receiver.Command] = Behaviors.setup[Receiver.Command] { ctx =>
implicit val materializer: Materializer = SystemMaterializer(ctx.system).materializer
Behaviors.receiveMessage {
case InitUpload(path, replyTo) =>
val file = fileAtDestination(path)
val index = if (file.exists()) file.length else 0
ctx.log.info(s"Got init command for $file at pointer $index")
replyTo.tell(Sender.StartWithIndex(path, index.toLong))
Behaviors.same
case Upload(path, fileSource) =>
val file = fileAtDestination(path)
val sink = if (file.exists()) {
FileIO.toPath(file.toPath, Set(StandardOpenOption.APPEND, StandardOpenOption.WRITE))
} else {
FileIO.toPath(file.toPath, Set(StandardOpenOption.CREATE_NEW, StandardOpenOption.WRITE))
}
ctx.log.info(s"Saving file into ${file.toPath}")
fileSource.runWith(sink)
Behaviors.same
}
}
}
Some auxiliary methods
val destination: File = Files.createTempDirectory("destination").toFile
def fileAtDestination(file: String) = {
val name = new File(file).getName
new File(destination, name)
}
def writeRandomToFile(file: File, size: Int): Unit = {
val out = new FileOutputStream(file, true)
(0 until size).foreach { _ =>
out.write(Random.nextPrintableChar())
}
out.close()
}
And finally some test code
// sender and receiver bootstrapping is omitted
//Create some dummy file to upload
val file: Path = Files.createTempFile("test", "test")
writeRandomToFile(file.toFile, 1000)
//Initiate a new upload
sender.tell(Sender.Upload(file.toAbsolutePath.toString))
// Sleep to allow file upload to finish
Thread.sleep(1000)
//Write more data to the file to emulate a failure
writeRandomToFile(file.toFile, 1000)
//Initiate a new upload that will "recover" from the previous upload
sender.tell(Sender.Upload(file.toAbsolutePath.toString))
Finally, the whole process can be defined as

Sending Several Queries in RxTx

I'm trying to send several queries at once to a device (car's ECU, specifically the ELM327) via RxTx. Here's the class that utilize RxTx:
import collection.JavaConversions._
import gnu.io._
import java.io._
import java.util.TooManyListenersException
class Serial private (portName: String,
baudRate: Int,
dataBits: Int,
stopBits: Int,
parity: Int,
flowControl: Int) {
private val portId = CommPortIdentifier.getPortIdentifier(portName)
private val serial = portId.open("Serial Connection from OBDScan",
5000).asInstanceOf[SerialPort]
setPortParameters(baudRate, dataBits, stopBits, parity, flowControl)
private val istream = serial.getInputStream
private val ostream = serial.getOutputStream
def this(portName: String, baudRate: Int = 115200) = this(portName, baudRate,
SerialPort.DATABITS_8,
SerialPort.STOPBITS_1,
SerialPort.PARITY_NONE,
SerialPort.FLOWCONTROL_NONE)
def close = {
try {
istream.close
ostream.close
} catch {
case ioe: IOException => // don't care, it's ended already
}
serial.close
}
def query(command: String) = {
ostream.write(command.getBytes)
ostream.write("\r\n".getBytes) // neccessary for Serial port.
}
def queryResult: String = {
try {
val availableBytes = istream.available
val buffer = new Array[Byte](availableBytes)
if (availableBytes > 0) {
istream.read(buffer, 0, availableBytes)
}
new String(buffer, 0, availableBytes)
} catch {
case ioe: IOException => "Something wrong! Please try again."
}
}
def addListener(listener: SerialPortEventListener) = {
try {
serial.addEventListener(listener)
serial.notifyOnDataAvailable(true)
} catch {
case tm: TooManyListenersException => println("Too many listener")
}
}
def removeListener = {
serial.removeEventListener
}
private def setPortParameters(br: Int,
db: Int,
sb: Int,
p: Int,
fc: Int) = {
serial.setSerialPortParams(baudRate, dataBits, stopBits, parity)
serial.setFlowControlMode(flowControl)
}
}
object Serial {
def connect(portName: String, baudRate: Int = 115200): Serial = {
try {
val ret = new Serial(portName, baudRate)
ret
} catch {
// exception handling omitted.
}
}
}
Now querying works fine and gives me the correct result. The problems comes when I sent several queries at once:
val serial = Serial.connect("COM34")
serial.query("AT Z")
serial.query("10 00")
The device recieved both of the queries, but returns only one result. If I want to get the next result, I have to send another query, which will results in the program get late by one query. If I call Thread.sleep after each query:
val serial = Serial.connect("COM34")
serial.query("AT Z")
Thread.sleep(500)
serial.query("10 00")
The problem's solved, but of course the entire application stops when Thread.sleep is called. I don't want this since the application will do this query all the time (which means it will hangs all the time if I do this).
Since I'm on Scala, I'm thinking on using Actors or something similar, but I think it's an overkill for a desktop application. Is there any way to do this without Actors? Maybe I'm reading the response from serial port wrong?
TL;DR: I want to do several queries to serial device via RxTx without locking the whole application (current solution achieved via Thread.sleep, which blocks the whole application). How can I do that?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

s3 multipart upload always fails on the second part with timeout - java

Related

How to combine a WebFlux WebClient DataBuffer download with more actions

No configuration setting found for key 'conf' while trying to use ConfigFactory.parseString

MediaManager is already initialized

How to implement fault tolerant file upload with akka remote and steam

Sending Several Queries in RxTx

Categories

Resources