Downloading large(1gb) file from server caues request timeout - java

So I am developing 2 services. One service handles file on filesystem and returns them via REST interface.
ServiceA:
File zipFile = new File(folderDownload.getZipPath());
try {
final InputStream inputStream = new BufferedInputStream(new FileInputStream(zipFile));
return ResponseEntity
.status(HttpStatus.OK)
.eTag(folderDownload.getDownloadId())
.contentLength(zipFile.length())
.lastModified(Instant.ofEpochMilli(zipFile.lastModified()))
.body(new InputStreamResource(inputStream));
} catch (Exception e) {
throw new DownloadException(zipFile.getName(), e.getMessage(), 400);
}
ServiceB is a public facing API that receives requests validates them and if the request is valid it retrieves files from serviceA and returns them to the client. For security reasons serviceA can only interact with serviceB so there is no other way but sending file through 2 services...
WebClient client = WebClient.create(STORAGE_HOST);
// Request service to get file data
Flux<DataBuffer> fileDataStream = client.get().uri(uriBuilder -> uriBuilder.path("folders/zip/download").queryParam("requestId", requestId).build())
.accept(MediaType.APPLICATION_OCTET_STREAM).retrieve()
.bodyToFlux(DataBuffer.class);
// Streams the stream from response instead of loading it all in memory
DataBufferUtils.write(fileDataStream, outputStream).map(DataBufferUtils::release).then().block();
When downloading smaller files everything is OK.
When downloading larger files I was getting outOfmemory exception on serviceB. I started using webClient instead of restTemplate and those problems went away. But now I am facing a new problem:
If file is large, the request timeouts before completing. So user receives incomplete file. What is the solution here?
The exception I see in console on serviceB:
2020-04-09 13:40:24.138 WARN 4106 --- [io-8081-exec-10] s.a.s.e.CustomGlobalExceptionHandler : Async request timed out
2020-04-09 13:40:24.138 WARN 4106 --- [io-8081-exec-10] .m.m.a.ExceptionHandlerExceptionResolver : Resolved [org.springframework.web.context.request.async.AsyncRequestTimeoutException]
Also I find it weird that when downloading directly from serviceA I get no loading indicator...
I have been on this problem for past week and I am slowly losing my mind. please help :(
kinds regards

Related

project reactor: my application keeps working after OOM error and I can't intercept and handle such errors

I noticed that project reactor doesn't allow handling OOM errors. Project reactor developers intercept and handle such errors by adding log messages, but there is no way to intercept and handle them for framework users.
I suppose It's potentially a big problem. If OOM happens, an HTTP request sent by the client of my application hangs, but my application keeps working and handling other requests, and healthchecks of my application are successful too.
As I was answered in a comment on GitHub, my application is potentially in an unrecoverable state. Yes, I agree and I would prefer that healthchecks fail or the application finish when OOM happens. I understand that the onError method is not a good place to handle such special errors. But In my opinion, It could be done in class Hooks, e.g. onJvmError, that will allow to start gracefully shutdown process or signalize via implementation of healthchecks about this situation.
I implemented the project in Kotlin ( in Java) with the rest endpoint using reactor and Webflux to reproduce such case.
interface CsvRepository : ReactiveCrudRepository<CsvRow, Long>
#Service
class CsvService(val csvRepository: CsvRepository) {
fun createByteArray(): Mono<ByteArray> =
csvRepository.findAll()
.reduce(ByteArrayOutputStream()) { output, el ->
output.write(el.toString().toByteArray())
output.write("\n".toByteArray())
output
}
.map { it.toByteArray() }
}
#SpringBootApplication
class WebfluxMemoryRetroProjectApplication(val csvService: CsvService) {
#Bean
fun routing(): RouterFunction<ServerResponse> = router {
accept(MediaType.ALL).nest {
GET("/test") {
csvService.createByteArray()
.flatMap { result ->
ServerResponse.ok()
.headers { httpHeaders ->
httpHeaders.contentType = MediaType("application", "force-download")
httpHeaders.set(HttpHeaders.CONTENT_DISPOSITION, "attachment; filename=test.txt")
}
.bodyValue(ByteArrayResource(result))
}
}
}
}
}
As a result of HTTP GET requests It requests data from a database and creates a file with a size of about 70MB.
So If I run it with -xmx100m, I have only an error in logs:
2022-02-07 12:25:06.135 ERROR 10288 --- [actor-tcp-nio-1] r.n.channel.ChannelOperationsHandler : [7edd71ea, L:/127.0.0.1:58918 - R:localhost/127.0.0.1:5432] Error was received while reading the incoming data. The connection will be closed.
java.lang.OutOfMemoryError: Java heap space
....
2022-02-07 12:25:06.141 ERROR 10288 --- [actor-tcp-nio-1] i.r.p.client.ReactorNettyClient : Connection Error
reactor.netty.ReactorNetty$InternalNettyException: java.lang.OutOfMemoryError: Java heap space
This error isn't handled and an HTTP client request hangs.
Is this expected behavior?
how can I handle this exception? I've tried onErrorResume and Hooks.onOperatorError but I couldn't intercept this type of exception.
It seems unlogical for me that the application keeps working and I can't intercept and handle such errors.

How to process files in batch with PDF OCR?

I would like to process 20000 PDFS in batch asynchronously with Google OCR, but I did not find documentation releated with it, I already tried using client.asyncBatchAnnotateFilesAsync fuction;
List<AsyncAnnotateFileRequest> requests = new ArrayList<>();
for (MultipartFile file : files) {
GcsSource gcsSource = GcsSource.newBuilder().setUri(gcsSourcePath + file.getOriginalFilename()).build();
InputConfig inputConfig = InputConfig.newBuilder().setMimeType("application/pdf").setGcsSource(gcsSource)
.build();
GcsDestination gcsDestination = GcsDestination.newBuilder()
.setUri(gcsDestinationPath + file.getOriginalFilename()).build();
OutputConfig outputConfig = OutputConfig.newBuilder().setBatchSize(2).setGcsDestination(gcsDestination)
.build();
AsyncAnnotateFileRequest request = AsyncAnnotateFileRequest.newBuilder().addFeatures(feature)
.setInputConfig(inputConfig).setOutputConfig(outputConfig).build();
requests.add(request);
}
AsyncBatchAnnotateFilesRequest request = AsyncBatchAnnotateFilesRequest.newBuilder().addAllRequests(requests)
.build();
AsyncBatchAnnotateFilesResponse response = client.asyncBatchAnnotateFilesAsync(request).get();
System.out.println("Waiting for the operation to finish.");
But what I get is an error message
io.grpc.StatusRuntimeException: INVALID_ARGUMENT: At this time, only single requests are supported for asynchronous processing.
If google does not provide batch process why de they provide asyncBatchAnnotateFilesAsync? Maybe am I using an old version? Does asyncBatchAnnotateFilesAsync function work in other beta version?
Multiple requests on a single call are not supported by the Vision service.
This may be confusing because, according to the RPC API documentation you could indeed provide multiple requests on a single service call (1 file per request), still, according to this issue tracker there is a known limitation on the Vision service and currently it can only take one request at a time.
Since they limit to just 1 file per request, can you just send in 20k requests? They are async requests so it should be pretty fast to send those in.

Reprocessing a message when an error occurs while processing it in the Kafka Stream

I have simple Spring application based on Kafka Streams which consumes message from incoming topic, doing map transformation and printing this message. KStream configured like this
#Bean
public KStream<?, ?> processingPipeline(StreamsBuilder builder, MyTransformer myTransformer,
PrintAction printAction, String topicName) {
KStream<String, JsonNode> source = builder.stream(topicName,
Consumed.with(Serdes.String(), new JsonSerde<>(JsonNode.class)));
// #formatter:off
source
.map(myTransformer)
.foreach(printAction);
// #formatter:on
return source;
}
Inside MyTransformer I'm calling external microservice which can be down at this time. If call failed (typically throwing RuntimeException), I can't do my transformation.
The question is here any way to reprocess message in Streams application again if any error happened during previous processing?
Based on my current research here is no way to do so, the only possibility I have is to push the message into dead letters topic and try to process it in the future if it failes again I push it into DLT again and do retries this way.
if any uncaught exception happens during Kafka Streams processing, your stream will change status to ERROR and stop consuming incoming messages for partition on which error occurred.
You need to catch exceptions by yourself. Retries could be achieved either: 1) using Spring RetryTemplate to invoke external microservice (but keep in mind that you will have delays of consuming messages from a specific partition), or 2) push failed message into another topic for later reprocessing (as you suggested)
Update since kafka-streams 2.8.0
since kafka-streams 2.8.0, you have the ability to automatically replace failed stream thread (that caused by uncaught exception)
using KafkaStreams method void setUncaughtExceptionHandler(StreamsUncaughtExceptionHandler eh); with StreamThreadExceptionResponse.REPLACE_THREAD. For more details please take a look at Kafka Streams Specific Uncaught Exception Handler
kafkaStreams.setUncaughtExceptionHandler(ex -> {
log.error("Kafka-Streams uncaught exception occurred. Stream will be replaced with new thread", ex);
return StreamsUncaughtExceptionHandler.StreamThreadExceptionResponse.REPLACE_THREAD;
});

Handling stream and exceptions in Springs HttpMessageConverter

I've made a custom spring HttpMessageConverter, in order to create and output csv from a rest controller.
I.e. the converter is registered to handle the class type that is returned from the controller.
I use the supercsv library/framework to generate the actual csv, and therefore have a CsvMapWriter using the stream from the body of the HttpOutputMessage provided in the writeInternal overwritten method:
protected void writeInternal(MyType myObject, HttpOutputMessage outputMessage)
throws IOException, HttpMessageNotWritableException {
OutputStreamWriter oswriter = new OutputStreamWriter(outputMessage.getBody());
ICsvMapWriter mapWriter = new CsvMapWriter(oswriter,CsvPreference.STANDARD_PREFERENCE);
I then go on to use the mapWriter to write the csv lines directly to the stream, e.g.
try{
mapWriter.writeHeader(transactionHeader);
mapWriter.write(transactionLevelMap, transactionHeader, getTransactionInfoProcessors());
} finally {
if (mapWriter != null) {
mapWriter.close();
}
}
Since the writing - and csv map processing - can throw exceptions, I wrap the code with a try-finally, as a "standard approach" when dealing with resources like streams.
The problem is, that when the thrown exception bubbles up through the call chain, and is being picked up by the spring error handling mechanism - that wants to return a http status 500 to the client, indicating a server error, I get the following dump:
WARN 3208 --- [ qtp32193376-20] org.eclipse.jetty.server.HttpChannel : Could not send response error 500
I assume it is due to the fact, that I've just closed the stream from the HttpOutputMessage, which Spring now want to use for writing the error status etc.
That makes sense, but my question now is:
Should i NOT close the stream in the "finally" clause, when an exception is being thrown? (And only close it, when I now that all processing went well)
I.e. can I assume that Spring will take care of the resource deallocation once it is done with it, when it uses it for the error handling?

Amazon S3 Upload NoHttpResponseException

I have the next code for uploading files to an Amazon S3:
AmazonS3Client client = new AmazonS3Client(credentials,
new ClientConfiguration().withMaxConnections(100)
.withConnectionTimeout(120 * 1000)
.withMaxErrorRetry(15));
TransferManager tm = new TransferManager(client);
TransferManagerConfiguration configuration = new TransferManagerConfiguration();
configuration.setMultipartUploadThreshold(MULTIPART_THRESHOLD);
tm.setConfiguration(configuration);
Upload upload = tm.upload(bucket, key, file);
try {
upload.waitForCompletion();
} catch(InterruptedException ex) {
logger.error(ex.getMessage());
} finally {
tm.shutdownNow(false);
}
It works, but some uploads(1GB) produce the next log message:
INFO AmazonHttpClient:Unable to execute HTTP request: bucket-name.s3.amazonaws.com failed to respond
org.apache.http.NoHttpResponseException: bucket-name.s3.amazonaws.com failed to respond
I have tried to create TransferManager without AmazonS3Client, but it doesn't help.
Is there any way to fix it?
The log message is telling you that there was a transient error sending data to S3. You've configured .withMaxErrorRetry(15), so the AmazonS3Client is transparently retrying the request that failed and the overall upload is succeeding.
There isn't necessarily anything to fix here - sometimes packets get lost on the network, especially if you're trying to push through a lot of packets at once. Waiting a little while and retrying is usually the right way to deal with this, and that's what's already happening.
If you wanted, you could try turning down the MaxConnections setting to limit how many chunks of the file will be uploaded at a time - there's probably a sweet spot where you're still getting reasonable throughput, but not overloading the network.

Categories

Resources