Client server java application: send large file using SOAP and AXIS2 - java

I have to send a millions of data over the network using Soap Web Services (java2wsdl) between java client/server. So I tried to serialize objects into a file and then send it to server.
But the problem is that serialization generates a very large file that causes memory problems in java application.
Since the file is very big I tried to split this file into small ones. The problem is that I must send n files between the client and the server, which will consume a lot of time while the objectif is to optimize the processing time.
Do you have any suggestions to optimize the processing time and ensuring no "out of memory"?

Web services arent designed primarily as a large file transfer mechanism. For that specific file transfer protocols will do a better job, like dealing with partial recovery, error recovery etc.
Try this solution

Related

How to download a large file over HTTP in Camel without running out of memory

I am making a sort of api gateway using apache camel and spring boot to define routes to different services on our network. But I am having trouble with one route to a service which is used to download large (100mb ~ 500mb) files from our server. It seems that the file is always loaded completely into memory, coursing OutOfMemory exceptions if multiple people are calling it at the same time.
The route is defined as something like this:
restConfiguration()
.component("undertow")
.host("0.0.0.0")
.port("{{api.port}}")
.contextPath("/api");
rest("/download")
.get("/{fileId}")
.route()
.setHeader(Exchange.HTTP_PATH, constant("/download"))
.setHeader(Exchange.HTTP_QUERY, simple("fileid=${header.fileId}"))
.toD("{{download-service.url}}?bridgeEndpoint=true&throwExceptionOnFailure=false");
I have been trying to find a way to avoid it giving OutOfMemory exceptions when calling /download/, by trying out netty and jetty instead of undertow and enabling or disabling stream caching, but no matter what it keeps loading the entire file into memory instead of just sending it through to the client.
I hope one of you can help me with this.

Java - Calling API for 500k rows of data. Spring batch or Websocket?

I have two CSV files, one containing 500k+ customer records. I am attempting to convert each row to a customer object and do a POST to an API which I am also responsible for.
This approach has the obvious problem of firing of 500k+ HTTP calls and causing maximum HTTP connections to be reached.
I have had two suggestions thrown at me, opening up a WebSocket or using Spring Batch. Is this a good use case for opening a WebSocket and sending messages rather than opening multiple HTTP connections? Or is it better to go the more traditional route of using spring batch?
Since it appears to be your own server, you should just make a server route that allows you to send it multiple records at a time and then you can batch things into a lot fewer API calls.
If it's really 500k records, you need to send, you will probably still want to batch them into multiple requests, but you could at least do them 10k at a time and manage your connections so you don't have more than 5-10 requests in flight at any given time (since it's unlikely your server could process more than that at once anyway and this should keep your client from running out of network resources).
Or, if you want to do it more like a file upload, you could send 500k records worth of data, have your server handle it like a file upload and then once it succeeds, have the server process it.
In fact, you may want to just upload the CSV and let the server process it directly.
While a webSocket connection would let you use the same connection for multiple requests (which is a good thing), you still don't want to be sending 500k individual records. Just the overhead of sending that many separate request alone will be inefficient whether it's webSocket or http request. Instead, you really want to batch the requests and send large chunks of data per request.

Best way to push lots of realtime data to AWS from Java applications

I am looking to push large amounts of data from my Java Web Application to AWS. Within my Java application I have some flexibility in the approach/technology to use. Generally I am trying to dump large amounts of system data into a AWS store for historical purposes that can eventually be reported on and server for audit/historical purposes.
1) The Java Web app (N nodes) will push system-diagnostic information to AWS in near-real time.
2) System-diagnostic information will be collected by a custom plugin for the system and push to some AWS end-point for aggregation.
3) New information to push to AWS will be available approx every second
4) Multiple java web apps will be collecting and pushing information to a central serve
I am looking for the best way to transport the data from the java apps to AWS; Ideally the solution would integrate well on the AWS side and not be overly complex to implement on the Java Web app side (ex. I do not want to have to run some other app/DS to provide an intermediary store). I do not have strong opinions on the AWS storage technology yet either.
Example ideas: Batch HTTP POST data from java web app to AWS, use JMS solution to send data out, leverage some Logger technology to "write" to a AWS datastore.
Assuming that the diagnostic information is not too big I would consider SQS. If you have different classes of data, you can push the different types to different queues. You can then consume the messages in the queue(s) either from servers running in EC2 or on your own servers.
SQS will deliver each message at least once, but you have to be ready for a given message to be delivered multiple times. Duplicates do happen occasionally.
If your payloads are large, you will want to drop them in S3. If you have to go this route, you might want to use SQS as well: create a file in S3 and push a message to SQS with the S3 filename so you make sure all the payloads get processed.
I would imagine that you will push the data packets into SQS and then have a separate process that will consume the messages and insert into a database or other store in a format that supports whatever reporting/aggregation requirements you might have. The queue provides scalable flow control so you size the message consumption/processing for your average data rate, even though your data production rate will likely vary greatly during the day.
SQS only holds messages for a maximum of 14 days, so you must have some other process that will consume the messages and do some long-term storage.

Sending large data over a java web service

I have a Java web service that returns a large amount of data. Is there a standard way to stream a response rather than trying to return a huge chunk of data at once?
This problem is analogous to the older problem with bringing back large RSS feeds. You can do it by parameterizing the request: http://host/myservice?start=0&count=100, or by including next/prev urls in the response itself.
The latter approach has a lot of advantages. I'll search for a link that describes it and post it here if I find one.
I would look into a comet like approach:
From WIKI:
Comet is a web application model in which a long-held HTTP request
allows a web server to push data to a browser, without the browser
explicitly requesting it.
Basically, rather than sending the large data all at once, allow your web server to push data at its own pace and according to your needs.
Webservice might not be a good method for data transfer.
If I were you, I would like to setup another service like FTP or SFTP.
The server puts the data to the specific path of the FTP server and sends the path information to the client through the webservice response.

How to perform file upload monitoring with Play! framework

Is it possible to monitor file uploads, somehow, with Play! framework? Also, if the file should be BIG (i.e. +500MB), would it be possible to save the received bytes into a temporary file instead of keeping it in memory? (see update below)
note : there is no code to show as I'm wondering these questions and cannot seem to find the answers with Google
Thanks!
** Update **
(I almost forgot about this question.) Well, apparantly, files uploaded are stored in temporary files, and the files are not passed as a byte array (or something), but as a Java File object to the action controller.
But even in a RESTful environment, file monitoring can be achieved.
** Update 2 **
Is there a way to get early event listeners on incoming HTTP requests? This could allow for monitoring request data transfer.
Large requests and temp files
Play! is already storing large HTTP requests in temp files named after UUIDs (thus reducing the memory footprint of the server). Once the request is done, this file gets deleted.
Monitoring uploads in Play!
Play! is using (the awesome) Netty project for its HTTP server stack (and also on the client stack, if you're considering Async HTTP client.
Netty is:
asynchronous
event-driven
100% HTTP
Given Play!'s stack, you should be able to implement your "upload progress bar" or something. Actually, Async HTTP client already achieves progress listeners for file uplaods and resumable download (see the quick start guide).
But play.server package doesn't seem to provide such functionality/extension point.
Monitoring uploads anyway
I think Play! is meant to be behind a "real" HTTP server in reverse proxy mode (like nginx or lighthttpd).
So you'd better off using an upload progress module for one of those servers (like HttpUploadProgressModule for nginx) than messing with Play!'s HTTP stack.

Categories

Resources