Use akka to stream contents of streaming file as a HttpResponse - java

I am reading parts of large file via a Java FileInputStream and would like to stream it's content back to the client (in the form of an akka HttpResponse). I am wondering if this is possible, and how I would do this?
From my research, EntityStreamingSupport can be used but only supports json or csv data. I will be streaming raw data from the file, which will not be in the form of json or csv.

Assuming you use akka-http and Scala you may use getFromFile to stream the entire binary file from a path to the HttpResponse like this:
path("download") {
get {
entity(as[FileHandle]) { fileHandle: FileHandle =>
println(s"Server received download request for: ${fileHandle.fileName}")
getFromFile(new File(fileHandle.absolutePath), MediaTypes.`application/octet-stream`)
}
}
}
Taken from this file upload/download roundtrip akka-http example:
https://github.com/pbernet/akka_streams_tutorial/blob/f246bc061a8f5a1ed9f79cce3f4c52c3c9e1b57a/src/main/scala/akkahttp/HttpFileEcho.scala#L52
Streaming the entire file eliminates the need for "manual chunking", thus the example above will run with limited heap size.
However, if needed manual chunking could be done like this:
val fileInputStream = new FileInputStream(fileHandle.absolutePath)
val chunked: Source[ByteString, Future[IOResult]] = akka.stream.scaladsl.StreamConverters
.fromInputStream(() => fileInputStream, chunkSize = 10 * 1024)
chunked.map(each => println(each)).runWith(Sink.ignore)

Related

Cannot unzip nor get blob from HTTP Response

I am trying to unzip a file that is in a "response" of a HTTP Request.
My point is that after receiving the response I cannot unzip it nor make it to a blob to parse it afterward.
The zip will always return a xml and the idea after the file is unzipped, is to transform the XML to a JSON.
Here is the code I tried:
val client = HttpClient.newBuilder().build();
val request = HttpRequest.newBuilder()
.uri(URI.create("https://donnees.roulez-eco.fr/opendata/instantane"))
.build();
val response = client.send(request, HttpResponse.BodyHandlers.ofString());
Then the response.body() is just unreadable and I did not find a proper way to make it to a blob
The other code I used for unzipping directly is this one:
val url = URL("https://donnees.roulez-eco.fr/opendata/instantane")
val con = url.openConnection() as HttpURLConnection
con.setRequestProperty("Accept-Encoding", "gzip")
println("Length : " + con.contentLength)
var reader: Reader? = null
reader = InputStreamReader(GZIPInputStream(con.inputStream))
while (true) {
val ch: Int = reader.read()
if (ch == -1) {
break
}
print(ch.toChar())
}
But in this case, it won't accept the gzip
Any idea?
It looks like you're confusing zip (an archive format that supports compression) with gzip (a simple compressed format).
Downloading https://donnees.roulez-eco.fr/opendata/instantane (e.g. with curl) and checking the result shows that it's a zip archive (containing a single file, PrixCarburants_instantane.xml).
But you're trying to decode it as a gzip stream (with GZIPInputStream), which it's not — hence your issue.
Reading a zip file is slightly more involved than reading a gzip file, because it can hold multiple compressed files. But ZipInputStream makes it fairly easy: you can read the first zip entry (which has metadata including its uncompressed size), and then go on to read the actual data in that entry.
A further complication is that this particular compressed file seems to use ISO 8859-1 encoding, not the usual UTF-8. So you need to take that into account when converting the byte stream into text.
Here's some example code:
val zipStream = ZipInputStream(con.inputStream)
val entry = zipStream.nextEntry
val reader = InputStreamReader(zipStream, Charset.forName("ISO-8859-1"))
for (i in 1..entry.size)
print(reader.read().toChar())
Obviously, reading and printing the entire 11MB file one character at a time is not very efficient! And if there's any possibility that the zip archive could have multiple entries, you'd have to read through them all, stopping when you get to the one with the right name. But I hope this is a good illustration.

Read AWS S3 GZIP Object using GetObjectRequest with range

I am trying to read a big AWS S3 Compressed Object(gz).I don't want to read the whole object, want to read it in parts,so that i can process the uncompressed data in parallel
I am reading it with GetObjectRequest with "Range" Header, where i am setting byte range.
However, when i give a byte range in between (100,200), it fails with "Not in GZIP format"
The reason for failure is , AWS request return a stream,however when i parse it to GZIPInputStream it fails as "GZIPInputStream" expects the first byte (GZIP_MAGIC = 0x8b1f) to confirm is it gzip , which is not present in the stream.
GetObjectRequest rangeObjectRequest = new GetObjectRequest(<<Bucket>>, <<Key>>).withRange(100, 200);
S3Object object = s3Client.getObject(rangeObjectRequest);
S3ObjectInputStream rawData = object.getObjectContent();
InputStream data = new GZIPInputStream(rawData);
Can anyone guide the right approach?
GZIP is a compression format in which each byte in the file depends on all of the bytes that precede it. Which means that you can't pick an arbitrary byte range out of the file and make sense of it.
If you need to read byte ranges, you'll need to store it uncompressed.
You could also create your own file storage format that stores chunks of the file as separately-compressed blocks. You could do this using the ZIP format, where each file in the archive represents a specific block size. But you'd need to implement your own ZIP directory reader to make that work.

custom nifi processor - writing of flow file

I want to create a custom NiFi processor which can read ESRi ASCII grid files and return CSV like representation with some metadata per file and geo-referenced user data in WKT format.
Unfortunately, the parsed result is not written back as an updated flow file.
https://github.com/geoHeil/geomesa-nifi/blob/rasterAsciiGridToWKT/geomesa-nifi-processors/src/main/scala/org/geomesa/nifi/geo/AsciiGrid2WKT.scala#L71-L107 is my try at making this happen in NiFi.
Unfortunately, only the original files are returned. The converted output is not persisted.
When trying to adapt it to manually serialize some CSV strings like:
val lineSep = System.getProperty("line.separator")
val csvResult = result.map(p => p.productIterator.map{
case Some(value) => value
case None => ""
case rest => rest
}.mkString(";")).mkString(lineSep)
var output = session.write(flowFile, new OutputStreamCallback() {
#throws[IOException]
def process(outputStream: OutputStream): Unit = {
IOUtils.write(csvResult, outputStream, "UTF-8")
}
})
still no flowflies are written. Either the issue from above persists or I get Stream not closed exceptions for the outputStream.
It must be a tiny bit which is missing, but I can't seem to find the missing bit.
Each session method that changes flow file like session.write() returns a new version of file and you have to transfer this new version.
If you change your file in converterIngester() function, you have to return this new version to caller function to transfer to relationship.

Send binary file from Java Server to C# Unity3d Client with Protocol Buffer

I have asked this question https://stackoverflow.com/questions/32735189/sending-files-from-java-server-to-unity3d-c-sharp-client but I saw that it isn't an optimal solution to send files between Java and C# via built-in operations, because I also need also other messages, not only the file content.
Therefore, I tried using Protobuf, because it is fast and can serialize/deserialize objects platform independent. My .proto file is the following:
message File{
optional int32 fileSize = 1;
optional string fileName = 2;
optional bytes fileContent = 3;
}
So, I set the values for each variable in the generated .java file:
file.setFileSize(fileSize);
file.setFileName(fileName);
file.setFileContent(ByteString.copyFrom(fileContent, 0, fileContent.length);
I saw many tutorials about how to write the objects to a file and read from it. However, I can't find any example about how to send a file from server socket to client socket.
My intention is to serialize the object (file size, file name and file content) on the java server and to send these information to the C# client. So, the file can be deserialized and stored on the client side.
In my example code above, the server read the bytes of the file (image file) and write it to the output stream, so that the client can read and write the bytes to disk through input stream. I want to achieve the same thing with serialization of my generated .proto file.
Can anyone provide me an example or give me a hint how to do that?
As described in the documentation, protobuf does not take care of where a message start and stops, so when using a stream socket like TCP you'll have to do that yourself.
From the doc:
[...] If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer. [...]
Length-prefixing is a good candidate. Depending on what language you're writing, there are libraries that does length-prefixing for e.g. TCP that you can use, or you can define it yourself.
An example representation of the buffer on the wire might beof the format might be (beginning of buffer to the left):
[buf_length|serialized_buffer2]
So you code to pack the the buffer before sending might look something like (this is in javascript with node.js):
function pack(message) {
var packet = new Buffer(message.length + 2);
packet.writeIntBE(message.length, 0, 2);
message.copy(packet, 2);
return packet;
}
To read you would have to do the opposite:
client.on('data', function (data) {
dataBuffer = Buffer.concat([dataBuffer, data]);
var dataLen = dataBuffer.readIntBE(0, 2);
while(dataBuffer.length >= dataLen) {
// Message length excluding length prefix of 2 bytes
var msgLen = dataBuffer.readIntBE(0, 2);
var thisMsg = new Buffer(dataBuffer.slice(2, msgLen + 2));
//do something with the msg here
// Remove processed message from buffer
dataBuffer = dataBuffer.slice(msgLen + 2);
}
});
You should also be aware of that when sending multiple protobufs on a TCP socket, they are likely to be buffered for network optimizations (concatenated) and sent together. Meaning some sort of delimiter is needed anyway.

How to efficiently download large csv file using java

I need to provide a feature where user can download reports in excel/csv format in my web application. Once i made a module in web application which creates excel and then read it and sent to browser. It was working correctly. This time i don't want to generate excel file, as i don't have that level of control over file systems. I guess one way is to generate appropriate code in StringBuffer and set correct contenttype(I am not sure about this approach). Other team also has this feature but they are struggling when data is very large. What is the best way to provide this feature considering size of data could be very huge. Is it possible to send data in chunk without client noticing(except delay in downloading).
One issue i forgot to add is when there is very large data, it also creates problem in server side (cpu utilization and memory consumption). Is it possible that i read fixed amount of records like 500, send it to client, then read another 500 till completed.
You can also generate HTML instead of CSV and still set the content type to Excel. This is nice for colouring and styled text.
You can also use gzip compression when the client accepts that compression. Normally there are standard means, like a servlet filter.
Never a StringBuffer or the better StringBuilder. Better streaming it out. If you do not (cannot) call setContentength, the output goes chunked (without predictive progress).
URL url = new URL("http://localhost:8080/Works/images/address.csv");
response.setHeader("Content-Type", "text/csv");
response.setHeader("Content-disposition", "attachment;filename=myFile.csv");
URLConnection connection = url.openConnection();
InputStream stream = connection.getInputStream();
BufferedOutputStream outs = new BufferedOutputStream(response.getOutputStream());
int len;
byte[] buf = new byte[1024];
while ((len = stream.read(buf)) > 0) {
outs.write(buf, 0, len);
}
outs.close();

Categories

Resources