I'm trying to send a json object (serialized as a string) into an SQS queue that triggers a lambda. The SQS message is exceeding the maximum 256 kB limit that SQS has. I was trying to gzip compress my message before sending it. Here is how I'm trying to do it:
public static String compress(String str) throws Exception {
System.out.println("Original String Length : " + str.length());
ByteArrayOutputStream obj=new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(obj);
gzip.write(str.getBytes("UTF-8"));
gzip.close();
String base64Encoded = Base64.getEncoder().encodeToString(obj.toByteArray());
System.out.println("Compressed String length : " + base64Encoded.length());
return base64Encoded;
}
The lambda that this SQS queue triggers is a nodejs based lambda where I need to unzip and decode this message. Im trying to use the zlib library in nodejs to unzip and decode my message like this:
exports.handler = async (event, context) => {
let msg = null
event.Records.forEach(record => {
let { body } = record;
var buffer = zlib.inflateSync(new Buffer(body, 'base64')).toString();
msg = JSON.parse(JSON.parse(JSON.stringify(buffer.toString(), undefined, 4)))
});
}
I'm getting the following error on execution:
{
"errorType": "Error",
"errorMessage": "incorrect header check",
"code": "Z_DATA_ERROR",
"errno": -3,
"stack": [
"Error: incorrect header check",
" at Zlib.zlibOnError [as onerror] (zlib.js:180:17)",
" at processChunkSync (zlib.js:429:12)",
" at zlibBufferSync (zlib.js:166:12)",
" at Object.syncBufferWrapper [as unzipSync] (zlib.js:764:14)",
" at /var/task/index.js:12:19",
" at Array.forEach (<anonymous>)",
" at Runtime.exports.handler (/var/task/index.js:10:17)",
" at Runtime.handleOnce (/var/runtime/Runtime.js:66:25)"
]
}
Can someone tell me how I can approach this problem in a better way? Is there aa better way to compress the string in java? Is there a better way to decompress, decode and parse the json in nodejs?
256Kb for the message is huge, if you send millions messages like this, it will be extremely hard to process them all, think about replication that SQS has to do internally.
SQS is not a database and its not meant to store a lot of text.
I assume that you message contains a lot of business information in addition to some technical message identification parameters.
Usually this points on a wrong design of the system. So you can try the following:
Think about the storage for the content of the business information. It should not be SQS, it can be anything, Mongo, Postgres/MySQL whatever, Maybe ElasticSearch or even Redis in some cases. Since the application is on cloud, aws has many additional storage engines (S3, DynamoDB, aurora, etc). So find the one that suits your use case the best. Probably S3 is the way to go if you only need a document by some key (path), but the decision is beyond the scope of this question.
The "sender" of the message will store the business related information in this storage, and will send a short message to SQS that will contain a pointer (url, foreign key, or application specific document id, whatever) on the document so that the receiver will be able to get that document from the storage once it gets the SQS message.
With this approach you don't need to zip anything, the messages will be short.
The problem is that you are sending a gzip stream, and then trying to read a zlib stream. They are two different things. Either send gzip and receive gzip, or send zlib and receive zlib. E.g. zlib.gunzipSync on the receive side.
Related
I'm trying to download large files (<1GB) in Kotlin since I already knew I'm using okhttp and pretty much followed just used the answer from this question. Except that I'm using Kotlin instead of java, so the syntax is slightly diffrent.
val client = OkHttpClient()
val request = Request.Builder().url(urlString).build()
val response = client.newCall(request).execute()
val is = response.body().byteStream()
val input = BufferedInputStream(is)
val output = FileOutputStream(file)
val data = ByteArray(1024)
val total = 0L
val count : Int
do {
count = input.read(data)
total += count
output.write(data, 0, count)
} while (count != -1)
output.flush()
output.close()
input.close()
That works in that it downloads the file without using too much memory but it seems needlessly ineffective in that it constantly tries to write more data without knowing if any new data arrived.
That also seems confirmed with my own tests while running this on a very resource limited VM as it seems to use more CPU while getting a lower download speed then a comparable script in python, and of cause using wget.
What I'm wondering if there is a way where I can give something a callback that gets called if x bytes are available or if it's the end of the file so I don't have to constantly try and get more data without knowing if there is any.
Edit:
If it's not possible with okhttp I don't have a problem using something else, it's just that it was the http library I'm used to.
As of version 11, Java has a built-in HttpClient which implements
asynchronous streams of data with non-blocking back pressure
and that's what you need if you want your code to run only when there's data to process.
If you can afford to upgrade to Java 11, you'll be able to solve your problem out of the box, using the HttpResponse.BodyHandlers.ofFile body handler. You won't have to implement any data transfer logic on your own.
Kotlin example:
fun main(args: Array<String>) {
val client = HttpClient.newHttpClient()
val request = HttpRequest.newBuilder()
.uri(URI.create("https://www.google.com"))
.GET()
.build()
println("Starting download...")
client.send(request, HttpResponse.BodyHandlers.ofFile(Paths.get("google.html")))
println("Done with download.")
}
One could do away with the BufferedInputStream. Or as its default buffer size in Oracle's java is 8192, use a larger ByteArray, say 4096.
However best would be to either use java.nio or try Files.copy:
Files.copy(is, file.toPath());
This removes about 12 lines of code.
An other way is to send the request with a header to deflate gzip compression Accept-Encoding: gzip, so the transmission takes less time. In the response here then possibly wrap is in a new GZipInputStream(is) - when the response header Content-Encoding: gzip is given. Or if feasible store the file compressed with an addition ending .gz; mybiography.md as mybiography.md.gz.
I'm beginning an initial review of vert.x and comparing it to akka-http. One area where akka appears to shine is streaming of response bodies.
In akka-http it is possible to create a streaming entity that utilizes back-pressure which allows the client to decide when it is ready to consume data.
As an example, it is possible to create a response with an entity consisting of 1 billion instances of "42" values:
//Iterator is "lazy", therefore this function returns immediately
val bodyData : () => Iterator[ChunkStreamPart] = () =>
Iterator
.continually("42")
.take(1000000000)
.map(ChunkStreamPart.apply)
val route =
get {
val entity : HttpEntity =
Chunked(ContentTypes.`text/plain(UTF-8)`, Source fromIterator bodyData)
complete(HttpResponse(entity=entity))
}
The above code will not "blow up" the server's memory and will return the response to the client before the billion values have been generated.
The "42" values will get created on-the-fly as the client tries to consume the response body.
Question: is this streaming capability also present in vert.x?
A cursory review of the HttpServerResponse class would indicate that it is not since the write member function can only take in a String or a vert.x Buffer. From my limited understanding it seems that Buffer is not lazy and holds the data in memory which means the 1 billion "42" example would crash a server with just a few concurrent requests.
Thank you in advance for your consideration and response.
I need to send larger video files (and other files) to server with base64 encode.
I get out of memory exception, because I want to store the file in the memory (in byte[]) then encode it to string with Base64.encodeToString. But how can I encode the file and send it out on-the-air and/or using less memory? Or how can I do this better?
To the request I using now MultipartEntityBuilder after I build it, I send it out to the server with post method and with the file I need to send other data too. So I need to send both in one request and the server only accepts files with base64 encoded.
OR
Because I using Drupal's REST module to create content from posts, it's another solution for me, if I can send normal post with a normal form. (like the browser does) The problem is, I can't find, just only one solution. When you call the <endpoint>/file url and you pass four things, these are:
array("filesize" => 1029, // file size
"filename" => "something.mp4", //file name
"uid" => 1, // user id, who upload the file
"file" => "base64 encoded file string")
After this request I get an fid, which is the uploaded file's id. I need to send this with the real content, when I create node. If I can send the file with normal post mode (without encode) like the browser does at form send, it would be better.
I need to send larger video files (and other files) to server with base64 encode.
You should consider getting a better server, one that supports binary uploads.
I get out of memory exception, because I want to store the file in the memory (in byte[]) then encode it to string with Base64.encodeToString.
That will not work for any significant video. You do not have heap space for this.
But how can I encode the file and send it out on-the-air and/or using less memory? Or how can I do this better?
You can implement a streaming converter to base64 (read the bytes in from a file and write the bytes out to a base64-encoded file, where you are only processing a small number of bytes at a time in RAM). Then, upload the file along with the rest of your form data.
In the GCM documentation (http://developer.android.com/google/gcm/gcm.html) it states that there's a 4kb payload limit per message. I am struggling to figure out how long my messages are. They're currently simply defined in a String and I use the packages
com.google.android.gcm.server.Message
com.google.android.gcm.server.Sender
to send the messages. The messages sent and received are fine. I'm just wondering if there was a way to see how many bytes each message is at the moment to see how much more I can add. I tried printing out the default encoding using Charset.defaultCharset() but I'm not sure if that's the actual encoding. It returned US-Ascii.
Currently the sending of messages goes something like:
Message message;
message = new Message.Builder()
.addData("MESSAGE_TYPE", "version1")
.addData("PERSONNAME", "john")
.addData("PHONENUMBER", "5551234567")
.build();
Sender sender = new Sender(API_KEY);
try{
MulticastResult result = sender.send(message, registrationIds, 5);
}
Is there a way to determine how many bytes the message actually is? Thanks.
I have a JPG file with 800KB. I try to upload to S3 and keep getting timeout error.
Can you please figure what is wrong? 800KB is rather small for upload.
Error Message: Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed.
HTTP Status Code: 400
AWS Error Code: RequestTimeout
Long contentLength = null;
System.out.println("Uploading a new object to S3 from a file\n");
try {
byte[] contentBytes = IOUtils.toByteArray(is);
contentLength = Long.valueOf(contentBytes.length);
} catch (IOException e) {
System.err.printf("Failed while reading bytes from %s", e.getMessage());
}
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(contentLength);
s3.putObject(new PutObjectRequest(bucketName, key, is, metadata));
Is it possible that IOUtils.toByteArray is draining your input stream so that there is no more data to be read from it when the service call is made? In that case a stream.reset() would fix the issue.
But if you're just uploading a file (as opposed to an arbitrary InputStream), you can use the simpler form of AmazonS3.putObject() that takes a File, and then you won't need to compute the content length at all.
http://docs.amazonwebservices.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/AmazonS3.html#putObject(java.lang.String, java.lang.String, java.io.File)
This will automatically retry any such network errors several times. You can tweak how many retries the client uses by instantiating it with a ClientConfiguration object.
http://docs.amazonwebservices.com/AWSJavaSDK/latest/javadoc/com/amazonaws/ClientConfiguration.html#setMaxErrorRetry(int)
If your endpoint is behind a VPC it will also silently error out. You can add a new VPC endpoint here for s3
https://aws.amazon.com/blogs/aws/new-vpc-endpoint-for-amazon-s3/