I need to append the string content to google cloud storage file.
Actually I'm doing string operation that contains a file size of more than 30 MB.
I'm running my project in google app engine.
When I try to hold the entire content in stringbuilder its throwing java heap space error in google app engine.
Is there any way to append the content in google cloud storage file instead of increasing the java heap space?
Streaming data out in Java is usually accomplished through a Writer pattern. As you build your data, you "write" it out on a stream. Keeping the entire thing in memory leads to the problems you've experienced.
I'm not sure which library you're using to access Google Cloud Storage, but most of them provide a way to write objects a few bytes at a time.
App Engine's legacy Google Cloud Storage API provides an AppEngineFile object with a write() method that you can repeatedly invoke as you create your object. The new Google Cloud Storage Client Library provides a GcsOutputChannel that can do the same.
I might have misunderstood your question, though. Are you asking about creating an object in Google Cloud Storage and then appending content to it afterwards? GCS does not allow for appending to files once they've been fully created, with the limited exception of the "compose" functionality (which would work, except that you can only compose an object a certain number of times before you reach a maximum composition level).
Related
I am using mirth connect 3.0.3 and i am having a .xml file which is almost 85mb size and contains some device information. i need to read this .xml file and insert that data to the database(sql server).
the problem i am facing is when i try to read the data it is showing java heap size error:
i increased server memory to 1024mb and client memory to 1024mb.
but it is showing the same error. if i increase the memory to more, i am not able to start mirth connect.
any suggestion is appreciated.
Thanks.
Is the XML file comprised of multiple separate sections/pieces of data that would make sense to split up into multiple channel messages? If so, consider using a Batch Adapter. The XML data type has options to split based on element/tag name, the node depth/level, or an XPath query. All of those options currently still require the message to read into memory in its entirety, but it will still be more memory-efficient than reading the entire XML document in as a single message.
You can also use a JavaScript batch script, in which case you're given a Java BufferedReader, and can use the script to read through the file and return a message at a time. In this case, you will not have to read the entire file into memory.
Are there large blobs of data in the message that don't need to be manipulated in a transformer? Like, embedded images, etc? If so, consider using an Attachment Handler. That way you can extract that data and store it once, rather than having it copied and stored multiple times throughout the message lifecycle (for Raw / Transformed / Encoded / etc.).
So the existing code base where I work uses a regular Java File("a/directory/path") object for a massive amount of logic. Now my team wants me to use a file stored in the Azure Blob instead. I can get the file from the blob using the CloudBlobItem() java api. But this object is different than a regular java File() object. And I would have to change a bunch of stuff in the logic. Is there any blob item which can be casted to a regular File() object?
Short answer: No.
You're comparing two completely different things. Azure blobs are not files. You'd need to stream them down to where your code is running. Maybe to a file stream. Maybe write to disk. And then work with the file. You cannot just use an Azure blob like any other file I/O.
Note: If you're using Azure File Storage (which is an SMB share), then you do treat everything in that file store like you'd treat local storage. But it sounds like you're just using normal block blobs for your storage.
I have a use case where I need to perform the querying operation on the data stored in Google Cloud Datastore and display the results and provide a download link to the csv file of the same data.
I had gone through different documentations, but it dealt mostly with python whereas my implementation is in Java.
Please guide.
Here is one possible way: you build your csv file in memory in a Cloud Endpoint by querying the Cloud datastore and print to a ByteArrayOutputStream, as shown hereafter:
ByteArrayOutputStream csvOS = new ByteArrayOutputStream();
PrintStream printer = new PrintStream(csvOS);
printer.println("L1C1;L1C2;L1C3");
printer.println("L2C1;L2C2;L2C3");
printer.close();
Then you save the csv file to Cloud Storage and return the URL for downloading it, as I explained in the following answer:
https://stackoverflow.com/a/37603225/3371862
An other possibility would be to stream the result through a Google App Engine servlet (i.e. you don't go through Cloud Endpoints). Have a look at how to write csv file in google app by using java
I'm searching for the best approach to do the following:
User uploads a large (~500 Megabytes) ZIP file via an App-Engine servlet
All the extracted content should be saved to a Cloud Storage bucket
A DB record should be inserted to a table on CloudSQL with the URL of every stored file.
What will be the best approach to implement such a behavior?
Thanks!
You can easily upload the 500 MB .ZIP directly to GCS, but then you can't unpack it there -- and it's too large to get it into app engine for unpacking. Rather, for this latter task, I would use google compute engine, which is not subject to the same limitations as google app engine.
I can easily upload/write or Read contents of files (~80KB) from Google Cloud Storage
Now, I have to perform a bigger task while serving big files (~200MB-300MB) :
1) Need to read the contents of uploaded file into chunks (~10 KB).
<--Want to modify chunked data programmatically-->
2) Repeat the "1" steps until stream read the whole content of the file (from starting to end sequentially).
I tried this program but in response i only some amount of data. How to perform mentioned task?
You should not use the file API (which is deprecated - see comment at the top of the page you mentioned). Instead use the GCS client (mentioned in the deprecation notice). The GCS client allows you to read continuously and you can serialize the state of the GcsInputChannel between requests until read was completed (if read is longer than request timeout). You should also consider using the mapreduce library and use the GoogleCloudStorageLineInput for reading the file and writing the modified one in your mapper (probably map-only in your case).