Get a Google Cloud Storage file from its BlobKey - java

I wrote a Google App Engine application that makes use of Blobstore to save programmatically-generated data. To do so, I used the Files API, which unfortunately has been deprecated in favor to Google Cloud Storage. So I'm rewriting my helper class to work with GCS.
I'd like to keep the interface as similar as possible as it was before, also because I persist BlobKeys in the Datastore to keep references to the files (and changing the model of a production application is always painful). When i save something to GCS, i retrieve a BlobKey with
BlobKey blobKey = blobstoreService.createGsBlobKey("/gs/" + fileName.getBucketName() + "/" + fileName.getObjectName());
as prescribed here, and I persist it in the Datastore.
So here's the question: the documentation tells me how to serve a GCS file with blobstoreService.serve(blobKey, resp); in a servlet response, BUT how can I retrieve the file content (as InputStream, byte array or whatever) to use it in my code for further processing? In my current implementation I do that with a FileReadChannel reading from an AppEngineFile (both deprecated).

Here is the code to open a Google Storage Object as Input Stream. Unfortunately, you have to use bucket name and object name and not the blob key
GcsFilename gcs_filename = new GcsFilename(bucket_name, object_name);
GcsService service = GcsServiceFactory.createGcsService();
ReadableByteChannel rbc = service.openReadChannel(gcs_filename, 0);
InputStream stream = Channels.newInputStream(rbc);

Given a blobKey, use the BlobstoreInputStream class to read the value from Blobstore, as described in the documentation:
BlobstoreInputStream in = new BlobstoreInputStream(blobKey);

You can get the cloudstorage filename only in the upload handler (fileInfo.gs_object_name) and store it in your database. After that it is lost and it seems not to be preserved in BlobInfo or other metadata structures.
Google says:
Unlike BlobInfo metadata FileInfo metadata is not persisted to
datastore. (There is no blob key either, but you can create one later
if needed by calling create_gs_key.) You must save the gs_object_name
yourself in your upload handler or this data will be lost.
Sorry, this is a python link, but it should be easy to find something similar in java.
https://developers.google.com/appengine/docs/python/blobstore/fileinfoclass

Here is the Blobstore approach (sorry, this is for Python, but I am sure you find it quite similar for Java):
blob_reader = blobstore.BlobReader(blob_key)
if blob_reader:
file_content = blob_reader.read()

Related

Output data appears in a random order when uploaded to google cloud storage

I've been using the google-dataflow-sdk to upload CSV files to google cloud storage.
When I upload the file to a google cloud project, my data appears in a file in a random order on the cloud. Each line on the csv is correct, but the rows are all over the place.
The header of the csv )i.e. attribute, attribute, attribute) are on another line all the time and never at the top where is should be. I stress again, the data in each column is fine, it is just the rows that are randomly positioned.
here is the code which reads the data initially:
PCollection<String> csvData = pipeline.apply(TextIO.Read.named("ReadItems")
.from(filename));
and this is the code that writes to the google cloud project:
csvData.apply(TextIO.Write.named("WriteToCloud")
.to("gs://dbm-poc/"+partnerId+"/"+dateOfReport+modifiedFileName)
.withSuffix(".csv"));
Thanks for any help.
Firstly, to fix your header use:
public static TextIO.Write.Bound<String> withHeader(#Nullable String header)
https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write#withHeader-java.lang.String-
For example:
...
TextIO.Write.withHeader("<header>").apply(..)
...
Secondly, Dataflow does not currently support ordered/sorted writing to Sinks. This is mostly likely due to its distributed/paralell architecture. You could write your own custom Sink if you really wanted to. See similar question here for more details.
Whilst i agree the answer provided by Graham Polley is correct, I managed to find a much simpler way to get the data to write in an ordered way.
I instead used the google cloud storage library to store the files I would need onto the cloud, like so:
public static String writeFile(byte[] content, String filename, String partnerId, String dateOfReport) {
Storage storage = StorageOptions.defaultInstance().service();
BlobId blobId = BlobId.of("dbm-poc", partnerId + "/" + dateOfReport + "-" + filename + ".csv");
BlobInfo blobInfo = BlobInfo.builder(blobId).contentType("binary/octet-stream").build();
storage.create(blobInfo, content);
return filename;
}
public static byte[] readFile(String filename) throws IOException {
return Files.readAllBytes(Paths.get(filename));
}
Using these two methods in conjunction with each other, I was not only able to upload the files to the bucket i wanted without losing any of the contents ordering, but i was also able to change the format of the uploaded files from text to a binary/octet-stream file which means it can be access and downloaded.
This method also seems to remove the need to have a pipeline to upload data.

How to update the content of a file in Google Drive?

I am trying to update the content of a Google Doc file with the content of another Google Doc file. The reason I don't use the copy method of the API is because that creates another file with another ID. My goal is to keep the current ID of the file. This is a code snippet which unfortunately does nothing:
com.google.api.services.drive.Drive.Files.Get getDraft = service.files().get(draftID);
File draft = driveManager.getFileBackoffExponential(getDraft);
com.google.api.services.drive.Drive.Files.Update updatePublished = service.files().update(publishedID, draft);
driveManager.updateFileBackoffExponential(updatePublished);
The two backoffExponential functions just launch the execute method on the object.
Googling around I found out that the update method offers another constructor:
public Update update(java.lang.String fileId, com.google.api.services.drive.model.File content, com.google.api.client.http.AbstractInputStreamContent mediaContent)
Thing is, I have no idea how to retrieve the mediaContent of a Google file such as a Google Doc.
The last resort could be a Google Apps Script but I'd rather avoid that since it's awfully slow and unreliable.
Thank you.
EDIT: I am using Drive API v3.
Try the Google Drive REST update.
Updates a file's metadata and/or content with patch semantics.
This method supports an /upload URI and accepts uploaded media with
the following characteristics:
Maximum file size: 5120GB Accepted Media MIME types: /*
To download a Google File in the format that's usable, you need to specify the mime-type. Since you're using Spreadsheets, you can try application/vnd.openxmlformats-officedocument.spreadsheetml.sheet. Link to Download files for more info.

How to read the contents of an uploaded blob?

I'm using Blobstore to upload a simple text file using this doc: https://cloud.google.com/appengine/docs/java/blobstore/#Java_Uploading_a_blob . I understand from the docs how to save and serve the blob to users, but I don't understand how can my servlet that handle the file upload actually read the contents of the text file?
I found the answers. This is the code:
Map<String, List<FileInfo>> infos = blobstoreService.getFileInfos(request);
Long fileSize = infos.get("myFile").get(0).getSize();
Map<String, List<BlobKey>> blobKeys = blobstoreService.getUploads(request);
byte[] fileBytes =
blobstoreService.fetchData(blobKeys.get("myFile").get(0), 0, fileSize);
String input = new String(fileBytes);
In python there is the BlobReader class to help you do this. (https://cloud.google.com/appengine/docs/python/blobstore/blobreaderclass)
It seems like you are using Java though. There does not seem to be an equivalent class in Java. What I would do is to use GCS as the backing for you blobstore (https://cloud.google.com/appengine/docs/java/blobstore/#Java_Using_the_Blobstore_API_with_Google_Cloud_Storage). This way the files uploaded to the blobstore will be accessibly in GCS.
You can then read the file using the GCS client library for Java.

Get Google Cloud Storage File from ObjectName

I'm migrating my GAE app from the deprecated File API to Google Cloud Storage Client Library.
I used to persist the blobKey, but since there is partial support for it (as specified here) from now on I'll have to persist the object name.
Unfortunately the object name that comes from the GCS looks more or less like this
/gs/bucketname/819892hjd81dh19gf872g8211
as you can see, it also contains the bucket name
Here's the issue, every time I need to get the file for further processing (or to serve it in a servlet) I need to create an instance of GcsFileName(bucketName, objectName) which gives me something like
/bucketName/gs/bucketName/akahsdjahslagfasgfjkasd
which (of course) doesn't work.
so. my question is:
- how can I generate a GcsFileName form the objectName?
UPDATE
I tried using the objectName as BlobKey. But it just doesn't work :(
InputStream is = new BlobstoreInputStream(blobstoreService.createGsBlobKey("/gs/bucketName/akahsdjahslagfasgfjkasd"));
I got the usual answer
BlobstoreInputStream received an invalid blob key
How do I get the file using the ObjectName???
If you have persisted and retrieved e.g the string String objname worth e.g "/gs/bucketname/819892hjd81dh19gf872g8211", you could split it on "/" (String[] pieces = objname.split("/")) and use the pieces appropriately in the call to GcsFileName.

Create Empty CloudBlockBlob in Azure

I'm hoping the answer to this question is quite simple, but I can't get it working after looking at the Azure Java API documentation.
I am trying to create an empty CloudBlockBlob, which will have blocks uploaded to it at a later point. I have successfully uploaded blocks before, when the blob is created upon the first block being uploaded, but I can't seem to get anything other than ("the specified blob does not exist") when I try to create a new blob without any data and then access it. I require this because in my service, a call is first made to create the new blob in Azure, and then later calls are used to upload blocks (at which point a check is made to see if the blob exists). Is it possible to create an empty blob in Azure, and upload data to it later? What have I missed?
I've not worked with Java SDK so I may be wrong but I tried creating an empty blob using C# code (storage client library 2.0) and if I upload an empty input stream an empty blob with zero byte size is created. I did something like the following:
CloudBlockBlob emptyBlob = blobContainer.GetBlockBlobReference("emptyblob.txt");
using (MemoryStream ms = new MemoryStream())
{
emptyBlob.UploadFromStream(ms);//Empty memory stream. Will create an empty blob.
}
I did look at Azure SDK for Java source code on Github here: https://github.com/WindowsAzure/azure-sdk-for-java/blob/master/microsoft-azure-api/src/main/java/com/microsoft/windowsazure/services/blob/client/CloudBlockBlob.java and found this "upload" function where you can specify an input stream. Try it out and see if it works for you.

Categories

Resources