I'm using Blobstore to upload a simple text file using this doc: https://cloud.google.com/appengine/docs/java/blobstore/#Java_Uploading_a_blob . I understand from the docs how to save and serve the blob to users, but I don't understand how can my servlet that handle the file upload actually read the contents of the text file?
I found the answers. This is the code:
Map<String, List<FileInfo>> infos = blobstoreService.getFileInfos(request);
Long fileSize = infos.get("myFile").get(0).getSize();
Map<String, List<BlobKey>> blobKeys = blobstoreService.getUploads(request);
byte[] fileBytes =
blobstoreService.fetchData(blobKeys.get("myFile").get(0), 0, fileSize);
String input = new String(fileBytes);
In python there is the BlobReader class to help you do this. (https://cloud.google.com/appengine/docs/python/blobstore/blobreaderclass)
It seems like you are using Java though. There does not seem to be an equivalent class in Java. What I would do is to use GCS as the backing for you blobstore (https://cloud.google.com/appengine/docs/java/blobstore/#Java_Using_the_Blobstore_API_with_Google_Cloud_Storage). This way the files uploaded to the blobstore will be accessibly in GCS.
You can then read the file using the GCS client library for Java.
Related
I've been using the google-dataflow-sdk to upload CSV files to google cloud storage.
When I upload the file to a google cloud project, my data appears in a file in a random order on the cloud. Each line on the csv is correct, but the rows are all over the place.
The header of the csv )i.e. attribute, attribute, attribute) are on another line all the time and never at the top where is should be. I stress again, the data in each column is fine, it is just the rows that are randomly positioned.
here is the code which reads the data initially:
PCollection<String> csvData = pipeline.apply(TextIO.Read.named("ReadItems")
.from(filename));
and this is the code that writes to the google cloud project:
csvData.apply(TextIO.Write.named("WriteToCloud")
.to("gs://dbm-poc/"+partnerId+"/"+dateOfReport+modifiedFileName)
.withSuffix(".csv"));
Thanks for any help.
Firstly, to fix your header use:
public static TextIO.Write.Bound<String> withHeader(#Nullable String header)
https://cloud.google.com/dataflow/java-sdk/JavaDoc/com/google/cloud/dataflow/sdk/io/TextIO.Write#withHeader-java.lang.String-
For example:
...
TextIO.Write.withHeader("<header>").apply(..)
...
Secondly, Dataflow does not currently support ordered/sorted writing to Sinks. This is mostly likely due to its distributed/paralell architecture. You could write your own custom Sink if you really wanted to. See similar question here for more details.
Whilst i agree the answer provided by Graham Polley is correct, I managed to find a much simpler way to get the data to write in an ordered way.
I instead used the google cloud storage library to store the files I would need onto the cloud, like so:
public static String writeFile(byte[] content, String filename, String partnerId, String dateOfReport) {
Storage storage = StorageOptions.defaultInstance().service();
BlobId blobId = BlobId.of("dbm-poc", partnerId + "/" + dateOfReport + "-" + filename + ".csv");
BlobInfo blobInfo = BlobInfo.builder(blobId).contentType("binary/octet-stream").build();
storage.create(blobInfo, content);
return filename;
}
public static byte[] readFile(String filename) throws IOException {
return Files.readAllBytes(Paths.get(filename));
}
Using these two methods in conjunction with each other, I was not only able to upload the files to the bucket i wanted without losing any of the contents ordering, but i was also able to change the format of the uploaded files from text to a binary/octet-stream file which means it can be access and downloaded.
This method also seems to remove the need to have a pipeline to upload data.
I wrote a Google App Engine application that makes use of Blobstore to save programmatically-generated data. To do so, I used the Files API, which unfortunately has been deprecated in favor to Google Cloud Storage. So I'm rewriting my helper class to work with GCS.
I'd like to keep the interface as similar as possible as it was before, also because I persist BlobKeys in the Datastore to keep references to the files (and changing the model of a production application is always painful). When i save something to GCS, i retrieve a BlobKey with
BlobKey blobKey = blobstoreService.createGsBlobKey("/gs/" + fileName.getBucketName() + "/" + fileName.getObjectName());
as prescribed here, and I persist it in the Datastore.
So here's the question: the documentation tells me how to serve a GCS file with blobstoreService.serve(blobKey, resp); in a servlet response, BUT how can I retrieve the file content (as InputStream, byte array or whatever) to use it in my code for further processing? In my current implementation I do that with a FileReadChannel reading from an AppEngineFile (both deprecated).
Here is the code to open a Google Storage Object as Input Stream. Unfortunately, you have to use bucket name and object name and not the blob key
GcsFilename gcs_filename = new GcsFilename(bucket_name, object_name);
GcsService service = GcsServiceFactory.createGcsService();
ReadableByteChannel rbc = service.openReadChannel(gcs_filename, 0);
InputStream stream = Channels.newInputStream(rbc);
Given a blobKey, use the BlobstoreInputStream class to read the value from Blobstore, as described in the documentation:
BlobstoreInputStream in = new BlobstoreInputStream(blobKey);
You can get the cloudstorage filename only in the upload handler (fileInfo.gs_object_name) and store it in your database. After that it is lost and it seems not to be preserved in BlobInfo or other metadata structures.
Google says:
Unlike BlobInfo metadata FileInfo metadata is not persisted to
datastore. (There is no blob key either, but you can create one later
if needed by calling create_gs_key.) You must save the gs_object_name
yourself in your upload handler or this data will be lost.
Sorry, this is a python link, but it should be easy to find something similar in java.
https://developers.google.com/appengine/docs/python/blobstore/fileinfoclass
Here is the Blobstore approach (sorry, this is for Python, but I am sure you find it quite similar for Java):
blob_reader = blobstore.BlobReader(blob_key)
if blob_reader:
file_content = blob_reader.read()
My Usecase: I don't use the blobstore for uploading and downloading files. I use the blobstore to store very large strings my program is creating. With persisting the path where the blob is stored, I can later load the string again (see documentation)
My Question: Is there an easier way to access the blob content without the need to store the path? BlobstoreService only allows me to directly serve to the HttpServletReponse.
You only need to store a BlobKey - there is never a need to store a path (files or not).
To access contents of a blob:
BlobstoreService blobStoreService = BlobstoreServiceFactory.getBlobstoreService();
String myString =
new String(blobStoreService.fetchData(blobKey, 0, BlobstoreService.MAX_BLOB_FETCH_SIZE-1);
EDIT:
If you have a very long string, you can use any standard way of reading a byte array into a string by fetching data from a blob in a loop.
I guess when you said "persisting the path where the blob is stored", you mean BlobKey?
FileService allows you to directly access blob data:
// Get a file service
FileService fileService = FileServiceFactory.getFileService();
// Get a file backed by blob
AppEngineFile file = fileService.getBlobFile(blobKey)
// get a read channel
FileReadChannel readChannel = fileService.openReadChannel(file, false);
// Since you store a String, I guess you want to read it a such
BufferedReader reader = new BufferedReader(Channels.newReader(readChannel, "UTF8"));
// Do this in loop unitil all data is read
String line = reader.readLine();
I'm currently working on a project that is done in Java, on google appengine. i have above 2000 records
Appengine does not allow files to be stored so any on-disk representation objects cannot be used. Some of these include the File class.
I want to write data and export it to a few csv files, the user to download it.
How may I do this without using any File classes? I'm not very experienced in file handling so I hope you guys can advise me.
Thanks.
Just generate the csv in memory using a StringBuffer and then use StringBuffer.toString().getBytes() to get a byte array which can then be sent to your output stream.
For instance if using a servlet in GAE:
protected void doGet(HttpServletRequest req, HttpServletResponse resp) {
StringBuffer buffer = new StringBuffer();
buffer.append("header1, header2, header3\n");
buffer.append("row1column1, row1column2, row1column3\n");
buffer.append("row2column1, row2column2, row2column3\n");
// Add more CSV data to the buffer
byte[] bytes = buffer.toString().getBytes();
// This will suggest a filename for the browser to use
resp.addHeader("Content-Disposition", "attachment; filename=\"myFile.csv\"");
resp.getOutputStream().write(bytes, 0, bytes.length);
}
More information about GAE Servlets
More information about Content-Disposition
You can store data in memory using byte arrays, stings and streams. For example,
ByteArrayOutputStream csv = new ByteArrayOutputStream();
PrintStream printer = new PrintStream(csv);
printer.println("a;b;c");
printer.println("1;2;3");
printer.close();
csv.close();
Then in your Servlet you can serve your csv.toByteArray() as a stream. Some example is given here: Implementing a simple file download servlet.
You can use OpenCSV library in Google App Engine.
How to read content of a blob and write to GAE datastore in java.
Once you have the BlobKey for the blob you want to read, you can construct a BlobstoreInputStream:
BlobKey blobKey = ...;
InputStream is = new BlobstoreInputStream(blobKey)
You can then read the blob contents using any of the InputStream read methods.
You can use FileService API to create/write/read files in Blobstore. When you read byte array from file, then you can easily add as a property to Datastore entity and save it.