I'm using SDK 2.0 and trying to putObject into the bucket.
From different API I'm receiving InputStream which holds file (CSV or plain text)
InputStream stream = otherApi.get();
S3Client s3 = S3Client.build();
s3.putObject(PutObjectRequest.builder(), RequestBody ? )
RequestBody has multiple useful methods as well RequestBody.fromInputStream but require to provide contentLenght which I don't know. Files could be (1MB or even 20MB).
Anyone faced with this problem during using the new API version?
Old 1.X did not requered knowledge of contentLength.
Related
In the Microsoft Graph REST API beta documentation in section Get chatMessageHostedContent there is the Java example for getting hosted content bytes for an image:
InputStream stream = graphClient
.chats("19:2da4c29f6d7041eca70b638b43d45437#thread.v2")
.messages("1615971548136") .hostedContents("aWQ9eF8wLXd1cy1kOS1lNTRmNjM1NWYxYmJkNGQ3ZTNmNGJhZmU4NTI5MTBmNix0eXBlPTEsdXJsPWh0dHBzOi8vdXMtYXBpLmFzbS5za3lwZS5jb20vdjEvb2JqZWN0cy8wLXd1cy1kOS1lNTRmNjM1NWYxYmJkNGQ3ZTNmNGJhZmU4NTI5MTBmNi92aWV3cy9pbWdv")
.content()
.buildRequest()
.get();
... but using the latest tag microsoftgraph/msgraph-beta-sdk-java (0.9.0-20210615.3) this example doesn't work as content method in ChatMessageHostedContentRequestBuilder cannot be resolved.
With that in mind my question is what is official way of downloading hosted content bytes.
Related question with some more details is also present on GitHub.
It looks like this will be fixed in the future - but for the time being this workaround should do it:
String valueUrl = graphClient
.chats(chatId)
.messages(messageId)
.hostedContents(hostedContentId)
.getRequestUrlWithAdditionalSegment("$value");
InputStream stream = new CustomRequestBuilder<>(valueUrl, graphClient, null, InputStream.class).buildRequest().get();
This link explains how to use the REST API to upload an attachment.
But I want to upload an attachment with the java client...
I assume the following classes are relevant (though I may be wrong)...
org.elasticsearch.ingest.IngestService
org.elasticsearch.ingest.PipelineStore
I realize that I can just fall back to the REST interface but I'd rather try and use the native client first...
Just send a BASE64 encoded PDF in a field like:
String base64;
try (InputStream is = YourClass.class.getResourceAsStream(pathToYourFile)) {
byte bytes[] = IOUtils.toByteArray(is);
base64 = Base64.getEncoder().encodeToString(bytes);
}
IndexRequest indexRequest = new IndexRequest("index", "type", "id")
.setPipeline("foo")
.source(
jsonBuilder().startObject()
.field("field", base64)
.endObject()
);
In case you are not aware of it, I'm also linking to FSCrawler project in case it solves something you want to do already.
Here is four options that you can use to index PDFs to ElasticSearch
Ingest Attachment Plugin
Apache Tika
FsCrawler
Ambar
Pros/cons described in this post
I have a java application that needs to do fast and reliable downloads from Amazon's S3. Ideally, I'd use something like the AWS SDK's TransferManager ( http://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/services/s3/transfer/TransferManager.html ), except I'd like to process the data in a streaming fashion, without having to stage all the downloaded data on local disk.
Ideally, the library would have an interface similar to AmazonS3#getObject(), but the implementation would be faster and more robust. Even better, the library would support pre-fetching for multiple S3 objects: I could give it a list of objects that I want to download eventually, then consume a sequence of streams for each object quickly. It's ok if the library has to use a lot of RAM to do the pre-fetching.
Does anybody know of a library that has some/all of these features?
I would recommend to use minio-java
Java Library for Amazon S3 Compatible Cloud Storage
io.minio.MinioClient.getObject returns InputStream [example] and you could do multiple getObject where each call returns individual InputStream.
MinioClient s3Client = new MinioClient("https://s3.amazonaws.com", "YOUR-ACCESSKEYID", "YOUR-SECRETACCESSKEY");
InputStream stream1 = s3Client.getObject("my-bucketname", "my-objectname1");
InputStream stream2 = s3Client.getObject("my-bucketname", "my-objectname2");
Here, the streams are not pre-fetched. If pre-fetching is hard requirement, you could use another variant of getObject
public void getObject(String bucketName, String objectName, String fileName)
Advantage of using this method is, it resumes previous getObject if any.
MinioClient s3Client = new MinioClient("https://s3.amazonaws.com", "YOUR-ACCESSKEYID", "YOUR-SECRETACCESSKEY");
s3Client.getObject("my-bucketname", "my-objectname1", "/mycachedir/my-objectname1");
s3Client.getObject("my-bucketname", "my-objectname2", "/mycachedir/my-objectname2");
I wrote a Google App Engine application that makes use of Blobstore to save programmatically-generated data. To do so, I used the Files API, which unfortunately has been deprecated in favor to Google Cloud Storage. So I'm rewriting my helper class to work with GCS.
I'd like to keep the interface as similar as possible as it was before, also because I persist BlobKeys in the Datastore to keep references to the files (and changing the model of a production application is always painful). When i save something to GCS, i retrieve a BlobKey with
BlobKey blobKey = blobstoreService.createGsBlobKey("/gs/" + fileName.getBucketName() + "/" + fileName.getObjectName());
as prescribed here, and I persist it in the Datastore.
So here's the question: the documentation tells me how to serve a GCS file with blobstoreService.serve(blobKey, resp); in a servlet response, BUT how can I retrieve the file content (as InputStream, byte array or whatever) to use it in my code for further processing? In my current implementation I do that with a FileReadChannel reading from an AppEngineFile (both deprecated).
Here is the code to open a Google Storage Object as Input Stream. Unfortunately, you have to use bucket name and object name and not the blob key
GcsFilename gcs_filename = new GcsFilename(bucket_name, object_name);
GcsService service = GcsServiceFactory.createGcsService();
ReadableByteChannel rbc = service.openReadChannel(gcs_filename, 0);
InputStream stream = Channels.newInputStream(rbc);
Given a blobKey, use the BlobstoreInputStream class to read the value from Blobstore, as described in the documentation:
BlobstoreInputStream in = new BlobstoreInputStream(blobKey);
You can get the cloudstorage filename only in the upload handler (fileInfo.gs_object_name) and store it in your database. After that it is lost and it seems not to be preserved in BlobInfo or other metadata structures.
Google says:
Unlike BlobInfo metadata FileInfo metadata is not persisted to
datastore. (There is no blob key either, but you can create one later
if needed by calling create_gs_key.) You must save the gs_object_name
yourself in your upload handler or this data will be lost.
Sorry, this is a python link, but it should be easy to find something similar in java.
https://developers.google.com/appengine/docs/python/blobstore/fileinfoclass
Here is the Blobstore approach (sorry, this is for Python, but I am sure you find it quite similar for Java):
blob_reader = blobstore.BlobReader(blob_key)
if blob_reader:
file_content = blob_reader.read()
I'm currently working on a project that is done in Java, on google appengine. i have above 2000 records
Appengine does not allow files to be stored so any on-disk representation objects cannot be used. Some of these include the File class.
I want to write data and export it to a few csv files, the user to download it.
How may I do this without using any File classes? I'm not very experienced in file handling so I hope you guys can advise me.
Thanks.
Just generate the csv in memory using a StringBuffer and then use StringBuffer.toString().getBytes() to get a byte array which can then be sent to your output stream.
For instance if using a servlet in GAE:
protected void doGet(HttpServletRequest req, HttpServletResponse resp) {
StringBuffer buffer = new StringBuffer();
buffer.append("header1, header2, header3\n");
buffer.append("row1column1, row1column2, row1column3\n");
buffer.append("row2column1, row2column2, row2column3\n");
// Add more CSV data to the buffer
byte[] bytes = buffer.toString().getBytes();
// This will suggest a filename for the browser to use
resp.addHeader("Content-Disposition", "attachment; filename=\"myFile.csv\"");
resp.getOutputStream().write(bytes, 0, bytes.length);
}
More information about GAE Servlets
More information about Content-Disposition
You can store data in memory using byte arrays, stings and streams. For example,
ByteArrayOutputStream csv = new ByteArrayOutputStream();
PrintStream printer = new PrintStream(csv);
printer.println("a;b;c");
printer.println("1;2;3");
printer.close();
csv.close();
Then in your Servlet you can serve your csv.toByteArray() as a stream. Some example is given here: Implementing a simple file download servlet.
You can use OpenCSV library in Google App Engine.