My idea is to upload an int array from a Java servlet running on an AWS ec2 microinstance. As I understand it I would have to convert my int array to an java object file first and then upload the file into my bucket, but is there a way to do this "on the fly" without first creating a local file?
If I had to create a local file first, which pathname would it have?
It will like this:
public void arrayToS3(String bucket, String pathInS3, JSONArray array) {
ObjectMetadata metadata = new ObjectMetadata();
byte[] dataInMemory = array.toString().getBytes();
s3client.putObject(bucket, pathInS3, new ByteArrayInputStream(dataInMemory), metadata);
}
Just convert anything into IntputStream. For example, the method arrayToS3 convert the JSONArray to String, and convert String to byte[]. Finally, wrapping byte[] to InputStream.
Everything is in memory. If your data is not very large, it is a simple way to do that. If you data is bigger than the memory set by JVM, out-of-memory will hit you.
Related
I'm passing an array of array to a java method and I need to add that data to a new file (which will be loaded into an s3 bucket)
How do I do this? I haven't been able to find an example of this
Also, I'm sure "object" is not the correct data type this attribute should be. Array doesn't seem to be the correct one.
Java method -
public void uploadStreamToS3Bucket(String[][] locations) {
try {
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withRegion(String.valueOf(awsRegion))
.build();
String fileName = connectionRequestRepository.findStream() +".json";
String bucketName = "downloadable-cases";
File locationData = new File(?????) // Convert locations attribute to a file and load it to putObject
s3Client.putObject(new PutObjectRequest(bucketName, fileName, locationData));
} catch (AmazonServiceException ex) {
System.out.println("Error: " + ex.getMessage());
}
}
You're trying to use PutObjectRequest(String,String,File)
but you don't have a file. So you can either:
Write your object to a file and then pass that file
or
Use the PutObjectRequest(String,String,InputStream,ObjectMetadata) version instead.
The later is better as you save the intermediate step.
As for how to write an object to a stream you may ask: Check this How can I convert an Object to Inputstream
Bear in mind to read it you have to use the same format.
It might be worth to think about what kind of format you want to save your information, because it might be needed to be read for another program, or maybe by another human directly from the bucket and there might be other formats / serializers that area easy to read (if you write JSON for instance) or more efficient (if you use another serializer that takes less space).
As for the type of array of array you can use the [][] syntax. For instance an array of array of Strings would be:
String [][] arrayOfStringArrays;
I hope this helps.
I'm currently getting a ResponseInputStream<GetObjectResponse> from the S3Client (SDK 2), read it into a byte array and open two ByteArrayInputStream to pass them to Apache Tika and ImageIO.read.
Tika detectes the mimeType, BufferedImage is used to get height and width. Now both operations do not need to read the whole file (at least not for all image types). But reading into a byte array requires the consumption of the whole file.
Now how could I open two streams and just discard it when I'm done? Is the only way to perform two getObject calls to S3? Mark and reset isn't supported by the SDK.
One possible way is while uploading the image, if you add the metadata info in the request, then you only need to call GetObjectMetadata method and you can get the information that you need without retrieving the whole object again.
s3Client.putObject(bucketName, stringObjKeyName, "Uploaded String Object");
// Upload a file as a new object with ContentType and title specified.
PutObjectRequest request = new PutObjectRequest(bucketName, fileObjKeyName, new File(fileName));
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentType("plain/text");
metadata.addUserMetadata("title", "someTitle");
request.setMetadata(metadata);
s3Client.putObject(request);
I am trying to read a big AWS S3 Compressed Object(gz).I don't want to read the whole object, want to read it in parts,so that i can process the uncompressed data in parallel
I am reading it with GetObjectRequest with "Range" Header, where i am setting byte range.
However, when i give a byte range in between (100,200), it fails with "Not in GZIP format"
The reason for failure is , AWS request return a stream,however when i parse it to GZIPInputStream it fails as "GZIPInputStream" expects the first byte (GZIP_MAGIC = 0x8b1f) to confirm is it gzip , which is not present in the stream.
GetObjectRequest rangeObjectRequest = new GetObjectRequest(<<Bucket>>, <<Key>>).withRange(100, 200);
S3Object object = s3Client.getObject(rangeObjectRequest);
S3ObjectInputStream rawData = object.getObjectContent();
InputStream data = new GZIPInputStream(rawData);
Can anyone guide the right approach?
GZIP is a compression format in which each byte in the file depends on all of the bytes that precede it. Which means that you can't pick an arbitrary byte range out of the file and make sense of it.
If you need to read byte ranges, you'll need to store it uncompressed.
You could also create your own file storage format that stores chunks of the file as separately-compressed blocks. You could do this using the ZIP format, where each file in the archive represents a specific block size. But you'd need to implement your own ZIP directory reader to make that work.
Uploading an image to hbase using Java program, after retrieving the image I found there is difference in file size eventually increased and most of Exif and Meta data loss
(GPS location data, camera details, etc..)
Code :
public ArrayList<Object> uploadImagesToHbase(MultipartFile uploadedFileRef){
byte[] bytes =uploadedFileRef.getBytes();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
ImageIO.write(image, "jpg", outputStream);
HBaseAdmin admin = new HBaseAdmin(configuration);
HTable table = new HTable(configuration, "sample");
Put image = new Put(Bytes.toBytes("1"));
image.add(Bytes.toBytes("DataColumn"), Bytes.toBytes(DataQualifier), bytes);
table.put(image);
How to store and retrieve a Image with out any change / loss?
Please try using SerializationUtils from Apache Commons Lang.
Below are methods
static Object clone(Serializable object) //Deep clone an Object using serialization.
static Object deserialize(byte[] objectData) //Deserializes a single Object from an array of bytes.
static Object deserialize(InputStream inputStream) //Deserializes an Object from the specified stream.
static byte[] serialize(Serializable obj) //Serializes an Object to a byte array for storage/serialization.
static void serialize(Serializable obj, OutputStream outputStream) //Serializes an Object to the specified stream.
While storing in to hbase you can store byte[] which was returned from serialize.
While getting the Object you can type cast to corresponding object for ex: File object and can get it back.
Most likely you are just over-complicating things. :-)
The reason why you are losing the Exif and other metadata, is that the ImageIO convenience methods ImageIO.read(...) and ImageIO.write(...) does not preserve metadata. The good news is, they are not needed.
As you seem to already have the image data from the MultipartFile, you should simply store that data (the byte array) in the database, and you will store exactly what the user uploaded. No difference in file size, and metadata will be untouched.
Your code above doesn't compile for me, and I'm no HBase expert, so I just leave that out (as you have already been able to store an image, to see the size/quality difference and metadata loss, I assume you know how to do that :-) ). But here's the basics:
public ArrayList<Object> uploadImagesToHbase(MultipartFile uploadedFileRef) {
byte[] bytes = uploadedFileRef.getBytes();
// Store the above "bytes" byte array in HBase *as is* (no ImageIO)
}
I'm planing to use SheetJS with rhino. And sheetjs takes a binary object(BLOB if i'm correct) as it's input. So i need to read a file from the system using stranded java I/O methods and store it into a blob before passing it to sheetjs. eg :-
var XLDataWorkBook = XLSX.read(blobInput, {type : "binary"});
So how can i create a BLOB(or appropriate type) from a binary file in java in order to pass it in.
i guess i cant pass streams because i guess XLSX needs a completely created object to process.
I found the answer to this by myself. i was able to get it done this way.
Read the file with InputStream and then write it to a ByteArrayOutputStream. like below.
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
...
buffer.write(bytes, 0, len);
Then create a byte array from it.
byte[] byteArray = buffer.toByteArray();
Finally i did convert it to a Base64 String (which is also applicable in my case) using the "Base64.encodeBase64String()" method in apache.commons.codec.binary package. So i can pass Base64 String as a method parameter.
If you further need there are lot of libraries(3rd-party and default) available for Base64 to Blob conversion as well.