Create Tar archive from directory on S3 using AWS Lambda

Create Tar archive from directory on S3 using AWS Lambda - java

I need to extract a bunch of zip files stored on s3 and add them to a tar archive and store that archive on s3. it is likely that that the sum of the zip files will greater than the 512mb local storage allowed from lambda functions. I have a partial souldtion that gets the objects from s3 extracts them and puts them in a s3 object without using the lambda local storage.
Extract object Thread
public class ExtractObject implements Runnable{
private String objectName;
private String uuid;
private final byte[] buffer = new byte[1024];
public ExtractAdvert(String name, String uuid) {
this.objectName= name;
this.uuid= uuid;
}
#Override
public void run() {
final String srcBucket = "my-bucket-name";
final AmazonS3 s3Client = new AmazonS3Client();
try {
S3Object s3Object = s3Client.getObject(new GetObjectRequest(srcBucket, objectName));
ZipInputStream zis = new ZipInputStream(s3Object.getObjectContent());
ZipEntry entry = zis.getNextEntry();
while(entry != null) {
String fileName = entry.getName();
String mimeType = FileMimeType.fromExtension(FilenameUtils.getExtension(fileName)).mimeType();
System.out.println("Extracting " + fileName + ", compressed: " + entry.getCompressedSize() + " bytes, extracted: " + entry.getSize() + " bytes, mimetype: " + mimeType);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
int len;
while ((len = zis.read(buffer)) > 0) {
outputStream.write(buffer, 0, len);
}
InputStream is = new ByteArrayInputStream(outputStream.toByteArray());
ObjectMetadata meta = new ObjectMetadata();
meta.setContentLength(outputStream.size());
meta.setContentType(mimeType);
System.out.println("##### " + srcBucket + ", " + FilenameUtils.getFullPath(objectName) + "tmp" + File.separator + uuid + File.separator + fileName);
// Add this to tar archive instead of putting back to s3
s3Client.putObject(srcBucket, FilenameUtils.getFullPath(objectName) + "tmp" + File.separator + uuid + File.separator + fileName, is, meta);
is.close();
outputStream.close();
entry = zis.getNextEntry();
}
zis.closeEntry();
zis.close();
} catch (IOException ioe) {
System.out.println(ioe.getMessage());
}
}
}
this runs for each object that needs to be extracted and saves them in a s3 object in the structure needed for the tar file.
I think what i need is instead of putting the object back to s3 is to keep it in memory and add it to a tar archive. and upload that but after a lot of looking around and trial and error i have not created a successful tar file.
The main issue is i can't use the tmp directory in lambda.
Edit
should i be creating the tar file as i go instead of putting objects to s3? (see comment // Add this to tar archive instead of putting back to s3)
if so how do i create a tar stream without a storing it locally?
EDIT 2: Attempt at taring the files
ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(bucketName);
ListObjectsV2Result result;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
TarArchiveOutputStream tarOut = new TarArchiveOutputStream(baos);
do {
result = s3Client.listObjectsV2(req);
for (S3ObjectSummary objectSummary : result.getObjectSummaries()) {
if(objectSummary.getKey().startsWith("tmp/") ) {
System.out.printf(" - %s (size: %d)\n", objectSummary.getKey(), objectSummary.getSize());
S3Object s3Object = s3Client.getObject(new GetObjectRequest(bucketName, objectSummary.getKey()));
InputStream is = s3Object.getObjectContent();
System.out.println("Pre Create entry");
TarArchiveEntry archiveEntry = new TarArchiveEntry(IOUtils.toByteArray(is));
// Getting following exception above
// IllegalArgumentException: Invalid byte 111 at offset 7 in ' positio' len=8
System.out.println("Pre put entry");
tarOut.putArchiveEntry(archiveEntry);
System.out.println("Post put entry");
}
}
String token = result.getNextContinuationToken();
System.out.println("Next Continuation Token: " + token);
req.setContinuationToken(token);
} while (result.isTruncated());
ObjectMetadata metadata = new ObjectMetadata();
InputStream is = new ByteArrayInputStream(baos.toByteArray());
s3Client.putObject(new PutObjectRequest(bucketName, bucketFolder + "tar-file", is, metadata));

I have found a solution to this and it very similar to my attempt in Edit 2 above.
private final String bucketName = "bucket-name";
private final String bucketFolder = "tmp/";
private final String tarKey = "tar-dir/tared-file.tar";
private void createTar() throws IOException, ArchiveException {
ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(bucketName);
ListObjectsV2Result result;
ByteArrayOutputStream baos = new ByteArrayOutputStream();
TarArchiveOutputStream tarOut = new TarArchiveOutputStream(baos);
do {
result = s3Client.listObjectsV2(req);
for (S3ObjectSummary objectSummary : result.getObjectSummaries()) {
if (objectSummary.getKey().startsWith(bucketFolder)) {
S3Object s3Object = s3Client.getObject(new GetObjectRequest(bucketName, objectSummary.getKey()));
InputStream is = s3Object.getObjectContent();
String s3Key = objectSummary.getKey();
String tarPath = s3Key.substring(s3Key.indexOf('/') + 1, s3Key.length());
s3Key.lastIndexOf('.'));
byte[] ba = IOUtils.toByteArray(is);
TarArchiveEntry archiveEntry = new TarArchiveEntry(tarPath);
archiveEntry.setSize(ba.length);
tarOut.putArchiveEntry(archiveEntry);
tarOut.write(ba);
tarOut.closeArchiveEntry();
}
}
String token = result.getNextContinuationToken();
System.out.println("Next Continuation Token: " + token);
req.setContinuationToken(token);
} while (result.isTruncated());
ObjectMetadata metadata = new ObjectMetadata();
InputStream is = baos.toInputStream();
metadata.setContentLength(baos.size());
s3Client.putObject(new PutObjectRequest(bucketName, tarKey, is, metadata));
}

Related

In Groovy, how to properly get the file from HttpServletRequest

I am writing a REST API in Groovy script that will receive a file upload from client side.
The REST API will receive the file via HttpServletRequest.
I am trying to get the file from HttpServletRequest by getting its InputStream, then convert it to File to save to proper folder.
My code is as below:
RestApiResponse doHandle(HttpServletRequest request, RestApiResponseBuilder apiResponseBuilder, RestAPIContext context) {
InputStream inputStream = request.getInputStream()
def file = new File(tempFolder + "//" + fileName)
FileOutputStream outputStream = null
try
{
outputStream = new FileOutputStream(file, false)
int read;
byte[] bytes = new byte[DEFAULT_BUFFER_SIZE];
while ((read = inputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, read);
}
}
finally {
if (outputStream != null) {
outputStream.close();
}
}
inputStream.close();
// the rest of the code
}
The files are created, but all of them are corrupted.
When I try to open them with Notepad, all of them have, at the beginning, some thing similar to the below:
-----------------------------134303111730200325402357640857
Content-Disposition: form-data; name="pbUpload1"; filename="Book1.xlsx"
Content-Type: application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
Am I doing this wrong? How do I get the file correctly?

Found the solution with MultipartStream
import org.apache.commons.fileupload.MultipartStream
import org.apache.commons.io.FileUtils
InputStream inputStream = request.getInputStream()
//file << inputStream;
String fileName = "";
final String CD = "Content-Disposition: "
MultipartStream multipartStream = new MultipartStream(inputStream, boundary);
//Block below line because it always return false for some reason
// but should be used as stated in document
//boolean nextPart = multipartStream.skipPreamble();
//Block below line as in my case, the part I need is at the first part
// or maybe I should use it and break after successfully get the file name
//while(nextPart) {
String[] headers = multipartStream.readHeaders().split("\\r\\n")
ContentDisposition cd = null
for (String h in headers) {
if (h.startsWith(CD)) {
cd = new ContentDisposition(h.substring(CD.length()));
fileName = cd.getParameter("filename"); }
}
def file = new File(tempFolder + "//" + fileName)
ByteArrayOutputStream output = new ByteArrayOutputStream(1024)
try
{
multipartStream.readBodyData(output)
FileUtils.writeByteArrayToFile(file, output.toByteArray());
}
finally {
if (output != null) {
output.flush();
output.close();
}
}
inputStream.close();

Amazon sdk dowload large zip file from s3 bucket in spring boot java

What I did was trying to download large data zip file from s3 bucket
S3ObjectInputStream inputStreams = s3object.getObjectContent();
File newFile = new File(zipFileTempLocation + File.separator + CommonConstant.FILE_NAME);
FileOutputStream fileOutputStream = new FileOutputStream(newFile);
GZIPInputStream gzipInputStream = new GZIPInputStream(inputStream);
LOGGER.info("staring to write {}", newFile.toPath());
byte[] buffer = new byte[5000];
int le
while ((len = gzipInputStream.read(buffer)) > 0) {
fileOutputStream.write(buffer, 0, len);
}
gzipInputStream.close();
String newFileURl = newFile.getAbsolutePath();
Path path = Paths.get(url);
return Files.readAllBytes(path);
}
when I trying to run my service it says out of heap memory error. Can you help me with this ?

You can have a look at this answer and stream it so it's not loaded into memory.
#GetMapping(value = "/downloadfile/**", produces = { MediaType.APPLICATION_OCTET_STREAM_VALUE })
public ResponseEntity<S3ObjectInputStream> downloadFile(HttpServletRequest request) {
//reads the content from S3 bucket and returns a S3ObjectInputStream
S3Object object = publishAmazonS3.getObject("12345bucket", "/logs/file1.log");
S3ObjectInputStream finalObject = object.getObjectContent();
final StreamingResponseBody body = outputStream -> {
int numberOfBytesToWrite = 0;
byte[] data = new byte[1024];
while ((numberOfBytesToWrite = finalObject.read(data, 0, data.length)) != -1) {
System.out.println("Writing some bytes..");
outputStream.write(data, 0, numberOfBytesToWrite);
}
finalObject.close();
};
return new ResponseEntity<>(body, HttpStatus.OK);
}

Is there any way to upload extracted zip file using "java.util.zip" to AWS-S3 using multipart upload (Java high level API)

Need to upload a large file to AWS S3 using multipart-upload using stream instead of using /tmp of lambda.The file is uploaded but not uploading completely.
In my case the size of each file in zip cannot be predicted, may be a file goes up to 1 Gib of size.So I used ZipInputStream to read from S3 and I want to upload it back to S3.Since I am working on lambda, I cannot save the file in /tmp of lambda due to the large file size.So I tried to read and upload directly to S3 without saving in /tmp using S3-multipart upload.
But I faced an issue that the file is not writing completely.I suspect that the file is overwritten every time. Please review my code and help.
public void zipAndUpload {
byte[] buffer = new byte[1024];
try{
File folder = new File(outputFolder);
if(!folder.exists()){
folder.mkdir();
}
AmazonS3 s3Client = AmazonS3ClientBuilder.defaultClient();
S3Object object = s3Client.getObject("mybucket.s3.com","MyFilePath/MyZip.zip");
TransferManager tm = TransferManagerBuilder.standard()
.withS3Client(s3Client)
.build();
ZipInputStream zis =
new ZipInputStream(object.getObjectContent());
ZipEntry ze = zis.getNextEntry();
while(ze!=null){
String fileName = ze.getName();
System.out.println("ZE " + ze + " : " + fileName);
File newFile = new File(outputFolder + File.separator + fileName);
if (ze.isDirectory()) {
System.out.println("DIRECTORY" + newFile.mkdirs());
}
else {
filePaths.add(newFile);
int len;
while ((len = zis.read(buffer)) > 0) {
ObjectMetadata meta = new ObjectMetadata();
meta.setContentLength(len);
InputStream targetStream = new ByteArrayInputStream(buffer);
PutObjectRequest request = new PutObjectRequest("mybucket.s3.com", fileName, targetStream ,meta);
request.setGeneralProgressListener(new ProgressListener() {
public void progressChanged(ProgressEvent progressEvent) {
System.out.println("Transferred bytes: " + progressEvent.getBytesTransferred());
}
});
Upload upload = tm.upload(request);
}
}
ze = zis.getNextEntry();
}
zis.closeEntry();
zis.close();
System.out.println("Done");
}catch(IOException ex){
ex.printStackTrace();
}
}

The problem is your inner while loop. Basically you're reading 1024 bytes from the ZipInputStream and upload those into S3.
Instead of streaming into S3, you will overwrite the target key again and again and again.
The solution to this is a bit more complex because you don't have one stream per file but one stream per zip container.
This means you can't do something like below because the stream will be closed by AWS after the first upload is done
// Not possible
PutObjectRequest request = new PutObjectRequest(targetBucket, name,
zipInputStream, meta);
You have to write the ZipInputStream into a PipedOutputStream object - for each of the ZipEntry positions.
Below is a working example
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.GetObjectRequest;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.services.s3.model.PutObjectRequest;
import com.amazonaws.services.s3.model.S3Object;
import com.amazonaws.services.s3.transfer.TransferManager;
import com.amazonaws.services.s3.transfer.TransferManagerBuilder;
import java.io.*;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;
public class Pipes {
public static void main(String[] args) throws IOException {
Regions clientRegion = Regions.DEFAULT;
String sourceBucket = "<sourceBucket>";
String key = "<sourceArchive.zip>";
String targetBucket = "<targetBucket>";
PipedOutputStream out = null;
PipedInputStream in = null;
S3Object s3Object = null;
ZipInputStream zipInputStream = null;
try {
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withRegion(clientRegion)
.withCredentials(new ProfileCredentialsProvider())
.build();
TransferManager transferManager = TransferManagerBuilder.standard()
.withS3Client(s3Client)
.build();
System.out.println("Downloading an object");
s3Object = s3Client.getObject(new GetObjectRequest(sourceBucket, key));
zipInputStream = new ZipInputStream(s3Object.getObjectContent());
ZipEntry zipEntry;
while (null != (zipEntry = zipInputStream.getNextEntry())) {
long size = zipEntry.getSize();
String name = zipEntry.getName();
if (zipEntry.isDirectory()) {
System.out.println("Skipping directory " + name);
continue;
}
System.out.printf("Processing ZipEntry %s : %d bytes\n", name, size);
// take the copy of the stream and re-write it to an InputStream
out = new PipedOutputStream();
in = new PipedInputStream(out);
ObjectMetadata metadata = new ObjectMetadata();
metadata.setContentLength(size);
PutObjectRequest request = new PutObjectRequest(targetBucket, name, in, metadata);
transferManager.upload(request);
long actualSize = copy(zipInputStream, out, 1024);
if (actualSize != size) {
throw new RuntimeException("Filesize of ZipEntry " + name + " is wrong");
}
out.flush();
out.close();
}
} finally {
if (out != null) {
out.close();
}
if (in != null) {
in.close();
}
if (s3Object != null) {
s3Object.close();
}
if (zipInputStream != null) {
zipInputStream.close();
}
System.exit(0);
}
}
private static long copy(final InputStream input, final OutputStream output, final int buffersize) throws IOException {
if (buffersize < 1) {
throw new IllegalArgumentException("buffersize must be bigger than 0");
}
final byte[] buffer = new byte[buffersize];
int n = 0;
long count=0;
while (-1 != (n = input.read(buffer))) {
output.write(buffer, 0, n);
count += n;
}
return count;
}
}

I faced a similar problem and have solved it by utilising Java s3 sdk library. As you say key here is that since the files are large you want to "stream" the content, without keeping any data in memory or writing to disk.
I've made a library that can be used for this purpose and is available in Maven Central, here is the GitHub link: nejckorasa/s3-stream-unzip

incomplete download of file from url in java - using java NIO

I used the code provided in accepted solution of below thread to download a 500kb zip file
How to download and save a file from Internet using Java?
public static File downloadFile(String fileURL, String saveDir)
throws IOException {
File downloadFolder = null;
String saveFilePath = null;
URL url = new URL(fileURL);
ReadableByteChannel rbc = Channels.newChannel(url.openStream());
String fileName = fileURL.substring(fileURL.lastIndexOf("/") + 1, fileURL.length());
FileOutputStream fos = new FileOutputStream(fileName);
saveFilePath = saveDir + File.separator + fileName;
downloadFolder = new File(saveDir);
downloadFolder.deleteOnExit();
downloadFolder.mkdirs();
FileOutputStream outputStream = new FileOutputStream(saveFilePath);
outputStream.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
outputStream.close();
rbc.close();
return new File(saveFilePath);
}
The code is able to identify the file but the the problem is , the download is incomplete. It always downloads 5KB and stops there after.
I dont want to use apache commons file untils.

The problem was that I didn't set authentication. I set the authentication and it worked. code below
public static File downloadFile(String fileURL, String saveDir)
throws IOException {
File downloadFolder = null;
String saveFilePath = null;
String username = "guest";
String password = "guest";
String usepass = username + ":" + password;
String basicAuth = "Basic "+ javax.xml.bind.DatatypeConverter.printBase64Binary(usepass.getBytes());
URL url = new URL(fileURL);
HttpURLConnection httpConn = (HttpURLConnection) website.openConnection();
httpConn.setRequestProperty("Authorization", basicAuth);
ReadableByteChannel rbc = Channels.newChannel(httpConn.getInputStream());
String fileName = fileURL.substring(fileURL.lastIndexOf("/") + 1,fileURL.length());
FileOutputStream fos = new FileOutputStream(fileName);
saveFilePath = saveDir + File.separator + fileName;
downloadFolder = new File(saveDir);
downloadFolder.deleteOnExit();
downloadFolder.mkdirs();
FileOutputStream outputStream = new FileOutputStream(saveFilePath);
outputStream.getChannel().transferFrom(rbc, 0, Long.MAX_VALUE);
outputStream.close();
rbc.close();
return new File(saveFilePath);
}

file upload using java servlet as a service without a web browser

I am very new to java and servlet programming.
I am not sure whether it is possible to write a servlet which when passed a URL from the local client machine, uploads the file to the server.
basically on the client machine we have a C# program and on the server side we have Apache-tomcat installed. I need to upload file(s) to the server using C# program on client machine.
Should I provide any more information (?)
Thanks in Advance

Note this code illustrates the general idea and not guaranteed to work without modification.
The C# file upload part
// this code shows you how the browsers wrap the file upload request, you still can fine a way simpler code to do the same thing.
public void PostMultipleFiles(string url, string[] files)
{
string boundary = "----------------------------" + DateTime.Now.Ticks.ToString("x");
HttpWebRequest httpWebRequest = (HttpWebRequest)WebRequest.Create(url);
httpWebRequest.ContentType = "multipart/form-data; boundary=" + boundary;
httpWebRequest.Method = "POST";
httpWebRequest.KeepAlive = true;
httpWebRequest.Credentials = System.Net.CredentialCache.DefaultCredentials;
Stream memStream = new System.IO.MemoryStream();
byte[] boundarybytes =System.Text.Encoding.ASCII.GetBytes("\r\n--" + boundary +"\r\n");
string formdataTemplate = "\r\n--" + boundary + "\r\nContent-Disposition: form-data; name=\"{0}\";\r\n\r\n{1}";
string headerTemplate = "Content-Disposition: form-data; name=\"{0}\"; filename=\"{1}\"\r\n Content-Type: application/octet-stream\r\n\r\n";
memStream.Write(boundarybytes, 0, boundarybytes.Length);
for (int i = 0; i < files.Length; i++)
{
string header = string.Format(headerTemplate, "file" + i, files[i]);
//string header = string.Format(headerTemplate, "uplTheFile", files[i]);
byte[] headerbytes = System.Text.Encoding.UTF8.GetBytes(header);
memStream.Write(headerbytes, 0, headerbytes.Length);
FileStream fileStream = new FileStream(files[i], FileMode.Open,
FileAccess.Read);
byte[] buffer = new byte[1024];
int bytesRead = 0;
while ((bytesRead = fileStream.Read(buffer, 0, buffer.Length)) != 0)
{
memStream.Write(buffer, 0, bytesRead);
}
memStream.Write(boundarybytes, 0, boundarybytes.Length);
fileStream.Close();
}
httpWebRequest.ContentLength = memStream.Length;
Stream requestStream = httpWebRequest.GetRequestStream();
memStream.Position = 0;
byte[] tempBuffer = new byte[memStream.Length];
memStream.Read(tempBuffer, 0, tempBuffer.Length);
memStream.Close();
requestStream.Write(tempBuffer, 0, tempBuffer.Length);
requestStream.Close();
try
{
WebResponse webResponse = httpWebRequest.GetResponse();
Stream stream = webResponse.GetResponseStream();
StreamReader reader = new StreamReader(stream);
string var = reader.ReadToEnd();
}
catch (Exception ex)
{
response.InnerHtml = ex.Message;
}
httpWebRequest = null;
}
and to understand how the above code was written you might wanna take a look at How does HTTP file upload work?
POST /upload?upload_progress_id=12344 HTTP/1.1
Host: localhost:3000
Content-Length: 1325
Origin: http://localhost:3000
... other headers ...
Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryePkpFF7tjBAqx29L
------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="MAX_FILE_SIZE"
100000
------WebKitFormBoundaryePkpFF7tjBAqx29L
Content-Disposition: form-data; name="uploadedfile"; filename="hello.o"
Content-Type: application/x-object
... contents of file goes here ...
------WebKitFormBoundaryePkpFF7tjBAqx29L--
and finally all you have to do is to implement a servlet that can handle the file upload request, then you do whatever that you want to do with the file, take a look at this file upload tutorial
protected void processRequest(HttpServletRequest request,
HttpServletResponse response)
throws ServletException, IOException {
response.setContentType("text/html;charset=UTF-8");
// Create path components to save the file
final String path = request.getParameter("destination");
final Part filePart = request.getPart("file");
final String fileName = getFileName(filePart);
OutputStream out = null;
InputStream filecontent = null;
final PrintWriter writer = response.getWriter();
try {
out = new FileOutputStream(new File(path + File.separator
+ fileName));
filecontent = filePart.getInputStream();
int read = 0;
final byte[] bytes = new byte[1024];
while ((read = filecontent.read(bytes)) != -1) {
out.write(bytes, 0, read);
}
writer.println("New file " + fileName + " created at " + path);
LOGGER.log(Level.INFO, "File{0}being uploaded to {1}",
new Object[]{fileName, path});
} catch (FileNotFoundException fne) {
writer.println("You either did not specify a file to upload or are "
+ "trying to upload a file to a protected or nonexistent "
+ "location.");
writer.println("<br/> ERROR: " + fne.getMessage());
LOGGER.log(Level.SEVERE, "Problems during file upload. Error: {0}",
new Object[]{fne.getMessage()});
} finally {
if (out != null) {
out.close();
}
if (filecontent != null) {
filecontent.close();
}
if (writer != null) {
writer.close();
}
}
}
private String getFileName(final Part part) {
final String partHeader = part.getHeader("content-disposition");
LOGGER.log(Level.INFO, "Part Header = {0}", partHeader);
for (String content : part.getHeader("content-disposition").split(";")) {
if (content.trim().startsWith("filename")) {
return content.substring(
content.indexOf('=') + 1).trim().replace("\"", "");
}
}
return null;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Create Tar archive from directory on S3 using AWS Lambda - java

Related

In Groovy, how to properly get the file from HttpServletRequest

Amazon sdk dowload large zip file from s3 bucket in spring boot java

Is there any way to upload extracted zip file using "java.util.zip" to AWS-S3 using multipart upload (Java high level API)

incomplete download of file from url in java - using java NIO

file upload using java servlet as a service without a web browser

Categories

Resources