Read Zip file content without extracting in java - java

I have byte[] zipFileAsByteArray
This zip file has rootDir --|
| --- Folder1 - first.txt
| --- Folder2 - second.txt
| --- PictureFolder - image.png
What I need is to get two txt files and read them, without saving any files on disk. Just do it in memory.
I tried something like this:
ByteArrayInputStream bis = new ByteArrayInputStream(processZip);
ZipInputStream zis = new ZipInputStream(bis);
Also I will need to have separate method go get picture. Something like this:
public byte[]image getImage(byte[] zipContent);
Can someone help me with idea or good example how to do that ?

Here is an example:
public static void main(String[] args) throws IOException {
ZipFile zip = new ZipFile("C:\\Users\\mofh\\Desktop\\test.zip");
for (Enumeration e = zip.entries(); e.hasMoreElements(); ) {
ZipEntry entry = (ZipEntry) e.nextElement();
if (!entry.isDirectory()) {
if (FilenameUtils.getExtension(entry.getName()).equals("png")) {
byte[] image = getImage(zip.getInputStream(entry));
//do your thing
} else if (FilenameUtils.getExtension(entry.getName()).equals("txt")) {
StringBuilder out = getTxtFiles(zip.getInputStream(entry));
//do your thing
}
}
}
}
private static StringBuilder getTxtFiles(InputStream in) {
StringBuilder out = new StringBuilder();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line;
try {
while ((line = reader.readLine()) != null) {
out.append(line);
}
} catch (IOException e) {
// do something, probably not a text file
e.printStackTrace();
}
return out;
}
private static byte[] getImage(InputStream in) {
try {
BufferedImage image = ImageIO.read(in); //just checking if the InputStream belongs in fact to an image
ByteArrayOutputStream baos = new ByteArrayOutputStream();
ImageIO.write(image, "png", baos);
return baos.toByteArray();
} catch (IOException e) {
// do something, it is not a image
e.printStackTrace();
}
return null;
}
Keep in mind though I am checking a string to diferentiate the possible types and this is error prone. Nothing stops me from sending another type of file with an expected extension.

You can do something like:
public static void main(String args[]) throws Exception
{
//bis, zis as you have
try{
ZipEntry file;
while((file = zis.getNextEntry())!=null) // get next file and continue only if file is not null
{
byte b[] = new byte[(int)file.getSize()]; // create array to read.
zis.read(b); // read bytes in b
if(file.getName().endsWith(".txt")){
// read files. You have data in `b`
}else if(file.getName().endsWith(".png")){
// process image
}
}
}
finally{
zis.close();
}
}

You can use below code.
But need to make sure that you S3 Bucket initial setup.
import com.amazonaws.AmazonServiceException;
import com.amazonaws.SdkClientException;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.GetObjectRequest;
import com.amazonaws.services.s3.model.ObjectMetadata;
import com.amazonaws.services.s3.model.ResponseHeaderOverrides;
import com.amazonaws.services.s3.model.S3Object;
import java.io.*;
import static com.amazonaws.regions.Regions.US_EAST_1;
public class GetObject2 {
public static void main(String[] args) throws IOException {
String bucketName = "Give Yout Bucket Name";
String key = "Give your String Key";
S3Object fullObject = null, objectPortion = null, headerOverrideObject = null;
try {
AmazonS3 s3Client = AmazonS3ClientBuilder.standard()
.withRegion(US_EAST_1)
.withCredentials(new ProfileCredentialsProvider())
.build();
// Get an object and print its contents.
System.out.println("Downloading an object");
fullObject = s3Client.getObject(new GetObjectRequest(bucketName, key));
System.out.println("Content-Type: " + fullObject.getObjectMetadata().getContentType());
System.out.println("Content: ");
displayTextInputStream(fullObject.getObjectContent());
File localFile = new File("C:\\awstest.zip");
ObjectMetadata object = s3Client.getObject(new GetObjectRequest(bucketName, key), localFile);
// Get a range of bytes from an object and print the bytes.
GetObjectRequest rangeObjectRequest = new GetObjectRequest(bucketName, key)
.withRange(0, 9);
objectPortion = s3Client.getObject(rangeObjectRequest);
System.out.println("Printing bytes retrieved.");
displayTextInputStream(objectPortion.getObjectContent());
// Get an entire object, overriding the specified response headers, and print the object's content.
ResponseHeaderOverrides headerOverrides = new ResponseHeaderOverrides()
.withCacheControl("No-cache")
.withContentDisposition("attachment; filename=example.txt");
GetObjectRequest getObjectRequestHeaderOverride = new GetObjectRequest(bucketName, key)
.withResponseHeaders(headerOverrides);
headerOverrideObject = s3Client.getObject(getObjectRequestHeaderOverride);
displayTextInputStream(headerOverrideObject.getObjectContent());
} catch (AmazonServiceException e) {
// The call was transmitted successfully, but Amazon S3 couldn't process
// it, so it returned an error response.
e.printStackTrace();
} catch (SdkClientException e) {
// Amazon S3 couldn't be contacted for a response, or the client
// couldn't parse the response from Amazon S3.
e.printStackTrace();
} finally {
// To ensure that the network connection doesn't remain open, close any open input streams.
if (fullObject != null) {
fullObject.close();
}
if (objectPortion != null) {
objectPortion.close();
}
if (headerOverrideObject != null) {
headerOverrideObject.close();
}
}
}
static void displayTextInputStream(InputStream input) throws IOException {
// Read the text input stream one line at a time and display each line.
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String line = null;
while ((line = reader.readLine()) != null) {
System.out.println(line);
}
System.out.println();
}
}

Related

Get Base64 encoded GZipped String from RabbitMQ with Java

I've implemented some GzipUtil which works pretty good and looks like this:
import com.sun.org.apache.xml.internal.security.exceptions.Base64DecodingException;
import com.sun.org.apache.xml.internal.security.utils.Base64;
import java.io.BufferedReader;
import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.nio.charset.StandardCharsets;
import java.util.zip.GZIPInputStream;
public class GzipUtil {
public static void unzip(String putBase64EncodedGzippedStringHere) throws Base64DecodingException {
byte[] compressed = Base64.decode(putBase64EncodedGzippedStringHere);
if ((compressed == null) || (compressed.length == 0)) {
throw new IllegalArgumentException("Cannot unzip null or empty bytes");
}
if (!isZipped(compressed)) {
System.out.println(compressed);
}
try (ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream(compressed)) {
try (GZIPInputStream gzipInputStream = new GZIPInputStream(byteArrayInputStream)) {
try (InputStreamReader inputStreamReader =
new InputStreamReader(gzipInputStream, StandardCharsets.UTF_8)) {
try (BufferedReader bufferedReader = new BufferedReader(inputStreamReader)) {
StringBuilder output = new StringBuilder();
String line;
while ((line = bufferedReader.readLine()) != null) {
output.append(line);
System.out.println(output.toString());
}
}
}
}
} catch (IOException e) {
throw new RuntimeException("Failed to unzip content", e);
}
}
public static boolean isZipped(final byte[] compressed) {
return (compressed[0] == (byte) (GZIPInputStream.GZIP_MAGIC))
&& (compressed[1] == (byte) (GZIPInputStream.GZIP_MAGIC >> 8));
}
}
Now I've got some other code that consumes a RabbitMQ queue like this:
ConnectionFactory rabbitMqConnectionFactory = new ConnectionFactory();
rabbitMqConnectionFactory.setHost("MyHostname");
rabbitMqConnectionFactory.setPort(5672);
rabbitMqConnectionFactory.setUsername("MyUsername");
rabbitMqConnectionFactory.setPassword("MyPassword");
rabbitMqConnectionFactory.setVirtualHost("MyVirtualHost");
Connection physicalSocketConnectionToRabbitMq = rabbitMqConnectionFactory.newConnection();
Channel messageChannel = physicalSocketConnectionToRabbitMq.createChannel();
// if the queue already exists, it won't do anything, it just skips the operation
messageChannel.queueDeclare(
"MyQueueName", //queue
true, //durable
false, //exclusive
false, //autoDelete
null //arguments
);
DeliverCallback deliverCallback = new DeliverCallback() {
public void handle(String consumerTag, Delivery message) throws IOException {
System.out.println("consumerTag=" + consumerTag);
System.out.println("exchangeName=" + message.getEnvelope().getExchange() );
System.out.println("routingKey=" + message.getEnvelope().getRoutingKey() );
System.out.println("deliveryTag=" + message.getEnvelope().getDeliveryTag() );
byte[] data = message.getBody();
if (data == null) {
System.err.println("body is null");
}
else {
System.err.println("body is not null: " + data);
try {
GzipUtil.unzip( new String(data) );
}
catch (Exception e) {
System.err.println("Exception: " + e);
}
}
}
};
messageChannel.basicConsume(
"MyQueueName",
autoAcknowledge,
deliverCallback,
new CancelCallback() {
public void handle(String consumerTag) throws IOException {
//nothing to do
}
}
);
If I run this code, the output looks like this:
body is not null: [B#5f571234
consumerTag=XXX
exchangeName=XXX
routingKey=tomato_gzip_b64
deliveryTag=1
message.getBody() returns some byte array. As I understood, I have to use new String to make it a decoded string:
byte[] decodedBytes = Base64.getDecoder().decode(encodedString);
System.out.println("decodedBytes=" + decodedBytes);
String decodedString = new String(decodedBytes);
System.out.println("decodedString=" + decodedString);
It seems like I missed something, because my GzipUtil returns nothing when I call GzipUtil.unzip( new String(data) );! Does anyone know why?
String payload = new String(message.getBody(), StandardCharsets.UTF_8);
Even if the message content is a JSON file inside a GZIP archive, the payload is still read as a string and with UTF-8 as the default charset. Then the correct content comes out the back and can be processed further.
From this string you can make a JSON file again, which can also be zipped again.
Good hint from VGA (comment above)!

Extract pdf attachment on AWS S3 using iText Java

I am using below iText Java code to extract attachments from PDF file. that work fine on local system. It extract XML file from PDF and stores on strOutputPath. I want to perform this operation on AWS S3. PDF file will on S3 and attachment should be extracted on S3. How I can use absolute path of file on S3 in this case. I used s3client.getUrl().toExternalForm(); but I get HTTP 403 error.
import java.util.Iterator;
import java.util.Set;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.File;
import com.itextpdf.text.pdf.PdfObject;
import com.itextpdf.text.pdf.PRStream;
import com.itextpdf.text.pdf.PdfArray;
import com.itextpdf.text.pdf.PdfDictionary;
import java.io.IOException;
import com.itextpdf.text.pdf.PdfName;
import com.itextpdf.text.pdf.PdfReader;
public class app
{
public static void main(final String[] args) {
try {
final String strInputPath = args[0];
final String strOutputPath = args[1];
final PdfReader pdfReader = new PdfReader(strInputPath);
final PdfDictionary catalog = pdfReader.getCatalog();
final PdfDictionary names = catalog.getAsDict(PdfName.NAMES);
final PdfDictionary embeddedFiles = names.getAsDict(PdfName.EMBEDDEDFILES);
final PdfArray embeddedFilesArray = embeddedFiles.getAsArray(PdfName.NAMES);
for (int i = 0; i < embeddedFilesArray.size(); ++i) {
final PdfDictionary FileSpec = embeddedFilesArray.getAsDict(i);
if (FileSpec != null) {
String strFileName = FileSpec.getAsString(PdfName.F).toString();
System.out.println(strFileName);
if (strFileName.endsWith(".xml")) {
strFileName = String.valueOf(System.currentTimeMillis()) + ".xml";
extractFiles(pdfReader, FileSpec, String.valueOf(strOutputPath) + strFileName);
}
}
}
}
catch (IOException e) {
e.printStackTrace();
}
}
private static void extractFiles(final PdfReader pdfReader, final PdfDictionary filespec, final String strFileName) {
final PdfDictionary refs = filespec.getAsDict(PdfName.EF);
PRStream prStream = null;
FileOutputStream outputStream = null;
final Set<PdfName> keys = (Set<PdfName>)refs.getKeys();
try {
for (final PdfName key : keys) {
prStream = (PRStream)PdfReader.getPdfObject((PdfObject)refs.getAsIndirectObject(key));
outputStream = new FileOutputStream(new File(strFileName));
outputStream.write(PdfReader.getStreamBytes(prStream));
outputStream.flush();
outputStream.close();
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
}
catch (IOException e2) {
e2.printStackTrace();
}
finally {
try {
if (outputStream != null) {
outputStream.close();
}
}
catch (IOException e3) {
e3.printStackTrace();
}
}
try {
if (outputStream != null) {
outputStream.close();
}
}
catch (IOException e3) {
e3.printStackTrace();
}
}
}
I think what you need to do is write a Java client that works on the files on your S3 bucket and performs following steps:
Downloads the required file from S3.
Extracts the attachment from the file.
Uploads the resultant files back to S3.
Sample code the perform above mentioned steps is as follows :
import java.io.*;
import java.util.Set;
import com.amazonaws.services.s3.*;
import com.amazonaws.services.s3.model.*;
import com.itextpdf.text.pdf.*;
public class S3PDFAttachmentExtractor {
public static void main(String[] args) throws IOException {
// download file from S3
AmazonS3Client amazonS3Client = new AmazonS3Client();
S3Object object = amazonS3Client.getObject("<yours3location>", "fileKey");
// write the file content to a local file.
S3ObjectInputStream objectContent = object.getObjectContent();
FileOutputStream out = new FileOutputStream("tempOutputFile.pdf");
writeToFile(objectContent, out);
// Extract attachment from the downloaded file.
extractAttachment("tempOutputFile.pdf", "tempAttachement.xml");
//upload the attachment
uploadFile("<s3bucket.fully.qualified.name>", "tempAttachement.xml", "attachementNameOnS3.xml");
}
private static void writeToFile(InputStream input, FileOutputStream out) throws IOException {
// Read the text input stream one line at a time and display each line.
try (BufferedInputStream in = new BufferedInputStream(input);) {
byte[] chunk = new byte[1024];
while (in.read(chunk) > 0) {
out.write(chunk);
}
} finally {
input.close();
}
}
public static void extractAttachment(final String strInputPath, final String strOutputPath) {
try {
final PdfReader pdfReader = new PdfReader(strInputPath);
final PdfDictionary catalog = pdfReader.getCatalog();
final PdfDictionary names = catalog.getAsDict(PdfName.NAMES);
final PdfDictionary embeddedFiles = names.getAsDict(PdfName.EMBEDDEDFILES);
final PdfArray embeddedFilesArray = embeddedFiles.getAsArray(PdfName.NAMES);
for (int i = 0; i < embeddedFilesArray.size(); ++i) {
final PdfDictionary FileSpec = embeddedFilesArray.getAsDict(i);
if (FileSpec != null) {
String strFileName = FileSpec.getAsString(PdfName.F).toString();
System.out.println(strFileName);
if (strFileName.endsWith(".xml")) {
strFileName = String.valueOf(System.currentTimeMillis()) + ".xml";
extractFiles(pdfReader, FileSpec, String.valueOf(strOutputPath) + strFileName);
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
private static void extractFiles(final PdfReader pdfReader, final PdfDictionary filespec, final String strFileName) {
final PdfDictionary refs = filespec.getAsDict(PdfName.EF);
PRStream prStream = null;
FileOutputStream outputStream = null;
final Set<PdfName> keys = (Set<PdfName>) refs.getKeys();
try {
for (final PdfName key : keys) {
prStream = (PRStream) PdfReader.getPdfObject((PdfObject) refs.getAsIndirectObject(key));
outputStream = new FileOutputStream(new File(strFileName));
outputStream.write(PdfReader.getStreamBytes(prStream));
outputStream.flush();
outputStream.close();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e2) {
e2.printStackTrace();
} finally {
try {
if (outputStream != null) {
outputStream.close();
}
} catch (IOException e3) {
e3.printStackTrace();
}
}
try {
if (outputStream != null) {
outputStream.close();
}
} catch (IOException e3) {
e3.printStackTrace();
}
}
private static void uploadFile(String bucketFullPath, String fileLocation, String fileName) throws IOException {
AmazonS3Client amazonS3Client = new AmazonS3Client();
InputStream bis = new FileInputStream(fileLocation);
ObjectMetadata objectMetadata = new ObjectMetadata();
objectMetadata.setContentType("application/xml");
amazonS3Client.putObject(bucketFullPath, fileName, bis, objectMetadata);
}
}
Please note that a better way to do this type of thing is to write a AWS Lambda function in Java using the above code. Since AWS Lambada can be easily configured to process events from S3 Storage, your code will automatically get invoked when a file is written or modified in S3 bucket. For further details you can check the AWS Lambda Documentation
Edit:
Another alternative is - If you are running the Java code on AWS EC2, then there is a way to mount a S3 bucket as a file System. This will allow you access files as if these files are stored locally, And your original code will work. But this approach will work only on AWS EC2 environment.

Issue with unpacking/decrypt password protected (AES 256) 7z file in java using apache compress/org.tukaani.xz

Getting org.tukaani.xz.CorruptedInputException: Compressed data is corrupt error while trying to decrypt a password protected (AES 256) 7z file. Whereas without password protected 7z file getting unpack without any issue. Both the cases same xls file being compressed.
I am using Apache commons compress and org.tukaani.xz.
sample code for reference.
package com.concept.utilities.zip;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.lang.reflect.Field;
import org.apache.commons.compress.archivers.sevenz.SevenZArchiveEntry;
import org.apache.commons.compress.archivers.sevenz.SevenZFile;
import org.apache.poi.hssf.usermodel.HSSFWorkbook;
import org.apache.poi.ss.usermodel.Workbook;
public class DecryptionUtil {
static {
try {
Field field = Class.forName("javax.crypto.JceSecurity").getDeclaredField("isRestricted");
field.setAccessible(true);
field.set(null, java.lang.Boolean.FALSE);
} catch (Exception ex) {
}
}
public void SevenZFile(String directory, String encryptCompressFileName, String password) {
SevenZFile sevenZFile = null;
SevenZArchiveEntry entry = null;
try {
File file = new File(directory+encryptCompressFileName);
byte[] inputData = new byte[(int) file.length()];
FileInputStream fis = new FileInputStream(file);
fis.read(inputData);
fis.close();
// SeekableInMemoryByteChannel inMemoryByteChannel = new SeekableInMemoryByteChannel(inputData);
if(null != password){
byte[] pass = password.getBytes("UTF16");
sevenZFile = new SevenZFile(file, pass);
}else{
sevenZFile = new SevenZFile(file);
}
// Go through all entries
while (null != (entry = sevenZFile.getNextEntry())) {
// Maybe filter by name. Name can contain a path.
String processingFileName = entry.getName();
if (entry.isDirectory()) {
System.out.println(String.format("Found directory entry %s", processingFileName));
} else {
// If this is a file, we read the file content into a ByteArrayOutputStream ...
System.out.println(String.format("Unpacking start %s ...", processingFileName));
ByteArrayOutputStream contentBytes = new ByteArrayOutputStream();
// ... using a small buffer byte array.
byte[] buffer = new byte[2048];
int bytesRead;
while ((bytesRead = sevenZFile.read(buffer)) != -1) {
contentBytes.write(buffer, 0, bytesRead);
}
if (processingFileName.endsWith("xls")) {
// Writing into xls
Workbook wb = new HSSFWorkbook();
//String safeName = WorkbookUtil.createSafeSheetName(processingFileName);
//Sheet sheet = wb.createSheet(safeName);
FileOutputStream fileOut = new FileOutputStream(directory+processingFileName);
fileOut.write(contentBytes.toByteArray());
fileOut.flush();
wb.write(fileOut);
fileOut.close();
wb.close();
}else{ //regular file
System.out.println(contentBytes.toString("UTF-8"));
}
System.out.println(String.format("Unpacking finish %s ...", processingFileName));
}
}
} catch (Exception e) {
e.printStackTrace();
} finally {
try {
sevenZFile.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) {
DecryptionUtil decrypt = new DecryptionUtil();
decrypt.SevenZFile("H:\\archives\\", "StudentsWoPassword.7z", null);
decrypt.SevenZFile("H:\\archives\\", "StudentsWithPassAES256.7z", "test");
}
}
StudentsWoPassword.7z successfully unpacked but StudentsWithPassAES256.7z throws exception.
Unpacking start Students.xls ...
Unpacking finish Students.xls ...
org.tukaani.xz.CorruptedInputException: Compressed data is corrupt
at org.tukaani.xz.rangecoder.RangeDecoderFromStream.<init>(Unknown Source)
at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source)
at org.tukaani.xz.LZMAInputStream.initialize(Unknown Source)
at org.tukaani.xz.LZMAInputStream.<init>(Unknown Source)
at org.apache.commons.compress.archivers.sevenz.LZMADecoder.decode(LZMADecoder.java:43)
at org.apache.commons.compress.archivers.sevenz.Coders.addDecoder(Coders.java:76)
at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecoderStack(SevenZFile.java:933)
at org.apache.commons.compress.archivers.sevenz.SevenZFile.buildDecodingStream(SevenZFile.java:909)
at org.apache.commons.compress.archivers.sevenz.SevenZFile.getNextEntry(SevenZFile.java:222)
at com.concept.utilities.zip.DecryptionUtil.SevenZFile(DecryptionUtil.java:50)
at com.concept.utilities.zip.DecryptionUtil.main(DecryptionUtil.java:107)
Am I missing something? Is there any other way I can extract AES256 7z?
Your code is fine, you are just using the wrong charset/encoding when extracting bytes from the password. The SevenZFile class expects UTF-16 in little endian so you have to use UTF-16LE rather than UTF-16 (which will use big endian when encoding data).

Cannot read data from file

I am trying to read values from CSV file which is present in package com.example.
But when i run code with the following syntax:
DataModel model = new FileDataModel(new File("Dataset.csv"));
It says:
java.io.FileNotFoundException:Dataset.csv
I have also tried using:
DataModel model = new FileDataModel(new File("/com/example/Dataset.csv"));
Still not working.
Any help would be helpful.
Thanks.
If this is the FileDataModel from org.apache.mahout.cf.taste.impl.model.file then it can't take an input stream and needs just a file. The problem is you can't assume the file is available to you that easily (see answer to this question).
It might be better to read the contents of the file and save it to a temp file, then pass that temp file to FileDataModel.
InputStream initStream = getClass().getClasLoader().getResourceAsStream("Dataset.csv");
//simplistic approach is to put all the contents of the file stream into memory at once
// but it would be smarter to buffer and do it in chunks
byte[] buffer = new byte[initStream.available()];
initStream.read(buffer);
//now save the file contents in memory to a temporary file on the disk
//choose your own temporary location - this one is typical for linux
String tempFilePath = "/tmp/Dataset.csv";
File tempFile = new File(tempFilePath);
OutputStream outStream = new FileOutputStream(tempFile);
outStream.write(buffer);
DataModel model = new FileDataModel(new File(tempFilePath));
...
public class ReadCVS {
public static void main(String[] args) {
ReadCVS obj = new ReadCVS();
obj.run();
}
public void run() {
String csvFile = "file path of csv";
BufferedReader br = null;
String line = "";
String cvsSplitBy = ",";
try {
br = new BufferedReader(new FileReader(csvFile));
while ((line = br.readLine()) != null) {
// Do stuff here
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
} finally {
if (br != null) {
try {
br.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
System.out.println("Done");
}
}
CSV file which is present in package com.example
You can use getResource() or getResourceAsStream() to access the resource from within the package. For example
InputStream is = getClass().getResourceAsStream("/com/example/Dataset.csv");//uses absolute (package root) path
BufferedReader br = new BufferedReader(new InputStreamReader(is));
//read from BufferedReader
(note exception handling and file closing are omitted above for brevity)

preCalculate file stream checksum

i'm trying to ensure an output File integrity in case of disk out of space , network problem ,or any anyException that might occur during the streaming to file process .
is there a way to precalculate the FileStream checkSum before writing to disk then check if the file was written properly.
it sounds a bit nonsensical for me , that a system validates the integrity of its own exported XML through checkSum , normaly it's the job of the other end to verify if the the consumed file lives up to the file produced by the other system .
but it's a requirement i have to implement.
her's the stream i write as a file :
String xmlTransfer ="";
File testFile = new File("testFile.xml");
InputStream in = new ByteArrayInputStream(xmlTransfer.getBytes("utf-8"));
FileOutputStream out = new FileOutputStream(testFile)
byte[] buffer = new byte[2048];
int bytesRead;
while ((bytesRead = in.read(buffer)) != -1) {
out.write(buffer, 0, bytesRead);
}
out.close();
in.close();
No, you can't figure out how much data will come from a stream in advance. That's simply not how streams are meant to work.
What you could do, if you are writing both ends of the code, is to first calculate the file size on the sending end and send that before sending the file contents itself.
The best way is to catch exception. If something go wrong an exception will be launched and you could remove the partially written file in this case.
A second way is to have a in-memory stream before writing down to the filesystem but it consumes memory.
A third way is to ensure the destination disk capacity (new File(path).getFreeSpace())
The MD5 check sounds too slow for me in regards of the question.
try this :
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.ByteArrayInputStream;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.OutputStream;
import java.security.MessageDigest;
public class CheckSumFileTest
{
private File buildChecksumFile(File fileToCheck, String filePrefix, String checksumAlgorithm) throws Exception
{
String checksum = null;
File checksumFile = null;
String tempDir = System.getProperty("java.io.tmpdir");
try {
checksumFile = new File(tempDir, filePrefix+"."+ checksumAlgorithm.toLowerCase());
checksumFile.createNewFile();
checksumFile.deleteOnExit();
} catch (Exception e1) {
e1.printStackTrace();
throw e1;
}
FileWriter fw = null;
try {
checksum = checkSum(fileToCheck,checksumAlgorithm);
fw = new FileWriter(checksumFile);
fw.write(checksum);
} catch (Exception e) {
e.printStackTrace();
throw e;
}
finally
{
if(fw !=null)
fw.close();
}
return checksumFile;
}
private static String checkSum(File file, String checksumAlgorithm) throws Exception
{
MessageDigest digest = MessageDigest.getInstance(checksumAlgorithm);
InputStream input = null;
StringBuffer sb = new StringBuffer();
try{
input = new FileInputStream(file);
byte[] buffer = new byte[8192];
do {
int read = input.read(buffer);
if(read <= 0)
break;
digest.update(buffer, 0, read);
} while(true);
byte[] sum = digest.digest();
for (int i = 0; i < sum.length; i++) {
sb.append(Integer.toString((sum[i] & 0xff) + 0x100, 16).substring(1));
}
}catch(IOException io)
{
}finally{
if(input != null)
input.close();
}
return sb.toString();
}
private static String checkSumInStream(InputStream stream, String checksumAlgorithm) throws Exception
{
MessageDigest digest = MessageDigest.getInstance(checksumAlgorithm);
InputStream input = null;
StringBuffer sb = new StringBuffer();
try{
input = stream;
byte[] buffer = new byte[8192];
do {
int read = input.read(buffer);
if(read <= 0)
break;
digest.update(buffer, 0, read);
} while(true);
byte[] sum = digest.digest();
for (int i = 0; i < sum.length; i++) {
sb.append(Integer.toString((sum[i] & 0xff) + 0x100, 16).substring(1));
}
}catch(IOException io)
{
}finally{
if(input != null)
input.close();
}
return sb.toString();
}
private boolean checkIntegrity(String targetFileName, String checksumFileName, String checksumAlgorithm) throws Exception
{
FileInputStream stream = null;
BufferedReader br = null;
InputStreamReader ipsr = null;
File checksumFile = null;
String checksumString="";
File targetFile = new File(targetFileName);
try{
checksumFile = new File(checksumFileName);
stream = new FileInputStream(checksumFile);
ipsr = new InputStreamReader(stream);
br = new BufferedReader(ipsr);
//In checksum file : only one line to read
checksumString = br.readLine();
}finally
{
if(br != null)
br.close();
if(ipsr != null)
ipsr.close();
if(stream != null)
stream.close();
}
if(checksumString.equals(checkSum(targetFile,checksumAlgorithm)))
{
return true;
}
else
{
return false;
}
}
/**
* #param args
*/
public static void main(String[] args)
{
String str = "Amine";
InputStream stream = new ByteArrayInputStream(str.getBytes());
//step1
try {
System.out.println(checkSumInStream(stream,"MD5"));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
//step2
File file = new File("c:/test.txt");
// if file doesnt exists, then create it
if (!file.exists()) {
try {
file.createNewFile();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
FileWriter fw;
BufferedWriter bw;
try {
fw = new FileWriter(file.getAbsoluteFile());
bw = new BufferedWriter(fw);
bw.write(str);
bw.close();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
try {
System.out.println(checkSum(file, "MD5"));
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
System.out.println("Done");
}
}
You should check by MD5, not file size
You can calculate your MD5 while you're reading the stream.
See https://stackoverflow.com/a/304350/3230038
Then, after saving the file, you can generate the md5 again and compare
UPDATE - here's my more detailed idea for this. I am assuming that you just want to calculate the MD5 without having to bring the whole byte[] into memory. In this case, I think you have 2 options
calculate MD5 on the fly, as you're saving, then after saving, check md5 again (if you're on linux you can just use md5sum)
calculate MD5 in a first pass, then save the file in a second pass.
for example
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.security.DigestInputStream;
import java.security.MessageDigest;
import java.security.NoSuchAlgorithmException;
import org.apache.commons.io.IOUtils;
import org.apache.commons.io.output.NullOutputStream;
public class MD5OnTheFly {
/**
* #param args
* #throws NoSuchAlgorithmException
* #throws IOException
*/
public static void main(String[] args) throws NoSuchAlgorithmException, IOException {
long ini = System.currentTimeMillis();
File file = new File("/home/leoks/Downloads/VirtualBox-4.3.0.tar");
System.out.println("size:"+file.length());
InputStream is = new FileInputStream(file);
MessageDigest md = MessageDigest.getInstance("MD5");
DigestInputStream dis = new DigestInputStream(is, md);
IOUtils.copy(dis, new NullOutputStream());
byte[] digest = md.digest();
StringBuffer hexString = new StringBuffer();
for (int i = 0; i < digest.length; i++) {
String hex = Integer.toHexString(0xff & digest[i]);
if (hex.length() == 1)
hexString.append('0');
hexString.append(hex);
}
System.out.println(hexString);
long end = System.currentTimeMillis();
System.out.println(end-ini+" millis");
}
}
returns
410859520
dda81aea75a83b1489662c6bcd0677e4
1413 millis
and then
[leoks#home ~]$ md5sum /home/leoks/Downloads/VirtualBox-4.3.0.tar
dda81aea75a83b1489662c6bcd0677e4 /home/leoks/Downloads/VirtualBox-4.3.0.tar
[leoks#home ~]$

Categories

Resources