How to convert List to the csv byte array safely?

How to convert List to the csv byte array safely? - java

Initially I had the following code:
Attempt 1
try (var output = new ByteArrayOutputStream();
var printer = new CSVPrinter(new OutputStreamWriter(output), CSVFormat.DEFAULT)) {
printer.printRecord(EMAIL);
for (MyBean mb : items) {
printer.printRecord(mb.getEmail());
}
externalHttpCall(output.toByteArray());
}
Here I found out that sometimes the byte array is not written fully.
I understand that it is because of the fact that stream is not flushed during externalHttpCall invocations.
To fix it I wrote the following:
Attempt 2
try (var output = new ByteArrayOutputStream();
var printer = new CSVPrinter(new OutputStreamWriter(output), CSVFormat.DEFAULT)) {
printer.printRecord(EMAIL);
for (MyBean mb : items) {
printer.printRecord(mb.getEmail());
}
printer.flush();
log.info("Printer was flushed");
externalHttpCall(output.toByteArray());
}
It solved the problem, but here I was lost in a thought that it is really bad idea to close stream only after externalHttpCall. So I came up with the following solution:
Attempt 3
externalHttpCall(convertToByteArray(items);
public byte[] convertToByteArray(List<MyBean> items){
try (var output = new ByteArrayOutputStream();
var printer = new CSVPrinter(new OutputStreamWriter(output), CSVFormat.DEFAULT)) {
printer.printRecord(EMAIL);
for (MyBean mb : items) {
printer.printRecord(mb.getEmail());
}
return output.toByteArray();
}
}
I expected that flush will happen before stream close. But based on my experiments it doesn't work. Looks like it happens because of flush happens before stream close but after toByteArray invocation.
How could I fix it?

Given the three code snippets in the question I'd assume that this should work:
externalHttpCall(convertToByteArray(items);
public byte[] convertToByteArray(List<MyBean> items){
try (var output = new ByteArrayOutputStream();
var printer = new CSVPrinter(new OutputStreamWriter(output), CSVFormat.DEFAULT)) {
printer.printRecord(EMAIL);
for (MyBean mb : items) {
printer.printRecord(mb.getEmail());
}
printer.flush()
log.info("Printer was flushed");
return output.toByteArray();
}
}
Depending on the CSVFormat the CSVPrinter is flushed automatically on close (CSVFormat.DEFAULT will not be flushed automatically...). You can use CSVFormat's builder like pattern to make the format flush on close with CSVFormat.DEFAULT.withAutoFlush(true) (thanks to #PetrBodnár for this hint). This will however probably make no difference in the above example.
If you translate the try-with-resource to the actual call order you will get something like this:
var output = new ByteArrayOutputStream();
var printer = new CSVPrinter(new OutputStreamWriter(output), CSVFormat.DEFAULT)
printer.printRecord(EMAIL);
...
var result = output.toByteArray();
printer.close(); // might call flush
output.close();
return result;
As the close operations will be called in the finally-block, they will take place after creation of the byte array. If flush is needed, you will need to do it prior to calling toByteArray.

The following is a correct usage:
var output = new ByteArrayOutputStream();
try (var printer = new CSVPrinter(
new OutputStreamWriter(output, StandardCharsets.UTF_8), CSVFormat.DEFAULT)) {
printer.printRecord(EMAIL);
for (MyBean mb : items) {
printer.printRecord(mb.getEmail());
}
}
// Everything flushed and closed.
externalHttpCall(output.toByteArray());
This error behavior might stem from something else.
For instance the externalHttpCall not flushing . Or writing the bytes as text (using a Writer i.o. OutputStream), and expecting UTF-8, whose multi-byte sequences are brittle, maybe raising an exception. Or setting the HTTP header Content-Length wrong, as String.length().
An other cause: items containing a null or getEmail throwing an exception that is not detected.
Available is also:
String s = output.toString(StandardCharsets.UTF_8);

Related

Cannot read Part's content (Flux<DataBuffer>) into a single String

In the following snippet, I'm trying to extract a file's content (sent to a given service) using Spring's Part object and convert it into a String.
The issue is that it skips the mapper function and the code inside the mapper function doesn't execute like the filePartMono's content is empty, but when I inspect the object at runtime , its storage field has the file's data.
public void parseFilePart(Part filePartMono) {
filePartMono.content().map(dataBuffer -> {
byte[] bytes = new byte[dataBuffer.readableByteCount()];
dataBuffer.read(bytes);
DataBufferUtils.release(dataBuffer);
String fileContent = new String(bytes, StandardCharsets.UTF_8);
});
}

org.springframework.http.codec.multipart.Part.content() returns a Flux<DataBuffer>, meaning nothing happens until you subscribe to this Publisher.
If your code can be executed in a blocking way without causing errors, you can refactor it like this to get the String result:
public void parseFilePart(Part filePartMono) {
List<String> parts =
filePartMono.content()
.map(dataBuffer -> {
byte[] bytes = new byte[dataBuffer.readableByteCount()];
dataBuffer.read(bytes);
DataBufferUtils.release(dataBuffer);
return new String(bytes, StandardCharsets.UTF_8);
})
.collectList()
.block();
//do what you want here with the Strings you retrieved
}
If you're sure that the Flux<DataBuffer> will always emit 1 single DataBuffer, you can replace .collectList().block() with .blockFirst() and obtain a String result instead of List<String>.
If your code can't be executed in a blocking fashion, then you could refactor it like this:
public void parseFilePart(Part filePartMono) {
filePartMono.content()
.map(dataBuffer -> {
byte[] bytes = new byte[dataBuffer.readableByteCount()];
dataBuffer.read(bytes);
DataBufferUtils.release(dataBuffer);
return new String(bytes, StandardCharsets.UTF_8);
})
.subscribe(resultString -> {
//do what you want with the result String here
});
}
P.S. I didn't test your implementation to convert DataBuffer to String, so you might have to double-check that now that it's actually invoked

Batching multiple files to Amazon S3 using the Java SDK

I'm trying to upload multiple files to Amazon S3 all under the same key, by appending the files. I have a list of file names and want to upload/append the files in that order. I am pretty much exactly following this tutorial but I am looping through each file first and uploading that in part. Because the files are on hdfs (the Path is actually org.apache.hadoop.fs.Path), I am using the input stream to send the file data. Some pseudocode is below (I am commenting the blocks that are word for word from the tutorial):
// Create a list of UploadPartResponse objects. You get one of these for
// each part upload.
List<PartETag> partETags = new ArrayList<PartETag>();
// Step 1: Initialize.
InitiateMultipartUploadRequest initRequest = new InitiateMultipartUploadRequest(
bk.getBucket(), bk.getKey());
InitiateMultipartUploadResult initResponse =
s3Client.initiateMultipartUpload(initRequest);
try {
int i = 1; // part number
for (String file : files) {
Path filePath = new Path(file);
// Get the input stream and content length
long contentLength = fss.get(branch).getFileStatus(filePath).getLen();
InputStream is = fss.get(branch).open(filePath);
long filePosition = 0;
while (filePosition < contentLength) {
// create request
//upload part and add response to our list
i++;
}
}
// Step 3: Complete.
CompleteMultipartUploadRequest compRequest = new
CompleteMultipartUploadRequest(bk.getBucket(),
bk.getKey(),
initResponse.getUploadId(),
partETags);
s3Client.completeMultipartUpload(compRequest);
} catch (Exception e) {
//...
}
However, I am getting the following error:
com.amazonaws.services.s3.model.AmazonS3Exception: The XML you provided was not well-formed or did not validate against our published schema (Service: Amazon S3; Status Code: 400; Error Code: MalformedXML; Request ID: 2C1126E838F65BB9), S3 Extended Request ID: QmpybmrqepaNtTVxWRM1g2w/fYW+8DPrDwUEK1XeorNKtnUKbnJeVM6qmeNcrPwc
at com.amazonaws.http.AmazonHttpClient.handleErrorResponse(AmazonHttpClient.java:1109)
at com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:741)
at com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:461)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:296)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3743)
at com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:2617)
If anyone knows what the cause of this error might be, that would be greatly appreciated. Alternatively, if there is a better way to concatenate a bunch of files into one s3 key, that would be great as well. I tried using java's builtin SequenceInputStream but that did not work. Any help would be greatly appreciated. For reference, the total size of all the files could be as large as 10-15 gb.

I know it's probably a bit late but worth giving my contribution.
I've managed to solve a similar problem using the SequenceInputStream.
The tricks is in being able to calculate the total size of the result file and then feeding the SequenceInputStream with an Enumeration<InputStream>.
Here's some example code that might help:
public void combineFiles() {
List<String> files = getFiles();
long totalFileSize = files.stream()
.map(this::getContentLength)
.reduce(0L, (f, s) -> f + s);
try {
try (InputStream partialFile = new SequenceInputStream(getInputStreamEnumeration(files))) {
ObjectMetadata resultFileMetadata = new ObjectMetadata();
resultFileMetadata.setContentLength(totalFileSize);
s3Client.putObject("bucketName", "resultFilePath", partialFile, resultFileMetadata);
}
} catch (IOException e) {
LOG.error("An error occurred while combining files. {}", e);
}
}
private Enumeration<? extends InputStream> getInputStreamEnumeration(List<String> files) {
return new Enumeration<InputStream>() {
private Iterator<String> fileNamesIterator = files.iterator();
#Override
public boolean hasMoreElements() {
return fileNamesIterator.hasNext();
}
#Override
public InputStream nextElement() {
try {
return new FileInputStream(Paths.get(fileNamesIterator.next()).toFile());
} catch (FileNotFoundException e) {
System.err.println(e.getMessage());
throw new RuntimeException(e);
}
}
};
}
Hope this helps!

How to sign an InputStream from a PDF file with PDFBox 2.0.0

I want to sign a InputStream from a PDF file without using a temporary file.
Here I convert InputStream to File and this work fine :
InputStream inputStream = this.signatureObjPAdES.getSignatureDocument().getInputStream();
OutputStream outputStream = new FileOutputStream(new File("C:/temp.pdf"));
int read = 0;
byte[] bytes = new byte[1024];
while ((read = inputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, read);
}
PDDocument document = PDDocument.load(new File("C:/temp.pdf"));
...
document.addSignature(new PDSignature(this.dts.getDocumentTimeStamp()), this);
document.saveIncremental(new FileOutputStream("C:/result.pdf");
document.close();
But I want to do this directly :
PDDocument document = PDDocument.load(inputStream);
Problem: at run
Exception in thread "main" java.lang.NullPointerException
at java.io.RandomAccessFile.<init>(Unknown Source)
at org.apache.pdfbox.io.RandomAccessBufferedFileInputStream.<init>(RandomAccessBufferedFileInputStream.java:77)
at org.apache.pdfbox.pdmodel.PDDocument.saveIncremental(PDDocument.java:961)
All ideas are welcome.
Thank you.
EDIT:
It's now working with the release of PDFBox 2.0.0.

The cause
The immediate hindrance is in the method PDDocument.saveIncremental() itself:
public void saveIncremental(OutputStream output) throws IOException
{
InputStream input = new RandomAccessBufferedFileInputStream(incrementalFile);
COSWriter writer = null;
try
{
writer = new COSWriter(output, input);
writer.write(this, signInterface);
writer.close();
}
finally
{
if (writer != null)
{
writer.close();
}
}
}
(PDDocument.java)
The member incrementalFile used in the first line is only set during a PDDocument.load with a File parameter.
Thus, this method cannot be used.
A work-around
Fortunately the method PDDocument.saveIncremental() only uses methods and values publicly available with the sole exception of signInterface, but you know the value of it because you set it in your code in the line right before the saveIncremental call:
document.addSignature(new PDSignature(this.dts.getDocumentTimeStamp()), this);
document.saveIncremental(new FileOutputStream("C:/result.pdf"));
Thus, instead of calling PDDocument.saveIncremental() you can do the equivalent in your code.
To do so you furthermore need a replacement value for the InputStream input. It needs to return a stream with the identical content as inputStream in your
PDDocument document = PDDocument.load(inputStream);
So you need to use that stream twice. As you have not said whether that inputStream can be reset, we'll first copy it into a byte[] which we forward both to PDDocument.load and new COSWriter.
Thus, replace your
PDDocument document = PDDocument.load(inputStream);
...
document.addSignature(new PDSignature(this.dts.getDocumentTimeStamp()), this);
document.saveIncremental(new FileOutputStream("C:/result.pdf"));
document.close();
by
byte[] inputBytes = IOUtils.toByteArray(inputStream);
PDDocument document = PDDocument.load(new ByteArrayInputStream(inputBytes));
...
document.addSignature(new PDSignature(this.dts.getDocumentTimeStamp()), this);
saveIncremental(new FileOutputStream("C:/result.pdf"),
new ByteArrayInputStream(inputBytes), document, this);
document.close();
and add a new method saveIncremental to your class inspired by the original PDDocument.saveIncremental():
void saveIncremental(OutputStream output, InputStream input, PDDocument document, SignatureInterface signatureInterface) throws IOException
{
COSWriter writer = null;
try
{
writer = new COSWriter(output, input);
writer.write(document, signatureInterface);
writer.close();
}
finally
{
if (writer != null)
{
writer.close();
}
}
}
On the side
I said above
As you have not said whether that inputStream can be reset, we'll first copy it into a byte[] which we forward both to PDDocument.load and new COSWriter.
Actually there is another reason to do so: COSWriter.doWriteSignature() retrieves the length of the original PDF like this:
long inLength = incrementalInput.available();
(COSWriter.java)
The documentation of InputStream.available() states, though:
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not.
To re-use inputStream instead of using a byte[] and ByteArrayInputStreams as above, therefore, inputStream not only needs to support reset() but also needs to be one of the few InputStream implementations which return the total number of bytes in the stream as available.
FileInputStream and ByteArrayInputStream both do return the total number of bytes in the stream as available.
There may still be more issues when using generic InputStreams instead of these two.

Hey Cyril Bremaud, you can use this approach, since the PDDocument class has 3 overloaded constructor, you can go ahead and provide only the file path if you like and it will work as well. But for your requirement to be able to pass an InputStream directly to the PDDocument constructor, use this code:
lStrInputPDFfile = "samples_pdf_signing\Country Calendar.pdf";
lOsPDFInput = new java.io.FileInputStream(lStrInputPDFfile);
jPDFDocument = new org.apache.pdfbox.pdmodel.PDDocument().load(lOsPDFInput);
But this also work in my case:
lStrInputPDFfile = "samples_pdf_signing\Country Calendar.pdf";
jPDFDocument = new org.apache.pdfbox.pdmodel.PDDocument().load(lStrInputPDFfile);
Note: `InputStream is a parent class of FileInputStream and that is why the above code works.
updated my code, please check again. Thanks to #mkl for pointing that out.

reverse chunks in a file

My basic Java problem is this: I need to read in a file by chunks, then reverse the order of the chunks, then write that out to a new file. My first (naive) attempt followed this approach:
read a chunk from the file.
reverse the bytes of the chunk
push the bytes one at a time to the front of a results list
repeat for all chunks
write result list to new file.
So this is basically a very stupid and slow way to solve the problem, but generates the correct output that I am looking for. To try to improve the situation, I change to this algorithm:
read a chunk from the file
push that chunk onto the front of a list of arrays
repeat for all chunks
foreach chunk, write to new file
And to my mind, that produces the same output. except it doesn't and I am quite confused. The first chunk in the result file matches with both methods, but the rest of the file is completely different.
Here is the meat of the Java code I am using:
FileInputStream in;
FileOutputStream out, out2;
Byte[] t = new Byte[0];
LinkedList<Byte> reversed_data = new LinkedList<Byte>();
byte[] data = new byte[bufferSize];
LinkedList<byte[]> revd2 = new LinkedList<byte[]>();
try {
in = new FileInputStream(infile);
out = new FileOutputStream(outfile1);
out2 = new FileOutputStream(outfile2);
} catch (FileNotFoundException e) {
e.printStackTrace();
return;
}
while(in.read(data) != -1)
{
revd2.addFirst(data);
byte[] revd = reverse(data);
for (byte b : revd)
{
reversed_data.addFirst(b);
}
}
for (Byte b : reversed_data)
{
out.write(b);
}
for (byte[] b : revd2)
{
out2.write(b);
}
At http://pastie.org/3113665 you can see a complete example program (a long with my debugging attempts). For simplicity I am using a bufferSize that divides evenly the size of the file so all chunks will be the same size, but this won't hold in the real world. My question is, WHY don't these two methods generate the same output? It's driving me crazy because I can't grok it.

You're constantly overwriting the data you've read previously.
while(in.read(data) != -1)
{
revd2.addFirst(data);
// ignore byte-wise stuff
}
You're adding the same object repeatedly to the list revd2, so each list node will finally contain a reference to data filled with the result of the last read. I suggest replacing that with revd2.addFirst(data.clone()).

My guess is you want to change
revd2.addFirst(data);
byte[] revd = reverse(data);
for the following so the reversed copy is added to the start of the list.
byte[] revd = reverse(data);
revd2.addFirst(revd);

writing java testng test cases

I am beginning with Java and testng test cases.
I need to write a class, which reads data from a file and makes an in-memory data structure and uses this data structure for further processing. I would like to test, if this DS is being populated correctly. This would call for dumping the DS into a file and then comparing the input file with the dumped file. Is there any testNG assert available for file matching? Is this a common practice?

I think it would be better to compare the data itself not the written out data.
So I would write a method in the class to return this data structure (let's call it getDataStructure()) and then write a unit test to compare with the correct data.
This only needs a correct equals() method in your data structure class and do:
Assert.assertEquals(yourClass.getDataStructure(), correctData);
Of course if you need to write out the data structure to a file, then you can test the serialization and deserialization separately.

File compare/matching can be extracted to a utility method or something like that.
If you need it only for testing there are addons for jUnit
http://junit-addons.sourceforge.net/junitx/framework/FileAssert.html
If you need file compare outside the testing environment you can use this simple function
public static boolean fileContentEquals(String filePathA, String filePathB) throws Exception {
if (!compareFilesLength(filePathA, filePathB)) return false;
BufferedInputStream streamA = null;
BufferedInputStream streamB = null;
try {
File fileA = new File(filePathA);
File fileB = new File(filePathB);
streamA = new BufferedInputStream(new FileInputStream(fileA));
streamB = new BufferedInputStream(new FileInputStream(fileB));
int chunkSizeInBytes = 16384;
byte[] bufferA = new byte[chunkSizeInBytes];
byte[] bufferB = new byte[chunkSizeInBytes];
int totalReadBytes = 0;
while (totalReadBytes < fileA.length()) {
int readBytes = streamA.read(bufferA);
streamB.read(bufferB);
if (readBytes == 0) break;
MessageDigest digestA = MessageDigest.getInstance(CHECKSUM_ALGORITHM);
MessageDigest digestB = MessageDigest.getInstance(CHECKSUM_ALGORITHM);
digestA.update(bufferA, 0, readBytes);
digestB.update(bufferB, 0, readBytes);
if (!MessageDigest.isEqual(digestA.digest(), digestB.digest()))
{
closeStreams(streamA, streamB);
return false;
}
totalReadBytes += readBytes;
}
closeStreams(streamA, streamB);
return true;
} finally {
closeStreams(streamA, streamB);
}
}
public static void closeStreams(Closeable ...streams) {
for (int i = 0; i < streams.length; i++) {
Closeable stream = streams[i];
closeStream(stream);
}
}
public static boolean compareFilesLength(String filePathA, String filePathB) {
File fileA = new File(filePathA);
File fileB = new File(filePathB);
return fileA.length() == fileB.length();
}
private static void closeStream(Closeable stream) {
try {
stream.close();
} catch (IOException e) {
// ignore exception
}
}
Your choice, but having an utility class with that functionality that can be reused is better imho.
Good luck and have fun.

Personally I would do the opposite. Surely you need a way to compare two of these data structure in the Java world - so the test would read from the file, build the DS, do its processing, and then assert it's equal to an "expected" DS you set up in your test.
(using JUnit4)
#Test
public void testProcessingDoesWhatItShould() {
final DataStructure original = readFromFile(filename);
final DataStructure actual = doTheProcessingYouNeedToDo(original);
final DataStructure expected = generateMyExpectedResult();
Assert.assertEquals("data structure", expected, actual);
}

If this DS is a simple Java Bean. then you can use EqualsBuilder from Apache Commons to compare 2 objects.

compare bytes loaded from file system and bytes you are going to write file system
pseudo code
byte[] loadedBytes = loadFileContentFromFile(file) // maybe apache commons IOUtils.toByteArray(InputStream input)
byte[] writeBytes = constructBytesFromDataStructure(dataStructure)
Assert.assertTrue(java.util.Arrays.equals(writeBytes ,loadedBytes));

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to convert List to the csv byte array safely? - java

Related

Cannot read Part's content (Flux<DataBuffer>) into a single String

Batching multiple files to Amazon S3 using the Java SDK

How to sign an InputStream from a PDF file with PDFBox 2.0.0

reverse chunks in a file

writing java testng test cases

Categories

Resources