Adding up ByteArrayOutputStreams - java

Im having a bunch of ByteArrayOutputstreams onto which pdf reports are written over different parts of a particular workflow. I use IText to accomplish this. Now, at the end I would like to group all these single ByteArrayOutputstreams into a bigger ByteArrayOutputstream so as to group all pdf reports together.
I looked at Apache Commons library but could not find anything of use.
1 way that i know of is to convert each of these ByteArrayOutputstreams into byte[] and then using System.arraycopy to copy them into a bigger byte[]. The problem with this is me having to declare the size of the result byte[] upfront which makes it non-ideal.
Is there any other way to copy/append to/concatenate ByteArrayOutputStreams taht i may have missed ?

You can use a List<Byte[]> for that. You can add your bytes on the go to the list.
List<Byte[]> listOfAllBytes = new ArrayList<Byte[]>;
ByteArrayOutputstreams byteArray = //...
listOfAllBytes.add(byteArray.toByteArray);
At the end, you can just get the full byte array back.

Write all but one of their toByteArray() results into the remaining one.

public class Concatenate {
/** The resulting PDF file. */
public static final String RESULT
= "path/to/concatenated_pdf.pdf";
public static void main(String[] args)
throws IOException, DocumentException {
String[] files = { "1.pdf", "2.pdf" };
Document document = new Document();
PdfCopy copy = new PdfCopy(document, new FileOutputStream(RESULT));
document.open();
PdfReader reader;
int n;
// loop over the documents you want to concatenate
for (int i = 0; i < files.length; i++) {
reader = new PdfReader(files[i]);
// loop over the pages in that document
n = reader.getNumberOfPages();
for (int page = 0; page < n; ) {
copy.addPage(copy.getImportedPage(reader, ++page));
}
copy.freeReader(reader);
}
document.close();
}
}

Related

Background image itextpdf 5.5

I'm using itextpdf 5.5, can't change it to 7.
I have the problem with background image.
I have a document (text and tables) without stamp and I want to add stamp to it.
This is how I download existing doc.
PdfReader pdfReader = new PdfReader("/doc.pdf");
PdfImportedPage page = writer.getImportedPage(pdfReader, 1);
PdfContentByte pcb = writer.getDirectContent();
pcb.addTemplate(page, 0,0);
And this is how I download stamp image and add it to my doc.
PdfContentByte canvas = writer.getDirectContentUnder();
URL resource = getClass().getResource(getStamp());
Image background = new Jpeg(resource);
background.scaleToFit(463F, 132F);
background.setAbsolutePosition(275F, 100F);
canvas.addImage(background);
But when I download my document - I don't see the stamp. I tried to change getDirectContent() to getDirectContentUnder() when I download my doc but this leads to the opposite situation - my stamp isn't in background.
My first doc is generated this way.
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
Document document = new Document(PageSize.A4);
try {
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
Paragraph title = new Paragraph(formatUtil.msg("my.header"), fontBold);
title.setAlignment(Element.ALIGN_CENTER);
document.add(title);
Template tmpl = fmConfig.getConfiguration().getTemplate("template.ftl");
Map<String, Object> params = new HashMap<>();
StringWriter writer = new StringWriter();
params.put("param", "param");
tmpl.process(params, writer);
document.add(new Paragraph(writer.toString(), fontCommon));
PdfPTable table = new PdfPTable(2);
document.add(table);
PdfContentByte canvas = writer.getDirectContentUnder();
Image background = new Jpeg(getClass().getResource("background.jpg"));
background.scaleAbsolute(PageSize.A4);
background.setAbsolutePosition(0,0);
canvas.addImage(background);
} finally {
if (document.isOpen()) {
document.close();
}
}
In comments it became clear that the task was to add some content (a bitmap image) to a PDF so that it is over the background (another bitmap image) added to the UnderContent during generation and under the text in the original DirectContent.
iText does not contain high-level code for such content manipulation. While the structure of iText generated PDFs would allow for such code, PDFs generated or manipulated by other PDF libraries may have a different structure; the structure in iText generated PDFs may even be changed during post-processing using other libraries. Thus, it is understandable that no high-level feature for this is provided by iText.
To implement the task nonetheless, therefore, we have to base our code on lower level iText APIs. In this context we can make use of the PdfContentStreamEditor helper class from this answer which already abstracts some details away. (That question originally is about iTextSharp for C# but further down in the answer Java versions of the code also are provided.)
In detail, we extend the PdfContentStreamEditor to remove the former UnderContent (and provide it as list of instructions). Now we can in a first step apply this editor to your file and in a second step add this former UnderContent plus an image over it to the intermediary file. (This could also be done in a single step but that would require a more complex, less maintainable editor class.)
First the new UnderContentRemover content stream editor class:
public class UnderContentRemover extends PdfContentStreamEditor {
/**
* Clears state of {#link UnderContentRemover}, in particular
* the collected content. Use this if you use this instance for
* multiple edit runs.
*/
public void clear() {
afterUnderContent = false;
underContent.clear();
depth = 0;
}
/**
* Retrieves the collected UnderContent instructions
*/
public List<List<PdfObject>> getUnderContent() {
return new ArrayList<List<PdfObject>>(underContent);
}
/**
* Adds the given instructions (which may previously have been
* retrieved using {#link #getUnderContent()}) to the given
* {#link PdfContentByte} instance.
*/
public static void write (PdfContentByte canvas, List<List<PdfObject>> operations) throws IOException {
for (List<PdfObject> operands : operations) {
int index = 0;
for (PdfObject object : operands) {
object.toPdf(canvas.getPdfWriter(), canvas.getInternalBuffer());
canvas.getInternalBuffer().append(operands.size() > ++index ? (byte) ' ' : (byte) '\n');
}
}
}
protected void write(PdfContentStreamProcessor processor, PdfLiteral operator, List<PdfObject> operands) throws IOException {
String operatorString = operator.toString();
if (afterUnderContent) {
super.write(processor, operator, operands);
return;
} else if ("q".equals(operatorString)) {
depth++;
} else if ("Q".equals(operatorString)) {
depth--;
if (depth < 1)
afterUnderContent = true;
} else if (depth == 0) {
afterUnderContent = true;
super.write(processor, operator, operands);
return;
}
underContent.add(new ArrayList<>(operands));
}
boolean afterUnderContent = false;
List<List<PdfObject>> underContent = new ArrayList<>();
int depth = 0;
}
(UnderContentRemover)
As you see, its write method stores the leading instructions forwarded to it in the underContent list until it finds the restore-graphics-state (Q) instruction matching the initial save-graphics-state instruction (q). After that it instead forwards all further instructions to the parent write implementation which writes them to the edited page content.
We can use this for the task at hand as follows:
PdfReader pdfReader = new PdfReader(YOUR_DOCUMENT);
List<List<List<PdfObject>>> underContentByPage = new ArrayList<>();
byte[] sourceWithoutUnderContent = null;
try ( ByteArrayOutputStream outputStream = new ByteArrayOutputStream() ) {
PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream);
UnderContentRemover underContentRemover = new UnderContentRemover();
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++) {
underContentRemover.clear();
underContentRemover.editPage(pdfStamper, i);
underContentByPage.add(underContentRemover.getUnderContent());
}
pdfStamper.close();
pdfReader.close();
sourceWithoutUnderContent = outputStream.toByteArray();
}
Image background = YOUR_IMAGE_TO_ADD_INBETWEEN;
background.scaleToFit(463F, 132F);
background.setAbsolutePosition(275F, 100F);
pdfReader = new PdfReader(sourceWithoutUnderContent);
byte[] sourceWithStampInbetween = null;
try ( ByteArrayOutputStream outputStream = new ByteArrayOutputStream() ) {
PdfStamper pdfStamper = new PdfStamper(pdfReader, outputStream);
for (int i = 1; i <= pdfReader.getNumberOfPages(); i++) {
PdfContentByte canvas = pdfStamper.getUnderContent(i);
UnderContentRemover.write(canvas, underContentByPage.get(i-1));
canvas.addImage(background);
}
pdfStamper.close();
pdfReader.close();
sourceWithStampInbetween = outputStream.toByteArray();
}
Files.write(new File("PdfLikeVladimirSafonov-WithStampInbetween.pdf").toPath(), sourceWithStampInbetween);
(AddImageInBetween test testForVladimirSafonov)

Using iText 2.1.7 to merge large PDFs

I am using an older version of iText (2.1.7) to merge PDFs. Because that is the last version under the MPL available to me. I cannot change this.
Anyways. I am trying to merge multiple PDFs. Everything seems to work ok, but when I go over about 1500 pages, then the generated PDF fails to open (behaves as if it is corrupted)
This is how I am doing it:
private byte[] mergePDFs(List<byte[]> pdfBytesList) throws DocumentException, IOException {
Document document = new Document();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
PdfCopy copy = new PdfCopy(document, outputStream);
document.open();
for (byte[] pdfByteArray : pdfBytesList) {
ByteArrayInputStream readerStream = new ByteArrayInputStream(pdfByteArray);
PdfReader reader = new PdfReader(readerStream);
for (int i = 0; i < reader.getNumberOfPages(); ) {
copy.addPage(copy.getImportedPage(reader, ++i));
}
copy.freeReader(reader);
reader.close();
}
document.close();
return outputStream.toByteArray();
}
Is this the correct approach? Is there anything about this that would hint at breaking when going over a certain amount of pages? There are no exceptions thrown or anything.
For anyone curious, the issue had nothing to do with iText and instead was the code responsible for returning the response from iText.

Extracting an embedded object from a pdf

I had embedded a byte array into a pdf file (Java).
Now I am trying to extract that same array.
The array was embedded as a "MOVIE" file.
I couldn't find any clue on how to do that...
Any ideas?
Thanks!
EDIT
I used this code to embed the byte array:
public static void pack(byte[] file) throws IOException, DocumentException{
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(RESULT));
writer.setPdfVersion(PdfWriter.PDF_VERSION_1_7);
writer.addDeveloperExtension(PdfDeveloperExtension.ADOBE_1_7_EXTENSIONLEVEL3);
document.open();
RichMediaAnnotation richMedia = new RichMediaAnnotation(writer, new Rectangle(0,0,0,0));
PdfFileSpecification fs
= PdfFileSpecification.fileEmbedded(writer, null, "test.avi", file);
PdfIndirectReference asset = richMedia.addAsset("test.avi", fs);
RichMediaConfiguration configuration = new RichMediaConfiguration(PdfName.MOVIE);
RichMediaInstance instance = new RichMediaInstance(PdfName.MOVIE);
RichMediaParams flashVars = new RichMediaParams();
instance.setAsset(asset);
configuration.addInstance(instance);
RichMediaActivation activation = new RichMediaActivation();
richMedia.setActivation(activation);
PdfAnnotation richMediaAnnotation = richMedia.createAnnotation();
richMediaAnnotation.setFlags(PdfAnnotation.FLAGS_PRINT);
writer.addAnnotation(richMediaAnnotation);
document.close();
I have written a brute force method to extract all streams in a PDF and store them as a file without an extension:
public static final String SRC = "resources/pdfs/image.pdf";
public static final String DEST = "results/parse/stream%s";
public static void main(String[] args) throws IOException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new ExtractStreams().parse(SRC, DEST);
}
public void parse(String src, String dest) throws IOException {
PdfReader reader = new PdfReader(src);
PdfObject obj;
for (int i = 1; i <= reader.getXrefSize(); i++) {
obj = reader.getPdfObject(i);
if (obj != null && obj.isStream()) {
PRStream stream = (PRStream)obj;
byte[] b;
try {
b = PdfReader.getStreamBytes(stream);
}
catch(UnsupportedPdfException e) {
b = PdfReader.getStreamBytesRaw(stream);
}
FileOutputStream fos = new FileOutputStream(String.format(dest, i));
fos.write(b);
fos.flush();
fos.close();
}
}
}
Note that I get all PDF objects that are streams as a PRStream object. I also use two different methods:
When I use PdfReader.getStreamBytes(stream), iText will look at the filter. For instance: page content streams consists of PDF syntax that is compressed using /FlateDecode. By using PdfReader.getStreamBytes(stream), you will get the uncompressed PDF syntax.
Not all filters are supported in iText. Take for instance /DCTDecode which is the filter used to store JPEGs inside a PDF. Why and how would you "decode" such a stream? You wouldn't, and that's when we use PdfReader.getStreamBytesRaw(stream) which is also the method you need to get your AVI-bytes from your PDF.
This example already gives you the methods you'll certainly need to extract PDF streams. Now it's up to you to find the path to the stream you need. That calls for iText RUPS. With iText RUPS you can look at the internal structure of a PDF file. In your case, you need to find the annotations as is done in this question: All links of existing pdf change the action property to inherit zoom - iText library
You loop over the page dictionaries, then loop over the /Annots array of this dictionary (if it's present), but instead of checking for /Link annotations (which is what was asked in the question I refer to), you have to check for /RichMedia annotations and from there examine the assets until you find the stream that contains the AVI file. RUPS will show you how to dive into the annotation dictionary.

Remove rectangles from PDF file

I'd like to have a program that removes all rectangles from a PDF file. One use case for this is to unblacken a given PDF file to see if there is any hidden information behind the rectangles. The rest of the PDF file should be kept as-is.
Which PDF library is suitable to this task? In Java, I would like the code to look like this:
PdfDocument doc = PdfDocument.load(new File("original.pdf"));
PdfDocument unblackened = doc.transform(new CopyingPdfVisitor() {
public void visitRectangle(PdfRect rect) {
if (rect.getFillColor().getBrightness() >= 0.1) {
super.visitRectangle(rect);
}
}
});
unblackened.save(new File("unblackened.pdf"));
The CopyingPdfVisitor would copy a PDF document exactly as-is, and my custom code would leave out all the dark rectangles.
Itext pdf library have ways to modify pdf content.
The *ITEXT CONTENTPARSER Example * may give you any idea. "qname" parameter (qualified name) may be used to detected rectangle element.
http://itextpdf.com/book/chapter.php?id=15
Other option, if you want obtain the text on the document use the PdfReaderContentParser to extract text content
public void parsePdf(String pdf, String txt) throws IOException {
PdfReader reader = new PdfReader(pdf);
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
PrintWriter out = new PrintWriter(new FileOutputStream(txt));
TextExtractionStrategy strategy;
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
strategy = parser.processContent(i, new SimpleTextExtractionStrategy());
out.println(strategy.getResultantText());
}
out.flush();
out.close();
reader.close();
}
example at http://itextpdf.com/examples/iia.php?id=277

Read .bmp image and subtract 10 from 10th byte of the image and re-create the image in Java

I am creating an application which will read image byte/pixel/data from an .bmp image and store it in an byte/char/int/etc array.
Now, from this array, I want to subtract 10 (in decimal) from the data stored in the 10th index of an array.
I am able to successfully store the image information in the array created. But when I try to write the array information back to .bmp image, the image created is not viewable.
This is the piece of code which I tried to do so.
In this code, I am not subtracting 10 from the 10th index of an array.
public class Test1 {
public static void main(String[] args) throws IOException{
File inputFile = new File("d://test.bmp");
FileReader inputStream = new FileReader("d://test.bmp");
FileOutputStream outputStream = new FileOutputStream("d://test1.bmp");
/*
* Create byte array large enough to hold the content of the file.
* Use File.length to determine size of the file in bytes.
*/
char fileContent[] = new char[(int)inputFile.length()];
for(int i = 0; i < (int)inputFile.length(); i++){
fileContent[i] = (char) inputStream.read();
}
for(int i = 0; i < (int)inputFile.length(); i++){
outputStream.write(fileContent[i]);
}
}
}
Instead of char[], use byte[]
Here's a modified version if your code which works:
public class Test {
public static void main(String[] args) throws IOException {
File inputFile = new File("someinputfile.bmp");
FileOutputStream outputStream = new FileOutputStream("outputfile.bmp");
/*
* Create byte array large enough to hold the content of the file.
* Use File.length to determine size of the file in bytes.
*/
byte fileContent[] = new byte[(int)inputFile.length()];
new FileInputStream(inputFile).read(fileContent);
for(int i = 0; i < (int)inputFile.length(); i++){
outputStream.write(fileContent[i]);
}
outputStream.close();
}
}
To make your existing code work you should replace the FileReader with a FileInputStream. According to the FileReader javadoc:
FileReader is meant for reading streams of characters. For reading streams of raw bytes, consider using a FileInputStream.
Modifying your sample as below
public static void main(String[] args) throws IOException
{
File inputFile = new File("d://test.bmp");
FileInputStream inputStream = new FileInputStream("d://test.bmp");
FileOutputStream outputStream = new FileOutputStream("d://test1.bmp");
/*
* Create byte array large enough to hold the content of the file.
* Use File.length to determine size of the file in bytes.
*/
byte fileContent[] = new byte[(int)inputFile.length()];
for(int i = 0; i < (int)inputFile.length(); i++){
fileContent[i] = (byte) inputStream.read();
}
inputStream.close();
for(int i = 0; i < (int)inputFile.length(); i++){
outputStream.write(fileContent[i]);
}
outputStream.flush();
outputStream.close();
}
This work for me to create a copy of the original image.
Though as mentioned in the comments above this is probably not the correct approach for what you are trying to achieve.
Other have pointed you at errors in your code (using char instead of byte mostly), however, even if you fix that, you probably will end up with a non-loadable image if you change the value of the 10th byte in the file.
This is because, a .bmp image file starts with an header containing information about the file (color depth, dimensions, ... see BMP file format) before any actual image data. Specifically, the 10th byte is part of a 4 byte integer storing the offset of the actual image data (pixel array). So subtracting 10 from this value will probably make the offset pointing at the wrong point in the file, and your image loader doing bound checking will probably consider this invalid.
What you really want to do is load the image as an image and manipulate the pixel values directly. Something like that:
BufferedImage originalImage = ImageIO.read(new File("d://test.bmp"));
int rgb = originalImage.getRGB(10, 0);
originalImage.setRGB(rgb >= 10 ? rgb - 10 : 0);
ImageIO.write(originalImage, "bmp", new File("d://test1.bmp"));

Categories

Resources