Merge document with PDFMergerUtility in pdfbox 2.00

Merge document with PDFMergerUtility in pdfbox 2.00 - java

Pdfbox Merge Document with 1.8.xx as like mergePdf.mergeDocuments() it working fine .now pdfbox version 2.0.0 contain some argument like org.apache.pdfbox.multipdf.PDFMergerUtility.mergeDocuments(MemoryUsageSetting arg0)
what is MemoryUsageSetting how to use with mergeDocuments.I read as like Merge the list of source documents, saving the result in the destination file. kindly provide some code equivalent to version 2.0.0
public void combine()
{
try
{
PDFMergerUtility mergePdf = new PDFMergerUtility();
String folder ="pdf";
File _folder = new File(folder);
File[] filesInFolder;
filesInFolder = _folder.listFiles();
for (File string : filesInFolder)
{
mergePdf.addSource(string);
}
mergePdf.setDestinationFileName("Combined.pdf");
mergePdf.mergeDocuments();
}
catch(Exception e)
{
}
}

According to the javadoc, MemoryUsageSetting controls how memory/temporary files are used for buffering.
The two easiest usages are:
MemoryUsageSetting.setupMainMemoryOnly()
this sets buffering memory usage to only use main-memory (no temporary file) which is not restricted in size.
MemoryUsageSetting.setupTempFileOnly()
this sets buffering memory usage to only use temporary file(s) (no main-memory) which is not restricted in size.
So for you, the call would be
mergePdf.mergeDocuments(MemoryUsageSetting.setupMainMemoryOnly());
or
mergePdf.mergeDocuments(MemoryUsageSetting.setupTempFileOnly());
Or just pass null. This will default to main memory only. That's also what the javadoc tells: memUsageSetting defines how memory is used for buffering PDF streams; in case of null unrestricted main memory is used.

Related

How to lock folder in java before writing file?

I have usecase where multiple thread can write file to a folder. At a given point of time I want to identify which is the latest file in that folder.
Since I cannot use timestamp as it can be same for more than 1 file in the folder. So I want to lock the folder, generate sequence number by counting number of file in folder, write new file by using the generated sequence number, release lock. Is this possible in java?
Similarly while reading take the file with largest sequence number.
Chances of concurrent writing file to a folder is less so performance won't be an issue.

You can't use FileLock on a directory so you will have to handle locking in Java. You could do something like:
private final Object lock = new Object();
public void writeToNext(String dirPath) {
synchronized(lock) {
File dir = new File(dirPath);
List<File> files = Arrays.asList(dir.listFiles(new FileFilter() {
#Override
public boolean accept(File pathname) {
return !pathname.isDirectory();
}
}));
int numFiles = files.size();
String nextFile = dir.getAbsolutePath() + File.separator + (numFiles + 1) + ".txt"; // get a path for the new file
System.out.println("Writing to " + nextFile);
// TODO write to file
}
}
Note
You could implement your solution such that each write increments a counter somewhere and you can just use that to get the next value; only order and look for the last file if the counter hasn't been initialized.

Using Java SE 7 or above:
WatchService API allows tracking file operations (create, modify and delete file) in a specified directory. In this scenario create a watch service to track new files created in the specific folder. Each time a new file is created the file creating an event is triggered and the process allows do some user-defined action.
The file already has a created time attribute (java.nio.file.attribute.BasicFileAttributes). This can be extracted as of type java.nio.file.attribute.FileTime which is in millis or can be a more specific java.util.concurrent.TimeUnit (this allows nanosecond precision). This gives a chance to be more specific about what is the newest file.
Also, there is an option to create a custom user-defined file attribute for any file. The attribute allows defining as key-value pair. This unique attribute value can be associated with a file to identify if its the latest. The following APIs allow creating and reading a custom file attribute: java.nio.file.attribute.UserDefinedFileAttributeView and Files.getFileAttributeView().
I think using the above APIs and methods one can create an application to track the latest files created in a specified folder and perform a required action. Note there is no locking mechanism involved if one is using these APIs.
EDIT (included):
Using a collection to retrieve latest file:
A thread-safe collection can be used to store the filenames (or file path) and retrieve them LIFO (last-in-first-out). The watch service (or similar process) can store the filename of the (latest) file created in the folder to this collection. A read operation just gets the latest filename from this collection and work with it. One can consider java.util.concurrent.ConcurrentLinkedDeque or LinkedBlockingDeque based on requirement.
EDIT (included):
A possible solution's process diagram:

Use File.createNewFile() in a loop for writing. Because it
Atomically creates a new, empty file named by this abstract pathname if and only if a file with this name does not yet exist. The check for the existence of the file and the creation of the file if it does not exist are a single operation that is atomic with respect to all other filesystem activities that might affect the file.
Like this:
import java.io.*;
import java.util.*;
public class FileCreator {
public static void main(String[] args) throws IOException {
String creatorId = UUID.randomUUID().toString();
File dir = new File("dir");
for (int filesCreated = 0; filesCreated < 1000; filesCreated++) {
File newFile;
for (int fileIdx = dir.list().length; ; fileIdx++) {
newFile = new File(dir, "file-" + fileIdx + ".txt");
if (newFile.createNewFile()) {
break;
}
}
try (PrintWriter pw = new PrintWriter(newFile)) {
pw.println(creatorId);
}
}
}
}
Another option would be Files.createFile(...). It throws an exception if the file already exists.
As for reading:
Similarly while reading take the file with largest sequence number.
What's the question here? Just take it.

Regarding stitching of multiple files into a single file

I work on query latencies and have a requirement where I have several files which contain data. I want to aggregate this data into a single file. I use a naive technique where I open each file and collect all the data in a global file. I do this for all the files but this is time taking. Is there a way in which you can stitch the end of one file to the beginning of another and create a big file containing all the data. I think many people might have faced this problem before. Can anyone kindly help ?

I suppose you are currently doing the opening and appending by hand; otherwise I do not know why it would take a long time to aggregate the data, especially since you describe the amount of files using multiple and several which seem to indicate it's not an enormous number.
Thus, I think you are just looking for a way to automatically to the opening and appending for you. In that case, you can use an approach similar to below. Note this creates the output file or overwrites it if it already exists, then appends the contents of all specified files. If you want to call the method multiple times and append to the same file instead of overwriting an existing file, an alternative is to use a FileWriter instead with true as a second argument to its constructor so it will append to an existing file.
void aggregateFiles(List<String> fileNames, String outputFile) {
PrintWriter writer = null;
try {
writer = new PrintWriter(outputFile);
for(String fileName : fileNames) {
Path path = Paths.get(fileName);
String fileContents = new String(Files.readAllBytes(path));
writer.println(fileContents);
}
} catch(IOException e) {
// Handle IOException
} finally {
if(writer != null) writer.close();
}
}
List<String> files = new ArrayList<>();
files.add("f1.txt");
files.add("someDir/f2.txt");
files.add("f3.txt");
aggregateFiles(files, "output.txt");

Understanding Simple XML Parser - New File Output - Java

I am trying to learn how to use the Simple XML Framework as detailed in this thread : Best practices for parsing XML.
I am using the following code :
public class SimpleXMLParserActivity extends Activity {
/** Called when the activity is first created. */
#Override
public void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
//setContentView(R.layout.main);
Serializer serializer = new Persister();
Example example = new Example("Example message", 123);
File result = new File("example.xml");
try {
Log.d("Start", "Starting Serializer");
serializer.write(example, result);
} catch (Exception e) {
// TODO Auto-generated catch block
Log.d("Self", "Error");
e.printStackTrace();
}
}
}
I am having a problem understanding the line
File result = new File("example.xml");
1) Does this line create a new file in my app called example.xml ? If so where is this file located.
2) Or does this line look for an existing file called example.xml and then add to it ? If so where should the example.xml file be placed in my app bundle so that it can be found. I do notice at the moment I am getting an error message :
java.io.FileNotFoundException: /example.xml (Read-only file system)
Thank you.

File result = new File("example.xml")
This line will just store the filename "example.xml" in a new File object. There is no check if that file actually exists and it does not try to create it either.
A file without specifying an absolute path (starting with / like new File("/sdcard/example.xml")) is considered to be in the current working directory which I guess is / for Android apps (-> /example.xml (Read-only file system))
I guess serializer.write(example, result); tries to create the actual file for your but fails since you can't write to '/'.
You have to specify a path for that file. There are several places you can store files, e.g.
Context#getFilesDir() will give you a place in your app's home directory (/data/data/your.package/files/) where only you can read / write - without extra permission.
Environment#getExternalStorageDirectory() will give you the general primary storage thing (might be /sdcard/ - but that's very different for devices). To write here you'll need the WRITE_EXTERNAL_STORAGE permission.
there are more places available in Environment that are more specialized. E.g. for media files, downloads, caching, etc.
there is also Context#getExternalFilesDir() for app private (big) files you want to store on the external storage (something like /sdcard/Android/data/your.package/)
to fix your code you could do
File result = new File(Environment.getExternalStorageDirectory(), "example.xml");
Edit: either use the provided mechanisms to get an existing directory (preferred but you are limited to the folders you are supposed to use):
// via File - /data/data/your.package/app_assets/example.xml
File outputFile = new File(getDir("assets", Context.MODE_PRIVATE), "example.xml");
serializer.write(outputFile, result);
// via FileOutputStream - /data/data/your.package/files/example.xml
FileOutputStream outputStream = openFileOutput("example.xml", Context.MODE_PRIVATE);
serializer.write(outputStream, result);
or you may need to create the directories yourself (hackish way to get your app dir but it should work):
File outputFile = new File(new File(getFilesDir().getParentFile(), "assets"), "example.xml");
outputFile.mkdirs();
serializer.write(outputFile, result);
Try to avoid specifying full paths like "/data/data/com.simpletest.test/assets/example.xml" since they might be different on other devices / Android versions. Even the / is not guaranteed to be /. It's safer to use File.separatorChar instead if you have to.

2 solutions to do it cleanly :
use openFileOutput to write a private file in the application private directory (which could be located in the internal memory or the external storage if the app was moved there). See here for a snippet
or use the File constructor to create the File anywhere your app has write access. This is if you want to store the file on the SDCard for example. Instantiating a file doesn't create it on the file system, unless you start writiung to it (with FileOutputStream for example)
I'd recommend approach 1, it's better for users because these files get erased when your app is uninstalled. If the file is large, then using the External Storage is probably better.

What I read on the Android pages, I see it creates a file with that name:
File constructor
I think it writes it to the /data/data/packagname directory
edit: the 'packagename' was not shown in the tekst above. I put it between brackets. :s

Try saving to /sdcard/example.xml.

Detecting a PDF Package or Portfolio in code

Does anyone know of a way to detect whether a given PDF file is a PDF Portfolio or a PDF Package, rather than a "regular" PDF? I'd prefer Java solutions, although since I haven't yet found any information on detecting the specific type of PDF, I'll take what I can get and they try to figure out the Java solution afterwards.
(In searching past questions, it appears that a bunch of folks don't know that such things as PDF Portfolios and PDF Packages exist. Generally, they're both ways that Adobe allows multiple, discrete PDFs to be packaged into a single PDF file. Opening a PDF Package in Reader shows the user a list of the embedded PDFs and allows further viewing from there. PDF Portfolios appear to be a bit more complicated -- they also include Flash-based browser for the embedded files, and then allow users to extract the discrete PDFs from there. My issue with them, and the reason I'd like to be able to detect them in code, is because OS X's built-in Preview.app can't read these files -- so I'd like to at least warn users of a web app of mine that uploading them can lead to diminished compatibility across platforms.)

This question is old, but in-case someone wants to know, it is possible. It can be done with Acrobat and JavaScript by using the following command.
if (Doc.collection() != null)
{
//It Is Portfolio
}
Acrobat JavaScript API says, "A collection object is obtained from the Doc.collection property. Doc.collection returns a null value when there is no PDF collection (also called PDF package and PDF portfolio).The collection object is used to set the initial document in the collection, set the initial view of the collection, and to get, add, and remove collection fields (or categories)."

I'm also facing same problem while extracting data through kofax, but i got solution and its working fine need to add extra jar for Document class.
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
public class PDFPortfolio {
/**
* #param args
*/
public static void main(String[] args) {
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document("e:/pqr1.pdf");
// get collection of embedded files
com.aspose.pdf.EmbeddedFileCollection embeddedFiles = pdfDocument.getEmbeddedFiles();
// iterate through individual file of Portfolio
for(int counter=1; counter<=pdfDocument.getEmbeddedFiles().size();counter++)
{
com.aspose.pdf.FileSpecification fileSpecification = embeddedFiles.get_Item(counter);
try {
InputStream input = fileSpecification.getContents();
File file = new File(fileSpecification.getName());
// create path for file from pdf
// file.getParentFile().mkdirs();
// create and extract file from pdf
java.io.FileOutputStream output = new java.io.FileOutputStream("e:/"+fileSpecification.getName(), true);
byte[] buffer = new byte[4096];
int n = 0;
while (-1 != (n = input.read(buffer)))
output.write(buffer, 0, n);
// close InputStream object
input.close();
output.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

How to estimate zip file size in java before creating it

I am having a requirement wherein i have to create a zip file from a list of available files. The files are of different types like txt,pdf,xml etc.I am using java util classes to do it.
The requirement here is to maintain a maximum file size of 5 mb. I should select the files from list based on timestamp, add the files to zip until the zip file size reaches 5 mb. I should skip the remaining files.
Please let me know if there is a way in java where in i can estimate the zip file size in advance without creating actual file?
Or is there any other approach to handle this

Wrap your ZipOutputStream into a personalized OutputStream, named here YourOutputStream.
The constructor of YourOutputStream will create another ZipOutputStream (zos2) which wraps a new ByteArrayOutputStream (baos)
public YourOutputStream(ZipOutputStream zos, int maxSizeInBytes)
When you want to write a file with YourOutputStream, it will first write it on zos2
public void writeFile(File file) throws ZipFileFullException
public void writeFile(String path) throws ZipFileFullException
etc...
if baos.size() is under maxSizeInBytes
Write the file in zos1
else
close zos1, baos, zos2 an throw an exception. For the exception, I can't think of an already existant one, if there is, use it, else create your own IOException ZipFileFullException.
You need two ZipOutputStream, one to be written on your drive, one to check if your contents is over 5MB.
EDIT : In fact I checked, you can't remove a ZipEntry easily.
http://download.oracle.com/javase/6/docs/api/java/io/ByteArrayOutputStream.html#size()

+1 for Colin Herbert: Add files one by one, either back up the previous step or removing the last file if the archive is to big. I just want to add some details:
Prediction is way too unreliable. E.g. a PDF can contain uncompressed text, and compress down to 30% of the original, or it contains already-compressed text and images, compressing to 80%. You would need to inspect the entire PDF for compressibility, basically having to compress them.
You could try a statistical prediction, but that could reduce the number of failed attempts, but you would still have to implement above recommendation. Go with the simpler implementation first, and see if it's enough.
Alternatively, compress files individually, then pick the files that won't exceedd 5 MB if bound together. If unpacking is automated, too, you could bind the zip files into a single uncompressed zip file.

There is a better option. Create a dummy LengthOutputStream that just counts the written bytes:
public class LengthOutputStream extends OutputStream {
private long length = 0L;
#Override
public void write(int b) throws IOException {
length++;
}
public long getLength() {
return length;
}
}
You can just simply connect the LengthOutputStream to a ZipOutputStream:
public static long sizeOfZippedDirectory(File dir) throws FileNotFoundException, IOException {
try (LengthOutputStream sos = new LengthOutputStream();
ZipOutputStream zos = new ZipOutputStream(sos);) {
... // Add ZIP entries to the stream
return sos.getLength();
}
}
The LengthOutputStream object counts the bytes of the zipped stream but stores nothing, so there is no file size limit. This method gives an accurate size estimation but almost as slow as creating a ZIP file.

I dont think there is any way to estimate the size of zip that will be created because the zips are processed as streams. Also it would not be technically possible to predict the size of the created compressed format unless you actually compress it.

I did this once on a project with known input types. We knew that general speaking our data compressed around 5:1 (it was all text.) So, I'd check the file size and divide by 5...
In this case, the purpose for doing so was to check that files would likely be below a certain size. We only needed a rough estimate.
All that said, I have noticed zip applications like 7zip will create a zip file of a certain size (like a CD) and then split the zip off to a new file once it reaches the limit. You could look at that source code. I have actually used the command line version of that app in code before. They have a library you can use as well. Not sure how well that will integrate with Java though.
For what it is worth, I've also used a library called SharpZipLib. It was very good. I wonder if there is a Java port to it.

Maybe you could add a file each time, until you reach the 5MB limit, and then discard the last file. Like #Gopi, I don't think there is any way to estimate it without actually compressing the file.
Of course, file size will not increase (or maybe a little, because of the zip header?), so at least you have a "worst case" estimation.

just wanted to share how we implemented manual way
int maxSizeForAllFiles = 70000; // Read from property
int sizePerFile = 22000; // Red from property
/**
* Iterate all attachment list to verify if ZIP is required
*/
for (String attachFile : inputAttachmentList) {
File file = new File(attachFile);
totalFileSize += file.length();
/**
* if ZIP required ??? based on the size
*/
if (file.length() >= sizePerFile) {
toBeZipped = true;
logger.info("File: "
+ attachFile
+ " Size: "
+ file.length()
+ " File required to be zipped, MAX allowed per file: "
+ sizePerFile);
break;
}
}
/**
* Check if all attachments put together cross MAX_SIZE_FOR_ALL_FILES
*/
if (totalFileSize >= maxSizeForAllFiles) {
toBeZipped = true;
}
if (toBeZipped) {
// Zip Here iterating all attachments
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.