Detecting a PDF Package or Portfolio in code

Detecting a PDF Package or Portfolio in code - java

Does anyone know of a way to detect whether a given PDF file is a PDF Portfolio or a PDF Package, rather than a "regular" PDF? I'd prefer Java solutions, although since I haven't yet found any information on detecting the specific type of PDF, I'll take what I can get and they try to figure out the Java solution afterwards.
(In searching past questions, it appears that a bunch of folks don't know that such things as PDF Portfolios and PDF Packages exist. Generally, they're both ways that Adobe allows multiple, discrete PDFs to be packaged into a single PDF file. Opening a PDF Package in Reader shows the user a list of the embedded PDFs and allows further viewing from there. PDF Portfolios appear to be a bit more complicated -- they also include Flash-based browser for the embedded files, and then allow users to extract the discrete PDFs from there. My issue with them, and the reason I'd like to be able to detect them in code, is because OS X's built-in Preview.app can't read these files -- so I'd like to at least warn users of a web app of mine that uploading them can lead to diminished compatibility across platforms.)

This question is old, but in-case someone wants to know, it is possible. It can be done with Acrobat and JavaScript by using the following command.
if (Doc.collection() != null)
{
//It Is Portfolio
}
Acrobat JavaScript API says, "A collection object is obtained from the Doc.collection property. Doc.collection returns a null value when there is no PDF collection (also called PDF package and PDF portfolio).The collection object is used to set the initial document in the collection, set the initial view of the collection, and to get, add, and remove collection fields (or categories)."

I'm also facing same problem while extracting data through kofax, but i got solution and its working fine need to add extra jar for Document class.
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
public class PDFPortfolio {
/**
* #param args
*/
public static void main(String[] args) {
com.aspose.pdf.Document pdfDocument = new com.aspose.pdf.Document("e:/pqr1.pdf");
// get collection of embedded files
com.aspose.pdf.EmbeddedFileCollection embeddedFiles = pdfDocument.getEmbeddedFiles();
// iterate through individual file of Portfolio
for(int counter=1; counter<=pdfDocument.getEmbeddedFiles().size();counter++)
{
com.aspose.pdf.FileSpecification fileSpecification = embeddedFiles.get_Item(counter);
try {
InputStream input = fileSpecification.getContents();
File file = new File(fileSpecification.getName());
// create path for file from pdf
// file.getParentFile().mkdirs();
// create and extract file from pdf
java.io.FileOutputStream output = new java.io.FileOutputStream("e:/"+fileSpecification.getName(), true);
byte[] buffer = new byte[4096];
int n = 0;
while (-1 != (n = input.read(buffer)))
output.write(buffer, 0, n);
// close InputStream object
input.close();
output.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

Related

How can I open a FileInputStream that has its share set to allow ReadWrite?

In .Net I can open a FileStream set to FileAccess.Read, FileShare.ReadWrite. How can I do the same in Java?
Files.newInoutStream() does not appear to support either of these capabilities.
Update: Let me explain why. We have a common use case where our application opens a DOCX file while Word has it opening for editing. The only way Windows allows this due to the locks Word has on the file is FileAccess.Read & FileShare.ReadWrite.
And yes, that's dangerous (would be fine if it was FileShare.Read). But the world is what it is here and this in practice works great.
But it means I need to find a way in Java to open an InputStream to that file that the existing constraints due to Word holding it open require.

There is no 'InoutStream' in java.
You're probably looking for Files.newByteChannel:
import java.nio.ByteBuffer;
import java.nio.file.*;
import java.nio.charset.StandardCharsets;
class Snippet {
public static void main(String[] args) throws Exception {
Path path = Paths.get("test.txt");
try (var channel = Files.newByteChannel(path, StandardOpenOption.WRITE, StandardOpenOption.READ)) {
ByteBuffer bb = ByteBuffer.allocate(1024);
channel.read(bb);
// Note that 'read' reads 1 to x bytes depending on file system and
// phase of the moon.
bb.flip();
System.out.println("Read: " + StandardCharsets.UTF_8.decode(bb));
bb.clear();
channel.position(0);
channel.write(StandardCharsets.UTF_8.encode("Hello, World!"));
channel.position(0);
channel.read(bb);
bb.flip();
System.out.println("Read: " + StandardCharsets.UTF_8.decode(bb));
}
}
}
make a file named 'test.txt', put in whatever you like, then run this. It'll print whatever is there, then overwrite it with Hello, World!.
Note that the read call is guaranteed to read at least 1 byte, but will not neccessarily fill the entire buffer even if the file is that large: The idea is that you read one 'block' that the OS and file system can efficiently transfer in one operation. You'll need to add some loops if you want to read a guaranteed minimum, or even the entire file.

Can't delete file after calling FontFactory.getFont() method

I am using iTextPdf 5.5.3 to create PDF/A documents, I want the user to select custom fonts by uploading the .ttf file of the font, and becuase FontFactory.getFont() method only takes the font name as a string I have to write the uploaded file to the user's drive (I KNOW, I ASKED MY CUSTOMER FOR PERMISSION TO WRITE TO THE DRIVE) and then pass the path of the uploaded file to the getFont() method, after everything is finished I want to delete the uploaded files from the drive. Here is my code:
File fontFile = new File("d:/temp/testFont.ttf");
try {
FileOutputStream outStream = new FileOutputStream(fontFile);
outStream.write(the bytes of the uploaded font file);
outStream.flush();
outStream.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
Font font = FontFactory.getFont(fontFile.getAbsolutePath(), BaseFont.CP1250 , BaseFont.EMBEDDED);
fontFile.delete();
This code is not working, somehow the getFont() method is locking the font file and therefore the file is not being deleted. I tried lots of ways to do this, like: fontFile.deleteOnExit(); or FileDeleteStrategy.FORCE.delete("file path"); but nothing is working for me. Please advise. Thanks

I am not going to answer the question mentioned in the title of your post (because it is a secondary). Instead I am going to answer the question in the body (which is the essential question).
You claim that FontFactory.getFont() requires a font file on the file system. That is not incorrect. However, that doesn't mean you can't create a font from a byte[].
You are trying to solve your problem by saving a ttf on disk (which is forbidden by your customer), but that isn't necessary. In a way, your customer is right: it's not a good idea to save the TTF as a temporary file on disk (which is why I'm ignoring your secondary question).
Take a look at the following createFont() method:
public static BaseFont createFont(String name,
String encoding,
boolean embedded,
boolean cached,
byte[] ttfAfm,
byte[] pfb)
throws DocumentException,
IOException
This is how you should interpret the parameters in your case:
name - the name of the font (not the location)
encoding - the encoding to be applied to this font
embedded - true if the font is to be embedded in the PDF
cached - probably false in your case, as you won't be reusing the font in the JVM
ttfAfm - the bytes of the .ttf file
pfb - in your case, this value will benull (it only makes sense in the context of Type1 fonts).
Now you can meet the requirements of your customer and you do not need to introduce a suboptimal workaround.
Note: you are using iText 5.5.3 which is available under the AGPL. Please be aware that your customer will need to purchase a commercial license with iText Software as soon as he starts using iText in a web service, in a product,...

Properties missing in docx file

I am having a situation where i receive a ms-word (docx) document as a stream/bytearray from a webservice.
I then try to recreate the file, giving it the same name and content as before.
If i compare the original file and the one created after the download, then they are identical.
However, when i try to open the new one in word i get an error, and if accept the riscs i can open it.
If i look at the properties af the file in windows, the new one is missing a lot of information.
Any one know how to recreate the properties so the file can be opened without errors?
Just an extra piece of information.. If i use .doc (word97-2003) documents all is working fine, only .docx documents are a problem (also .xlsx and all the office 2007-2010 documents).
This is my code creating the files..
private static void saveBytesAsFile(String path, String filename, byte[] data){
try {
File dir = new File(path);
dir.mkdirs();
OutputStream os = new FileOutputStream(path + "/" + filename);
os.write(data);
os.flush();
os.close();
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
I compared the original and the recreated file in notepad++, and got the result that they are identical.
This is how i see that some properties are missing.
Image of properties
These are the warnings i get from word:
If i press ok on the first and yes on the second i can open the document anyway.
Word Warnings

If you are asked to "accept the risks" then this sounds more like the default Word behaviour when downloading documents from the internet, not an error. You can change the Word behaviour from the Word Options, Trust Center (assuming you are using Word 2007 or later).
So I doubt that the missing properties are an issue. It is possible to change the creation date of the document that you are re-creating, by changing the system clock before building your new document (based on the previous document's content). I am not recommending these steps.

SOLUTION
It turned out that there were no problem with the properties.
It was simply so that somewhere in the services that provided the document data an extra blank character was added at the end of the file data.
This resulted in a mismatch of the expected file length and the actual one, and therefore the office components complained when trying to open the documents.
And this also prevented the properties of the file to be parsed.
What was pretty annoying, was that file comparison tools did not catch this. (Or maybe there need to be some configuration of trailing spaces not beeing ignored.)

Binary editing in java

I have a file that I am trying to binary edit to cut off a header.
I have identified the start address of the actual data I want to keep in the file, however I am trying to find a way in Java where I can specify a range of bytes to delete from a file.
At the moment I am reading the file in a (Buffered)FileInputStream, and the only way I can see to cut off the header of this file is to save from my start address to the end of the file in memory, then write that out overwriting the original file.
Is there any functionality to remove bits in files without having to go through the process of creating a whole new file?

There is a method to truncate the file (setLength) but there is not API to remove an arbitrary sequence from inside.
If the file is so large that there is a performance issue to rewrite it, I suggest to split it into several files. Some performance maybe can be gained by using RandomAccessFile to seek to the point of deletion, rewrite from there and then truncate.

Try this, it uses a RandomAccessFile to wipe out the un-needed parts of the file, by first seeking to the start index, then wiping the un-needed characters onwards.
import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;
public class Main {
public static void main(String[] args) {
int startIndex = 21;
int numberOfCharsToRemove = 20;
// Using a RandomAccessFile, overwirte the part you want to wipe
// out using the NUL character
try (RandomAccessFile raf = new RandomAccessFile(new File("/Users/waleedmadanat/Desktop/sample.txt"), "rw")) {
raf.seek(startIndex);
for (int i = 1; i <= numberOfCharsToRemove; i++) {
raf.write('\u0000');
}
} catch (IOException e) {
e.printStackTrace();
}
}
}

I couldn't find any API method to perform what I wanted (goes with the answers above)
I solved it by just re-writing the file back out to a new file, then replacing the old one with the new one.
I used the following code to perform the replacement:
FileOutputStream fout = new FileOutputStream(inFile.getAbsolutePath() + ".tmp");
FileChannel chanOut = fout.getChannel();
FileChannel chanIn = fin.getChannel();
chanIn.transferTo(pos, chanIn.size(), chanOut);
Where pos is my start address to begin the file transfer, which occurs directly under the header that I am cutting out of this file.
I have also noticed no slowdowns using this method

How to open a file without saving it to disk

My Question: How do I open a file (in the system default [external] program for the file) without saving the file to disk?
My Situation: I have files in my resources and I want to display those without saving them to disk first. For example, I have an xml file and I want to open it on the user's machine in the default program for reading xml file without saving it to the disk first.
What I have been doing: So far I have just saved the file to a temporary location, but I have no way of knowing when they no longer need the file so I don't know when/if to delete it. Here's my SSCCE code for that (well, it's mostly sscce, except for the resource... You'll have to create that on your own):
package main;
import java.io.*;
public class SOQuestion {
public static void main(String[] args) throws IOException {
new SOQuestion().showTemplate();
}
/** Opens the temporary file */
private void showTemplate() throws IOException {
String tempDir = System.getProperty("java.io.tmpdir") + "\\BONotifier\\";
File parentFile = new File(tempDir);
if (!parentFile.exists()) {
parentFile.mkdirs();
}
File outputFile = new File(parentFile, "template.xml");
InputStream inputStream = getClass().getResourceAsStream("/resources/template.xml");
int size = 4096;
try (OutputStream out = new FileOutputStream(outputFile)) {
byte[] buffer = new byte[size];
int length;
while ((length = inputStream.read(buffer)) > 0) {
out.write(buffer, 0, length);
}
inputStream.close();
}
java.awt.Desktop.getDesktop().open(outputFile);
}
}

Because of this line:
String tempDir = System.getProperty("java.io.tmpdir") + "\\BONotifier\\";
I deduce that you're working on Windows. You can easily make this code multiplatform, you know.
The answer to your question is: no. The Desktop class needs to know where the file is in order to invoke the correct program with a parameter. Note that there is no method in that class accepting an InputStream, which could be a solution.
Anyway, I don't see where the problem is: you create a temporary file, then open it in an editor or whatever. That's fine. In Linux, when the application is exited (normally) all its temporary files are deleted. In Windows, the user will need to trigger the temporary files deletion. However, provided you don't have security constraints, I can't understand where the problem is. After all, temporary files are the operating system's concern.

Depending on how portable your application needs to be, there might be no "one fits all" solution to your problem. However, you can help yourself a bit:
At least under Linux, you can use a pipe (|) to direct the output of one program to the input of another. A simple example for that (using the gedit text editor) might be:
echo "hello world" | gedit
This will (for gedit) open up a new editor window and show the contents "hello world" in a new, unsaved document.
The problem with the above is, that this might not be a platform-independent solution. It will work for Linux and probably OS X, but I don't have a Windows installation here to test it.
Also, you'd need to find out the default editor by yourself. This older question and it's linked article give some ideas on how this might work.

I don't understand your question very well. I can see only two possibilities to your question.
Open an existing file, and you wish to operate on its stream but do not want to save any modifications.
Create a file, so that you could use file i/o to operate on the file stream, but you don't wish to save the stream to file.
In either case, your main motivation is to exploit file i/o existingly available to your discretion and programming pleasure, am I correct?
I have feeling that the question is not that simple and this my answer is probably not the answer you seek. However, if my understanding of the question does coincide with your question ...
If you wish to use Stream io, instead of using FileOutputStream or FileInputStream which are consequent to your opening a File object, why not use non-File InputStream or OutputStream? Your file i/o utilities will finally boil down to manipulating i/o streams anyway.
http://docs.oracle.com/javase/7/docs/api/java/io/OutputStream.html
http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html
No need to involve temp files.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Detecting a PDF Package or Portfolio in code - java

Related

How can I open a FileInputStream that has its share set to allow ReadWrite?

Can't delete file after calling FontFactory.getFont() method

Properties missing in docx file

Binary editing in java

How to open a file without saving it to disk

Categories

Resources