Possible bug with load() and parse() methods in PDFBox?

Possible bug with load() and parse() methods in PDFBox? - java

I tried to use PDFBox on regular .pdf files and it worked fine.
However when I encountered a corrupted .pdf , the code would "freeze" .. not throwing errors or something .. simply the load or parse function take forever to execute
Here is the corrupted file (i have zipped it so that everybody could download it), it is probably not a native pdf file but it was saved as a .pdf extension and it is only 4 Kb.
I am not an expert at all, but I think that this is a bug with PDFBox. According to documentation, both load() and parse() methods are supposed to throw exceptions if they fail. However in case with my file, the code would take forever to execute and not throw exception.
I tried using only load, one can try parse() .. the result is the same
import java.io.FileNotFoundException;
import java.io.IOException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.util.PDFTextStripper;
public class TestTest {
public static void main(String[] args) throws FileNotFoundException, IOException {
System.out.println(pdfToText("C:\\..............MYFILE.pdf"));
System.out.println("done ! ! !");
}
private static String pdfToText(String fileName) throws IOException {
PDDocument document = null;
document = PDDocument.load(new File(fileName)); // THIS TAKES FOREVER
PDFTextStripper stripper = new PDFTextStripper();
document.close();
return stripper.getText(document);
}
}
How to force this code throw an exception or stop executing if the .pdf file is corrupted?
Thanks

Try this solution:
private static String pdfToText(String fileName) {
PDDocument document = null;
try {
document = PDDocument.load(fileName);
PDFTextStripper stripper = new PDFTextStripper();
return stripper.getText(document);
} catch (IOException e) {
System.err.println("Unable to open PDF Parser. " + e.getMessage());
return null;
} finally {
if (document != null) {
try {
document.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}

For implementing simple timeouts for 3rd party libs I often use an implementation like Apache Commons ThreadMonitor:
long timeoutInMillis = 1000;
try {
Thread monitor = ThreadMonitor.start(timeoutInMillis);
// do some work here
ThreadMonitor.stop(monitor);
} catch (InterruptedException e) {
// timed amount was reached
}
Example code is from Apache's ThreadMonitor Javadoc.
I only use this when the 3rd party API does not provide some timeout mechanism, of course.
However I was forced to tweak this a bit some weeks ago, because this solution does not work well with (3rd party) code that is using Exception masking.
In particular we run into problems with c3p0 which masks all Exceptions (and in particular InterruptedExceptions). Our solution was to tweak the implementation to also check the exception's cause chain for InterruptedExceptions.

Related

Add txt files to a runnable JAR file

I'm trying to make a runnable jar file and I'm having problems with my .txt files.
My program also have images, but fortunately I've figured out how to manage them. I'm using something like this with them and it works just fine both Eclipse and the jar:
logoLabel.setIcon(new ImageIcon(getClass().getResource("/logo.png")));
My problem is when I've something like this in one of my classes:
try {
employeeList = (TreeSet<Employee>) ListManager.readFile("list/employeeList.txt");
} catch (ClassNotFoundException i) {
i.printStackTrace();
} catch (IOException i) {
i.printStackTrace();
}
And this in the class ListManager that I use to read my lists serialized in the .txt files:
public static Object readFile(String file) throws IOException, ClassNotFoundException {
ObjectInputStream is = new ObjectInputStream(new FileInputStream(file));
Object o = is.readObject();
is.close();
return o;
}
I also have a similar method to write in the files.
I've tried several combinations that I've found here:
How to include text files with Executable Jar
Creating Runnable Jar with external files included
Including a text file inside a jar file and reading it
I've also tried with slash, without slash, using openStream, not using openStream... But or I get a NullPointerException or it doesn't compile at all...
Maybe is something silly or maybe is a concept error that I've of how URL class works, I'm new to programming...
Thank you very much in advance for your advice!
EDIT:
It's me again... The answer Raniz gave was just what I needed and it worked perfect, but now my problem is with the method that I use to write in the files...
public static void writeFile(Object o, String file) throws IOException {
ObjectOutputStream os = new ObjectOutputStream(new FileOutputStream(file));
os.writeObject(o);
os.close();
}
try {
ListManager.writeFile(employeeList.getEmployeeList(), "lists/employeeList.txt");
} catch (IOException i) {
i.printStackTrace();
}
Could you help me please? I don't know what I should use to replace FileOutputStream, because I think there is the problem again, am I right?
Thank you very much!

The problem is that you're trying to access a file inside of a JAR archive as a file in the file system (because that's what FileInputStream is for) and that won't work.
You can convert readFile to use an URL instead and let URL handle opening the stream for you:
public static Object readFile(URL url) throws IOException, ClassNotFoundException {
ObjectInputStream is = new ObjectInputStream(url.openStream());
Object o = is.readObject();
is.close();
return o;
}
You should also put your code in a try-statement since it currently doesn't close the streams if an IOException occurs:
public static Object readFile(URL url) throws IOException, ClassNotFoundException {
try(ObjectInputStream is = new ObjectInputStream(url.openStream())) {
Object o = is.readObject();
return o;
}
}
try {
employeeList = (TreeSet<Employee>) ListManager.readFile(getClass().getResource("/list/employeeList.txt"));
} catch (ClassNotFoundException i) {
i.printStackTrace();
} catch (IOException i) {
i.printStackTrace();
}
I also have a similar method to write in the files.
That won't work if the files are inside the JAR so you should probably consider having your files outside your JAR.

Yes, if you want to read resources from inside a jar file, you shouldn't use FileInputStream. Perhaps you should add a readResource method:
public static Object readResource(Class clazz, String resource)
throws IOException, ClassNotFoundException {
try (ObjectInputStream is =
new ObjectInputStream(clazz.getResourceAsStream(resource))) {
return is.readObject();
}
}
(I'd also suggest updating your readFile method to use a try-with-resources block - currently if there's an exception you won't close the stream...)
Note that when you say "I also have a similar method to write in the files" - you won't be able to easily write to a resource in the jar file.

How to watch file for new content and retrieve that content

I have a file with name foo.txt. This file contains some text. I want to achieve following functionality:
I launch program
write something to the file (for example add one row: new string in foo.txt)
I want to get ONLY NEW content of this file.
Can you clarify the best solution of this problem? Also I want resolve related issues: in case if I modify foo.txt I want to see diff.
The closest tool which I found in Java is WatchService but if I understood right this tool can only detect type of event happened on filesystem (create file or delete or modify).

Java Diff Utils is designed for that purpose.
final List<String> originalFileContents = new ArrayList<String>();
final String filePath = "C:/Users/BackSlash/Desktop/asd.txt";
FileListener fileListener = new FileListener() {
#Override
public void fileDeleted(FileChangeEvent paramFileChangeEvent)
throws Exception {
// use this to handle file deletion event
}
#Override
public void fileCreated(FileChangeEvent paramFileChangeEvent)
throws Exception {
// use this to handle file creation event
}
#Override
public void fileChanged(FileChangeEvent paramFileChangeEvent)
throws Exception {
System.out.println("File Changed");
//get new contents
List<String> newFileContents = new ArrayList<String> ();
getFileContents(filePath, newFileContents);
//get the diff between the two files
Patch patch = DiffUtils.diff(originalFileContents, newFileContents);
//get single changes in a list
List<Delta> deltas = patch.getDeltas();
//print the changes
for (Delta delta : deltas) {
System.out.println(delta);
}
}
};
DefaultFileMonitor monitor = new DefaultFileMonitor(fileListener);
try {
FileObject fileObject = VFS.getManager().resolveFile(filePath);
getFileContents(filePath, originalFileContents);
monitor.addFile(fileObject);
monitor.start();
} catch (InterruptedException ex) {
ex.printStackTrace();
} catch (FileNotFoundException e) {
//handle
e.printStackTrace();
} catch (IOException e) {
//handle
e.printStackTrace();
}
Where getFileContents is :
void getFileContents(String path, List<String> contents) throws FileNotFoundException, IOException {
contents.clear();
BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(path), "UTF-8"));
String line = null;
while ((line = reader.readLine()) != null) {
contents.add(line);
}
}
What I did:
I loaded the original file contents in a List<String>.
I used Apache Commons VFS to listen for file changes, using FileMonitor. You may ask, why? Because WatchService is only available starting from Java 7, while FileMonitor works with at least Java 5 (personal preference, if you prefer WatchService you can use it). note: Apache Commons VFS depends on Apache Commons Logging, you'll have to add both to your build path in order to make it work.
I created a FileListener, then I implemented the fileChanged method.
That method load new contents form the file, and uses Patch.diff to retrieve all differences, then prints them
I created a DefaultFileMonitor, which basically listens for changes to a file, and I added my file to it.
I started the monitor.
After the monitor is started, it will begin listening for file changes.

How to use StaX

Hey guys so I am brand new to the world of Java-XML parsing and found that the StaX API is probably my best bet as I need to both read and write XML files. Alright so I have a very short (and should be very simple) program that (should) create an XMLInputFactory and use that to create a XMLStreamReader. The XMLStreamReader is created using a FileInputStream attached to an XML file in the same directory as the source file. However even though the FileInputStream compiled properly, the XMLInputFactory cannot access it and without the FileInputStream it cannot creat the XMLStreamReader. Please help as I have no idea what to and am frustrated to the point of giving up!
import javax.xml.stream.*;
import java.io.*;
public class xml {
static String status;
public static void main(String[] args) {
status = "Program has started";
printStatus();
XMLInputFactory inFactory = XMLInputFactory.newInstance();
status = "XMLInputFactory (inFactory) defined"; printStatus();
try { FileInputStream fIS = new FileInputStream("stax.xml"); }
catch (FileNotFoundException na) { System.out.println("FileNotFound"); }
status = "InputStream (fIS) declared"; printStatus();
try { XMLStreamReader xmlReader = inFactory.createXMLStreamReader(fIS); } catch (XMLStreamException xmle) { System.out.println(xmle); }
status = "XMLStreamReader (xmlReader) created by 'inFactory'"; printStatus();
}
public static void printStatus(){ //this is a little code that send notifications when something has been done
System.out.println("Status: " + status);
}
}
also here is the XML file if you need it:
<?xml version="1.0"?>
<dennis>
<hair>brown</hair>
<pants>blue</pants>
<gender>male</gender>
</dennis>

Your problem has to do w/ basic java programming, nothing to do w/ stax. your FileInputStream is scoped within a try block (some decent code formatting would help) and therefore not visible to the code where you are attempting to create the XMLStreamReader. with formatting:
XMLInputFactory inFactory = XMLInputFactory.newInstance();
try {
// fIS is only visible within this try{} block
FileInputStream fIS = new FileInputStream("stax.xml");
} catch (FileNotFoundException na) {
System.out.println("FileNotFound");
}
try {
// fIS is not visible here
XMLStreamReader xmlReader = inFactory.createXMLStreamReader(fIS);
} catch (XMLStreamException xmle) {
System.out.println(xmle);
}
on a secondary note, StAX is a nice API, and a great one for highly performant XML processing in java. however, it is not the simplest XML api. you would probably be better off starting with the DOM based apis, and only using StAX if you experience performance issues using DOM. if you do stay with StAX, i'd advise using XMLEventReader instead of XMLStreamReader (again, an easier api).
lastly, do not hide exception details (e.g. catch them and print out something which does not include the exception itself) or ignore them (e.g. continue processing after the exception is thrown without attempting to deal with the problem).

iText - Fail on second attempt to generate PDF

I have a Java desktop application that is using iText to generate PDFs from a resultset. The first time you generate a PDF, it works fine. The problem comes when you try to generate a second one. It throws a DocumentException saying that the document is closed. I have tried to find other examples of people having this problem, and I come up with very little, which leads me to believe that I have made a very simple mistake and I cannot find it.
The code below is a snippet of the event handler that calls the report class:
RptPotReport report = new RptPotReport();
try {
report.rptPot();
} catch (DocumentException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
And here is the code for the report class itself. The error occurs on the second run through this code:
public class RptPotReport {
public static void main(String[] args) throws IOException, DocumentException, SQLException {
new RptPotReport().rptPot();
}
String fileOutput = "Potting Report.pdf";
public void rptPot() throws DocumentException, IOException {
File f = new File("Potting Report.pdf");
if (f.exists()) {
f.delete();
}
Document document = new Document();
document = pdfSizes.getPdfLetter();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(fileOutput));
document.open();
Phrase title = new Phrase();
title.add(new Chunk("Potting Report"));
document.add(title); // ******* DocumentException here: "The document has been closed. You can't add any Elements."
document.close();
try {
File pdfFile = new File(fileOutput);
if (pdfFile.exists()) {
if (Desktop.isDesktopSupported()) {
Desktop.getDesktop().open(pdfFile);
} else {
System.out.println("Awt Desktop is not supported!");
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
EDIT: At someone's suggestion, I tried calling the RptPotReport from a second thread, but that did not change anything. Looking into it further, the Document class of iText creates a new thread when it's instantiated. So I'm right back where I started, still stuck.

What does this line do exactly in your application:
document = pdfSizes.getPdfLetter();
Without the code and with your explanation it seems like the line sets the reference of the document variable to the one that you receive from pdfSizes.getPdfLetter(), which is reused between run, thus you no longer have the reference of the new Document() statement.
I tend to think the pdfSizes.getPdfLetter() method is bugged.

Not possible to launch a file on a network using Java Desktop?

(I have a problem that I illustrated in this question but had no correct answers. I refined my problem and tried to edit the initial question to reflect that but I guess because of the way SO displays unanswered questions it lost momentum and there is no way to revive it. So I am posting my correct question again).
I have a file that resides on a shared network location :
"\\KUROSAVVAS-PC\Users\kuroSAVVAS\Desktop\New Folder\Warsaw Panorama.JPG"
(The spaces are there intentionally)
The following code :
import java.awt.Desktop;
import java.io.File;
import java.io.IOException;
public class Test {
public static void main(String[] args) {
try {
String s = "\\\\KUROSAVVAS-PC\\Users\\kuroSAVVAS\\Desktop\\New Folder\\Warsaw Panorama.jpg";
File f = new File(s);
System.out.println(f.exists());
Desktop.getDesktop().open(f);
} catch (IOException e) {
e.printStackTrace();
}
}
}
Prints to the console that the file exists (System.out.println(f.exists());) but throws this exception! :
java.io.IOException: Failed to open file:////KUROSAVVAS-PC/Users/kuroSAVVAS/Desktop/New%20%20%20%20%20Folder/Warsaw%20%20%20%20Panorama.jpg. Error message: The system cannot find the file specified.
at sun.awt.windows.WDesktopPeer.ShellExecute(WDesktopPeer.java:59)
at sun.awt.windows.WDesktopPeer.open(WDesktopPeer.java:36)
at java.awt.Desktop.open(Desktop.java:254)
at Test.main(Test.java:13)
Has anyone any idea why something like this may happen? I have tried everything from creating URIs to decoding them afterwards... Nothing works.

With java 7 you can do this
public static void main(String[] args) throws IOException {
String s = "\\\\KUROSAVVAS-PC\\Users\\kuroSAVVAS\\Desktop\\New Folder\\Warsaw Panorama.jpg";
Path p = Paths.get(s);
Desktop.getDesktop().browse(p.toUri());
}

Java 6 solution:
public static void launchFile(File file) {
if (!Desktop.isDesktopSupported())
return;
Desktop dt = Desktop.getDesktop();
try {
dt.open(file);
} catch (IOException ex) {
// this is sometimes necessary with files on other servers ie
// \\xxx\xxx.xls
launchFile(file.getPath());
}
}
// this can launch both local and remote files
public static void launchFile(String filePath) {
if (filePath == null || filePath.trim().length() == 0)
return;
if (!Desktop.isDesktopSupported())
return;
Desktop dt = Desktop.getDesktop();
try {
dt.browse(getFileURI(filePath));
} catch (Exception ex) {
ex.printStackTrace();
}
}
// generate uri according to the filePath
private static URI getFileURI(String filePath) {
URI uri = null;
filePath = filePath.trim();
if (filePath.indexOf("http") == 0 || filePath.indexOf("\\") == 0) {
if (filePath.indexOf("\\") == 0){
filePath = "file:" + filePath;
filePath = filePath.replaceAll("#", "%23");
}
try {
filePath = filePath.replaceAll(" ", "%20");
URL url = new URL(filePath);
uri = url.toURI();
} catch (MalformedURLException ex) {
ex.printStackTrace();
} catch (URISyntaxException ex) {
ex.printStackTrace();
}
} else {
File file = new File(filePath);
uri = file.toURI();
}
return uri;
}
This answer was on the bug report, but I've edited it to fix when there is a hash.

TL;DR of ZAMMBI's answer (+1 BTW). (Using Java 6)
This works, as expected
Desktop.getDesktop().open(new File("\\\\host\\path_without\\spaces.txt")); //works
This fails, due to a known Java bug:
Desktop.getDesktop().open(new File("\\\\host\\path with\\spaces.txt")); //fails <shakes fist>
This work-around works
Desktop.getDesktop().browse(new URI("file://host/path%20with/spaces.txt")) //works (note slash direction and escape sequences)
This work-around seems like it should work, but does not:
Desktop.getDesktop().browse((new File("\\\\host\\path with\\spaces.txt")).toURI());
This work-around works, and seems to be the most general form:
File curFile = new File("\\\\host\\path with\\or_without\\spaces\\local or network.txt");
Desktop.getDesktop().browse(new URI(curFile .toURI().toString().replace("file:////","file://")));

It seems that there is a bug when you try to access a resource on a network drive with spaces in the path. See this entry in Sun's bug database.
Since the bug is already a year old, I don't think you'll get a fix anytime soon. Try the latest VM. If that doesn't help, try to get the source for WDesktopPeer. Instead of encoding the path, try to keep it as it was (with backslashes and all) and put quotes around it. That might work.
[EDIT] Specifically, don't replace \ with /, do not prepend file:// and leave the spaces as they are (instead of replacing them with %20)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Possible bug with load() and parse() methods in PDFBox? - java

Related

Add txt files to a runnable JAR file

How to watch file for new content and retrieve that content

How to use StaX

iText - Fail on second attempt to generate PDF

Not possible to launch a file on a network using Java Desktop?

Categories

Resources