PDF Box - Unable to renameTo or Delete files - java

I'm fairly new to programming and I've been trying to use PDFBox for a personal project that I have. I'm basically trying to verify if the PDF has specific keywords in it, if YES I want to transfer the file to a "approved" folder.
I know the code below is poor written, but I'm not able to transfer nor delete the file correctly:
try (Stream<Path> filePathStream = Files.walk(Paths.get("C://pdfbox_teste"))) {
filePathStream.forEach(filePath -> {
if (Files.isRegularFile(filePath)) {
String arquivo = filePath.toString();
File file = new File(arquivo);
try {
// Loading an existing document
PDDocument document = PDDocument.load(file);
// Instantiate PDFTextStripper class
PDFTextStripper pdfStripper = new PDFTextStripper();
String text = pdfStripper.getText(document);
String[] words = text.split("\\.|,|\\s");
for (String word : words) {
// System.out.println(word);
if (word.equals("Revisão") || word.equals("Desenvolvimento")) {
// System.out.println(word);
if(file.renameTo(new File("C://pdfbox_teste//Aprovados//" + file.getName()))){
document.close();
System.out.println("Arquivo transferido corretamente");
file.delete();
};
}
}
System.out.println("Fim do documento: " + arquivo);
System.out.println("----------------------------");
document.close();
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
});
I wanted to have the files transferred into the new folder. Instead, sometimes they only get deleted and sometimes nothing happens. I imagine the error is probably on the foreach, but I can't seem to find a way to fix it.

You try to rename the file while it is still open, and only close it afterwards:
// your code, does not work
if(file.renameTo(new File("C://pdfbox_teste//Aprovados//" + file.getName()))){
document.close();
System.out.println("Arquivo transferido corretamente");
file.delete();
};
Try to close the document first, so the file is no longer accessed by your process, and then it should be possible to rename it:
// fixed code:
document.close();
if(file.renameTo(new File("C://pdfbox_teste//Aprovados//" + file.getName()))){
System.out.println("Arquivo transferido corretamente");
};
And as Mahesh K pointed out, you don't have to delete the (original) file after you renamed it. Rename does not make a duplicate where the original file would need to be deleted, it just renames it.

After calling renameTo, you shouldn't be using delete.. as per my understanding renameTo works like move command. Pls see this

Related

Apache PDFBox to open temporary created PDF file

I'm using apache pdfbox 2.x version and I am trying to read a temp created file.
Below is my code to create a temp file and read it:
Path mergedTempFile = null;
try {
mergedTempFile = Files.createTempFile("merge_", ".pdf");
PDDocument pdDocument = PDDocument.load(mergedTempFile.toFile());
But it gives error:
java.io.IOException: Error: End-of-File, expected line
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1098)
at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2577)
at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:2560)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:219)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1099)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1082)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1006)
at com.howtodoinjava.demo.PdfboxApi.test(PdfboxApi.java:326)
at com.howtodoinjava.demo.PdfboxApi.main(PdfboxApi.java:317)
From this link I have got a reference but it did not help anyway:
Similar Issue Link
Please help me with this. Still, I can not get rid of this.
PDDocument.load(...) is used to parse an existing PDF.
The passed temporary file (mergedTempFile) is empty, thus the exception. Just create a PDDocument with the constructor (resides in-memory) and later save it with PDDocument.save(...).
Path mergedTempFile = null;
try {
mergedTempFile = Files.createTempFile("merge_", ".pdf");
try (PDDocument pdDocument = new PDDocument()) {
// add content
pdDocument.addPage(new PDPage()); // empty page as an example
pdDocument.save(mergedTempFile.toFile());
}
} catch (IOException e) {
// exception handling
}
// use mergedTempFile for further logic

access denied while saving file using DirectoryChooser

I'm using Apache libraries to edit DOCX file and I want user to choose dir where to save his file. It doesnt matter what folder to select it always thows an excetion and says "path (Access denied)", however, if I choose the directory in my code it works perfectly. Here's some of my code:
XWPFDocument doc = null;
try {
doc = new XWPFDocument(new ByteArrayInputStream(byteData));
} catch (IOException e) {
e.printStackTrace();
}
/* editing docx file somehow (a lot of useless code) */
Alert alert = new Alert(Alert.AlertType.INFORMATION);
DirectoryChooser dirChooser = new DirectoryChooser();
dirChooser.setTitle("Choose folder");
Stage stage = (Stage) (((Node) event.getSource()).getScene().getWindow());
File file = dirChooser.showDialog(stage);
if (file != null) {
try {
doc.write(new FileOutputStream(file.getAbsoluteFile()));
alert.setContentText("Saved to folder " + file.getAbsolutePath());
} catch (IOException e) {
alert.setContentText(e.getLocalizedMessage());
}
} else {
try {
doc.write(new FileOutputStream("C://output.docx"));
alert.setContentText("Saved to folder C:\\");
} catch (IOException e) {
alert.setContentText(e.getLocalizedMessage());
}
}
alert.showAndWait();
Please help me to figure out what I'm doing wrong :(
DirectoryChooser returns a File object which is either a directory or a null (if you did not choose one by pressing cancel or exit the dialog). So in order to save your file, you need to also append the file name to the absolute path of the directory you choose. You can do that by :
doc.write(new FileOutputStream(file.getAbsoluteFile()+"\\doc.docx"));
But this is platform dependent cause for windows it’s ‘\’ and for unix it’s ‘/’ so better use File.separator like :
doc.write(new FileOutputStream(file.getAbsoluteFile()+File.separator+"doc.docx"));
You can read more about the above here
Edit: As Fabian mentioned in the comments below you can use the File constructor, passing the folder ( the file you got from them DirectoryChooser ) and the new file name as parameters which makes the code far more readable :
new FileOutputStream(new File(file, "doc.docx"))

Android reading pdf metadata - memory issue

I'm currently building an android application that displays a set of pdf files in a ListView. Instead of just displaying the filename I want to grab the Title from the metadata of the pdf and display that in the list, if the file doesnt have a Title set then just use the filename. I'm using iText atm, here is what I have:
File[] filteredFiles = root.listFiles(filter);
for (int i=0;i<filteredFiles.length;i++) {
try {
File f = filteredFiles[i];
PdfReader reader = new PdfReader(f.getAbsolutePath());
String title = reader.getInfo().get("Title");
reader.close();
//Do other stuff here...
} catch (Exception e) {
e.printStackTrace();
}
}
This works fine, its gets the data I want, but its slowww. Also, sometimes I get memory crashes if the file is over 2MB. Is there a better way of doing this? Maybe a way of getting the metadata without having to actually open the pdf file?
Any help is much appreciated, Thanks.
You can try fast PDFParse library. It optimized for performance & small memory consumption.
File[] filteredFiles = root.listFiles(filter);
for (int i=0;i<filteredFiles.length;i++) {
try {
File f = filteredFiles[i];
PDFDocument reader = new PDFDocument(f.getAbsolutePath());
String title = reader.getDocumentInfo().getTitle();
reader.close();
//Do other stuff here...
} catch (Exception e) {
e.printStackTrace();
}
}

Java: Read in text files from a directory, from the internet

Does anybody know how to recursively read in files from a specific directory on the internet, in Java?
I want to read in all the text files from this web directory: http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/
I know how to read in multiple files that are in a folder on my computer, and I how to read in a single file from the internet. But how can I read in multiple files on the internet, without hardcoding the URLs in?
Stuff I tried:
// List the files on my Desktop
final File folder = new File("/Users/crystal/Desktop");
File[] listOfFiles = folder.listFiles();
for (int i = 0; i < listOfFiles.length; i++) {
File fileEntry = listOfFiles[i];
if (!fileEntry.isDirectory()) {
System.out.println(fileEntry.getName());
}
}
Another thing I tried:
// Reading data from the web
try
{
// Create a URL object
URL url = new URL("http://www.cs.ucdavis.edu/~davidson/courses/170-S11/Female/5_1_1.txt");
// Read all of the text returned by the HTTP server
BufferedReader in = new BufferedReader (new InputStreamReader(url.openStream()));
String htmlText; // String that holds current file line
// Read through file one line at a time. Print line
while ((htmlText = in.readLine()) != null)
{
System.out.println(htmlText);
}
in.close();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
// If another exception is generated, print a stack trace
e.printStackTrace();
}
Thanks!
Since the URL you mentioned has indexes enabled, you're in luck.
You've got a few options here.
Parse the html to find the attribute of the a tags, using SAX2 or any other XML parser. htmlunit would also work I think.
Use a little regexp magic to match all string between <a href=" and "> and use that as the urls to read from.
Once you've got a list of all the URLs you need, then the second piece of code should work just fine. Just iterate over your list, and construct your URL from that list.
Here's a sample regex that should match what you want. It does catch a few extra links, but you should be able to filter those out.
<a\ href="(.+?)">

File Delete and Rename in Java

I have the following Java code which will search in an xml for a specific tag and then will add some text to it and save that file. I couldnt find a way to rename the emporary file to the original file. Please suggest.
import java.io.*;
class ModifyXML {
public void readMyFile(String inputLine) throws Exception
{
String record = "";
File outFile = new File("tempFile.tmp");
FileInputStream fis = new FileInputStream("InfectiousDisease.xml");
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
FileOutputStream fos = new FileOutputStream(outFile);
PrintWriter out = new PrintWriter(fos);
while ( (record=br.readLine()) != null )
{
if(record.endsWith("<add-info>"))
{
out.println(" "+"<add-info>");
out.println(" "+inputLine);
}
else
{
out.println(record);
}
}
out.flush();
out.close();
br.close();
//Also we need to delete the original file
//outFile.renameTo(InfectiousDisease.xml);//Not working
}
public static void main (String[] args) {
try
{
ModifyXML f = new ModifyXML();
f.readMyFile("This is infectious disease data");
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
Thanks
First delete the original file and then rename the new file:
File inputFile = new File("InfectiousDisease.xml");
File outFile = new File("tempFile.tmp");
if(inputFile.delete()){
outFile.renameTo(inputFile);
}
A good method to rename files is.
File file = new File("path-here");
file.renameTo(new File("new path here"));
In your code there are several issues.
First your description mentions renameing the original file and adding some text to it. Your code doesn't do that, it opens two files, one for reading and one for writing (with the additional text). That is the right way to do things, as adding text in-place is not really feasible using the techniques you are using.
The second issue is that you are opening a temporary file. Temporary files remove themselves upon closing, so all the work you did adding your text disappears as soon as you close the file.
The third issue is that you are modifying XML files as plain text. This sometimes works as XML files are a subset of plain text files, but there is no indication that you attempted to ensure that the output file was an XML file. Perhaps you know more about your input files than is mentioned, but if you want this to work correctly for 100% of the input cases, you probably want to create a SAX writer that writes out all a SAX reader reads, with the additional information in the correct tag location.
You can use
outFile.renameTo(new File(newFileName));
You have to ensure these files are not open at the time.

Categories

Resources