Running multiple PDF through an PDFBox program - java

Currently I am trying to use PDFBox in Eclipse to run multiple PDF files in a folder through a text reader that will extract certain terms and output them into a text file that I will then convert to an excel sheet. Currently I have the program and it works correctly for a single PDF file:
public static void main(String args[]) throws IOException {
//Loading an existing document
File file = new File("ADE_acetylfuranoside_120319_pfister.pdf");
PDDocument document = PDDocument.load(file);
//Instantiate PDFTextStripper class
PDFTextStripper pdfStripper = new PDFTextStripper();
//Retrieving text from PDF document
String text = pdfStripper.getText(document);
//..."Actual code that extracts text"...
PrintStream o = new PrintStream(new File("output.txt"));
PrintStream console = System.out;
System.setOut(o);
System.out.println(finalSheet);
my problem is that I want to run 500 PDFs in one folder through this program on eclipse rather than putting in the name of each one individually. I also want it to output like:
Name1, Number1, ID1
Name2, Number2, ID2
but I think the way it is written now it will just overwrite line number one if I run multiple PDFs though it.
Thanks for the help!

For the first part, you could just use the File class with a FileFilter:
// directoryName could be as simple a "."
File folder = new File(directoryName);
File[] listOfFiles = folder.listFiles(new FileFilter() {
#Override
public boolean accept(File pathname) {
return pathname.getName().toLowerCase().endsWith(".pdf");
}
});
This gives you an array of File objects of all the files in a particular folder/directory. Now you can loop through it with pretty much the code you have.
On the output side, you'll likely want to correlate the output with the input. I'm a bit confused by your code and I'm guessing you'd just like an output file for each input file. So, perhaps, something like:
// index is the value you used to loop through the `listOfFiles` array
try( FileWriter fileWriter = new FileWriter(listOfFiles[index].getName() + ".output.txt" ) ) {
fileWriter.write( // the String text you want in the file );
}
This creates a file named (as taken from your example) "ADE_acetylfuranoside_120319_pfister.pdf.output.txt". Obviously this could change. In this case a new file is created for each input file.

Related

Jsoup java rewrites the file string which it should add

code that should read html file and write the result another file the buffered writer writes the file but when the code is run with different urlit doesn't appends but rewrites the file and the previous content disappears
the solution recuired is that when jsoup iterates new html the result should add to output file and not rewrite
changed different writer types other than buffered writer
public class WriteFile
{
public static void main(String args[]) throws IOException
{
String url = "http://www.someurl.com/registers";
Document doc = Jsoup.connect(url).get();
Elements es = doc.getElementsByClass("a_code");
for (Element clas : es)
{
System.out.println(clas.text());
BufferedWriter writer = new BufferedWriter(new FileWriter("D://Author.html"));
writer.append(clas.text());
writer.close();
}
}
}
Don't mistake the append-method of the BufferedWriter as appending content to the file. It actually appends to the given writer.
To actually append additional content to the file you need to specify that when opening the file writer. FileWriter has an additional constructor parameter allowing to specify that:
new FileWriter("D://Author.html", /* append = */ true)
You may even be interested in the Java Files API instead, so you can spare instantating your own BufferedWriter, etc.:
Files.write(Paths.get("D://Author.html"), clas.text().getBytes(), StandardOpenOption.CREATE, StandardOpenOption.APPEND);
Your loop and what you are writing may further be simplifiable to something as follows (you may then even omit the APPEND-open option again, if that makes sense):
Files.write(Paths.get("D://Author.html"),
String.join("" /* or new line? */,
doc.getElementsByClass("a_code")
.eachText()
).getBytes(),
StandardOpenOption.CREATE, StandardOpenOption.APPEND);

FileNotFoundException java cannot basic find file while it's there

I'm trying to read a basic txt file that contains prices in euros. My program is supposed to loop through these prices and then create a new file with the other prices. Now, the problem is that java says it cannot find the first file.
It is in the exact same package like this:
Java already fails at the following code:
FileReader fr = new FileReader("prices_usd.txt");
Whole code :
import java.io.*;
public class DollarToEur {
public static void main(String[] arg) throws IOException, FileNotFoundException {
FileReader fr = new FileReader("prices_usd.txt");
BufferedReader br = new BufferedReader(fr);
FileWriter fw = new FileWriter("prices_eur");
PrintWriter pw = new PrintWriter(fw);
String regel = br.readLine();
while(regel != null) {
String[] values = regel.split(" : ");
String beschrijving = values[0];
String prijsString = values[1];
double prijs = Double.parseDouble(prijsString);
double newPrijs = prijs * 0.913;
pw.println(beschrijving + " : " + newPrijs);
regel = br.readLine();
}
pw.close();
br.close();
}
}
Your file looks to be named "prices_usd" and your code is looking for "prices_usd.txt"
There are a couple of things you need to do:
Put the file directly under the project folder in Eclipse. When your execute your code in Eclipse, the project folder is considered to be the working directory. So you need to put the file there so that Java can find it.
Rename the file correctly with the .txt extn. From your screen print it looks like the file does not have an extension or may be it's just not visible.
Hope this helps!
It is bad practice to put resource files (like prices_usd.txt) in a package. Please put it under the resources/ directory. If you put it directly in the resources/ directory, you can access the file like this:
new FileReader(new File(this.getClass().getClassLoader().getResource("prices_usd.txt").getFile()));
But if you really have a good reason to put it in the package, you can access it like this:
new FileReader("src/main/java/week5/practicum13/prices_usd.txt");
But this will not work when you export your project (for example: as a jar).
EDIT 0: Also of course, your file's name needs to be "prices_usd.txt" and not just "prices_usd".
EDIT 1: The first (recommended) solution does return a string on .getFile() which can not directly be passed to the new File(...) constructor when the application is built / not run in the IDE. Spring has a solution to it though: org.springframework.core.io.ClassPathResource.
Simply use this code with Spring:
new FileReader(new ClassPathResource("prices_usd.txt").getFile());

How to convert a more than 1 doc file in a folder into text file using java

I have a code for 1 doc file only .I need to convert multiple doc file in folder into respective textfile.
Code for Single doc file Into text file:
import java.io.*;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
public class ReadDocFile {
public static void main(String[] args) {
File file = null;
try {
// Read the Doc/DOCx file
file = new File("document");
FileInputStream fis = new FileInputStream(file.getAbsolutePath());
XWPFDocument doc = new XWPFDocument(fis);
XWPFWordExtractor ex = new XWPFWordExtractor(doc);
String text = ex.getText();
//write the text in txt file
File fil = new File("D:\\wordtotextoutput\\java1new.txt");
Writer output = new BufferedWriter(new FileWriter(fil));
output.write(text);
output.close();
} catch (Exception exep) {
}
}
}
I will just give you the logic. you should be able to convert it to java
First the input docs files should land into a particular folder.
Scan the folder and get the No. of files in the folder.
Put a for loop and fetch the file one by one. Put all your code logic here inside the loop.
check the file type of the fetched file. if its .doc/.docx process it.
Process all the files in the similar way.
Later delete the processed file.
From Package org.apache.commons.io.FileUtils
You can use
FileUtils.copyDirectory(srcDir, destDir);
and then delete the old file if you want to.

Program Not Creating File. What is Wrong?

I have tried creating a file, using the code below:
import java.io.File;
public class DeleteEvidence {
public static void main(String[] args) {
File evidence = new File("cookedBooks.txt");
However, the file cookedBooks.txt does not exist anywhere on my computer. I'm pretty new to this, so I'm having problems understanding other threads about similar problems.
You have successfully created an instance of the class File, which is very different from creating actual files in your hard drive.
Instances of the File class are used to refer to files on the disk. You can use them to many things, for instance:
check if files or directories exist;
create/delete/rename files or directories; and
open "streams" to write data into the files.
To create a file in your hard disk and write some data to it, you could use, for instance, FileOutputStream.
public class AnExample {
public static void main(String... args) throws Throwable {
final File file = new File("file.dat");
try (FileOutputStream fos = new FileOutputStream(file);
DataOutputStream out = new DataOutputStream(fos)) {
out.writeInt(42);
}
}
}
Here, fos in an instance of FileOutputStream, which is an OutputStream that writes all bytes written to it to an underlying file on disk.
Then, I create an instance of DataOutputStream around that FileOutputStream: this way, we can write more complex data types than bytes and byte arrays (which is your only possibility using the FileOutputStream directly).
Finally, four bytes of data are written to the file: the four bytes representing the integer 42. Note that, if you open this file on a text editor, you will see garbage, since the code above did not write the characters '4' and '2'.
Another possibility would have been to use an OutputStreamWriter, which would give you an instance of Writer that can be used to write text (non-binary) files:
public class AnExample {
public static void main(String... args) throws Throwable {
final File file = new File("file.txt");
try (FileOutputStream fos = new FileOutputStream(file);
OutputStreamWriter out = new OutputStreamWriter(fos, StandardCharsets.UTF_8)) {
out.write("You can read this with a text editor.");
}
}
}
Here, you can open the file file.txt on a text editor and read the message written to it.
File evidence = new File(path);
evidence.mkdirs();
evidence.createNewFile();
File is an abstract concept of a file which does not have to exist. Simply creating a File object does not actually create a physical object.
You can do this in (at least) two ways.
Write something to the file (reference by the abstract File object)
Calling File#createNewFile
You can also create temporary files using File#createTempFile but I don't think this is what you are trying to achieve.
You have only created an object which can represent a file. This is just in memory though. If you want to access the file you must us ea FileInputStream or a FileOutputStream. Then it will also be created on the drive (in case of the outputstream).
FileOutputStream fo = new FileOutputStream(new File(oFileName));
fo.write("test".getBytes());
fo.close();
This is just ur creating file object by using this object u need to call one method i.e createFile() method..
So use evidence.createNewFile(); if you are creating just file.
else if u want to create file in any specific location then specify your file name
i.e File evidence=new File("path");
In this case if ur specifying any directoty
String path="abc.txt";
File file = new File(path);
if (file.createNewFile()) {
System.out.println("File is created");
}
else {
System.out.println("File is already created");
}
FileWriter fw = new FileWriter(file, true);
string ab="Hello";
fw.write(ab);
fw.write(summary);
fw.close();

File Delete and Rename in Java

I have the following Java code which will search in an xml for a specific tag and then will add some text to it and save that file. I couldnt find a way to rename the emporary file to the original file. Please suggest.
import java.io.*;
class ModifyXML {
public void readMyFile(String inputLine) throws Exception
{
String record = "";
File outFile = new File("tempFile.tmp");
FileInputStream fis = new FileInputStream("InfectiousDisease.xml");
BufferedReader br = new BufferedReader(new InputStreamReader(fis));
FileOutputStream fos = new FileOutputStream(outFile);
PrintWriter out = new PrintWriter(fos);
while ( (record=br.readLine()) != null )
{
if(record.endsWith("<add-info>"))
{
out.println(" "+"<add-info>");
out.println(" "+inputLine);
}
else
{
out.println(record);
}
}
out.flush();
out.close();
br.close();
//Also we need to delete the original file
//outFile.renameTo(InfectiousDisease.xml);//Not working
}
public static void main (String[] args) {
try
{
ModifyXML f = new ModifyXML();
f.readMyFile("This is infectious disease data");
}
catch(Exception e)
{
e.printStackTrace();
}
}
}
Thanks
First delete the original file and then rename the new file:
File inputFile = new File("InfectiousDisease.xml");
File outFile = new File("tempFile.tmp");
if(inputFile.delete()){
outFile.renameTo(inputFile);
}
A good method to rename files is.
File file = new File("path-here");
file.renameTo(new File("new path here"));
In your code there are several issues.
First your description mentions renameing the original file and adding some text to it. Your code doesn't do that, it opens two files, one for reading and one for writing (with the additional text). That is the right way to do things, as adding text in-place is not really feasible using the techniques you are using.
The second issue is that you are opening a temporary file. Temporary files remove themselves upon closing, so all the work you did adding your text disappears as soon as you close the file.
The third issue is that you are modifying XML files as plain text. This sometimes works as XML files are a subset of plain text files, but there is no indication that you attempted to ensure that the output file was an XML file. Perhaps you know more about your input files than is mentioned, but if you want this to work correctly for 100% of the input cases, you probably want to create a SAX writer that writes out all a SAX reader reads, with the additional information in the correct tag location.
You can use
outFile.renameTo(new File(newFileName));
You have to ensure these files are not open at the time.

Categories

Resources