I dont have much idea on pdf processing using java.I want to read a table in a PDF file using the iText java library. How to proceed?
You can extract text from a content stream, but for ordinary PDFs, the result will be plain text (without any structure). If there's a table on the page, that table won't be recognized as such. You'll get the content and some white space, but that's not a tabular structure! Only if you have a tagged PDF, you can obtain an XML-file. If the PDF contains tags that are recognized as table tags, this will be reflected in the PDF.
That's what I found out here
For reading content of the table from a PDF file, you only have to convert the PDF into a text file by using any API (I have used PdfTextExtracter.getTextFromPage() of iText) and then read that txt file by your Java program. After reading it the major task is done. You have to filter the data that you need, which you can do by continuously using split method of String class until you find the record you want.
Below is my code in which I have extracted part of the record from a PDF file and written it into a .CSV file. You can view the PDF file here: http://www.cea.nic.in/reports/monthly/generation_rep/actual/jan13/opm_02.pdf
public static void genrateCsvMonth_Region(String pdfpath, String csvpath) {
try {
String line = null;
// Appending Header in CSV file...
BufferedWriter writer1 = new BufferedWriter(new FileWriter(csvpath,
true));
writer1.close();
// Checking whether file is empty or not..
BufferedReader br = new BufferedReader(new FileReader(csvpath));
if ((line = br.readLine()) == null) {
BufferedWriter writer = new BufferedWriter(new FileWriter(
csvpath, true));
writer.append("REGION,");
writer.append("YEAR,");
writer.append("MONTH,");
writer.append("THERMAL,");
writer.append("NUCLEAR,");
writer.append("HYDRO,");
writer.append("TOTAL\n");
writer.close();
}
// Reading the pdf file..
PdfReader reader = new PdfReader(pdfpath);
BufferedWriter writer = new BufferedWriter(new FileWriter(csvpath,
true));
// Extracting records from page into String..
String page = PdfTextExtractor.getTextFromPage(reader, 1);
// Extracting month and Year from String..
String period1[] = page.split("PEROID");
String period2[] = period1[0].split(":");
String month[] = period2[1].split("-");
String period3[] = month[1].split("ENERGY");
String year[] = period3[0].split("VIS");
// Extracting Northen region
String northen[] = page.split("NORTHEN REGION");
String nthermal1[] = northen[0].split("THERMAL");
String nthermal2[] = nthermal1[1].split(" ");
String nnuclear1[] = northen[0].split("NUCLEAR");
String nnuclear2[] = nnuclear1[1].split(" ");
String nhydro1[] = northen[0].split("HYDRO");
String nhydro2[] = nhydro1[1].split(" ");
String ntotal1[] = northen[0].split("TOTAL");
String ntotal2[] = ntotal1[1].split(" ");
// Appending filtered data into CSV file..
writer.append("NORTHEN" + ",");
writer.append(year[0] + ",");
writer.append(month[0] + ",");
writer.append(nthermal2[4] + ",");
writer.append(nnuclear2[4] + ",");
writer.append(nhydro2[4] + ",");
writer.append(ntotal2[4] + "\n");
// Extracting Western region
String western[] = page.split("WESTERN");
String wthermal1[] = western[1].split("THERMAL");
String wthermal2[] = wthermal1[1].split(" ");
String wnuclear1[] = western[1].split("NUCLEAR");
String wnuclear2[] = wnuclear1[1].split(" ");
String whydro1[] = western[1].split("HYDRO");
String whydro2[] = whydro1[1].split(" ");
String wtotal1[] = western[1].split("TOTAL");
String wtotal2[] = wtotal1[1].split(" ");
// Appending filtered data into CSV file..
writer.append("WESTERN" + ",");
writer.append(year[0] + ",");
writer.append(month[0] + ",");
writer.append(wthermal2[4] + ",");
writer.append(wnuclear2[4] + ",");
writer.append(whydro2[4] + ",");
writer.append(wtotal2[4] + "\n");
// Extracting Southern Region
String southern[] = page.split("SOUTHERN");
String sthermal1[] = southern[1].split("THERMAL");
String sthermal2[] = sthermal1[1].split(" ");
String snuclear1[] = southern[1].split("NUCLEAR");
String snuclear2[] = snuclear1[1].split(" ");
String shydro1[] = southern[1].split("HYDRO");
String shydro2[] = shydro1[1].split(" ");
String stotal1[] = southern[1].split("TOTAL");
String stotal2[] = stotal1[1].split(" ");
// Appending filtered data into CSV file..
writer.append("SOUTHERN" + ",");
writer.append(year[0] + ",");
writer.append(month[0] + ",");
writer.append(sthermal2[4] + ",");
writer.append(snuclear2[4] + ",");
writer.append(shydro2[4] + ",");
writer.append(stotal2[4] + "\n");
// Extracting eastern region
String eastern[] = page.split("EASTERN");
String ethermal1[] = eastern[1].split("THERMAL");
String ethermal2[] = ethermal1[1].split(" ");
String ehydro1[] = eastern[1].split("HYDRO");
String ehydro2[] = ehydro1[1].split(" ");
String etotal1[] = eastern[1].split("TOTAL");
String etotal2[] = etotal1[1].split(" ");
// Appending filtered data into CSV file..
writer.append("EASTERN" + ",");
writer.append(year[0] + ",");
writer.append(month[0] + ",");
writer.append(ethermal2[4] + ",");
writer.append(" " + ",");
writer.append(ehydro2[4] + ",");
writer.append(etotal2[4] + "\n");
// Extracting northernEastern region
String neestern[] = page.split("NORTH");
String nethermal1[] = neestern[2].split("THERMAL");
String nethermal2[] = nethermal1[1].split(" ");
String nehydro1[] = neestern[2].split("HYDRO");
String nehydro2[] = nehydro1[1].split(" ");
String netotal1[] = neestern[2].split("TOTAL");
String netotal2[] = netotal1[1].split(" ");
writer.append("NORTH EASTERN" + ",");
writer.append(year[0] + ",");
writer.append(month[0] + ",");
writer.append(nethermal2[4] + ",");
writer.append(" " + ",");
writer.append(nehydro2[4] + ",");
writer.append(netotal2[4] + "\n");
writer.close();
} catch (IOException ioe) {
ioe.printStackTrace();
}
}
My Solution
package com.geek.tutorial.itext.table;
import java.io.FileOutputStream;
import com.lowagie.text.pdf.PdfPTable;
import com.lowagie.text.pdf.PdfPCell;
import com.lowagie.text.pdf.PdfWriter;
import com.lowagie.text.Document;
import com.lowagie.text.Paragraph;
public class SimplePDFTable
{
public SimplePDFTable() throws Exception
{
Document document = new Document();
PdfWriter.getInstance(document,
new FileOutputStream("SimplePDFTable.pdf"));
document.open();
PdfPTable table = new PdfPTable(2); // Code 1
// Code 2
table.addCell("1");
table.addCell("2");
// Code 3
table.addCell("3");
table.addCell("4");
// Code 4
table.addCell("5");
table.addCell("6");
// Code 5
document.add(table);
document.close();
}
public static void main(String[] args)
{
try
{
SimplePDFTable pdfTable = new SimplePDFTable();
}
catch(Exception e)
{
System.out.println(e);
}
}
}
Related
class ReadPDF {
public void Read() throws IOException {
int amountOfWords = 0;
int amountOfChars = 0;
String sourceCode ="";
try {
PDDocument doc = PDDocument.load(new File("C:\\Users\\ccw\\Desktop\\articles\\RECYCLING-BEHAVIOUR-AMONG-MALAYSIAN-TERTIARY-STUDENTS.pdf"));
String text = new PDFTextStripper().getText(doc);
sourceCode = sourceCode.replace ("-", "").replace (".", "");
while(doc!=null){
String[] words = sourceCode.split(" ");
amountOfWords = amountOfWords + words.length;
for (String word : words) {
amountOfChars = amountOfChars + word.length();
}
}
System.out.println("Amount of Chars is " + amountOfChars);
System.out.println("Amount of Words is " + (amountOfWords + 1));
System.out.println("Average Word Length is "+ (amountOfChars/amountOfWords));
}catch (IOException e) {
System.out.println(e);
}
}
}
I'm trying to count all the words and character in a pdf file by using pdfbox.
But now i getting an error, sourceCode is not initialize
Replace this line sourceCode = sourceCode.replace ("-", "").replace (".", ""); with sourceCode = text.replace ("-", "").replace (".", ""); .and remove the while loop
Hey I am trying to figure out how to get my save / load methods to put the .txt file they generate into a zip and how can I set the location of the file. this is my code so far :
save method - pulls data from the current array list and writes them to the file
public void save(String path) {
ArrayList<String> lines = new ArrayList<String>();
for (int i = 0; i < this.listaKorisnika.size(); i++) {
User korisnik = this.listaKorisnika.get(i);
String ime = korisnik.getIme();
String prezime = korisnik.getPrezime();
LocalDate datum = korisnik.getDatumRodjenja();
DateTimeFormatter dtf = DateTimeFormatter.ofPattern("dd.MM.yyyy.");
String formiraniDatum = dtf.format(datum);
String jmbg = korisnik.getJmbg();
String zanimanjePoSk = korisnik.getZanimanjePoSkoli();
String gdeRadi = korisnik.getGdeRadi();
String bolesti = korisnik.getBolesti();
String alergije = korisnik.getAlergije();
String line = ime + ";" + prezime + ";" + formiraniDatum + ";" + jmbg + ";" + zanimanjePoSk + ";" + gdeRadi
+ ";" + bolesti + ";" + alergije;
lines.add(line);
}
try {
Files.write(Paths.get(path), lines, Charset.defaultCharset(), StandardOpenOption.CREATE,
StandardOpenOption.TRUNCATE_EXISTING, StandardOpenOption.WRITE);
} catch (java.io.IOException e) {
System.out.println("Datoteka " + path + " nije pronađena.");
}
}
// this load method pulls data from the generated .txt file and puts them in memory.
public void load(String path) {
this.listaKorisnika = new ArrayList<User>();
List<String> lines;
try {
lines = Files.readAllLines(Paths.get(path), Charset.defaultCharset());
for (String line : lines) {
String[] attributes = line.split(";");
/*
* String line = ime +";"+ prezime +";"+ formiraniDatum +";"+ jmbg +";"+
* zanimanjePoSk +";"+ gdeRadi +";"+ bolesti +";"+ alergije;
*/
String ime = attributes[0];
String prezime = attributes[1];
String datum = attributes[2];
DateTimeFormatter dtf = DateTimeFormatter.ofPattern("dd.MM.yyyy.");
LocalDate datumZaCuvanje = LocalDate.parse(datum, dtf);
String jmbg = attributes[3];
String zanimanjePoSkoli = attributes[4];
String gdeRadi = attributes[5];
String bolesti = attributes[6];
String Alergije = attributes[7];
User user = new User(ime, prezime, datumZaCuvanje, jmbg, zanimanjePoSkoli, gdeRadi, bolesti, Alergije);
this.listaKorisnika.add(user);
}
} catch (java.io.IOException e) {
System.out.println("Datoteka " + path + " nije pronađena.");
} catch (Exception e) {
System.out.println("Desila se greška pri parsiranju datuma.");
}
}
I have this:
for (String[] aZkratkyArray1 : zkratkyArray) {
String oldString = " " + aZkratkyArray1[0] + " ";
String firstString = aZkratkyArray1[0] + " ";
String newString = " " + aZkratkyArray1[1] + " ";
System.out.println(newString);
System.out.println(fileContentsSingle);
fileContentsSingle = fileContentsSingle.replaceAll(oldString, newString);
if (fileContentsSingle.startsWith(firstString)) {
fileContentsSingle = aZkratkyArray1[1] + " " + fileContentsSingle.substring(firstString.length(),fileContentsSingle.length());
}
}
fileContentsSingle is just some regular string, aZkratkyArray is array with shortened words, f.e.:
ht, hello there
wru, who are you
So when fileContentsSingle = ht I am robot
it should end up : hello there I am robot
or when fileContentsSingle = I am robot hru
it should end up : I am robot who are you
But when I sysout fileContentsSingle after this iteration, or during it, string is never changed.
I tried both replace and replaceAll, I tried probably everything I could think of.
Where is the mistake?
EDIT:
This is how I import array:
String[][] zkratkyArray;
try {
LineNumberReader lineNumberReader = new LineNumberReader(new FileReader("zkratky.csv"));
lineNumberReader.skip(Long.MAX_VALUE);
int lines = lineNumberReader.getLineNumber();
lineNumberReader.close();
FileReader fileReader = new FileReader("zkratky.csv");
BufferedReader reader = new BufferedReader(fileReader);
zkratkyArray = new String[lines + 1][2];
String line;
int row = 0;
while ((line = reader.readLine()) != null) {
String[] array = line.split(",");
for (int i = 0; i < array.length; i++) {
zkratkyArray[row][i] = array[i];
}
row++;
}
reader.close();
fileReader.close();
} catch (FileNotFoundException e) {
System.out.println("Soubor se zkratkami nenalezen.");
zkratkyArray = new String[0][0];
}
Your code will work correctly for "ht I am robot". If you print fileContentsSingle after your for loop, it will print what you expect it to print:
final String[][] zkratkyArray = new String[2][];
zkratkyArray[0] = new String[] { "ht", "hello there" };
zkratkyArray[1] = new String[] { "wru", "who are you" };
String fileContentsSingle = "ht I am robot";
for (String[] aZkratkyArray1 : zkratkyArray) {
String oldString = " " + aZkratkyArray1[0] + " ";
String firstString = aZkratkyArray1[0] + " ";
String newString = " " + aZkratkyArray1[1] + " ";
fileContentsSingle = fileContentsSingle.replaceAll(oldString, newString);
if (fileContentsSingle.startsWith(firstString)) {
fileContentsSingle = aZkratkyArray1[1] + " "
+ fileContentsSingle.substring(firstString.length(), fileContentsSingle.length());
}
}
System.out.println(fileContentsSingle); // prints "hello there I am robot"
Concerning "I am robot hru", it will not work because "hru" is at the end of the String, and not followed by a space, and the String you are replacing is " hru " (with spaces before and after).
As you don't use regexps, you don't need replaceAll(), and you can use replace() instead.
Using regexps, you can do a more generic solution working everywhere in the line:
final String[][] zkratkyArray = new String[2][];
zkratkyArray[0] = new String[] { "ht", "hello there" };
zkratkyArray[1] = new String[] { "wru", "who are you" };
String fileContentsSingle = "ht I am robot wru";
for (String[] aZkratkyArray1 : zkratkyArray) {
fileContentsSingle = fileContentsSingle.replaceAll("\\b" + Pattern.quote(aZkratkyArray1[0]) + "\\b",
Matcher.quoteReplacement(aZkratkyArray1[1]));
}
System.out.println(fileContentsSingle); // hello there I am robot who are you
I don't think you are using any regex here. You are just reading a suustring and replace it with another one.
Just use the other version which does not use regex:
fileContentsSingle.replace(oldString, newString);
In the end, I found out that I had BOM's in input.csv file.
I would like to know how can I check for a specific string on a line of a text file, save it in an array, and then move on to the next line of that text file.
For e.g.:
(what he is/ year edition/name/age/profession/number)
competitor 2014 joseph 21 student 20232341
competitor 2013 michael 23 engineer 23425123
As output, it would give me this:
Song Festival'2014
here are the competitors:
Joseph, student of 21 years - 20232341
Song Festival'2013
are the competitors
Michael, engineer of 23 years - 23425123
edit: java language
For this you would use a BufferedReader, you can format things in the text file using the tab (indent) which would be read from the BufferedReader as '\t' I'll write an example for you in a moment.
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
public class StackoverflowExample {
private static final int INFORMATION_CAP = 100;
private static String[][] INFORMATION = new String[INFORMATION_CAP][INFORMATION_CAP];
public StackoverflowExample() {
String currentLine;
String[] textData;
int lineID = 0;
boolean endOfFile = false;
try {
BufferedReader reader = new BufferedReader(new FileReader("textfile.txt"));
currentLine = reader.readLine();
while(!endOfFile && currentLine != null) {
currentLine = currentLine.trim();
textData = currentLine.split("\\t");
String userType = textData[0];
String year = textData[1];
String name = textData[2];
String age = textData[3];
String profession = textData[4];
String number = textData[5];
INFORMATION[lineID] = new String[] { userType, year, name, age, profession, number };
lineID++;
currentLine = reader.readLine();
}
} catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
new StackoverflowExample();
for(int i = 0; i < INFORMATION.length; i++) {
System.out.println("Type: " + INFORMATION[i][0] +
"year: " + INFORMATION[i][1] +
"name: " + INFORMATION[i][2] +
"age: " + INFORMATION[i][3] +
"profession: " + INFORMATION[i][4] +
"number: " + INFORMATION[i][5]);
}
}
}
Create a text file called "textfile.txt" in your projects main directory, and format it like this
type(tab)year(tab)name(tab)age(tab)profession(tab)number(newline)
I am trying to export the 4 columns with the below code.the last column organization is a List.
String appname = "abc";
String path = "//home/exportfile//";
String filename = path + "ApplicationExport-" + appname + ".txt";
String ret = "false";
QueryOptions ops = new QueryOptions();
Filter[] filters = new Filter[1];
filters[0] = Filter.eq("application.name", appname);
ops.add(filters);
List props = new ArrayList();
props.add("identity.name");
// Do search
Iterator it = context.search(Link.class, ops, props);
// Build file and export header row
BufferedWriter out = new BufferedWriter(new FileWriter(filename));
out.write("IdentityName,UserName,WorkforceID,Organization");
out.newLine();
// Iterate Search Results
if (it != null) {
while (it.hasNext()) {
// Get link and create object
Object[] record = it.next();
String identityName = (String) record[0];
Identity user = (Identity) context.getObject(Identity.class, identityName);
// Get Identity attributes for export
String workforceid = (String) user.getAttribute("workforceID");
// Get application attributes for export
String userid = "";
List links = user.getLinks();
if (links != null) {
Iterator lit = links.iterator();
while (lit.hasNext()) {
Link l = lit.next();
String lname = l.getApplicationName();
if (lname.equalsIgnoreCase(appname)) {
userid = (String) l.getAttribute("User Name");
List organizations = l.getAttribute("Organization");
StringBuilder sb = new StringBuilder();
String listItemsSeparator = ",";
for (Object organization : organizations) {
sb.append(organization.toString());
sb.append(listItemsSeparator);
}
org = sb.toString().trim();
}
}
}
// Output file
out.write(identityName + "," + userid + "," + workforceid + "," + org);
out.newLine();
out.flush();
}
ret = "true";
}
// Close file and return
out.close();
return ret;
the output of the above code will be.for ex:
IdentityName,UserName,WorkforceID,Organization
dthomas,dthomas001,12345,Finance,HR
How do i get the output in below fashion
IdentityName,UserName,WorkforceID,Organization
dthomas,dthomas001,12345,Finance
dthomas,dthomas001,12345,HR
what and where i need to change the code?
You'll have to write one line to the file for each organization. So, basically, do not concatenate all organizations for a user with the string builder and move the output statements into the for loop that iterates through the organizations.
But it's difficult to provide a working example, because you're code you've shown doesn't compile yet...
This should bring you somewhat closer to the solution:
if (links != null) {
Iterator lit = links.iterator();
while (lit.hasNext()) {
Link l = lit.next();
String lname = l.getApplicationName();
if (lname.equalsIgnoreCase(appname)) {
userid = (String) l.getAttribute("User Name");
List organizations = l.getAttribute("Organization");
for (Object organization : organizations) {
// Output file
out.write(identityName + "," + userid + "," + workforceid + "," + organization);
out.newLine();
out.flush();
}
}
}
}
Remove this innermost for block and associated variables:
StringBuilder sb = new StringBuilder();
for (Object organization : organizations)
{
sb.append(organization.toString());
sb.append(listItemsSeparator);
}
org = sb.toString().trim();
Move the declaration of organizations outside the if (it != null) { block:
// Get application attributes for export
String userid = "";
List organizations = null;
List links = user.getLinks();
if (it != null)
{
Iterator lit = links.iterator();
while (lit.hasNext())
{
Link l = lit.next();
String lname = l.getApplicationName();
if (lname.equalsIgnoreCase(appname))
{
userid = (String) l.getAttribute("User Name");
organizations = l.getAttribute("Organization");
And then change this file output code:
// Output file
out.write(identityName + "," + userid + "," + workforceid + "," + org);
out.newLine();
To this:
// Output file
for (Object organization : organizations)
{
out.write(identityName + "," + userid + "," + workforceid + "," + organization.toString());
out.newLine();
}