I'm wondering if is there a way to obtain the content of a pdf file (raw bytes) as a String using Apache PdfBox 2.0.8. What I'm doing is to save the PDDocument object to a ByteArrayOutputStream and then create a new String getting ByteArrayOutputStream's byte array. But if I save the String to a file, the result is a blank pdf. The reason for this is because pdf's stream section bytes are different from a pdf created directly from PdDocument object to a file. After knowing this, I tried to get the ByteArrayOutputStream's character encoding using juniversalchardet, but no luck. So, is there a way to acomplish this?
This is what I have tried so far:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PDDocument doc = new PDDocument();
... //Add page, font, pdPageContentStream and text only to doc object with some latin chars (áéíóú)
doc.save(baos);
So, if I create a file using baos object, the pdf file looks as expected, but if I do this:
String str = new String(baos.toByteArray());
And then create a file using str bytes, the pdf file only shows a blank page.
Hope I was clear enough this time :)
Using this, just append everything to a String.
StringBuilder sb = new StringBuilder();
try (PDDocument document = PDDocument.load(new File("your\\path\\file.pdf"))) {
document.getClass();
if (!document.isEncrypted()) {
PDFTextStripperByArea stripper = new PDFTextStripperByArea();
stripper.setSortByPosition(true);
PDFTextStripper tStripper = new PDFTextStripper();
String pdfFileInText = tStripper.getText(document);
String lines[] = pdfFileInText.split("\\r?\\n");
for (String line : lines) {
sb.append(line);
}
}
}
return sb.toString();
Related
I have a String that i want to attach either to the pdf raw code somehow or, what i'm trying here is adding the string as a text file to the pdf, this is what i've come up with but it does not work.
Document document = new Document();
String directoryPath = "/storage/emulated/0/Download";
PdfWriter.getInstance(document, new FileOutputStream(directoryPath+"/Aa.pdf"));
String data = PaintView.full_string_return();
System.out.println(data);
PdfReader reader = new PdfReader(directoryPath+"/Aa.pdf");
PdfStamper stamper = new PdfStamper(reader,new FileOutputStream(directoryPath+"/Aa.pdf") );
PdfFileSpecification fs = PdfFileSpecification.fileEmbedded(stamper.getWriter(), null, String.format("data.txt",data),String.format("data",data).getBytes(StandardCharsets.UTF_8));
stamper.addFileAttachment(String.format("data",data),fs);
stamper.close();
I know i'm not using the pdfWriter here but i use it elsewhere, just added it in case it is messing up something.
I am writing code to read pdf file in selenium using Java PDF Library.
I wrote my code as
URL url = new URL(str);
InputStream is=url.openStream();
BufferedInputStream fileParse=new BufferedInputStream(is);
PDDocument document=null;
document=PDDocument.load(fileParse);
String pdfContent=new PDFTextStripper().getText(document);
But I am getting error at line document=PDDocument.load(fileParse) along with
java.io.IOException: Error: End-of-File, expected line
at org.apache.pdfbox.pdfparser.BaseParser.readLine(BaseParser.java:1119)
at org.apache.pdfbox.pdfparser.COSParser.parseHeader(COSParser.java:2017)
at org.apache.pdfbox.pdfparser.COSParser.parsePDFHeader(COSParser.java:1988)
at org.apache.pdfbox.pdfparser.PDFParser.parse(PDFParser.java:269)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1143)
at org.apache.pdfbox.pdmodel.PDDocument.load(PDDocument.java:1040)
I need to verify the content on the pdf file .
Appreciate the help.
Thanks
Simply you can use below line of code and its working:
//Loading an existing document
File file = new File("yourPdfFilepath");
PDDocument document = PDDocument.load(file);
//Instantiate PDFTextStripper class
PDFTextStripper pdfStripper = new PDFTextStripper();
//Retrieving text from PDF document
String pdfcontent = pdfStripper.getText(document);
I hope it helps you
Does anyone know how image file can be easily converted into PDF format. What I need is to get the image from database and display it on the screen as PDF. What am I doing wrong? I tried to use iText but with no results.
My code:
StreamResource resource = file.downloadFromDatabase();//get file from db
Document converToPdf=new Document();//Create Document Object
PdfWriter.getInstance(convertToPdf, new FileOutputStream(""));//Create PdfWriter for Document to hold physical file
convertToPdf.open();
Image convertJpg=Image.getInstance(resource); //Get the input image to Convert to PDF
convertToPdf.add(convertJpg);//Add image to Document
Embedded pdf = new Embedded("", convertToPdf);//display document
pdf.setMimeType("application/pdf");
pdf.setType(Embedded.TYPE_BROWSER);
pdf.setSizeFull();
Thanks.
You're not using iText correctly:
You never close your writer, so the addition of the image never gets written to the outputstream.
You pass an empty string to your FileOutputStream. If you want to keep the pdf in memory, use a ByteArrayOutputStream. If not, define a temporary name instead.
You pass your Document object, which is a iText-specific object to your Embedded object and treat it like a file. It is not a pdf-file or byte[]. You'll probably want to pass either your ByteArrayOutputStream or read the temp file as a ByteArrayOutputStream into memory and pass that to Embedded.
Maybe someone will use (Vaadin + iText)
Button but = new Button("FV");
StreamResource myResource = getPDFStream();
FileDownloader fileDownloader = new FileDownloader(myResource);
fileDownloader.extend(but);
hboxBottom.addComponent( but );
private StreamResource getPDFStream() {
StreamResource.StreamSource source = new StreamResource.StreamSource() {
public InputStream getStream() {
// step 1
com.itextpdf.text.Document document = new com.itextpdf.text.Document();
// step 2
ByteArrayOutputStream baos = new ByteArrayOutputStream();
try {
com.itextpdf.text.pdf.PdfWriter.getInstance(document, baos);
// step 3
document.open();
document.add(Chunk.NEWLINE); //Something like in HTML :-)
document.add(new Paragraph("TEST" ));
document.add(Chunk.NEWLINE); //Something like in HTML :-)
document.newPage(); //Opened new page
//document.add(list); //In the new page we are going to add list
document.close();
//file.close();
System.out.println("Pdf created successfully..");
} catch (DocumentException ex) {
Logger.getLogger(WndOrderZwd.class.getName()).log(Level.SEVERE, null, ex);
}
ByteArrayOutputStream stream = baos;
InputStream input = new ByteArrayInputStream(stream.toByteArray());
return input;
}
};
StreamResource resource = new StreamResource ( source, "test.pdf" );
return resource;
}
I am converting a text file to PDF using iText. The conversion works fine but I need that during conversion if the BufferedReader encounters a certain text, a new PDF Page is Started. This is what I have tried But A new Page is not started when that Text is encountered. My Sample code is as Below(Just the relevant part).
Document output = new Document(PageSize.B3);
FileInputStream fs = new FileInputStream("C:/ABC Statements final/File.TXT");
FileOutputStream file = new FileOutputStream(new File("C:/Pdf Statements/File.PDF"));
BufferedReader br = new BufferedReader(new InputStreamReader(fs));
PdfWriter writer = PdfWriter.getInstance(output, file);
output.open();
writer.open();
.............................
String pageend = "Page Total";
String trimmedend = br.readLine().trim();
if (trimmedend.startsWith(pageend)) {
output.newPage();
}
Maybe you need to change your if-statement to something like this:
String pageend = "page total";
...
if (trimmedend.toLowerCase().contains(pageend)) {
...
}
This way, you avoid case-sensitivity and you avoid the problem of having characters that aren't considered being white space before "page total". Of course: this is just an educated guess. I don't know what your original data stream looks like.
I want to convert a PDF file into a CSV file.
I am using iText library for this.
The program is working fine but the output is not in desired format.
All the data is coming in first line of the csv file. The output should be exactly same as pdf file(means with line breaks).
Please help.
Thanks in advance.
Document document = new Document();
document.open();
PdfReader reader = new PdfReader("C:\\Indiaops-projects\\PREMIUM_PAID_ACKNOWLEDGEMENT.pdf");
PdfDictionary dictionary = reader.getPageN(1);
AcroFields fileds = reader.getAcroFields();
PRIndirectReference reference = (PRIndirectReference)
dictionary.get(PdfName.CONTENTS);
PRStream stream = (PRStream) PdfReader.getPdfObject(reference);
byte[] bytes = PdfReader.getStreamBytes(stream);
PRTokeniser tokenizer = new PRTokeniser(bytes);
FileOutputStream fos=new FileOutputStream("C:\\Indiaops-projects\\pdf.csv");
StringBuffer buffer = new StringBuffer();
StringBuffer data = new StringBuffer();
int i=0;
while (tokenizer.nextToken()) {
if (tokenizer.getTokenType() == PRTokeniser.TK_STRING) {
String value = tokenizer.getStringValue();
if("x-none".equals(value)){
String datastr =data.toString();
if(!"".equals(datastr)){
buffer.append("\""+datastr+"\",");
data = new StringBuffer();
}
}else{
data.append(value);
}
}
}
String test=buffer.toString();
StringReader stReader = new StringReader(test);
int t;
while((t=stReader.read())>0)
fos.write(t);
document.add(new Paragraph(".."));
document.close();
You need to introduce a line break '\n' in the buffer after each table row.
buffer.append("\n");