xhtmlrenderer creating PDFs of length 0

xhtmlrenderer creating PDFs of length 0 - java

I am new to org.xhtmlrenderer.pdf.ITextRenderer and have this problem:
The PDFs that my test servlet streams to my Downloads folder are in fact empty files.
The relevant method, streamAndDeleteTheClob, is shown below.
The first try block is definitely not a problem.
The server spends a lot of time in the second try block. No exception thrown.
Can anyone suggest a solution to this problem or a good approach to to debugging it?
Can anyone point me to essentially similar code that really works?
Any help would be much appreciated.
res.setContentType("application/pdf");
ServletOutputStream out = res.getOutputStream();
...
private boolean streamAndDeleteTheClob(int pageid,
Connection con,
ServletOutputStream out) throws IOException, ServletException {
Statement statement;
Clob htmlpage;
StringBuffer pdfbuf = new StringBuffer();
final String pageToSendQuery = "SELECT text FROM page WHERE pageid = " + pageid;
// create xhtml file as a CLOB (Oracle large character object) and stream it into StringBuffer pdfbuf
try { // definitely no problem in this block
statement = con.createStatement();
resultSet = statement.executeQuery(pageToSendQuery);
if (resultSet.next()) {
htmlpage = resultSet.getClob(1);
} else {
return true;
}
final Reader in = htmlpage.getCharacterStream();
final char[] buffer = new char[4096];
while ((in.read(buffer)) != -1) {
pdfbuf.append(buffer);
}
} catch (Exception ex) {
out.println("buffering CLOB failed: " + ex);
}
// create pdf from StringBuffer
try {
DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document doc = builder.parse(new InputSource(new StringReader(pdfbuf.toString())));
ITextRenderer renderer = new ITextRenderer();
renderer.setDocument(doc, null);
renderer.layout();
renderer.createPDF(out);
out.close();
} catch (Exception ex) {
out.println("streaming of pdf failed: " + ex);
}
deleteClob(con, pageid);
return false;
}

Using the DocumentBuilder.parse this way will try to resolve the DTD referenced in the XHTML page. It takes a really long time. The easyest way to aviod that if you are using the Flying Saucer (xhtmlrenderer), is to create the document this way:
Document myDocument = XMLResource.load(myInputStream).getDocument();
Note that you can use XMLResource.load with a Reader too.

Two things I can think of.
1) If the iText document is not closed, it'll be empty. Looks like renderer.finish() will work, but createPDF(out) should do that already.
2) If there are no pages, you could get an empty doc as well... so an empty input could result in a 0-byte PDF.
3) You might be getting a perfectly valid PDF that's not being streamed properly. Try writing to a ByteArrayOutputStream and checking the length there.
4) An almost fanatical dedication to the Pope!

Related

Save file from a website with java

I'm trying to build a jsoup based java app to automatically download English subtitles for films (I'm lazy, I know. It was inspired from a similar python based app). It's supposed to ask you the name of the film and then download an English subtitle for it from subscene.
I can make it reach the download link but I get an Unhandled content type error when I try to 'go' to that link. Here's my code
public static void main(String[] args) {
try {
String videoName = JOptionPane.showInputDialog("Title: ");
subscene(videoName);
}
catch (Exception e) {
System.out.println(e.getMessage());
}
}
public static void subscene(String videoName){
try {
String siteName = "http://www.subscene.com";
String[] splits = videoName.split("\\s+");
String codeName = "";
String text = "";
if(splits.length>1){
for(int i=0;i<splits.length;i++){
codeName = codeName+splits[i]+"-";
}
videoName = codeName.substring(0, videoName.length());
}
System.out.println("videoName is "+videoName);
// String url = "http://www.subscene.com/subtitles/"+videoName+"/english";
String url = "http://www.subscene.com/subtitles/title?q="+videoName+"&l=";
System.out.println("url is "+url);
Document doc = Jsoup.connect(url).get();
Element exact = doc.select("h2.exact").first();
Element yuel = exact.nextElementSibling();
Elements lis = yuel.children();
System.out.println(lis.first().children().text());
String hRef = lis.select("div.title > a").attr("href");
hRef = siteName+hRef+"/english";
System.out.println("hRef is "+hRef);
doc = Jsoup.connect(hRef).get();
Element nonHI = doc.select("td.a40").first();
Element papa = nonHI.parent();
Element link = papa.select("a").first();
text = link.text();
System.out.println("Subtitle is "+text);
hRef = link.attr("href");
hRef = siteName+hRef;
Document subDownloadPage = Jsoup.connect(hRef).get();
hRef = siteName+subDownloadPage.select("a#downloadButton").attr("href");
Jsoup.connect(hRef).get(); //<-- Here's where the problem lies
}
catch (java.io.IOException e) {
System.out.println(e.getMessage());
}
}
Can someone please help me so I don't have to manually download subs?
I just found out that using
java.awt.Desktop.getDesktop().browse(java.net.URI.create(hRef));
instead of
Jsoup.connect(hRef).get();
downloads the file after prompting me to save it. But I don't want to be prompted because this way I won't be able to read the name of the downloaded zip file (I want to unzip it after saving using java).

Assuming that your files are small, you can do it like this. Note that you can tell Jsoup to ignore the content type.
// get the file content
Connection connection = Jsoup.connect(path);
connection.timeout(5000);
Connection.Response resultImageResponse = connection.ignoreContentType(true).execute();
// save to file
FileOutputStream out = new FileOutputStream(localFile);
out.write(resultImageResponse.bodyAsBytes());
out.close();
I would recommend to verify the content before saving.
Because some servers will just return a HTML page when the file cannot be found, i.e. a broken hyperlink.
...
String body = resultImageResponse.body();
if (body == null || body.toLowerCase().contains("<body>"))
{
throw new IllegalStateException("invalid file content");
}
...

Here:
Document subDownloadPage = Jsoup.connect(hRef).get();
hRef = siteName+subDownloadPage.select("a#downloadButton").attr("href");
//specifically here
Jsoup.connect(hRef).get();
Looks like jsoup expects that the result of Jsoup.connect(hRef) should be an HTML or some text that it's able to parse, that's why the message states:
Unhandled content type. Must be text/*, application/xml, or application/xhtml+xml
I followed the execution of your code manually and the last URL you're trying to access returns a content type of application/x-zip-compressed, thus the cause of the exception.
In order to download this file, you should use a different approach. You could use the old but still useful URLConnection, URL or use a third party library like Apache HttpComponents to fire a GET request and retrieve the result as an InputStream, wrap it into a proper writer and write your file into your disk.
Here's an example about doing this using URL:
URL url = new URL(hRef);
InputStream in = url.openStream();
OutputStream out = new BufferedOutputStream(new FileOutputStream("D:\\foo.zip"));
final int BUFFER_SIZE = 1024 * 4;
byte[] buffer = new byte[BUFFER_SIZE];
BufferedInputStream bis = new BufferedInputStream(in);
int length;
while ( (length = bis.read(buffer)) > 0 ) {
out.write(buffer, 0, length);
}
out.close();
in.close();

How to sign an InputStream from a PDF file with PDFBox 2.0.0

I want to sign a InputStream from a PDF file without using a temporary file.
Here I convert InputStream to File and this work fine :
InputStream inputStream = this.signatureObjPAdES.getSignatureDocument().getInputStream();
OutputStream outputStream = new FileOutputStream(new File("C:/temp.pdf"));
int read = 0;
byte[] bytes = new byte[1024];
while ((read = inputStream.read(bytes)) != -1) {
outputStream.write(bytes, 0, read);
}
PDDocument document = PDDocument.load(new File("C:/temp.pdf"));
...
document.addSignature(new PDSignature(this.dts.getDocumentTimeStamp()), this);
document.saveIncremental(new FileOutputStream("C:/result.pdf");
document.close();
But I want to do this directly :
PDDocument document = PDDocument.load(inputStream);
Problem: at run
Exception in thread "main" java.lang.NullPointerException
at java.io.RandomAccessFile.<init>(Unknown Source)
at org.apache.pdfbox.io.RandomAccessBufferedFileInputStream.<init>(RandomAccessBufferedFileInputStream.java:77)
at org.apache.pdfbox.pdmodel.PDDocument.saveIncremental(PDDocument.java:961)
All ideas are welcome.
Thank you.
EDIT:
It's now working with the release of PDFBox 2.0.0.

The cause
The immediate hindrance is in the method PDDocument.saveIncremental() itself:
public void saveIncremental(OutputStream output) throws IOException
{
InputStream input = new RandomAccessBufferedFileInputStream(incrementalFile);
COSWriter writer = null;
try
{
writer = new COSWriter(output, input);
writer.write(this, signInterface);
writer.close();
}
finally
{
if (writer != null)
{
writer.close();
}
}
}
(PDDocument.java)
The member incrementalFile used in the first line is only set during a PDDocument.load with a File parameter.
Thus, this method cannot be used.
A work-around
Fortunately the method PDDocument.saveIncremental() only uses methods and values publicly available with the sole exception of signInterface, but you know the value of it because you set it in your code in the line right before the saveIncremental call:
document.addSignature(new PDSignature(this.dts.getDocumentTimeStamp()), this);
document.saveIncremental(new FileOutputStream("C:/result.pdf"));
Thus, instead of calling PDDocument.saveIncremental() you can do the equivalent in your code.
To do so you furthermore need a replacement value for the InputStream input. It needs to return a stream with the identical content as inputStream in your
PDDocument document = PDDocument.load(inputStream);
So you need to use that stream twice. As you have not said whether that inputStream can be reset, we'll first copy it into a byte[] which we forward both to PDDocument.load and new COSWriter.
Thus, replace your
PDDocument document = PDDocument.load(inputStream);
...
document.addSignature(new PDSignature(this.dts.getDocumentTimeStamp()), this);
document.saveIncremental(new FileOutputStream("C:/result.pdf"));
document.close();
by
byte[] inputBytes = IOUtils.toByteArray(inputStream);
PDDocument document = PDDocument.load(new ByteArrayInputStream(inputBytes));
...
document.addSignature(new PDSignature(this.dts.getDocumentTimeStamp()), this);
saveIncremental(new FileOutputStream("C:/result.pdf"),
new ByteArrayInputStream(inputBytes), document, this);
document.close();
and add a new method saveIncremental to your class inspired by the original PDDocument.saveIncremental():
void saveIncremental(OutputStream output, InputStream input, PDDocument document, SignatureInterface signatureInterface) throws IOException
{
COSWriter writer = null;
try
{
writer = new COSWriter(output, input);
writer.write(document, signatureInterface);
writer.close();
}
finally
{
if (writer != null)
{
writer.close();
}
}
}
On the side
I said above
As you have not said whether that inputStream can be reset, we'll first copy it into a byte[] which we forward both to PDDocument.load and new COSWriter.
Actually there is another reason to do so: COSWriter.doWriteSignature() retrieves the length of the original PDF like this:
long inLength = incrementalInput.available();
(COSWriter.java)
The documentation of InputStream.available() states, though:
Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not.
To re-use inputStream instead of using a byte[] and ByteArrayInputStreams as above, therefore, inputStream not only needs to support reset() but also needs to be one of the few InputStream implementations which return the total number of bytes in the stream as available.
FileInputStream and ByteArrayInputStream both do return the total number of bytes in the stream as available.
There may still be more issues when using generic InputStreams instead of these two.

Hey Cyril Bremaud, you can use this approach, since the PDDocument class has 3 overloaded constructor, you can go ahead and provide only the file path if you like and it will work as well. But for your requirement to be able to pass an InputStream directly to the PDDocument constructor, use this code:
lStrInputPDFfile = "samples_pdf_signing\Country Calendar.pdf";
lOsPDFInput = new java.io.FileInputStream(lStrInputPDFfile);
jPDFDocument = new org.apache.pdfbox.pdmodel.PDDocument().load(lOsPDFInput);
But this also work in my case:
lStrInputPDFfile = "samples_pdf_signing\Country Calendar.pdf";
jPDFDocument = new org.apache.pdfbox.pdmodel.PDDocument().load(lStrInputPDFfile);
Note: `InputStream is a parent class of FileInputStream and that is why the above code works.
updated my code, please check again. Thanks to #mkl for pointing that out.

Specific characters not rendering properly in Java

I have an issue when displaying strings received from a server in a JTable. Some specific characters appear as little white squares instead of "é" or "à" etc. I tried a lot of things but none of them fixed my problem. I'm working with Eclipse under Windows. The server was developped using Visual Studio 2010.
The server sends an XML file using tinyXML2, the client uses JDom to read it. The font used is "Dialog". The server takes the strings from an Oracle database.
I assume this is an encoding problem, but I haven't been able to fix it yet.
Does anyone have an idea ?
Thx
Arnaud
EDIT : As requested, this is how I use JDom
public static Player fromXML(Element e)
{
Player result = new Player();
String e_text = null;
try
{
e_text = e.getChildText(XMLTags.XML_Player_playerId);
if (e_text != null) result.setID(Integer.parseInt(e_text));
e_text = e.getChildText(XMLTags.XML_Player_lastName);
if (e_text != null) result.setName(e_text);
e_text = e.getChildText(XMLTags.XML_Player_point_scored);
if (e_text != null) result.addSpecial(STAT_SCORED, Double.parseDouble(e_text));
e_text = e.getChildText(XMLTags.XML_Player_point_scored_last);
if (e_text != null) result.addSpecial(STAT_SCORED_LAST, Double.parseDouble(e_text));
}
catch (Exception ex) {
ex.printStackTrace();
}
return result;
}
public static Document load(String filename) {
File XMLFile = new File(CLIENT_TO_SERVER, filename);
SAXBuilder sxb = new SAXBuilder();
Document document = new Document();
try
{
document = sxb.build(new File(XMLFile.getPath()));
} catch(Exception e){e.printStackTrace();}
return document;
}

read the file using correct encoding, something like:
document = sxb.build(new BufferedReader(new InputStreamReader(new FileInputStream(XMLFile.getPath()), "UTF8")));
Note: 1. 1st determine which char encoding used in that file. specify that charset instead of UTF8 above.
Incase encoding is not known or it's being generated from various systems with different encoding, you may use 'encoding detector library of Mozilla'. #see https://code.google.com/p/juniversalchardet/
need to handle UnsupportedEncodingException

Inputstream handled by different objects depending on the content

I am writing a crawler/parser that should be able to process different types of content, being RSS, Atom and just plain html files. To determine the correct parser, I wrote a class called ParseFactory, which takes an URL, tries to detect the content-type, and returns the correct parser.
Unfortunately, checking the content-type using the provided in method in URLConnection doesn't always work. For example,
String contentType = url.openConnection().getContentType();
doesn't always provide the correct content-type (e.g "text/html" where it should be RSS) or doesn't allow to distinguish between RSS and Atom (e.g. "application/xml" could be both an Atom or a RSS feed). To solve this problem, I started looking for clues in the InputStream. Problem is that I am having trouble coming up an elegant class design, where I need to download the InputStream only once. In my current design I have wrote a separate class first that determines the correct content-type, next the ParseFactory uses this information to create an instance of the corresponding parser, which in turn, when the method 'parse()' is called, downloads the entire InputStream a second time.
public Parser createParser(){
InputStream inputStream = null;
String contentType = null;
String contentEncoding = null;
ContentTypeParser contentTypeParser = new ContentTypeParser(this.url);
Parser parser = null;
try {
inputStream = new BufferedInputStream(this.url.openStream());
contentTypeParser.parse(inputStream);
contentType = contentTypeParser.getContentType();
contentEncoding = contentTypeParser.getContentEncoding();
assert (contentType != null);
inputStream = new BufferedInputStream(this.url.openStream());
if (contentType.equals(ContentTypes.rss))
{
logger.info("RSS feed detected");
parser = new RssParser(this.url);
parser.parse(inputStream);
}
else if (contentType.equals(ContentTypes.atom))
{
logger.info("Atom feed detected");
parser = new AtomParser(this.url);
}
else if (contentType.equals(ContentTypes.html))
{
logger.info("html detected");
parser = new HtmlParser(this.url);
parser.setContentEncoding(contentEncoding);
}
else if (contentType.equals(ContentTypes.UNKNOWN))
logger.debug("Unable to recognize content type");
if (parser != null)
parser.parse(inputStream);
} catch (IOException e) {
e.printStackTrace();
} finally {
try {
inputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
return parser;
}
Basically, I am looking for a solution that allows me to eliminate the second "inputStream = new BufferedInputStream(this.url.openStream())".
Any help would be greatly appreciated!
Side note 1: Just for the sake of being complete, I also tried using the URLConnection.guessContentTypeFromStream(inputStream) method, but this returns null way too often.
Side note 2: The XML-parsers (Atom and Rss) are based on SAXParser, the Html-parser on Jsoup.

Can you just call mark and reset?
inputStream = new BufferedInputStream(this.url.openStream());
inputStream.mark(2048); // Or some other sensible number
contentTypeParser.parse(inputStream);
contentType = contentTypeParser.getContentType();
contentEncoding = contentTypeParser.getContentEncoding();
inputstream.reset(); // Let the parser have a crack at it now

Perhaps your ContentTypeParser should cache the content internally and feed it to the appropiate ContentParser instead of reacquiring data from InputStream.

NullPointerException using ImageIO.read

I'm getting an NPE while trying to read in an image file, and I can't for the life of me figure out why. Here is my line:
BufferedImage source = ImageIO.read(new File(imgPath));
imgPath is basically guaranteed to be valid and right before it gets here it copies the file from the server. When it hits that line, I get this stack trace:
Exception in thread "Thread-26" java.lang.NullPointerException
at com.ctreber.aclib.image.ico.ICOReader.getICOEntry(ICOReader.java:120)
at com.ctreber.aclib.image.ico.ICOReader.read(ICOReader.java:89)
at javax.imageio.ImageIO.read(ImageIO.java:1400)
at javax.imageio.ImageIO.read(ImageIO.java:1286)
at PrintServer.resizeImage(PrintServer.java:981) <---My function
<Stack of rest of my application here>
Also, this is thrown into my output window:
Can't create ICOFile: Can't read bytes: 2
I have no idea what is going on, especially since the File constructor is succeeding. I can't seem to find anybody who has had a similar problem. Anybody have any ideas? (Java 5 if that makes any difference)

I poked around some more and found that you can specify which ImageReader ImageIO will use and read it in that way. I poked around our codebase and found that we already had a function in place for doing EXACTLY what I was trying to accomplish here. Just for anybody else who runs into a similar issue, here is the crux of the code (some of the crap is defined above, but this should help anybody who tries to do it):
File imageFile = new File(filename);
Iterator<ImageReader> imageReaders = ImageIO.getImageReadersByFormatName("jpeg");
if ( imageReaders.hasNext() ) {
imageReader = (ImageReader)imageReaders.next();
stream = ImageIO.createImageInputStream(imageFile);
imageReader.setInput(stream, true);
ImageReadParam param = imageReader.getDefaultReadParam();
curImage = imageReader.read(0, param);
}
Thanks for the suggestions and help all.

The File constructor will almost certainly succeed, regardless of whether it points to a valid/existing file. At the very least, I'd check whether your underlying file exists via the exists() method.

Also note that ImageIO.read is not thread-safe (it reuses cached ImageReaders which are not thread-safe).
This means you can't easily read multiple files in parallel. To do that, you'll have to deal with ImageReaders yourself.

Have you considered that the file may simply be corrupted, or that ImageIO is trying to read it as the wrong type of file?

Googling for the ICOReader class results in one hit: IconsFactory from jide-common.
Apparently they had the same problem:
// Using ImageIO approach results in exception like this.
// Exception in thread "main" java.lang.NullPointerException
// at com.ctreber.aclib.image.ico.ICOReader.getICOEntry(ICOReader.java:120)
// at com.ctreber.aclib.image.ico.ICOReader.read(ICOReader.java:89)
// at javax.imageio.ImageIO.read(ImageIO.java:1400)
// at javax.imageio.ImageIO.read(ImageIO.java:1322)
// at com.jidesoft.icons.IconsFactory.b(Unknown Source)
// at com.jidesoft.icons.IconsFactory.a(Unknown Source)
// at com.jidesoft.icons.IconsFactory.getImageIcon(Unknown Source)
// at com.jidesoft.plaf.vsnet.VsnetMetalUtils.initComponentDefaults(Unknown Source)
// private static ImageIcon createImageIconWithException(final Class<?> baseClass, final String file) throws IOException {
// try {
// InputStream resource =
// baseClass.getResourceAsStream(file);
// if (resource == null) {
// throw new IOException("File " + file + " not found");
// }
// BufferedInputStream in =
// new BufferedInputStream(resource);
// return new ImageIcon(ImageIO.read(in));
// }
// catch (IOException ioe) {
// throw ioe;
// }
// }
What did they do instead?
private static ImageIcon createImageIconWithException(
final Class<?> baseClass, final String file)
throws IOException {
InputStream resource = baseClass.getResourceAsStream(file);
final byte[][] buffer = new byte[1][];
try {
if (resource == null) {
throw new IOException("File " + file + " not found");
}
BufferedInputStream in = new BufferedInputStream(resource);
ByteArrayOutputStream out = new ByteArrayOutputStream(1024);
buffer[0] = new byte[1024];
int n;
while ((n = in.read(buffer[0])) > 0) {
out.write(buffer[0], 0, n);
}
in.close();
out.flush();
buffer[0] = out.toByteArray();
} catch (IOException ioe) {
throw ioe;
}
if (buffer[0] == null) {
throw new IOException(baseClass.getName() + "/" + file
+ " not found.");
}
if (buffer[0].length == 0) {
throw new IOException("Warning: " + file
+ " is zero-length");
}
return new ImageIcon(Toolkit.getDefaultToolkit().createImage(
buffer[0]));
}
So you might want to try the same approach: read the raw bytes and use Toolkit to create an image from them.

"it's a jpeg but doesn't have a jpeg
extension."
That might be it.
It appears that the library AC.lib-ICO is throwing the NPE. Since this library is intended to read the Microsoft ICO file format, a JPEG might be a problem for it.
Consider explicitly providing the format using an alternative method.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

xhtmlrenderer creating PDFs of length 0 - java

Related

Save file from a website with java

How to sign an InputStream from a PDF file with PDFBox 2.0.0

Specific characters not rendering properly in Java

Inputstream handled by different objects depending on the content

NullPointerException using ImageIO.read

Categories

Resources