PDF validate using PDFBOX PreflightParser for PDDocument

PDF validate using PDFBOX PreflightParser for PDDocument - java

I could like to validate the pdf that was created(not as a file) but as ByteArrayOutputStream which is downloaded to browser . In order to avoid security issue could like to validate using pdfbox preflightparser where it has option only for parsing file not PDDocument.
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
doc.save(byteArrayOutputStream);
PreflightParser parser = new PreflightParser(doc);
//this constructor accepts only file.
Expectation is validate pdf file on the fly instead of loading from system.

You can also pass a DataSource. To facilitate this, use org.apache.pdfbox.io.IOUtils.ByteArrayDataSource whose constructor accepts an InputStream.

If you don't need extra information PreflightParser can give you - you can use PDFParser. It's constructor accepts RandomAccessBuffer which takes byte[] to create.

Related

Importing file to Alfresco programatically (through java backed webscript)

I am having problem when importing document (PDF) into Alfresco repository inside java backed webscript. I am using writer of ContentService.
If I use
ContentWriter writer = ContentService.getWriter(nodeRef, ContentModel.PROP_CONTENT, true);
writer.setEncoding("UTF-8");
writer.setMimetype("application/pdf");
writer.putContent(new String(byte []) );
or
writer.putContent(new String(byte [], "UTF-8") );
my document is not previewable (I see blank PDF file, tried with few small PDF files, don't know what would happen in case of other/larger files).
But if I use another putContent method which takes File as argument I'll successfully import the document.
writer.setEncoding("UTF-8");
writer.setMimetype("application/pdf");
writer.putContent(File);
I don't want to import file from disk since I get the file as Base64 encoded String but I don't know what am I missing.

You could use an InputStream as a parameter for ContentWriter::putContent. So you will prevent the String to byte array (and vice versa) conversions, which leads to difficulties with the encoding.
writer.putContent(new ByteArrayInputStream(Base64.decodeBase64("yourBase64EncodedString")))

Reading a binary file from the file system as a BLOB to use in rhino with javascript

I'm planing to use SheetJS with rhino. And sheetjs takes a binary object(BLOB if i'm correct) as it's input. So i need to read a file from the system using stranded java I/O methods and store it into a blob before passing it to sheetjs. eg :-
var XLDataWorkBook = XLSX.read(blobInput, {type : "binary"});
So how can i create a BLOB(or appropriate type) from a binary file in java in order to pass it in.
i guess i cant pass streams because i guess XLSX needs a completely created object to process.

I found the answer to this by myself. i was able to get it done this way.
Read the file with InputStream and then write it to a ByteArrayOutputStream. like below.
ByteArrayOutputStream buffer = new ByteArrayOutputStream();
...
buffer.write(bytes, 0, len);
Then create a byte array from it.
byte[] byteArray = buffer.toByteArray();
Finally i did convert it to a Base64 String (which is also applicable in my case) using the "Base64.encodeBase64String()" method in apache.commons.codec.binary package. So i can pass Base64 String as a method parameter.
If you further need there are lot of libraries(3rd-party and default) available for Base64 to Blob conversion as well.

Detect Content-Type Based on FileName

I'm trying to use Apache Tika to determine the content-type (i.e. - application/pdf for .pdf files). I would like to use Apache Tika's org.apache.tika.detect.NameDetector class. My problem is that it's detect method only accepts an InputStream. I do not have access to the File's InputStream. I only have the File's name (i.e. - myFile.pdf).
Is there any good way to use Apache Tika to determine the content-type based on only the extension/name of the file? (Note - I would like to avoid creating a temp file with the desired name to determine it's content-type.)
Thanks.

You can use the normal Apache Tika Detector interface passing in null for the InputStream, and supplying the filename.
Your code would look something like:
TikaConfig config = new TikaConfig();
Metadata metadata = new Metadata();
metadata.set(Metadata.RESOURCE_NAME_KEY, filename);
String mimetype = config.getDetector().detect(null, metadata);
To simplify things even more, if you use the Tika facade class you can just do:
Tika tika = new Tika();
String mimetype = tika.detect(filename);
And you'll just get back the mimetype guessed from the filename only
For more information, see the "Ways of triggering Detection" documentation on the Apache Tika website.

I did some searching and found a blog post which contains a code example that determines the type using the org.apache.tika.Tika class's detect method.
So I could write something like this:
org.apache.tika.Tika tika = new org.apache.tika.Tika();
String mimeType = tika.detect("abc.pdf"); // replace abc.pdf with a string variable

How to save content of a page call into a file in jsp/java?

In jsp/java how can you call a page that outputs a xml file as a result and save its result (xml type) into a xml file on server. Both files (the file that produces the xml and the file that we want to save/overwrite) live on the same server.
Basically I want to update my test.xml every now and then by calling generate.jsp that outputs a xml type result.
Thank you.

If the request is idempotent, then just use java.net.URL to get an InputStream of the JSP output. E.g.
InputStream input = new URL("http://example.com/context/page.jsp").openStream();
If the request is not idempotent, then you need to replace the PrintWriter of the response with a custom implementation which copies the output into some buffer/builder. I've posted a code example here before: Capture generated dynamic content at server side
Once having the output, just write it to disk the usual java.io way, assuming that JSP's are already in XHTML format.

Register a filter that adds a wrapper to your response. That is, it returns to the chain a new HttpServletResponse objects, extending the original HttpServletResponse, and returning your custom OutputStream and PrintWriter instead of the original ones.
Your OutputStream and PrintWriter calls the original OutputStream and PrintWriter, but also write to a your file (using a new FileOutputStream)

Why don't you use a real template engine like FreeMarker? That would be easier.

Converting a raw file (binary data ) into XML file

I'm working on a project under which i have to take a raw file from the server and convert it into XML file.
Is there any tool available in java which can help me to accomplish this task like JAXP can be used to parse the XML document ?

I guess you will need your objects for later use ,so create MyObject that will be some bean that you will load the values form your Raw File and you can write this to someFile.xml
FileOutputStream os = new FileOutputStream("someFile.xml");
XMLEncoder encoder = new XMLEncoder(os);
MyObject p = new MyObject();
p.setFirstName("Mite");
encoder.writeObject(p);
encoder.close();
Or you con go with TransformerFactory if you don't need the objects for latter use.

Yes. This assumes that the text in the raw file is already XML.
You start with the DocumentBuilderFactory to get a DocumentBuilder, and then you can use its parse() method to turn an input stream into a Document, which is an internal XML representation.
If the raw file contains something other than XML, you'll want to scan it somehow (your own code here) and use the stuff you find to build up from an empty Document.
I then usually use a Transformer from a TransformerFactory to convert the Document into XML text in a file, but there may be a simpler way.

JAXP can also be used to create a new, empty document:
Document dom = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.newDocument();
Then you can use that Document to create elements, and append them as needed:
Element root = dom.createElement("root");
dom.appendChild(root);
But, as Jørn noted in a comment to your question, it all depends on what you want to do with this "raw" file: how should it be turned into XML. And only you know that.

I think if you try to load it in an XmlDocument this will be fine

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

PDF validate using PDFBOX PreflightParser for PDDocument - java

You can also pass a DataSource. To facilitate this, use org.apache.pdfbox.io.IOUtils.ByteArrayDataSource whose constructor accepts an InputStream.

If you don't need extra information PreflightParser can give you - you can use PDFParser. It's constructor accepts RandomAccessBuffer which takes byte[] to create.

Related

Importing file to Alfresco programatically (through java backed webscript)

Reading a binary file from the file system as a BLOB to use in rhino with javascript

Detect Content-Type Based on FileName

How to save content of a page call into a file in jsp/java?

Converting a raw file (binary data ) into XML file

Categories

Resources