apache commons imaging writing image error - java

Below is the sample code that is giving error with the microsoft sample pictures (tulips.jpg) image
bufferedImage = Imaging.getBufferedImage(new file("Tulips.jpg"));
File imageFile = new File("outputfile.jpg");
final Map<String, Object> optionalParams = new HashMap<String, Object>();
Imaging.writeImage(bufferedImage, imageFile, ImageFormats.JPEG, optionalParams);
This code is giving "This image format (Jpeg-Custom) cannot be written." Any pointers would be of great help. I have searched stackoverflow, google - no help so far.
When I read the documentation it states that if the bufferedImage.getType()== TYPE_UNKNOWN, it gives this message but dont have a clue why it is giving UNKNOWN.
Thanks so much for your help.

Write to JPEG files is not supported by Apache Commons Imaging. You can see supported format information here: http://commons.apache.org/proper/commons-imaging/formatsupport.html (even JPEG read is not fully supported). However write to some other formats supported (e.g. you can change image format to PNG in your writeImage function call and it will work).
Moreover Apache Commons Imaging is not released yet, so I wouldn't recommend using it in a critical code.
As alternative you can take a look into JDK javax.imageio.ImageIO class (some examples: https://docs.oracle.com/javase/tutorial/2d/images/saveimage.html).
What exactly are you trying to achieve in your code?

Related

issues using apache tika Parser object to parse .doc and .docx file formats

When I try to use org.apache.tika.parser.Parser and DefaultDetector() to detect and parse the .doc and .docx file formats. But I am getting some error (not exception) thrown from Tika jars and that doesn't have any helpful stack trace for me to put here. I can confirm that it is happening for .doc and .docx only. PDF, jpeg, texts are fine. Has anyone come across this problem with .doc and .docx file formats? is there any solution that you have adopted?
My Code is the following:
unzippedBytes = loadUnzippedByteCode(attachment.getContents()); /* This is utility method written using native Java Zip library - returns byte array byte[] */
/* All the objects below were declared beforehand, but not initialised until now */
parseContextObj = new ParseContext();
dObj = new DefaultDetector();
detectedParser = new AutoDetectParser(dObj);
context.set(Parser.class, parser);
OutputStream outputstream = new ByteArrayOutputStream();
metadata = new Metadata();
InputStream input = TikaInputStream.get(unzippedBytes, metadata);
ContentHandler handler = new BodyContentHandler(outputstream);
detectedParser.parse(input, handler, metadata, parseContextObj); // This is where it is throwing NoSuchMethodError - cannot understand why and also cannot get the stacktrace - using tika 1.10 */
input.close();
The code above was something that I also found in some other SO question and decided to use it for my work. Also, the byte[] that I have used is something that I am receiving from very old struts 1.0 FormFile interface (getFileData() that returns byte[]). I used to have the bullhorn's irex parser to parse, but decided to use Tika for numerous reasons. the byte[] works fine with irex, but has issues whenever I am trying to parse .docx and .doc contents.
The following is the stack trace which I masked certain parts of due to privacy reasons:
2016-01-15 16:21:06,947 [http-apr-80-exec-3] [ERROR] XXXXX.XXXX.XXXXService - java.lang.NoSuchMethodError: org.apache.poi.util.POILogger.log(I[L
java/lang/Object;)V
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.parseRelationshipsPart(PackageRelationshipCollection.java:313)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:163)
at org.apache.poi.openxml4j.opc.PackageRelationshipCollection.<init>(PackageRelationshipCollection.java:131)
at org.apache.poi.openxml4j.opc.PackagePart.loadRelationships(PackagePart.java:561)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:109)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:80)
at org.apache.poi.openxml4j.opc.PackagePart.<init>(PackagePart.java:125)
at org.apache.poi.openxml4j.opc.ZipPackagePart.<init>(ZipPackagePart.java:78)
at org.apache.poi.openxml4j.opc.ZipPackage.getPartsImpl(ZipPackage.java:245)
at org.apache.poi.openxml4j.opc.OPCPackage.getParts(OPCPackage.java:684)
at org.apache.poi.openxml4j.opc.OPCPackage.open(OPCPackage.java:227)
at org.apache.tika.parser.pkg.ZipContainerDetector.detectOPCBased(ZipContainerDetector.java:208)
at org.apache.tika.parser.pkg.ZipContainerDetector.detectZipFormat(ZipContainerDetector.java:145)
at org.apache.tika.parser.pkg.ZipContainerDetector.detect(ZipContainerDetector.java:88)
at org.apache.tika.detect.CompositeDetector.detect(CompositeDetector.java:77)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:112)
I realised that my path has POI jar version 2.5.1 and according to maven central repo I am a dinosaur (seems like) is that possibly why? I am also getting error after putting all these for versions 3.13 and 2.60 for poi artifacts and xmlbeans respectively (suggested by #venkyreddy in that answer).
UPDATE
I tried building a new project separately from my original work, and used tika-app-1.10.jar ONLY in my classpath. I also investigated the tika-app-1.10.jar and found out that all the POI dependencies are actually there inluding xmlbeans and 'xml-schema'. After keeping only tika-app-1.10.jar in my classpath, I am getting the following Error (not Exception):
java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTypeLoader
at org.openxmlformats.schemas.wordprocessingml.x2006.main.DocumentDocument$Factory.parse(Unknown Source)
at org.apache.poi.xwpf.usermodel.XWPFDocument.onDocumentRead(XWPFDocument.java:158)
at org.apache.poi.POIXMLDocument.load(POIXMLDocument.java:167)
at org.apache.poi.xwpf.usermodel.XWPFDocument.<init>(XWPFDocument.java:119)
at org.apache.poi.xwpf.extractor.XWPFWordExtractor.<init>(XWPFWordExtractor.java:59)
at org.apache.poi.extractor.ExtractorFactory.createExtractor(ExtractorFactory.java:204)
at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:86)
at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:87)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at xxx.xxx.xxx.xxx.xxxxxAttachmentWithTika(xxxService.java:792)
I browsed the package and couldn't find any POIXMLTypeLoader class. is this a known issue? Could someone please respond to me?
Make sure there are no outdated POI jars and use the version of POI which matches the version of Tika that you are trying to use.
The class POIXMLTypeLoader was added to POI after POI 3.13 was released, so it seems you somehow mix newer versions. Only release POI 3.14-beta1 knows about this class! Make sure you do not include that version somehow.

Using xdocreport, is it possible to read drawings from a docx file?

I need to convert a docx to a pdf. The following code use the library xdocreport and works pretty well.
The problem is for some specific docx which contain drawings. They are not visible in the final pdf. I've tested the conversion with the live demo avaible from the github and I've the same problem.
So I'm wondering, is this possible, or do I need to use an other library ? Which one ? (dox4j doesn't seems to works neither).
final XWPFDocument document = new XWPFDocument(inputStream);
final OutputStream outPdf = new FileOutputStream("myFile.pdf");
PdfConverter.getInstance().convert(document, outPdf, optionsPdf);
outPdf.close();
XDocReport doesn't support drawing. It could support it since docx->pdf is based on iText which supports draw, but it's a big task (any contribution are welcome!)
You can see here limitation of XDocReport docx->pdf converter.

Microdata extraction from HTML in Java

I really need help to extract Mircodata which is embedded in HTML5. My purpose is to get structured data from a webpage just like this tool of google: http://www.google.com/webmasters/tools/richsnippets. I have searched a lot but there is no possible solution.
Currently, I use the any23 library but I can’t find any documentation, just only javadocs which dont provide enough information for me.
I use any23's Microdata Extractor but getting stuck at the third parameter: "org.w3c.dom.Document in". I can't parse a HTML content to be a w3cDom. I have used JTidy as well as JSoup but the DOM objects in these library are not fixed with the Extractor constructor. In addition, I also doubt about the 2nd parameter of the Microdata Extractor.
I hope that anyone can help me to do with any23 or suggest another library can solve this extraction issues.
Edit: I found solution myself by using the same way as any23 command line tool did. Here is the snippet of code:
HTTPDocumentSource doc = new HTTPDocumentSource(DefaultHTTPClient.createInitializedHTTPClient(), value);
InputStream documentInputInputStream = doc.openInputStream();
TagSoupParser tagSoupParser = new TagSoupParser(documentInputInputStream, doc.getDocumentURI());
Document document = tagSoupParser.getDOM();
ByteArrayOutputStream byteArrayOutput = new ByteArrayOutputStream();
MicrodataParser.getMicrodataAsJSON(tagSoupParser.getDOM(),new PrintStream(byteArrayOutput));
String result = byteArrayOutput.toString("UTF-8");
These line of code only extract microdata from HTML and write them in JSON format. I tried to use MicrodataExtractor which can change the output format to others(Rdf, turtle, ...) but the input document seems to only accept XML format. It throws "Document didn't start" when I put in a HTML document.
If anyone found the way to use MicrodataExtractor, please leave the answer here.
Thank you.
xpath is generally the way to consume html or xml.
have a look at: How to read XML using XPath in Java

How to get thumbnail from Dropboxapi using JAVA SDK?

I am working with Dropbox API using JAVA SDK. I try to get the thumbnail for each image in my dropbox account via API. Honestly, after I read the class and they just provided the description which is not useful enough for the beginner. I begin my code like this
public void getThumbnails() throws DropboxException{
DropboxInputStream dis = api.getThumbnailStream("/Koala.jpg", ThumbSize.ICON_256x256, ThumbFormat.JPEG);
}
What I don't understand is:
I should return something to client side in order to show the thumbnail I got from DropboxAPI but I don't know what I should return. Maybe DropboxInputStream?
How do I get the thumbnail from API? I try to find the example or guide for a day but I can't find any guide...
please someone guide me how to get the thumbnail via dropbox API
DropboxInputStream is just a FilterInputStream so after you get the input stream like you wrote you can just iterate the input stream and read it.
Then it's only a question of the way you need to present it.
Is it a Swing application you are writing? how do you need to show that image?
You should be able to read the Image with ImageIO.read
Image image = ImageIO.read(dis);
http://docs.oracle.com/javase/6/docs/api/javax/imageio/ImageIO.html

How can you serialize an XMP XML block to an existing JPEG Image?

I have many JPEG images which contain corrupted XMP XML blocks. I can easily fix these blocks but I'm unsure how to write the 'fixed' data back to the image files.
I'm currently using JAVA but am open to anything that will make this task easy.
This is the goal for another question around XMP XML asked earlier.
In JAVA you can use the Apache Sanselan library:
String newXmpXmlString = "<the><new/><xmp/><xml/></the>";
File file = new File('path/to/file');
new JpegXmpRewriter().updateXmpXml(new ByteSourceFile(file), new BufferedOutputStream(new FileOutputStream(file)), newXmpXmlString);
For a more detailed example of the solution outlined above there is an open source project on Google Code that houses a small jPeg XMP XML Trimmer.

Categories

Resources