Getting image from fb2 file in Java

Getting image from fb2 file in Java - java

I'm working now for e-book reader written in Java. Primary file type is fb2 which is XML-based type.
Images inside these books stored inside <binary> tags as a long text line (at least it looks like text in text editors).
How can I transform this text in actual pictures in Java? For working with XML I'm using JDOM2 library.
What I've tried does not produce valid pictures (jpeg files):
private void saveCover(Object book) {
// Necessary cast to process with book
Document doc = (Document) book;
// Document root and namespace
Element root = doc.getRootElement();
Namespace ns = root.getNamespace();
Element binaryEl = root.getChild("binary", ns);
String binaryText = binaryEl.getText();
File cover = new File(tempFolderPath + "cover.jpeg");
try (
FileOutputStream fileOut = new FileOutputStream(cover);
BufferedOutputStream bufferOut = new BufferedOutputStream(
fileOut);) {
bufferOut.write(binaryText.getBytes());
} catch (IOException e) {
e.printStackTrace();
}
}

The image content is specified as being base64 encoded (see: http://wiki.mobileread.com/wiki/FB2#Binary ).
As a consequence, you have to take the text from the binary element and decode it in to binary data (in Java 8 use: java.util.base64 and this method: http://docs.oracle.com/javase/8/docs/api/java/util/Base64.html#getDecoder-- )
If you take the binaryText value from your code, and feed it in to the decoder's decode() method you should get the right byte[] value for the image.

Related

apache pdfbox - how to test if a document is flattened?

I have written the following small Java main method. It takes in a (hardcoded for testing purposes!) PDF document I know contains active elements in the form and need to flatten it.
public static void main(String [] args) {
try {
// for testing
Tika tika = new Tika();
String filePath = "<path-to>/<pdf-document-with-active-elements>.pdf";
String fileName = filePath.substring(0, filePath.length() -4);
File file = new File(filePath);
if (tika.detect(file).equalsIgnoreCase("application/pdf")) {
PDDocument pdDocument = PDDocument.load(file);
PDAcroForm pdAcroForm = pdDocument.getDocumentCatalog().getAcroForm();
if (pdAcroForm != null) {
pdAcroForm.flatten();
pdAcroForm.refreshAppearances();
pdDocument.save(fileName + "-flattened.pdf");
}
pdDocument.close();
}
}
catch (Exception e) {
System.err.println("Exception: " + e.getLocalizedMessage());
}
}
What kind of test would assert the File(<path-to>/<pdf-document-with-active-elements>-flattened.pdf) generated by this code would, in fact, be flat?

What kind of test would assert that the file generated by this code would, in fact, be flat?
Load that document anew and check whether it has any form fields in its PDAcroForm (if there is a PDAcroForm at all).
If you want to be thorough, also iterate through the pages and assure that there are no Widget annotations associated to them anymore.
And to really be thorough, additionally determine the field positions and contents before flattening and apply text extraction at those positions to the flattened pdf. This verifies that the form has not merely been dropped but indeed flattened.

Extracting an embedded object from a pdf

I had embedded a byte array into a pdf file (Java).
Now I am trying to extract that same array.
The array was embedded as a "MOVIE" file.
I couldn't find any clue on how to do that...
Any ideas?
Thanks!
EDIT
I used this code to embed the byte array:
public static void pack(byte[] file) throws IOException, DocumentException{
Document document = new Document();
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(RESULT));
writer.setPdfVersion(PdfWriter.PDF_VERSION_1_7);
writer.addDeveloperExtension(PdfDeveloperExtension.ADOBE_1_7_EXTENSIONLEVEL3);
document.open();
RichMediaAnnotation richMedia = new RichMediaAnnotation(writer, new Rectangle(0,0,0,0));
PdfFileSpecification fs
= PdfFileSpecification.fileEmbedded(writer, null, "test.avi", file);
PdfIndirectReference asset = richMedia.addAsset("test.avi", fs);
RichMediaConfiguration configuration = new RichMediaConfiguration(PdfName.MOVIE);
RichMediaInstance instance = new RichMediaInstance(PdfName.MOVIE);
RichMediaParams flashVars = new RichMediaParams();
instance.setAsset(asset);
configuration.addInstance(instance);
RichMediaActivation activation = new RichMediaActivation();
richMedia.setActivation(activation);
PdfAnnotation richMediaAnnotation = richMedia.createAnnotation();
richMediaAnnotation.setFlags(PdfAnnotation.FLAGS_PRINT);
writer.addAnnotation(richMediaAnnotation);
document.close();

I have written a brute force method to extract all streams in a PDF and store them as a file without an extension:
public static final String SRC = "resources/pdfs/image.pdf";
public static final String DEST = "results/parse/stream%s";
public static void main(String[] args) throws IOException {
File file = new File(DEST);
file.getParentFile().mkdirs();
new ExtractStreams().parse(SRC, DEST);
}
public void parse(String src, String dest) throws IOException {
PdfReader reader = new PdfReader(src);
PdfObject obj;
for (int i = 1; i <= reader.getXrefSize(); i++) {
obj = reader.getPdfObject(i);
if (obj != null && obj.isStream()) {
PRStream stream = (PRStream)obj;
byte[] b;
try {
b = PdfReader.getStreamBytes(stream);
}
catch(UnsupportedPdfException e) {
b = PdfReader.getStreamBytesRaw(stream);
}
FileOutputStream fos = new FileOutputStream(String.format(dest, i));
fos.write(b);
fos.flush();
fos.close();
}
}
}
Note that I get all PDF objects that are streams as a PRStream object. I also use two different methods:
When I use PdfReader.getStreamBytes(stream), iText will look at the filter. For instance: page content streams consists of PDF syntax that is compressed using /FlateDecode. By using PdfReader.getStreamBytes(stream), you will get the uncompressed PDF syntax.
Not all filters are supported in iText. Take for instance /DCTDecode which is the filter used to store JPEGs inside a PDF. Why and how would you "decode" such a stream? You wouldn't, and that's when we use PdfReader.getStreamBytesRaw(stream) which is also the method you need to get your AVI-bytes from your PDF.
This example already gives you the methods you'll certainly need to extract PDF streams. Now it's up to you to find the path to the stream you need. That calls for iText RUPS. With iText RUPS you can look at the internal structure of a PDF file. In your case, you need to find the annotations as is done in this question: All links of existing pdf change the action property to inherit zoom - iText library
You loop over the page dictionaries, then loop over the /Annots array of this dictionary (if it's present), but instead of checking for /Link annotations (which is what was asked in the question I refer to), you have to check for /RichMedia annotations and from there examine the assets until you find the stream that contains the AVI file. RUPS will show you how to dive into the annotation dictionary.

JEditorPane html document inline (embedded) image from file

I'm trying to inline (embed) an image in a JEditorPane from a file such as:
<img src="data:image/gif;utf-8,data...">
But I'm struggling with the code.
So far I have (assuming a gif file):
try
{
File imageFile = new File("C:\\test\\testImage.gif");
File htmlFile = new new File("C:\\test\\testOutput.html");
byte[] imageBytes = Files.toByteArray(imageFile);
String imageData = new String(imageBytes, "UTF-8");
String html = "<html><body><img src=\"data:image/gif;utf-8," + imageData + "\"></body></html>";
FileUtils.writeStringToFile(htmlFile, htmlText);
} catch (Exception e) {
e.printStackTrace();
}
This does create a file but the image is invalid. I'm sure I'm not converting the image the proper way...

JEditorPane (and Java HTML rendering in general) does not support Base64 encoded images.
Of course 'does not' != 'could not'.
The thing is, you'd need to create (or adjust) an EditorKit can have new elements defined. An e.g. is seen in the AppletEditorKit. You'd need to look for HTML.tag.IMG - it is is a standard image, call the super functionality, else use this source (or similar) to convert it to an image, then embed it.

Getting metadata from JPEG in byte array form

I have a jpeg image in the form of a byte array. How can I get the byte array in a form where I can strip the comments node of the metadata?
byte[] myimagedata = ...
ImageWriter writer = ImageIO.getImageWritersBySuffix("jpeg").next();
ImageReader reader = ImageIO.getImageReader(writer);
//Looking for file here but have byte array
reader.setInput(new FileImageInputStream(new File(Byte array cant go here)));
IIOMetadata imageMetadata = reader.getImageMetadata(0);
Element tree = (Element) imageMetadata.getAsTree("javax_imageio_jpeg_image_1.0");
NodeList comNL = tree.getElementsByTagName("com");
IIOMetadataNode comNode;
if (comNL.getLength() == 0) {
comNode = new IIOMetadataNode("com");
Node markerSequenceNode = tree.getElementsByTagName("markerSequence").item(0);
markerSequenceNode.insertBefore(comNode,markerSequenceNode.getFirstChild());
} else {
comNode = (IIOMetadataNode) comNL.item(0);
}

You seem you be (just) asking how to create an ImageInputStream that reads from a byte array. From reading the javadocs, I think this should work:
new MemoryCacheImageInputStream(new ByteArrayInputStream(myimagedata))
The FileImageInputStream class doesn't have a constructor that allows you to read from anything but a file in the file system.
The FileCacheImageInputStream would also be an option, but it involves providing a directory in the file system for temporary caching ... and that seems undesirable in this context.

How can I save a PNG with a tEXt or iTXt chunk from Java?

I am currently using javax.imageio.ImageIO to write a PNG file. I would like to include a tEXt chunk (and indeed any of the chunks listed here), but can see no means of doing so.
By the looks of com.sun.imageio.plugins.png.PNGMetadata it should be possible.
I should be most grateful for any clues or answers.
M.

The solution I struck upon after some decompilation, goes as follows ...
RenderedImage image = getMyImage();
Iterator<ImageWriter> iterator = ImageIO.getImageWritersBySuffix( "png" );
if(!iterator.hasNext()) throw new Error( "No image writer for PNG" );
ImageWriter imagewriter = iterator.next();
ByteArrayOutputStream bytes = new ByteArrayOutputStream();
imagewriter.setOutput( ImageIO.createImageOutputStream( bytes ) );
// Create & populate metadata
PNGMetadata metadata = new PNGMetadata();
// see http://www.w3.org/TR/PNG-Chunks.html#C.tEXt for standardized keywords
metadata.tEXt_keyword.add( "Title" );
metadata.tEXt_text.add( "Mandelbrot" );
metadata.tEXt_keyword.add( "Comment" );
metadata.tEXt_text.add( "..." );
metadata.tEXt_keyword.add( "MandelbrotCoords" ); // custom keyword
metadata.tEXt_text.add( fractal.getCoords().toString() );
// Render the PNG to memory
IIOImage iioImage = new IIOImage( image, null, null );
iioImage.setMetadata( metadata ); // Attach the metadata
imagewriter.write( null, iioImage, null );

I realise this question is long since answered, but if you want to do it without dipping into the "com.sun" hierarchy, here's an quick and very ugly example as I couldn't find this documented anywhere else.
BufferedImage img = new BufferedImage(300, 300, BufferedImage.TYPE_INT_ARGB);
// Create a DOM Document describing the metadata;
// I've gone the quick and dirty route. The description for PNG is at
// [http://download.oracle.com/javase/1.4.2/docs/api/javax/imageio/metadata/doc-files/png_metadata.html][1]
Calendar c = Calendar.getInstance();
String xml = "<?xml version='1.0'?><javax_imageio_png_1.0><tIME year='"+c.get(c.YEAR)+"' month='"+(c.get(c.MONTH)+1)+"' day='"+c.get(c.DAY_OF_MONTH)+"' hour='"+c.get(c.HOUR_OF_DAY)+"' minute='"+c.get(c.MINUTE)+"' second='"+c.get(c.SECOND)+"'/><pHYs pixelsPerUnitXAxis='"+11811+"' pixelsPerUnitYAxis='"+11811+"' unitSpecifier='meter'/></javax_imageio_png_1.0>";
DOMResult domresult = new DOMResult();
TransformerFactory.newInstance().newTransformer().transform(new StreamSource(new StringReader(xml)), domresult);
Document document = dom.getResult();
// Apply the metadata to the image
ImageWriter writer = (ImageWriter)ImageIO.getImageWritersBySuffix("png").next();
IIOMetadata meta = writer.getDefaultImageMetadata(new ImageTypeSpecifier(img), null);
meta.setFromTree(meta.getMetadataFormatNames()[0], document.getFirstChild());
FileOutputStream out = new FileOutputStream("out.png");
writer.setOutput(ImageIO.createImageOutputStream(out));
writer.write(new IIOImage(img, null, meta));
out.close();

Using Java 1.6, I edited Mike's code to
Node document = domresult.getNode();
instead of his line
Document document = dom.getResult();
Moreover, I'd suggest to add a line
writer.dispose()
after the job has been done, so that any resources held by the writer are released.

We do this in the JGraphX project. Download the source code and have a look in the com.mxgraph.util.png package, there you'll find three classes for encoding that we copied from the Apache Batik sources. An example of usage is in com.mxgraph.examples.swing.editor.EditorActions in the saveXmlPng method. Slightly edited the code looks like:
mxPngEncodeParam param = mxPngEncodeParam
.getDefaultEncodeParam(image);
param.setCompressedText(new String[] { "mxGraphModel", xml });
// Saves as a PNG file
FileOutputStream outputStream = new FileOutputStream(new File(
filename));
try
{
mxPngImageEncoder encoder = new mxPngImageEncoder(outputStream,
param);
if (image != null)
{
encoder.encode(image);
}
}
finally
{
outputStream.close();
}
Where image is the BufferedImage that will form the .PNG and xml is the string we wish to place in the iTxt section. "mxGraphModel" is the key for that xml string (the section comprises some number of key/value pairs), obviously you replace that with your key.
Also under com.mxgraph.util.png we've written a really simple class that extracts the iTxt without processing the whole image. You could apply the same idea for the tEXt chunk using mxPngEncodeParam.setText instead of setCompressedText(), but the compressed text section does allow for considerable larger text sections.

Try the Sixlegs Java PNG library (http://sixlegs.com/software/png/).
It claims to have support for all chunk types and does private chunk handling.

Old question, but... PNGJ gives full control for reading and writing PNG chunks

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Getting image from fb2 file in Java - java

Related

apache pdfbox - how to test if a document is flattened?

Extracting an embedded object from a pdf

JEditorPane html document inline (embedded) image from file

Getting metadata from JPEG in byte array form

How can I save a PNG with a tEXt or iTXt chunk from Java?

Categories

Resources