I am trying to get the metadata of an apng image at the moment. I have
been able to get different frames from one apng file flawlessly and i am using PNGJ (a really great Standalone Java library for reading and writing PNG images), but I
am not able to get the different info that is stored against every
apng frame like delay of every frame.
I am at the moment just able to get the simple png image info that is stored in the header part by using
PngReader pngr = FileHelper.createPngReader(File);
pngr.imgInfo;
But I don't know how to have the information stored against the fcTL chunk. How can I do that?
You omitted the information that you are using the PNGJ library. As I mentioned in the other answer, this library does not parse APGN chunks (fcTL, fdAT). It loads them (you can inspect them in the ChunksList property) but they will be instatiated as "UNKNOWN" chunks, hence the binary data will be left in raw form. If you want to look inside the content of the fcTL chunks, you'd either parse the binary yourself, or implement youself the logic for that chunk type and register it in the reader (here's an example for a custom chunk).
Look at how you're currently reading 4-bytes integer 'seq' from fdAT.
You can read information from fcTL the same way.
Just keep in mind that some info is stored in fcTL as 4 bytes, some as 2 bytes, and some as 1 byte.
Related
I have a component that converts PDF documents to images, one image per page. Since the component uses converters producing in-memory images, it hits the JVM heap heavily and takes some time to finish conversions.
I'm trying to improve the overall performance of the conversion process, and found a native library with a JNI binding to convert PDFs to TIFFs. That library can convert PDFs to single TIFF files only (requires intermediate file system storage; does not even consume conversion streams), therefore result TIFF files have converted pages embedded, and not per-page images on the file system. Having a native library improves the overall conversion drastically and the performance gets really faster, but there is a real bottleneck: since I have to make a source-page to destination-page conversion, now I must extract every page from the result file and write all of them elsewhere. A simple and naive approach with RenderedImages:
final SeekableStream seekableStream = new FileSeekableStream(tempFile);
final ImageDecoder imageDecoder = createImageDecoder("tiff", seekableStream, null);
...
// V--- heap is wasted here
final RenderedImage renderedImage = imageDecoder.decodeAsRenderedImage(pageNumber);
// ... do the rest stuff ...
Actually speaking, I would really like just to extract a concrete page input stream from the TIFF container file (tempFile) and just redirect it to elsewhere without having it to be stored as an in-memory image. I would imagine an approach similar to containers processing where I need to seek for a specific entry to extract data from it (say, something like ZIP files processing, etc). But I couldn't find anything like that in ImageDecoder, or I'm probably wrong with my expectations and just missing something important here...
Is it possible to extract TIFF container page input streams using JAI API or probably third-party alternatives? Thanks in advance.
I could be wrong, but don't think JAI has support for splitting TIFFs without decoding the files to in-memory images. And, sorry for promoting my own library, but I think it does exactly what you need (the main part of the solution used to split TIFFs is contributed by a third party).
By using the TIFFUtilities class from com.twelvemonkeys.contrib.tiff, you should be able to split your multi-page TIFF to multiple single-page TIFFs like this:
TIFFUtilities.split(tempFile, new File("output"));
No decoding of the images are done, only splitting each IFD into a separate file, and writing the streams with corrected offsets and byte counts.
Files will be named output/0001.tif, output/0002.tif etc. If you need more control over the output name or have other requirements, you can easily modify the code. The code comes with a BSD-style license.
I am learning Inubit. I want to know, how may I store images in a database using the Inubit tool set?
The question is more than a year old. I guess you solved it by now.
For all others coming here, let me sketch out the typical way you'd do that.
0. (optional) Compress data.
Depending on the compression of the image (e.g. its GIF, PDF, uncompressed TIFF, etc. and not JPEG), you might want to compress it via a Compressor module first to reduce needed database space and increase overall performance on the next steps. Be sure to compress the binary data and not the base64-encoded string (see next step)!
1. Encode binary stream to base64.
Depending on where you get the image
data from, chances are that it already is base64 encoded. E.g. you
used a file connector to retrieve it from disk with the appropriate option checked or used a web service
connector. If you really have a binary data stream, convert it to
base64 using an encoder module (better self-documenting) or using a variable
assignment using the XPATH-function isxp:encode (more concise).
2. Save the encoded data via a database connector.
Well, the details
for doing this right are pretty much database specific. The cheap
trick that should work on any database, is storing the base64-string
simply as a string in a TEXT / CLOB column. This will waste about
three times as much space in the database as the original binary
data, since base64 is poorly packed. Doing it right would mean to
construct a forced SQL query in an XSLT that decodes the
base64-string to binary and stores it. Here is some reference
to how it can be done in Oracle.
Hope, this might be of some help.
Cheers,
Jörn
Jörn Willhöft
Willhöft IT-Beratung GmbH, Berlin, Germany
You do not store the image in the database, you only record the path to the image. The Image will be stored on the server.
Here is an example of how to store the path to the image : How to insert multiple images path to database
I am using android pdf writer
(apw) in my app successfully for the most part. However, when I try to include a high resolution in a pdf document, I get an out of memory exception.
Immediately before creating the pdf file, the library must have the content itself converted into a string (representing the raw pdf content), which is then converted to a byte array. The byte array is written to the file in a file output stream (see example via website).
The out of memory expection occurs when the string is generated because representing all the pixels of a bitmap image in string format is very memory intensive. I could downsample the image using the android API, however, it is essential that the images are put into the pdf at high resolution (~2000 x 1000).
There are many scanner type apps which seem to be able to able generate pdf high res images, so there must be a way around it, surely. Granted, they may be using other libraries, but surely there is someone who has figured out a way around it with this library given that it is free and therefore popular(?)
I emailed the developer, but there was no response.
Potential solutions (I can think of) include:
Modifying the library to load a string representing e.g. the first 10% of the PDF, and writing to file chunk by chunk. (edit)
Modifying the library to output a stringoutput stream, or other output stream to a temp file (or final file) as the actual pdf content is being written in the pdfwriter object.
However as a relative java noob (and even more of a pdf specification noob), I am unable to understand the library well enough to do this myself.
Has anyone come across this problem and found a way around it? Anyone willing to hazard a suggestion, or take a look at the library itself even to see if there is a fix of some sort.
Thanks for your help.
nme32
Edit:
Logcat says heap size is in the range on 40 to 60mb before the crash. I understand (do correct me if not) that Android limits the available memory to apps depending on what else is running, though it is in the 50mb ballpark, depending on device.
When loading the image, I think APW essentially converts it to bitmap, that is represents the image pixel by pixel then puts it into string format, meaning it doesn't matter which image format you use, it may as well be bitmap.
First of all the resolution you are mentioning is very high. And i have already mentioned the issues related to Images in Android in this Answer
Secondly in case first solution doesn't work for you i would suggest Disk based LruCache.And store the chunks into that disk based cache and then retrieve and use it. Here is an Example of that.
Hope this would help. If it doesn't comment on this answer and i will add more solutions.
There is a very large image (~200MB) in HDFS (block size 64MB). I want to know the following:
How to read the image in a mapReduce job?
Many topics suggest WholeInputFormat. Is there any other alternative and how to do it?
When WholeInputFormat is used, will there be any parallel processing of the blocks? I guess no.
If your block size is 64 MB, most probably HDFS would have split your image file into chunks and replicated it across the cluster, depending on what your cluster configuration is.
Assuming that you want to process your image file as 1 record rather than multiple blocks/line by line, here are a few options I can think of to process image file as a whole.
You can implement a custom input format and a record reader. The isSplitable() method in the input format should return false. The RecordReader.next( LongWritable pos, RecType val ) method should read the entire file and set val to the file contents. This will ensure
that the entire file goes to one map task as a single record.
You can sub-class the input format and override the isSplitable() method so that it returns false. This example shows how create a sub-class
SequenceFileInputFormat to implement a NonSplittableSequenceFileInputFormat.
I guess it depends on what type of processing you want to perform. If you are trying to perform something that can be done first splitting the big input into smaller image files and then independently processing the blocks and finally stitching the outputs parts back into large final output - then it may be possible. I'm no image expert but suppose if you want to make a color image into grayscale then you may be cut the large image into small images. Then convert them parallelly using MR. Once the mappers are done then stitch them back to one large grayscale image.
If you understand the format of the image then you may write your own recordreader to help the framework understand the record boundaries preventing corruption when they are inputted to the mappers.
Although you can use WholeFileInputFormat or SequenceFileInputFormat or something custom to read the image file, the actual issue(in my view) is to draw something out of the read file. OK..You have read the file, now what??How are you going to process your image to detect any object inside your mapper. I'm not saying it's impossible, but it would require a lot work to be done.
IMHO, you are better off using something like HIPI. HIPI provides an API for performing image processing tasks on top of MapReduce framework.
Edit :
If you really want to do it your way, then you need to write a custom InputFormat. Since images are not like text files, you can't use delimiters like \n for split creation. One possible workaround could be to create splits based on some given number of bytes. For example, if your image file is of 200MB, you could write an InputFormat which will create splits of 100MB(or whatever you give as a parameter in your Job configuration). I had faced such a scenario long ago while dealing with some binary files and this project had helped me a lot.
HTH
I am looking for the best possible way to create a file store. In this file store I will be storing information on contact details etc. I will need to modify the details in the text file. I just wanted to get some opinions of what would be the best way to do this?
It depends on the size of the project.
For small projects: you can use something simple like XML, JSON, etc...
For bigger projects: you should use an SQL database, like MySQL, SQLite (very handy!)
And it is always nice to use SQLite! It is a simple SQL database library that stores the databases in one single file.
Since you have to use something simple with basic File IO, I would suggest taking one of the simple formats, like XML, JSON, CSV. Or to score better, write your own binary files, using DataOutputStream and DataInputStream.
Your file format could be something like this:
arbitrary number of bytes: fixed header (like: "PendoContactsFormat")
4 bytes (ie: int), number of contacts
(for each contact:)
2 bytes (ie: short), number of fields in this contact
(for each field in this contact:)
2 bytes: size of the field header
n bytes: field header
2 bytes: size of the field value
n bytes: field value
CSV may be the right format, if you want to be able to edit this file manually and let non-programmers to edit it.