GIF 87a encoding in Java - java

I've done some research to do a GIF (version 87a) encoder but there are some implementation specifics I can't find specially about the data blocks
First, I can only check how many bytes each block will have when I reach either 255 bytes or the end of the image right? There's no way to tell in advance how many bytes I will need.
Second, as the gif encoding is in little endian, how can I write the resulting integers of the LZW compression in java and how should I align them? I looked at ByteBuffer (I'm coding in Java) but as the integers won't necessarily fit in in a byte it won't work right? How should I do then?
At last, between data blocks, should I start a new LZW compression or just continue where I left off in the previous block?

Related

Maintaining LSB through JPG compilation - is it possible?

This is one of those "pretty sure we found the answer, but hoping we're wrong" questions. We are looking at a steganography problem and it's not pretty.
Situation:
We have a series of images. We want to mark them (watermark) so the watermarks survive a series of conditions. The kicker is, we are using a lossfull format, JPG, rather than lossless such as PNG. Our watermarks need to survive screenshotting and, furthermore, need to be invisible to the naked eye. Finally, they need contain at least 32 bytes of data (we expect them to be repeating patterns across an image, of course). Due to the above, we need to hide the information in the pixels themselves. I am trying a Least Significant Bit change, including using large blocks per "bit" (I tried both increments of 16 as these are the jpg compression algorithms size chunks from what we understand, as well as various prime numbers) and reading the average of the resulting block. This sort of leads to requirements:
Must be .jpg
Must survive the jpg compression algorithm
Must survive screenshotting (assume screenshots are saved losslessly)
Problem:
JPG compression, even 100% "minimum loss" changes the pixel values. EG if we draw a huge band across an image setting the Red channel to 255 in a block 64 pixels high, more than half are not 255 in the compiled image. This means that even using an average of the blocks yields the LSB to be random, rather than what we "encoded". Our current prototype can take a random image, compress the message into a bit-endoded string and convert it to a XbyX array which is then superimposed on the image using the LSB of one of the three color-channels. This works and is detectable while it remains a BufferedImage, but once we convert to a JPG the compression destroys the message.
Question:
Is there a way to better control a jpg compression's pixel values? Or are we simply SOOL here and need to drop this avenue, either shifting to PNG output (unlikely) or need to understand the JPG compression's algorithm at length and use it to somehow determine LSB pattern outcomes? Preferably java, but we are open to looking at alternative languages if there are any that can solve our problem (our current PoC is in java)

Generate Image using Text

I visited this website,
https://xcode.darkbyte.ru/
Basically the website takes a text as Input and generates an Image.
It also takes an image as input and decodes it back to text.
I really wish to know what this is called and how it is done
I'd like to know the algorithm [preferably in Java]
Please help, Thanks in advance
There are many ways to encode a text (series of bytes) as an image, but the site you quoted does it in a pretty simple and straightforward way. And you can reverse-engineer it easily:
Up to 3 chars are coded as 1 pixel; 4 chars as 2 pixels -- we learn from this that only R(ed), G(reen) and B(lue) channels for each pixel are used (and not alpha/transparency channel).
We know PNG supports 8 bits per channel, and each ASCII char is 8 bits wide. Let's test if first char (first 8 bits) are stored in red channel.
Let's try z... Since z is relatively high in ASCII table (122) and . is relatively low (46) -- we expect to get a redish 1x1 PNG. And we do.
Let's try .z.. Is should be greenesh.. And it is.
Similarly for ..z we get a bluish pixel.
Now let's see what happens with a non-ASCII input. Try entering: ① (unicode char \u2460). The site html-encodes the string into ① and then encodes that ASCII text into the image as before.
Compression. When entering a larger amount of text, we notice the output is shorter then expected. It means the back-end is running some compression algorithm on raw input before (or after?) encoding it as image. By noticing the resolution of the image and maximum information content (HxWx3x8 bits) being smaller than input, we can conclude the compression is done before encoding to image, and not after (thus not relying to PNG compression). We could go further in detecting which compression algorithm is used by encoding the raw input with the common culprits like Huffman coding, Lempel-Zip, LZW, DEFLATE, even Brotli, and comparing the output with bytes from image pixels. (Note we can't detect it directly by inspecting a magic prefix, chances being author stripped anything but the raw compressed data.)

Compressing unicode characters

I am using GZIPOutputStream in my java program to compress big strings, and finally storing it in database.
I can see that while compressing English text, I am achieving 1/4 to 1/10 compression ration (depending on the string value). So say for example my original English text is 100kb, then on an average compressed text will be somewhere around 30kb.
But when I am compressing unicode characters, the compressed string is actually occupying more bytes than the original string. Say for example, my original unicode string is 100kb, then the compressed version is coming out to 200kb.
Unicode string example: "嗨,这是,短信计数测试持续for.Hi这是短"
Can anyone suggest that how can I achieve compression for unicode text as well? and why the compressed version is actually bigger than the original version?
My compression code in Java:
ByteArrayOutputStream baos = new ByteArrayOutputStream();
GZIPOutputStream zos = new GZIPOutputStream(baos);
zos.write(text.getBytes("UTF-8"));
zos.finish();
zos.flush();
byte[] udpBuffer = baos.toByteArray();
Java's GZIPOutputStream uses the Deflate compression algorithm to compress data. Deflate is a combination of LZ77 and Huffman coding. According to Unicode's Compression FAQ:
Q: What's wrong with using standard compression algorithms such as Huffman coding or patent-free variants of LZW?
A: SCSU bridges the gap between an 8-bit based LZW and a 16-bit encoded Unicode text, by removing the extra redundancy that is part of the encoding (sequences of every other byte being the same) and not a redundancy in the content. The output of SCSU should be sent to LZW for block compression where that is desired.
To get the same effect with one of the popular general purpose algorithms, like Huffman or any of the variants of Lempel-Ziv compression, it would have to be retargeted to 16-bit, losing effectiveness due to the larger alphabet size. It's relatively easy to work out the math for the Huffman case to show how many extra bits the compressed text would need just because the alphabet was larger. Similar effects exist for LZW. For a detailed discussion of general text compression issues see the book Text Compression by Bell, Cleary and Witten (Prentice Hall 1990).
I was able to find this set of Java classes for SCSU compression on the unicode website, which may be useful to you, however I couldn't find a .jar library that you could easily import into your project, though you can probably package them into one if you like.
I don't really know Chinese, but as far as I know te GZIP compression depends on repeating sequences of text and those repeating sequences are changed with "descriptions" (this is a very high level explanation). This means if you have a word "library" on 20 places in a string the algorithm will store the word "library" on the side and than note that it should appear on places x, y, z... So, you might not have a lot of redundancy in your original string so you cannot save a lot. Instead, you have more overhead than savings.
I'm not really a compression expert, and I don't know the details, but this is the basic principle of the compression.
P.S
This question might just be a duplicate of: Why gzip compressed buffer size is greater then uncompressed buffer?

How to write a TIFF from a 2D array of floats in Java?

In a Java program I have a 1024 x 1024 array of floats. How can I write a TIFF file corresponding to the image represented by this array?
Clarifications:
I am asking for a code snippet illustrating how to write a TIFF corresponding to the array of floats.
I'm looking for a grayscale image.
I know how to convert the 1024 x 1024 array of floats into any other 1024 x 1024 array of numerical values; e.g. if the method you have in mind requires, say, 1024 x 1024 floats in the range [0, 1.0), no problem, I know how to convert my data so that this constraint holds.
Thanks!
kjo
The problem that you will have is that, while it is possible to have floating point values for pixel data in TIFF, this is not part of the baseline specification. TIFF is a mushy enough spec to allow floating point samples, but not to standardize their semantic meaning. For example, I had a customer who had floating point samples generated by a Java app (using ImageJ, I believe) and expected us to read them correctly. ImageJ had put in a badly serialized hashtable into one of the description strings so I had to give them code that would work for that sample file but probably for no others. Don't be that Java app. And if you're going to use ImageJ to write floating point TIFFs, normalize your data between 0 and 1, because then I can guarantee that at least my tools will read it correctly without depending on semantic meaning.
While the baseline spec says that 16 bit per channel samples aren't part of the baseline, they are more likely to be be recognized by current TIFF consumers. So you might be happier in the long run writing grayscale with 16-bit samples in the range 0..65535, if you're hell-bent on writing TIFF.
If you think that you're going to write a non-compliant TIFF, just write your own file format and publish the spec and the reading and writing code. If you shoe-horn it into TIFF, you are creating a new format anyway and you will break most TIFF consuming applications as a side-effect. Which is better for the ecosystem?
Remember, when you write a bad TIFF, an angel gets set on fire.
AFAIU JAI can write TIFF files.
The canonical standard for handling TIFF images is the libtiff library, which is written in C.
It is possible to call native C code from Java.

Huffman coding in Java

I want encode every file by Huffman code.
I have found the length of bits per symbol (its Huffman code).
Is it possible to encode a character into a file in Java: are there any existing classes that read and write to a file bit by bit and not with minimum dimension of char?
You could create a BitSet to store your encoding as you are creating it and simply write the String representation to a file when you are done.
You really don't want to write single bits to a file, believe me. Usually we define a byte buffer, build the "file" in memory and, after all work is done, write the complete buffer. Otherwise it would take forever (nearly).
If you need a fast bit vector, then have a look at the colt library. That's pretty convenient if you want to write single bits and don't do all this bit shifting operations on your own.
I'm sure there are Huffman classes out there, but I'm not immediately aware of where they are. If you want to roll your own, two ways to do this spring to mind immediately.
The first is to assemble the bit strings in memory my using mask and shift operators and accumulate the bits into larger data objects (i.e. ints or longs) and then write those out to file with standard streaming.
The second, more ambitious and self-contained idea would be to write an implementation of OutputStream that has a method for writing a single bit and then this OutputStream class would do the aforementioned buffering/shifting/accumulating itself and probably pass the results down to a second, wrapped OutputStream.
Try writing a bit vector in java to do the bit representation: it should allow you to set/reset the individual bits in a bit stream.
The bit stream can thus hold your Huffman encoding. This is the best approach, and lightning fast too.
Huffmann sample analysis here
You can find a working (and fast) implementation here: http://code.google.com/p/kanzi/source/browse/src/kanzi/entropy/HuffmanTree.java

Categories

Resources