Different same but same result with zlib

Different same but same result with zlib - java

I have duplicate check who didn't work because my zlib hash are different for a same file.
I got a encrypted data (XML file) with AES from my client.
I decrypted the data (with Cipher) and got a byte array of the data zipped and base64 encoded.
I decode base64, unzlib and got my XML file.
If I do it again, I got a different base64 out of the Cipher. I decode it, unzlib and got exactly the same XML as below.
With this problem my duplicate check didnt work because base64 value is different and I didn't understand why.
My base64 value is around 3000 char and only the 10-15 last char are differents.
Actually this software is in PHP and all is good with it. On the new server in JAVA we got this error.
So the client data are correct, JAVA do something I can't explain.
Any idea ?
Thanks

Your question is rather difficult to parse, but I think what you're saying is that if you decompress something compressed by PHP and then recompress it with Java, you get different compressed data. When you decompress that data, you get exactly the original uncompressed data.
If that is correct, then there is no problem. There is no assurance that a different compressor will produce the same result, or even the same compressor since you can have different settings, or even the same compressor with the same settings, since you could be using a different version. "I decode it, unzlib and got exactly the same XML as below.", means that all the compressors and decompressors are doing what they are supposed to do. There is no assurance that decompression followed by compression will ever produce exactly the same result. The only assurance of a lossless compressor is that compression followed by decompression will produce exactly the same result.
You are creating a problem for yourself with "I have duplicate check". Checking the compressed data does not check for duplicated uncompressed data. If you want to look for duplicates, or if you want to check the integrity of your compression, transmission, and decompression process, then you need to do both using the uncompressed data, not the compressed data.

Related

Obtain a string from the compressed data and vice versa in java

I want to compress a string(an XML Document) in Java and store it in Cassandra db as varchar. I should be able to decompress it while reading from db. I looked into GZIP and lz4 and both return a byte array on compressing.
My goal is to obtain a string from the compressed data which can also be used to decompress and get back the original string.
What is the best possible approach?

I don't see any good reasons for you to compress your data: Cassandra can do it for you transparently (it will LZ4 your data by default). So, if your goal is to reduce your data footprint then you have a non-existent problem, and I'd feed the XML document directly to C*.
By the way, all the compression algorithms take array of bytes and produce array of bytes. As a solution, you could apply something like a base64 encoding to your compressed byte array. On decompression, reverse the logic: decode base64 your string and then apply your decompression algorithm.

Not enough reputation to comment so posting as an answer. If you want a string back, then significant compression will depend on your data. A very simple solution might be something like Java compressing Strings but that would work if your string is only characters and no numbers. You can modify this solution to work for most characters but then if you don't have repeating characters then you might actually get a larger string than your original one.

How may I store images in a database using the Inubit tool set?

I am learning Inubit. I want to know, how may I store images in a database using the Inubit tool set?

The question is more than a year old. I guess you solved it by now.
For all others coming here, let me sketch out the typical way you'd do that.
0. (optional) Compress data.
Depending on the compression of the image (e.g. its GIF, PDF, uncompressed TIFF, etc. and not JPEG), you might want to compress it via a Compressor module first to reduce needed database space and increase overall performance on the next steps. Be sure to compress the binary data and not the base64-encoded string (see next step)!
1. Encode binary stream to base64.
Depending on where you get the image
data from, chances are that it already is base64 encoded. E.g. you
used a file connector to retrieve it from disk with the appropriate option checked or used a web service
connector. If you really have a binary data stream, convert it to
base64 using an encoder module (better self-documenting) or using a variable
assignment using the XPATH-function isxp:encode (more concise).
2. Save the encoded data via a database connector.
Well, the details
for doing this right are pretty much database specific. The cheap
trick that should work on any database, is storing the base64-string
simply as a string in a TEXT / CLOB column. This will waste about
three times as much space in the database as the original binary
data, since base64 is poorly packed. Doing it right would mean to
construct a forced SQL query in an XSLT that decodes the
base64-string to binary and stores it. Here is some reference
to how it can be done in Oracle.
Hope, this might be of some help.
Cheers,
Jörn
Jörn Willhöft
Willhöft IT-Beratung GmbH, Berlin, Germany

You do not store the image in the database, you only record the path to the image. The Image will be stored on the server.
Here is an example of how to store the path to the image : How to insert multiple images path to database

Storing files on Database

i have to save a file in any format (XLS, PDF, DOC, JPG ....) in a database using Java. in my experience i would have do this by storing the binary data of the file into a BLOB type field, someone told me that an alternative is coding the binary data as Text using BASE64 and store the string in a TEXT type field. Which one is the best option to performn this task?.
Thanks.
Paul Manjarres

BLOB would be better, simply because you can use a byte[] data type, and you don't have to encode/decode from BASE64. No reason to use BASE64 for simple storage.

The argument for using BLOB is that it takes fewer CPU cycles, less disk and network i/o, less code, and reduces the likelihood of bugs:
As Will Hartung says, using BLOB enables you to skip the encode/decode steps, which will reduce CPU cycles. Moreover, there are many Java libraries for Base64 encoding and decoding, and there are nuances in implementation (ie PEM line wraps). This means to be safe, the same library should be used for encoding and decoding. This creates an unnecessary coupling between the application which creates the record, and the application that reads the record.
The encoded output will be larger than the raw bytes, which means it will take up more disk space (and network i/o).

Use BLOB to put them in database

FILE to BLOB = DB will not query the content and treat it as, as a ... well ... a meaningless binary BLOB regardless of its content. DB knows this field may be 1KB or 1GB and allocates resources accordingly.
FILE to TEXT = DB can query this thing. Strings can be searched replaced modified in the file. But this time DBMS will spend more resources to make this thing work. There may be a 100 char long text inside a field which may or may not be storing 1 million char long text. Files can have any kind of text encoding and invalid characters may be lost due to table/DB encoding settings.No need to use this if content of the files will not be used in SQL queries.
BASE64 = Converts any content to a lovely super valid text. A work around to bypass every compatibility issue. Store anywhere, print it, telegraph it, write it on a paper, convert your favorite selfie to a private key. Output will be meaningless and bigger but it will be an ordinary text.

Efficient JSON encoding for data that may be binary, but is often text

I need to send a JSON packet across the wire with the contents of an arbitrary file. This may be a binary file (like a ZIP file), but most often it will be plain ASCII text.
I'm currently using base64 encoding, which handles all files, but it increases the size of the data significantly - even if the file is ASCII to begin with. Is there a more efficient way I can encode the data, other than manually checking for any non-ASCII characters and then deciding whether or not to base64-encode it?
I'm currently writing this in Python, but will probably need to do the same in Java, C# and C++, so an easily portable solution would be preferable.

Use quoted-printable encoding. Any language should support that.
http://en.wikipedia.org/wiki/Quoted-printable

What are the different zlib compression methods and how do I force the default in Java's Deflater?

I am using DeflaterOutputStream to compress data as a part of a proprietary archive file format. I'm then using jcraft zlib code to decompress that data on the other end. The other end is a J2ME application, hence my reliance on third party zip decompression code and not the standard Java libraries.
My problem is that some files zip and unzip just fine, and others do not.
For the ones that do not, the compression method in the first byte of the data seems to be '5'.
From my reading up on zlib, I understand that a default value of '8' indicates the default deflate compression method. Any other value appears to be unacceptable to the decompressor.
What I'd like to know is:
What does '5' indicate?
Why does DeflaterOutputStream use different compression methods some of the time?
Can I stop it from doing that somehow?
Is there another way to generate deflated data that uses only the default compression method?

It might help to hone down exactly what you're looking at.
Before the whole of your data, there's usually a two-byte ZLIB header. As far as I'm aware, the lower 4 bits of the first byte of these should ALWAYS be 8. If you initialise your Deflater in nowrap mode, then you won't get these two bytes at all (though your other library must then be expecting not to get them).
Then, before each individual block of data, there's a 3-bit block header (notice, defined as a number of bits, not a whole number of bytes). Conceivably, you could have a block starting with byte 5, which would indicate a compressed block that is the final block, or with byte 8, which would be a non-compressed, non-final block.
When you create your DeflaterOutputStream, you can pass in a Deflater or your choosing to the constructor, and on that Defalter, there are some options you can set. The level is essentially the amount of look-ahead that the compression uses when looking for repeated patterns in the data; on the offchance, you might try setting this to a non-default value and see if it makes any difference to whether your decompresser can cope.
The strategy setting (see the setStrategy() method) can be used in some special circumstances to tell the deflater to only apply huffman compression. This can occasionally be useful in cases where you have already transformed your data so that frequencies of values are near negative powers of 2 (i.e. the distribution that Huffman coding works best on). I wouldn't expect this setting to affect whether a library can read your data, but juuust on the offchance, you might just try changing this setting.
In case its helpful, I've written a little bit about configuring Deflater, including the use of huffman-only compression on transformed data. I must admit, whatever options you choose, I'd really expect your library to be able to read the data. If you're really sure your compressed data is correct (i.e. ZLIB/Inflater can re-read your file), then you might consider just using another library...!
Oh, and stating the bleeding obvious but I'll mention it anyway, if your data is fixed you can of course just stick it in the jar and it'll effectively be deflated/inflater "for free". Ironically, your J2ME device MUST be able to decode zlib-compressed data, because that's essentially the format the jar is in...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.