Obtain a string from the compressed data and vice versa in java - java

I want to compress a string(an XML Document) in Java and store it in Cassandra db as varchar. I should be able to decompress it while reading from db. I looked into GZIP and lz4 and both return a byte array on compressing.
My goal is to obtain a string from the compressed data which can also be used to decompress and get back the original string.
What is the best possible approach?

I don't see any good reasons for you to compress your data: Cassandra can do it for you transparently (it will LZ4 your data by default). So, if your goal is to reduce your data footprint then you have a non-existent problem, and I'd feed the XML document directly to C*.
By the way, all the compression algorithms take array of bytes and produce array of bytes. As a solution, you could apply something like a base64 encoding to your compressed byte array. On decompression, reverse the logic: decode base64 your string and then apply your decompression algorithm.

Not enough reputation to comment so posting as an answer. If you want a string back, then significant compression will depend on your data. A very simple solution might be something like Java compressing Strings but that would work if your string is only characters and no numbers. You can modify this solution to work for most characters but then if you don't have repeating characters then you might actually get a larger string than your original one.

Related

Different same but same result with zlib

I have duplicate check who didn't work because my zlib hash are different for a same file.
I got a encrypted data (XML file) with AES from my client.
I decrypted the data (with Cipher) and got a byte array of the data zipped and base64 encoded.
I decode base64, unzlib and got my XML file.
If I do it again, I got a different base64 out of the Cipher. I decode it, unzlib and got exactly the same XML as below.
With this problem my duplicate check didnt work because base64 value is different and I didn't understand why.
My base64 value is around 3000 char and only the 10-15 last char are differents.
Actually this software is in PHP and all is good with it. On the new server in JAVA we got this error.
So the client data are correct, JAVA do something I can't explain.
Any idea ?
Thanks
Your question is rather difficult to parse, but I think what you're saying is that if you decompress something compressed by PHP and then recompress it with Java, you get different compressed data. When you decompress that data, you get exactly the original uncompressed data.
If that is correct, then there is no problem. There is no assurance that a different compressor will produce the same result, or even the same compressor since you can have different settings, or even the same compressor with the same settings, since you could be using a different version. "I decode it, unzlib and got exactly the same XML as below.", means that all the compressors and decompressors are doing what they are supposed to do. There is no assurance that decompression followed by compression will ever produce exactly the same result. The only assurance of a lossless compressor is that compression followed by decompression will produce exactly the same result.
You are creating a problem for yourself with "I have duplicate check". Checking the compressed data does not check for duplicated uncompressed data. If you want to look for duplicates, or if you want to check the integrity of your compression, transmission, and decompression process, then you need to do both using the uncompressed data, not the compressed data.

Compressing content of Java string variable

I have a Java application and I would like to compress content of certain String variable (as it's a huge JSON). Now, i saw a lot of examples where i can compress String into byte array, but as my application has to (optionally!) send that compressed data as a REST response (i'm using Java Spark) i would like to compress that String into another (smaller) string.
Not sure if that's really possible, that's why i'm here :)
Why i don't wanna send byte array over network? Because my response has two parts - metadata and actual data returned from the DB. I would like to keep metadata readable and only actual data compressed.
Is there a way to achieve this?

How may I store images in a database using the Inubit tool set?

I am learning Inubit. I want to know, how may I store images in a database using the Inubit tool set?
The question is more than a year old. I guess you solved it by now.
For all others coming here, let me sketch out the typical way you'd do that.
0. (optional) Compress data.
Depending on the compression of the image (e.g. its GIF, PDF, uncompressed TIFF, etc. and not JPEG), you might want to compress it via a Compressor module first to reduce needed database space and increase overall performance on the next steps. Be sure to compress the binary data and not the base64-encoded string (see next step)!
1. Encode binary stream to base64.
Depending on where you get the image
data from, chances are that it already is base64 encoded. E.g. you
used a file connector to retrieve it from disk with the appropriate option checked or used a web service
connector. If you really have a binary data stream, convert it to
base64 using an encoder module (better self-documenting) or using a variable
assignment using the XPATH-function isxp:encode (more concise).
2. Save the encoded data via a database connector.
Well, the details
for doing this right are pretty much database specific. The cheap
trick that should work on any database, is storing the base64-string
simply as a string in a TEXT / CLOB column. This will waste about
three times as much space in the database as the original binary
data, since base64 is poorly packed. Doing it right would mean to
construct a forced SQL query in an XSLT that decodes the
base64-string to binary and stores it. Here is some reference
to how it can be done in Oracle.
Hope, this might be of some help.
Cheers,
Jörn
Jörn Willhöft
Willhöft IT-Beratung GmbH, Berlin, Germany
You do not store the image in the database, you only record the path to the image. The Image will be stored on the server.
Here is an example of how to store the path to the image : How to insert multiple images path to database

Storing files on Database

i have to save a file in any format (XLS, PDF, DOC, JPG ....) in a database using Java. in my experience i would have do this by storing the binary data of the file into a BLOB type field, someone told me that an alternative is coding the binary data as Text using BASE64 and store the string in a TEXT type field. Which one is the best option to performn this task?.
Thanks.
Paul Manjarres
BLOB would be better, simply because you can use a byte[] data type, and you don't have to encode/decode from BASE64. No reason to use BASE64 for simple storage.
The argument for using BLOB is that it takes fewer CPU cycles, less disk and network i/o, less code, and reduces the likelihood of bugs:
As Will Hartung says, using BLOB enables you to skip the encode/decode steps, which will reduce CPU cycles. Moreover, there are many Java libraries for Base64 encoding and decoding, and there are nuances in implementation (ie PEM line wraps). This means to be safe, the same library should be used for encoding and decoding. This creates an unnecessary coupling between the application which creates the record, and the application that reads the record.
The encoded output will be larger than the raw bytes, which means it will take up more disk space (and network i/o).
Use BLOB to put them in database
FILE to BLOB = DB will not query the content and treat it as, as a ... well ... a meaningless binary BLOB regardless of its content. DB knows this field may be 1KB or 1GB and allocates resources accordingly.
FILE to TEXT = DB can query this thing. Strings can be searched replaced modified in the file. But this time DBMS will spend more resources to make this thing work. There may be a 100 char long text inside a field which may or may not be storing 1 million char long text. Files can have any kind of text encoding and invalid characters may be lost due to table/DB encoding settings.No need to use this if content of the files will not be used in SQL queries.
BASE64 = Converts any content to a lovely super valid text. A work around to bypass every compatibility issue. Store anywhere, print it, telegraph it, write it on a paper, convert your favorite selfie to a private key. Output will be meaningless and bigger but it will be an ordinary text.

Efficient JSON encoding for data that may be binary, but is often text

I need to send a JSON packet across the wire with the contents of an arbitrary file. This may be a binary file (like a ZIP file), but most often it will be plain ASCII text.
I'm currently using base64 encoding, which handles all files, but it increases the size of the data significantly - even if the file is ASCII to begin with. Is there a more efficient way I can encode the data, other than manually checking for any non-ASCII characters and then deciding whether or not to base64-encode it?
I'm currently writing this in Python, but will probably need to do the same in Java, C# and C++, so an easily portable solution would be preferable.
Use quoted-printable encoding. Any language should support that.
http://en.wikipedia.org/wiki/Quoted-printable

Categories

Resources