decompressing in java a string compressed in visual basic

decompressing in java a string compressed in visual basic - java

I have a problem trying to decompress (in Java) a string that was compressed in Visual Basic, using DeflateStream. I did a little bit of research and saw and tried a lot of code that decompresses in Java using Inflater and InflaterOutputStream. None of the codes found succeeded. I looked over a lot of sites, and searched on google a lot of ways. I can say that the first 3-pages from google that concern java, inflater/deflater compression, visual basic, deflaterStream are already read, without success. I want to know if this thing is possible (and it should be since I have found a way of compressing/decompressing between Java and C# here : http://www.dreamincode.net/forums/topic/243808-compression-and-decompression-between-java-and-c%23/ )
I would also like to know if there are compatibility issues regarding Visual Basic DeflateStream and Java analog classes Inflater/Deflater. If someone had a similar problem I would highly appreciate also some code snippets.
Thanks,
Mihai

Try setting the nowrap parameter in the Inflater class.
Per the documentation, the .NET DeflateStream produces the raw deflate format, RFC 1951. I presume that that's what you're using from Visual Basic.
Also per the documentation, Java's Inflater class decodes the zlib format, RFC 1950. The zlib format is the raw deflate format wrapped with a zlib header and trailer. Normally the Inflater class would fail on raw deflate data from .NET's DeflateStream. Fortunately however the Inflater class provides the nowrap parameter to the constructor (would be set to true), which allows Inflater to decode raw deflate data without the zlib wrapper.
So that should work. What are the first 20 bytes in hexadecimal of the data produced by DeflateStream?

From what I've read, java's deflate/inflate uses Zlib. What does VB use to inflate to? I've failed to google this, but some references seem to indicate it uses ZIP, no zlib.
Have you tried running file compressed_file to determine the format it's compressed to?
In any case, you might want to try apache commons to decompress using the ZIP algorithm.

Ok, so I finally managed to solve this. It was a beginner error, but it took me long enough to solve it. The issue was represented by the fact that when I was reading the initial file (that was compressed and encrypted), I was keeping in memory the content of the file as a string, instead of reading directly the content as an array of bytes. And of course, some of the bytes were not supported by the string encoding and they were lost, and that is why the format of the compression was no recognized. The Inflater/Deflater classes from Java are compatible with the DeflateStream class from vb.net !
Thank you very much for your help and assistance.

Related

Converting base64 proto binary to human readable (json/xml) using .proto schema

I'm new to Google Protobuf. I want to solve an issue and i can't find the answer on the internet, maybe i'm asking the wrong questions..
I want to build a tool which uses a .proto schema (and the generated classes) and convert input strings from readable to base64 string and the reverse (with java)
the purpose is to debug the messages. I don't know exactly which message is encoded (so which class to use..)
how can i proceed please? could you give me some pointers, links to projects or anything that may help me..
edit: for the moment, i will try to use ParseFromString function, with every class until i find the protobuf class that matches the base64 blob ..
Thank you

Protobuf can be used to serialize data on 1 machine and send it to the 2nd machine where it is de-serialized to the original form.
Protobuf uses its own logic to serialize things like Strings, Signed/Unsigned number values, etc
So knowing this, protobuf cannot/shouldnot be used as a converter to convert things into other things. It is used to transport data accross machine boundaries.
Maybe you just want to convert a String into BASE64 representation without protobuf? Then the hint of Thomas Timbul might be useful for you.

Characters altered by Lotus when receiving a POST through a Java WebAgent with OpenURL command

I have a Java WebAgent in Lotus-Domino which runs through the OpenURL command (https://link.com/db.nsf/agentName?openagent). This agent is created for receiving a POST with XML content. Before even parsing or saving the (XML) content, the webagent saves the content into a in-memory document:
For an agent run from a browser with the OpenAgent URL command, the
in-memory document is a new document containing an item for each CGI
(Common Gateway Interface) variable supported by Domino®. Each item
has the name and current value of a supported CGI variable. (No design
work on your part is needed; the CGI variables are available
automatically.)
https://www.ibm.com/support/knowledgecenter/en/SSVRGU_9.0.1/basic/H_DOCUMENTCONTEXT_PROPERTY_JAVA.html
The content of the POST will be saved (by Lotus) into the request_content field. When receiving content with this character: é, like:
<Name xml:lang="en">tést</Name>
The é is changed by Lotus to a ?®. This is also what I see when reading out the request_content field in the document properties. Is it possible to save the é as a é and not a: ?® in Lotus?
Solution:
The way I fixed it is via this post:
Link which help me solve this problem
The solution but in Java:
/****** INITIALIZATION ******/
session = getSession();
AgentContext agentContext = session.getAgentContext();
Stream stream = session.createStream();
stream.open("C:\\Temp\\test.txt", "LMBCS");
stream.writeText(agentContext.getDocumentContext().getItemValueString("REQUEST_CONTENT"));
stream.close();
stream.open("C:\\Temp\\test.txt", "UTF-8");
String Content = stream.readText();
stream.close();
System.out.println("Content: " + Content);

I've dealt with this before, but I no longer have access to the code so I'm going to have to work from memory.
This looks like a UTF-8 vs UTF-16 issue, but there are up to five charsets that can come into play: the charset used in the code that does the POST, the charset of the JVM the agent runs in, the charset of the Domino server code, the charset of the NSF - which is always LMBCS, and the charset of the Domino server's host OS.
If I recall correctly, REQUEST_CONTENT is treated as raw data, not character data. To get it right, you have to handle the conversion of REQUEST_CONTENT yourself.
The Notes API calls that you use to save data in the Java agent will automatically convert from Unicode to LMBCS and vice versa, but this only works if Java has interpreted the incoming data stream correctly. I think in most cases, the JVM running under Domino is configured for UTF-16 - though that may not be the case. (I recall some issue with a server in Japan, and one of the charsets that came into play was one of the JIS standard charsets, but I don't recall if that was in the JVM.)
So if I recall correctly, you need to read REQUEST_CONTENT as UTF-8 from a String into a byte array by using getBytes("UTF-8") and then construct a new String from the byte array using new String(byte[] bytes, "UTF-16"). That's assuming that Then pass that string to NotesDocument.ReplaceItemValue() so the Notes API calls should interpret it correctly.
I may have some details wrong here. It's been a while. I built a database a long time ago that shows the LMBCS, UTF-8 and UTF-16 values for all Unicode characters years ago. If you can get down to the byte values, it can be a useful tool for looking at data like this and figuring out what's really going on. It's downloadable from OpenNTF here. In a situation like this, I recall writing code that got the byte array and converted it to hex and wrote it to a NotesItem so that I could see exactly what was coming in and compare it to the database entries.
And, yes, as per the comments, it's much better if you let the XML tools on both sides handle the charset issues and encoding - but it's not always foolproof. You're adding another layer of charsets into the process! You have to get it right. If the goal is to store data in NotesItems, you still have to make sure that the server-side XML tools decode into the correct charset, which may not be the default.

my heart breaks looking at this. I also just passed through this hell, found the old advice, but... I just could not write to disk to solve this trivial matter.
Item item = agentContext.getDocumentContext().getFirstItem("REQUEST_CONTENT");
byte[] bytes = item.getValueCustomDataBytes("");
String content= new String (bytes, Charset.forName("UTF-8"));
Edited in response to comment by OP: There is an old post on this theme:
http://www-10.lotus.com/ldd/nd85forum.nsf/DateAllFlatWeb/ab8a5283e5a4acd485257baa006bbef2?OpenDocument (the same thread that OP used for his workaround)
the guy claims that when he uses a particular http header the method fails.
Now he was working with 8.5 and using LS. In my case I cannot make it fail by sending an additional header (or in function of the string argument)
How I Learned to Stop Worrying and Love the Notes/Domino:
For what it's worth getValueCustomDataBytes() works only with very short payloads. Dependent on content! Starting your text with an accented character such as 'é' will increase the length it still works with... But whatever I tried I could not get past 195 characters. Am I surprised? After all these years with Notes, I must admit I still am...
Well, admittedly it should not have worked in the first place as it is documented to be used only with User Defined Data fields.
Finally
Use IBM's icu4j and icu4j-charset packages - drop them in jvm/lib/ext. Then the code becomes:
byte[] bytes = item.getText().getBytes(CharsetICU.forNameICU("LMBCS"));
String content= new String (bytes, Charset.forName("UTF-8"));
and yes, will need a permission in java.policy:
permission java.lang.RuntimePermission "charsetProvider";
Is this any better than passing through the file system? Don't know. But kinda looks cleaner.

Uncompress a zlib-compressed string in Java

I have a Java module that is receiving a compressed string from a remote Python script. The Python script compresses the string using zlib.compress(). I simply want to uncompress it in Java and display it to the user.
The man page for Java's built-in zip.Deflater object describes pretty explicitly how to uncompress something that has been compressed using zlib.compress(). However, this method does not work for me. Depending on which encoding I use, I either get "Incorrect Header Check" errors or the uncompression returns an empty string.
So, how am I supposed to uncompress this? The data are not getting corrupted in transmission, and the compressed string begins with "x\x9c", which is apparently appropriate for zlib-compressed stuff.
I've never dealt with compression/uncompression on this level before and am getting confused. For extra credit, I'd appreciate an explanation between compressed/uncompressed and inflated/deflated. According to this they are different, but most of the internet seems to use them interchangeably for zlib. This just makes trying to find a solution even more difficult, as I couldn't tell you whether I'm actually trying to "uncompress" or "inflate" these data.

The confusion has arisen because some bright spark started describing the zlib protocol as "deflate". It might help you to read the RFCs mentioned in these Java docs.
Also this SO topic is quite relevant.
I suggest that you do
print repr(zlib.compress("The quick brown dog etc etc")
in Python (A) and compare the result from using the equivalent Java code using Deflater (B). Also ensure that you can Inflate B to recover your test input. Check that you are not suffering from unicode <-> bytes complications in Python or Java or both.
Have you tried doing a Python "deflate" as per the answer by #patthoyts in the SO topic that you quoted?

It seems Python's zlib.compress() uses gzip, are you sure to create Inflater with nowrap parameter for gzip compatible uncompression?
Inflate/deflate is used only regarding DEFLATE algorithm I believe, whereas compress/uncompress is more general term.

Decoding Java's JSON Unicode values with PHP

I had experienced different JSON encoded value for the same string depending on the language used in the past. Since the APIs were used in closed environment (no 3rd parties allowed), we made a compromise and all our Java applications are manually encoding Unicode characters. LinkedIn's API is returning "corrupted" values, basically the same as our Java applications. I've already posted a question on their forum, the reason I am asking it here as well is quite simple; sharing is caring :) This question is therefore partially connected with LinkedIn, but mostly trying to find an answer to the general encoding problem described below.
As you can see, my last name contains a letter ž, which should be \u017e but Java (or LinkedIn's API for that matter) returns \u009e with JSON and nothing with XML response. PHP's json_decode() ignores it and my last name becomes Kurida.
After an investigation, I've found ž apparently has two representations, 9e and 17e. What exactly is going on here? Is there a solution for this problem?

U+009E is a usually-invisible control character and not an acceptable alternative representation for ž.
The byte 0x9E represents the character ž in Windows code page 1252. That byte, if decoded using ISO-8859-1, would turn into U+009E.
(The confusion comes from the fact that if you write  in an HTML page, the browser doesn't actually give you character U+009E, as you might expect, but converts it to U+017E. The same is true of all the character references 0080–009F: they get changed as if the numbers referred to cp1252 bytes instead of Unicode characters. This is utterly bizarre and wrong behaviour, but all the major browsers do it so we're stuck with it now. Except in proper XHTML served as XML, since that has to follow the more sensible XML rules.)
Looking at the forum page, the JSON-reading is clearly not wrong: your name is registered as being “David Kurid[U+009E]a”. However that data has got into their system needs looking at.

What are the different zlib compression methods and how do I force the default in Java's Deflater?

I am using DeflaterOutputStream to compress data as a part of a proprietary archive file format. I'm then using jcraft zlib code to decompress that data on the other end. The other end is a J2ME application, hence my reliance on third party zip decompression code and not the standard Java libraries.
My problem is that some files zip and unzip just fine, and others do not.
For the ones that do not, the compression method in the first byte of the data seems to be '5'.
From my reading up on zlib, I understand that a default value of '8' indicates the default deflate compression method. Any other value appears to be unacceptable to the decompressor.
What I'd like to know is:
What does '5' indicate?
Why does DeflaterOutputStream use different compression methods some of the time?
Can I stop it from doing that somehow?
Is there another way to generate deflated data that uses only the default compression method?

It might help to hone down exactly what you're looking at.
Before the whole of your data, there's usually a two-byte ZLIB header. As far as I'm aware, the lower 4 bits of the first byte of these should ALWAYS be 8. If you initialise your Deflater in nowrap mode, then you won't get these two bytes at all (though your other library must then be expecting not to get them).
Then, before each individual block of data, there's a 3-bit block header (notice, defined as a number of bits, not a whole number of bytes). Conceivably, you could have a block starting with byte 5, which would indicate a compressed block that is the final block, or with byte 8, which would be a non-compressed, non-final block.
When you create your DeflaterOutputStream, you can pass in a Deflater or your choosing to the constructor, and on that Defalter, there are some options you can set. The level is essentially the amount of look-ahead that the compression uses when looking for repeated patterns in the data; on the offchance, you might try setting this to a non-default value and see if it makes any difference to whether your decompresser can cope.
The strategy setting (see the setStrategy() method) can be used in some special circumstances to tell the deflater to only apply huffman compression. This can occasionally be useful in cases where you have already transformed your data so that frequencies of values are near negative powers of 2 (i.e. the distribution that Huffman coding works best on). I wouldn't expect this setting to affect whether a library can read your data, but juuust on the offchance, you might just try changing this setting.
In case its helpful, I've written a little bit about configuring Deflater, including the use of huffman-only compression on transformed data. I must admit, whatever options you choose, I'd really expect your library to be able to read the data. If you're really sure your compressed data is correct (i.e. ZLIB/Inflater can re-read your file), then you might consider just using another library...!
Oh, and stating the bleeding obvious but I'll mention it anyway, if your data is fixed you can of course just stick it in the jar and it'll effectively be deflated/inflater "for free". Ironically, your J2ME device MUST be able to decode zlib-compressed data, because that's essentially the format the jar is in...

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.