Whirlpool hash in java and in python give different results

Whirlpool hash in java and in python give different results - java

I have two projects. panager and panager-android. I use the whirlpool hash algorithm and with the same data panager gives different results than panager-android.
panager is written in python and panager-android (guess) in java.
I'm ultra-new in java so take it easy :P
In python I use a module that I found on the net (whirlpool.py) and in java I use the jacksum library.

There are different versions of the Whirlpool spec which generate different output for the same input. It looks like whirlpool.py might be implementing the original Whirlpool (referred to as "Whirlpool-0"), whereas in panager-android you use Whirlpool-2:
AbstractChecksum encode = JacksumAPI.getChecksumInstance("whirlpool2");
Try changing that to "whirlpool0" and see if it matches your Python implementation now. Failing that, try "whirlpool1".
Wikipedia has known Whirlpool hashes from each version for a given test input which you may use to identify the version of a questioned Whirlpool implementation, or find out if it's just entirely wrong and broken.

Related

Why does Path.relativize behave differently on Java 8 and Java 11?

Why does the method relativize behave differently on java-8 and java-11?
Path path1 = Paths.get("/a/./b/../image.png");
Path path2 = Paths.get("/a/file.txt");
Path path = path1.relativize(path2);
System.out.println(path);
java-8 (1.8.0_66 to be exact) prints ../../../../file.txt. JavaDoc.
java-11 (11.0.4 to be exact) prints ../file.txt. JavaDoc.
The JavaDoc description of both versions is equal. I feel the java-11 way looks like a correct behavior to me:
path1: /a/./b/../image.png normalizes to /a/b/../image.png which normalizes to /a/image.png
path2: /a/file.txt
the way to navigate from /a/image.png and /a/file.txt is ../file.txt
Questions
How is the java-8 way supposed to be calculated? Doesn't it normalize the path? I don't understand how to get the result from head.
Why is there a difference between these two versions that is not documented at all?

Maybe this bug will answer your question:
https://bugs.openjdk.java.net/browse/JDK-8066943
This affected relativizing paths containing . or .. and was fixed for Java 9.
So that's why you see a difference between 8 and 11.

Windows based source-code answer here.
From the observation of the source codes (let's take a look at sun.nio.fs.WindowsPath, one of the implementations of Path) in java-11 is has additional code including normalization compared to java-8.
java-8 sun.nio.fs.WindowsPath source code at GitHub
java-11 sun.nio.fs.WindowsPath source code at GitHub
The key line of the latter implementation starts at the line 411, so basically, the latter implementation normalizes the paths before taking into calculation of the relative path:
WindowsPath base = this;
if (base.hasDotOrDotDot() || child.hasDotOrDotDot()) {
base = base.normalize();
child = child.normalize();
}
Digging further, the implementation changes between jdk8-b120 (source) and jdk-9+95 (source). Since the modular system was introduced, both the classes implementation and location differ:
Java 8 and below: /jdk/src/windows/classes/sun/nio/fs/WindowsPath.java
Java 9 and above: /jdk/src/java.base/windows/classes/sun/nio/fs/WindowsPath.java
How is the java-8 way supposed to be calculated? Doesn't it normalize the path? I don't understand how to get the result from head.
The most straightforward way to go is to normalize both paths first before relativizing them. But I have no idea whether it completely covers all the java.nio.file.Path implementations and is safe to do so.
Path path = path1.normalize().relativize(path2.normalize());

EBCDIC unpacking comp-3 data returns 40404** in Java

I have used the unpack data logic provided in below link for java
How to unpack COMP-3 digits using Java?
But for the null data in source it returns 404040404 like on Java unpack code. I understand this was space in ebcdic, but how to unpack by handling this space or to avoid it.

There are two problems that we have to deal with. First, is the data valid comp-3 data and second, is the data considered “valid” by older language implementations like COBOL since Comp-3 was mentioned.
If the offests are not misaligned it would appear that spaces are being interpreted by existing programs as 0 instead of spaces. This would be incorrect but could be an artifact of older programs that were engineered to tolerate this bad behaviour.
The approach I would take in a legacy shop (assuming no misalignment) is to consider “spaces” (which are sequences of 0x404040404040) as being zero. This would be a legacy check to compare the field with spaces and then assume that 0x00000000000f as the actual default. This is something an individual shop would have to determine and is not recognized as a general programming approach.
In terms of Java, one has to remember that bytes are “signed” so comparisons can be tricky based on how the code is written. The only “unsigned” data type I
recall in java is char which is really two bytes (unit 16) basically.
This is less of a programming problem than it is recognizing historical tolerance and remediation.

What is the definition of "lexically-ordered" base64 and why is RFCC-1940 apparently the canonical reference?

Today I was reading the documentation for Netty's Base64Dialect class.
It includes a dialect called ORDERED, of which it says, somewhat briefly:
Special "ordered" dialect of Base64 described in RFC1940.
To cut to the chase, I can't find any definition of what this is, and includes an erroneous reference which seems to replicated all over the internet.
Instead of RFC-1940, the document actually links to RFCC-1940, which apparently is a "reader comment", and a nonsensical one at that:
RFC 920: whkpiy clujzis brkyh dwojfmz jydwq hrnwcgklt fsltaiu
Comment by lsnxkrjo sxavymwpg
Submitted on 10/26/2006
Related RFC: RFC-920
Now RFC-920 appears to have nothing to do with base 64:
Domain requirements
This memo restates and refines the requirements on establishing a
Domain first described in RFC-881. It adds considerable detail
to that discussion, and introduces the limited set of top level
domains.
Is RFC-1940 relevant? Skimming, no I can't see any base 64 encoding definitions here:
Source Demand Routing: Packet Format and Forwarding Specification (Version 1).
The purpose of SDRP is to support source-initiated selection of
routes to complement the route selection provided by existing routing
protocols for both inter-domain and intra-domain routes. [...]
In fact, searching the web for "rfcc 1940 ordered base64" finds this same URL in lots of other documentation, but sadly no explanation of "lexically ordered base 64".
Is there a legitimate definition of this anywhere? And why hasn't anyone else noticed this URL refers to nonsense?

I have not found a "legitimate definition" of ordered Base64. (At time of writing this, it is not even mentioned in the Wikipedia page on Base64.)
If you treat the code as a specification(!), ordered Base64 is a variant in which the alphabet has been reordered into ascending ASCII order. This means that the natural ordering for ordered Base64 is the same as the natural ordering for the corresponding byte sequence.
Is it a problem that there isn't a specification for ordered Base64?
Probably not.
In reality the RFCs that "specify" the different variants of Base64 (and Base32 / Base16) are actually more of an attempt to describe the variants rather than specify them. And the same applies to the Wikipedia article.
From what I can tell (google searches), the ordered Base64 variant is rarely used.
The Base64 implementation that introduced the ordered variant is legacy code. (It hasn't been changed in the last 8 years). New Java code that requires Base64 encoding / decoding capability should be using the standard Java java.util.Base64 class introduced in Java 8.
But it is concerning that the javadocs you linked to (and others!) all refer to a nonsense page. That page probably had a legitimate description at some point, but it looks like it has been vandalized.

Load a Perl Hash into Java

I have a big .pm File, which only consist of a very big Perl hash with lots of subhashes. I have to load this hash into a Java program, do some work and changes on the data lying below and save it back into a .pm File, which should look similar to the one i started with.
By now, i tried to convert it linewise by regex and string matching, converting it into a XML Document and later Elementwise parse it back into a perl hash.
This somehow works, but seems quite dodgy. Is there any more reliable way to parse the perl hash without having a perl runtime installed?

You're quite right, it's utterly filthy. Regex and string for XML in the first place is a horrible idea, and honestly XML is probably not a good fit for this anyway.
I would suggest that you consider JSON. I would be stunned to find java can't handle JSON and it's inherently a hash-and-array oriented data structure.
So you can quite literally:
use JSON;
print to_json ( $data_structure, { pretty => 1 } );
Note - it won't work for serialising objects, but for perl hash/array/scalar type structures it'll work just fine.
You can then import it back into perl using:
my $new_data = from_json $string;
print Dumper $new_data;
Either Dumper it to a file, but given you requirement is multi-language going forward, just using native JSON as your 'at rest' data is probably a more sensible choice.
But if you're looking at parsing perl code within java, without a perl interpreter? No, that's just insanity.

Is there a Java equivalent of Python's printf hash replacement?

Specifically I am converting a python script into a java helper method. Here is a snippet (slightly modified for simplicity).
# hash of values
vals = {}
vals['a'] = 'a'
vals['b'] = 'b'
vals['1'] = 1
output = sys.stdout
file = open(filename).read()
print >>output, file % vals,
So in the file there are %(a), %(b), %(1) etc that I want substituted with the hash keys. I perused the API but couldn't find anything. Did I miss it or does something like this not exist in the Java API?

You can't do this directly without some additional templating library. I recommend StringTemplate. Very lightweight, easy to use, and very optimized and robust.

I doubt you'll find a pure Java solution that'll do exactly what you want out of the box.
With this in mind, the best answer depends on the complexity and variety of Python formatting strings that appear in your file:
If they're simple and not varied, the easiest way might be to code something up yourself.
If the opposite is true, one way to get the result you want with little work is by embedding Jython into your Java program. This will enable you to use Python's string formatting operator (%) directly. What's more, you'll be able to give it a Java Map as if it were a Python dictionary (vals in your code).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.