GRIB file to text in Java

GRIB file to text in Java - java

I am trying to create a small library to read grib data from a flight simulator using Java.
Created a code to read a binary file and convert it into a string.
public class MainClass
{
final static String FILE_NAME = "PATH\\global_winds.grib";
public static void main(String... aArgs) throws IOException
{
MainClass binary = new MainClass();
byte[] bytes = binary.readFile(FILE_NAME);
String myString = asciiBytesToString(bytes);
System.out.println(myString);
}
byte[] readFile(String aFileName) throws IOException
{
Path path = Paths.get(aFileName);
return Files.readAllBytes(path);
}
public static String asciiBytesToString( byte[] bytes )
{
if ( (bytes == null) || (bytes.length == 0 ) )
{
return "";
}
char[] result = new char[bytes.length];
for ( int i = 0; i < bytes.length; i++ )
{
result[i] = (char)bytes[i];
}
return new String( result );
}
}
The file is read and returns gibberish. Is anyone familiar with the grib format? I thought it is using ASCII...
I do know about third-party libraries like NetCDF. The grib file I am trying to process includes limited data and I really wanted to make a small API for reading this file.
Thank you!

GRIB files are binary, not in ASCII format. Furthermore, the data is packed and you need to do some math to obtain the original values.
To answer your question, GRIB2 uses
a Reference Value R (in IEEE 32-bit floating-point format)
a Binary Scale Factor E (signed integer, see comments below)
a Decimal Scale Factor D (signed integer, see comments below)
to pack the original data. The Reference Value R is used to eliminate negative numbers, the Scale Factors are used to achieve the required precision. According to the specification, the original value v is recovered from the packed value p using the formula
v = R + p * 2^E / 10^D
Note that GRIB2 uses a special representation for signed integers, which is not compatible with the two's complement. Instead, GRIB2 uses the most significant bit to represent the sign and the remaining bits to represent the magnitude of the signed integer. If the most significant bit is set, the number is negative.
If you want to look more into detail you can find a list of the relevant literature and specifications at http://www.theweatherserver.com/#grib2.
If you don't want to look into details, you can use the GRIB2Tools Java library from https://github.com/philippphb/GRIB2Tools, it does all the math and allows accessing the data using longitude and latitude.

Related

Why am I getting a number format exception when the data type accepts the values im parsing?

I do not want the answer, I would just like guidance and for someone to point to why my code is not performing as expected
My task is to flip an integer into binary, reformat the binary to a 32 bit number and then return the unsigned integer. So far my code successfully makes the conversions and flips the bits however I am getting a NumberFormatException when I attempt to parse the string value into a long that ill convert to an unsigned integer.
What is the issue with my code? What have I got misconstrued here? I know there are loads of solutions to this problem online but I prefer working things out my own way?
Could I please get some guidance? Thank you
public class flippingBits {
public static void main(String[] args) {
//change the number to bits
long bits = Long.parseLong(Long.toBinaryString(9));
String tempBits = String.valueOf(bits);
//reformat so that you get 32 bits
tempBits = String.format("%" + (32) + "s", tempBits).replace(" ", "0");
//flip the bits
tempBits = tempBits.replace("1", "5");
tempBits = tempBits.replace("0", "1");
tempBits = tempBits.replace("5", "0");
//Pass this to a long data type so that you can then eventually convert the new bits
// to an unsigned integer
long backToNum = Long.parseLong(tempBits);
}
}

You're directly parsing the bits into a long value instead of converting the bits into an equivalent value.
You need to use the following method (Long.parseUnsignedLong()):
long backToNum = Long.parseUnsignedLong(tempBits, 2); //output: 4294967286
The second argument represents radix:
To interpret a number written in a particular representation, it is necessary to know the radix or base of that representation. This allows the number to be converted into a real value.
See the representation of each radix (From Wikipedia):

Parse signed Byte from binary | Java

I have the following problem in Java. I am using an encryption algorithm that can produce negative bytes as outcome, and for my purposes I must be able to deal with them in binary. For negative bytes, the first or most significant bit in the of the 8 is 1. When I am trying to convert binary strings back to bytes later on, I am getting a NumberFormatException because my byte is too long. Can I tell Java to treat it like an unsigned byte and end up with negative bytes? My code so far is this:
private static String intToBinByte(int in) {
StringBuilder sb = new StringBuilder();
sb.append("00000000");
sb.append(Integer.toBinaryString(in));
return sb.substring(sb.length() - 8);
}
intToBinByte(-92); // --> 10100100
Byte.parseByte("10100100", 2) // --> NumberFormatException
Value out of range. Value:"10100100" Radix:2
Is there a better way to parse signed Bytes from binary in Java?
Thanks in advance!

You can just parse it with a larger type, then cast it to byte. Casting simply truncates the number of bits:
byte b = (byte) Integer.parseInt("10100100", 2);

I have written the following function to solve the problem:
private static byte binStringToByte(String in) {
byte ret = Byte.parseByte(in.substring(1), 2);
ret -= (in.charAt(0) - '0') * 128;
return ret;
}

How to decode comp 3 packed fields in an ascii file using java?

I have a huge mainframe file extracted from a legacy system. The file is encoded in ascii format. I want to convert that to comp3. Is there any algorithm available in java to do that? Also I need assistance on how to unpack the comp3 fields. I tried a java code to unpack comp3 but I found improper result
Please refer the code to unpack comp3 fields
import java.math.BigInteger;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
/**
* Converts between integer and an array of bytes in IBM mainframe packed
* decimal format. The number of bytes required to store an integer is (digits +
* 1) / 2. For example, a 7 digit number can be stored in 4 bytes. Each pair of
* digits is packed into the two nibbles of one byte. The last nibble contains
* the sign, 0F for positive and 0C for negative. For example 7654321 becomes
* 0x76 0x54 0x32 0x1F.
**/
public class PackedDecimalToComp {
public static void main(String[] args) {
try {
// test.unpackData(" 0x12345s");
Path path = Paths.get("C:\\Users\\AV00499269\\Desktop\\Comp3 data file\\Comp3Test.txt");
byte[] data = Files.readAllBytes(path);
PackedDecimalToComp test = new PackedDecimalToComp();
test.unpackData(data);
} catch (Exception ex) {
System.out.println("Exception is :" + ex.getMessage());
}
}
private static String unpackData(byte[] packedData) {
String unpackedData = "";
final int negativeSign = 13;
for (int currentCharIndex = 0; currentCharIndex < packedData.length; currentCharIndex++) {
byte firstDigit = (byte) ((packedData[currentCharIndex] >>> 4) & 0x0F);
byte secondDigit = (byte) (packedData[currentCharIndex] & 0x0F);
unpackedData += String.valueOf(firstDigit);
if (currentCharIndex == (packedData.length - 1)) {
if (secondDigit == negativeSign) {
unpackedData = "-" + unpackedData;
}
} else {
unpackedData += String.valueOf(secondDigit);
}
}
System.out.println("Unpackeddata is :" + unpackedData);
return unpackedData;
}
}

The comment in your code is incorrect. Packed data with a positive sign has x'A', x'C', x'E', or x'F' in the last nibble. Mainframes also have a concept of "preferred sign" which is x'C' in the last nibble for positive and x'D' in the last nibble for negative.
It is common for mainframe data to include both text and binary data in a single record, for example a name, a currency amount, and a quantity:
Hopper Grace ar% .
...which would be...
x'C8969797859940404040C799818385404040404081996C004B'
...in hex. This is code page 37, commonly referred to as EBCDIC.
Without knowing that the family name is confined to the first 10 bytes, the given name confined the the subsequent 10 bytes, the currency amount is in packed decimal (also known as binary coded decimal) in the next 3 bytes, and the quantity in the next two bytes, you cannot accurately transfer the data because code page conversion will destroy the currency amount. Converting to code page 1250, commonly in use on Microsoft Windows, you would end up with...
x'486F707065722020202047726163652020202020617225002E'
...where the text data is translated but the packed data is destroyed. The packed data no longer has a valid sign in the last nibble (the lower half of the last byte), the currency amount itself has been changed as has the quantity (from decimal 75 to decimal 11,776 due to both code page conversion and mangling of a big endian number as a little endian number).
This question has some answers that may help you. My recommendation is to have all the data converted to text on the mainframe prior to transferring it to another platform. There are mainframe utilities that excel at this.

Represent long in least amount of characters

I need to represent both very large and small numbers in the shortest string possible. The numbers are unsigned. I have tried just straight Base64 encode, but for some smaller numbers, the encoded string is longer than just storing the number as a string. What would be the best way to most optimally store a very large or short number in the shortest string possible with it being URL safe?

I have tried just straight Base64 encode, but for some smaller numbers, the encoded string is longer than just storing the number as a string
Base64 encoding of binary byte data will make it longer, by about a third. It is not supposed to make it shorter, but to allow safe transport of binary data in formats that are not binary safe.
However, base 64 is more compact than decimal representation of a number (or of byte data), even if it is less compact than base 256 (the raw byte data). Encoding your numbers in base 64 directly will make them more compact than decimal. This will do it:
private static final String base64Chars =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_";
static String encodeNumber(long x) {
char[] buf = new char[11];
int p = buf.length;
do {
buf[--p] = base64Chars.charAt((int)(x % 64));
x /= 64;
} while (x != 0);
return new String(buf, p, buf.length - p);
}
static long decodeNumber(String s) {
long x = 0;
for (char c : s.toCharArray()) {
int charValue = base64Chars.indexOf(c);
if (charValue == -1) throw new NumberFormatException(s);
x *= 64;
x += charValue;
}
return x;
}
Using this encoding scheme, Long.MAX_VALUE will be the string H__________, which is 11 characters long, compared to its decimal representation 9223372036854775807 which is 19 characters long. Numbers up to about 16 million will fit in a mere 4 characters. That's about as short as you'll get it. (Technically there are two other characters which do not need to be encoded in URLs: . and ~. You can incorporate those to get base 66, which would be a smidgin shorter for some numbers, although that seems a bit pedantic.)

To extend on Stephen C's answer, here is a piece of code to convert to base 62 (but you can increase this by adding more characters to the digits String (just pick what characters are valid for you):
public static String toString(long n) {
String digits = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";
int base = digits.length();
String s = "";
while (n > 0) {
long d = n % base;
s = digits.charAt(d) + s;
n = n / base;
}
return s;
}
This will never result in the string representation being longer than the digit one.

Assuming that you don't do any compression, and that you restrict yourself to URL safe characters, then the following procedure will give you the most compact encoding possible.
Make a list of all URL safe characters
Count them. Suppose you have N.
Represent your number in base N, representing 0 by the first character, 1 by the 2nd and so on.
So, what about compression ...
If you assume that the numbers you are representing are uniformly distributed across their range, then there is no real opportunity for compression.
Otherwise, there is potential for compression. If you can reduce the size of the common numbers then you can typically achieve a saving by compression. This is how Huffman encoding works.
But the downside is that compression at this level is not perfect across the range of numbers. It reduces the size of some numbers, but it inevitably increases the size of others.
So what does this mean for your use-case?
I think it means that you are looking at the problem the wrong way. You should not be aiming for a minimal encoded size for every number. You should be aiming to minimize the size on average ... averaged over the actual distribution of your numbers.

Same consistent-hashing algorithm implementation for Java and Python program

We have an app that the Python module will write data to redis shards and the Java module will read data from redis shards, so I need to implement the exact same consistent hashing algorithm for Java and Python to make sure the data can be found.
I googled around and tried several implementations, but found the Java and Python implementations are always different, can't be used togather. Need your help.
Edit, online implementations I have tried:
Java: http://weblogs.java.net/blog/tomwhite/archive/2007/11/consistent_hash.html
Python: http://techspot.zzzeek.org/2012/07/07/the-absolutely-simplest-consistent-hashing-example/
http://amix.dk/blog/post/19367
Edit, attached Java (Google Guava lib used) and Python code I wrote. Code are based on the above articles.
import java.util.Collection;
import java.util.SortedMap;
import java.util.TreeMap;
import com.google.common.hash.HashFunction;
public class ConsistentHash<T> {
private final HashFunction hashFunction;
private final int numberOfReplicas;
private final SortedMap<Long, T> circle = new TreeMap<Long, T>();
public ConsistentHash(HashFunction hashFunction, int numberOfReplicas,
Collection<T> nodes) {
this.hashFunction = hashFunction;
this.numberOfReplicas = numberOfReplicas;
for (T node : nodes) {
add(node);
}
}
public void add(T node) {
for (int i = 0; i < numberOfReplicas; i++) {
circle.put(hashFunction.hashString(node.toString() + i).asLong(),
node);
}
}
public void remove(T node) {
for (int i = 0; i < numberOfReplicas; i++) {
circle.remove(hashFunction.hashString(node.toString() + i).asLong());
}
}
public T get(Object key) {
if (circle.isEmpty()) {
return null;
}
long hash = hashFunction.hashString(key.toString()).asLong();
if (!circle.containsKey(hash)) {
SortedMap<Long, T> tailMap = circle.tailMap(hash);
hash = tailMap.isEmpty() ? circle.firstKey() : tailMap.firstKey();
}
return circle.get(hash);
}
}
Test code:
ArrayList<String> al = new ArrayList<String>();
al.add("redis1");
al.add("redis2");
al.add("redis3");
al.add("redis4");
String[] userIds =
{"-84942321036308",
"-76029520310209",
"-68343931116147",
"-54921760962352"
};
HashFunction hf = Hashing.md5();
ConsistentHash<String> consistentHash = new ConsistentHash<String>(hf, 100, al);
for (String userId : userIds) {
System.out.println(consistentHash.get(userId));
}
Python code:
import bisect
import md5
class ConsistentHashRing(object):
"""Implement a consistent hashing ring."""
def __init__(self, replicas=100):
"""Create a new ConsistentHashRing.
:param replicas: number of replicas.
"""
self.replicas = replicas
self._keys = []
self._nodes = {}
def _hash(self, key):
"""Given a string key, return a hash value."""
return long(md5.md5(key).hexdigest(), 16)
def _repl_iterator(self, nodename):
"""Given a node name, return an iterable of replica hashes."""
return (self._hash("%s%s" % (nodename, i))
for i in xrange(self.replicas))
def __setitem__(self, nodename, node):
"""Add a node, given its name.
The given nodename is hashed
among the number of replicas.
"""
for hash_ in self._repl_iterator(nodename):
if hash_ in self._nodes:
raise ValueError("Node name %r is "
"already present" % nodename)
self._nodes[hash_] = node
bisect.insort(self._keys, hash_)
def __delitem__(self, nodename):
"""Remove a node, given its name."""
for hash_ in self._repl_iterator(nodename):
# will raise KeyError for nonexistent node name
del self._nodes[hash_]
index = bisect.bisect_left(self._keys, hash_)
del self._keys[index]
def __getitem__(self, key):
"""Return a node, given a key.
The node replica with a hash value nearest
but not less than that of the given
name is returned. If the hash of the
given name is greater than the greatest
hash, returns the lowest hashed node.
"""
hash_ = self._hash(key)
start = bisect.bisect(self._keys, hash_)
if start == len(self._keys):
start = 0
return self._nodes[self._keys[start]]
Test code:
import ConsistentHashRing
if __name__ == '__main__':
server_infos = ["redis1", "redis2", "redis3", "redis4"];
hash_ring = ConsistentHashRing()
test_keys = ["-84942321036308",
"-76029520310209",
"-68343931116147",
"-54921760962352",
"-53401599829545"
];
for server in server_infos:
hash_ring[server] = server
for key in test_keys:
print str(hash_ring[key])

You seem to be running into two issues simultaneously: encoding issues and representation issues.
Encoding issues come about particularly since you appear to be using Python 2 - Python 2's str type is not at all like Java's String type, and is actually more like a Java array of byte. But Java's String.getBytes() isn't guaranteed to give you a byte array with the same contents as a Python str (they probably use compatible encodings, but aren't guaranteed to - even if this fix doesn't change things, it's a good idea in general to avoid problems in the future).
So, the way around this is to use a Python type that behaves like Java's String, and convert the corresponding objects from both languages to bytes specifying the same encoding. From the Python side, this means you want to use the unicode type, which is the default string literal type if you are using Python 3, or put this near the top of your .py file:
from __future__ import unicode_literals
If neither of those is an option, specify your string literals this way:
u'text'
The u at the front forces it to unicode. This can then be converted to bytes using its encode method, which takes (unsurprisingly) an encoding:
u'text'.encode('utf-8')
From the Java side, there is an overloaded version of String.getBytes that takes an encoding - but it takes it as a java.nio.Charset rather than a string - so, you'll want to do:
"text".getBytes(java.nio.charset.Charset.forName("UTF-8"))
These will give you equivalent sequences of bytes in both languages, so that the hashes have the same input and will give you the same answer.
The other issue you may have is representation, depending on which hash function you use. Python's hashlib (which is the preferred implementation of md5 and other cryptographic hashes since Python 2.5) is exactly compatible with Java's MessageDigest in this - they both give bytes, so their output should be equivalent.
Python's zlib.crc32 and Java's java.util.zip.CRC32, on the other hand, both give numeric results - but Java's is always an unsigned 64 bit number, while Python's (in Python 2) is a signed 32 bit number (in Python 3, its now an unsigned 32-bit number, so this problem goes away). To convert a signed result to an unsigned one, do: result & 0xffffffff, and the result should be comparable to the Java one.

According to this analysis of hash functions:
Murmur2, Meiyan, SBox, and CRC32 provide good performance for all kinds of keys. They can be recommended as general-purpose hashing functions on x86.
Hardware-accelerated CRC (labeled iSCSI CRC in the table) is the fastest hash function on the recent Core i5/i7 processors. However, the CRC32 instruction is not supported by AMD and earlier Intel processors.
Python has zlib.crc32 and Java has a CRC32 class. Since it's a standard algorithm, you should get the same result in both languages.
MurmurHash 3 is available in Google Guava (a very useful Java library) and in pyfasthash for Python.
Note that these aren't cryptographic hash functions, so they're fast but don't provide the same guarantees. If these hashes are important for security, use a cryptographic hash.

Differnt language implementations of a hashing algorithm does not make the hash value different. The SHA-1 hash whether generated in java or python will be the same.

I'm not familiar with Redis, but the Python example appears to be hashing keys, so I'm assuming we're talking about some sort of HashMap implementation.
Your python example appears to be using MD5 hashes, which will be the same in both Java and Python.
Here is an example of MD5 hashing in Java:
http://www.dzone.com/snippets/get-md5-hash-few-lines-java
And in Python:
http://docs.python.org/library/md5.html
Now, you may want to find a faster hashing algorithm. MD5 is focused on cryptographic security, which isn't really needed in this case.

Here is a simple hashing function that produces the same result on both python and java for your keys:
Python
def hash(key):
h = 0
for c in key:
h = ((h*37) + ord(c)) & 0xFFFFFFFF
return h;
Java
public static int hash(String key) {
int h = 0;
for (char c : key.toCharArray())
h = (h * 37 + c) & 0xFFFFFFFF;
return h;
}
You don't need a cryptographically secure hash for this. That's just overkill.

Let's get this straight: the same binary input to the same hash function (SHA-1, MD5, ...) in different environments/implementations (Python, Java, ...) will yield the same binary output. That's because these hash functions are implemented according to standards.
Hence, you will discover the sources of the problem(s) you experience when answering these questions:
do you provide the same binary input to both hash functions (e.g. MD5 in Python and Java)?
do you interpret the binary output of both hash functions (e.g. MD5 in Python and Java) equivalently?
#lvc's answer provides much more detail on these questions.

For the java version, I would recommend using MD5 which generates 128bit string result and it can then be converted into BigInteger (Integer and Long are not enough to hold 128bit data).
Sample code here:
private static class HashFunc {
static MessageDigest md5;
static {
try {
md5 = MessageDigest.getInstance("MD5");
} catch (NoSuchAlgorithmException e) {
//
}
}
public synchronized int hash(String s) {
md5.update(StandardCharsets.UTF_8.encode(s));
return new BigInteger(1, md5.digest()).intValue();
}
}
Note that:
The java.math.BigInteger.intValue() converts this BigInteger to an int. This conversion is analogous to a narrowing primitive conversion from long to int. If this BigInteger is too big to fit in an int, only the low-order 32 bits are returned. This conversion can lose information about the overall magnitude of the BigInteger value as well as return a result with the opposite sign.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.