How to parse bit fields from a byte array in Java? - java

I've been given the arduous task of parsing some incoming UDP packets from a source into an appropriate Java representation. The kicker is the data held within the packets are not byte aligned. In an effort to make the protocol as tight as possible, there are a number of bit fields indicating the presence or absence of data fields.
For example, at bit index 34 you may find a 24 bit field that should be converted to a float. Bit index 110 may be a flag indicating that the next 3 fields are each 5 and 6 bit values containing the hour, minute, and second of the day. (These are just made up by me but are representative of what the spec says). These packets are probably a few hundred bits long.
The spec is not likely to change, but it's completely possible that I'll be asked to decode other similar packet formats.
I can certainly bit shift and mask as needed, but I'm worried about ending up in bit shift hell and then drowning in it when more packet formats are discovered.
I'd love to hear any suggestions on best practices or Java libraries that may make the task more manageable.

Decoding QR codes is much the same exercise in reading a couple bits at a time, completely unaligned. Here's what I wrote to handle it -- maybe you can just reuse this.
http://code.google.com/p/zxing/source/browse/trunk/core/src/com/google/zxing/common/BitSource.java

for such cases I have developed JBBP library which is accessible in Maven central
for instance parsing of file to bits and printing of parsed values looks like
public static final void main(final String ... args) throws Exception {
try (InputStream inStream = ClassLoader.getSystemClassLoader().getResourceAsStream("somefile.txt")) {
class Bits { #Bin(type = BinType.BIT_ARRAY) byte [] bits; }
for(final byte b : JBBPParser.prepare("bit [_] bits;",JBBPBitOrder.MSB0).parse(inStream).mapTo(Bits.class).bits)System.out.print(b != 0 ? "1" : "0");
}
}

Related

Decoding unix time stamp from extended RTP packet header to calculate latency

I am working on a project where I am trying to calculate the latency of the packets received between two android devices using RTP.
So I went on to extend the RTP header with a unix time stamp in it's 12th-19th bytes.
I've received the packets now and tried to extract the unix time from them. However, I am doing something wrong in the decoding process as you can see in the screenshot. On the left, is the time I decoded from the packet, and on the right is the time of arrival. Please ignore the picture in the corner of my hand. (And sorry for the large resolution, not sure how to resize the image on SO.
I've converted the bytes to hex in order to try to debug the ginormous numbers I was getting when converting my byte array to a long. And I haven't noticed many clues, except for the consistent "41" in my hex values and "14" in my long values.
I'm currently out of ideas as to how to fix this. How do I extract the correct Unix Time in millis from my packet?
I'm using someone else's code to generate the bytes I'm putting in the packets, he uses this code to put the SSRC in the header (which is also 64-bits).
private void setLong(byte[] buffer, long n, int begin, int end) {
for (end--; end >= begin; end--) {
buffer[end] = (byte) (n % 256);
n >>= 8;
}
}
And my code utilizing the above method:
public void setUnixTime() {
for (int i=0;i<mBufferCount;i++) {
setLong(mBuffers[i], System.currentTimeMillis(),13,20);
}
}
I'm also interested in people's thoughts on calculating lag over RTP in this way (setting unix time on packets and comparing that time to time of arrival).
I believe since you say your timestamp should be in bytes 12-19, your begin and end should also be 12 and 20, respectively. As it currently reads, your for loop will only execute 7 times, leaving the final byte blank. Bytes 19, 18, 17, 16, 15, 14, and 13 will be set, but byte 12 will never be set.
Additionally, you may consider using bitwise & instead of modulo to truncate your larger numbers (n & 255), as it is slightly faster.
Everything above was done right. The reason I was getting incorrect values was because the packetizer for the Libstreaming library was overwriting my extended header. I had to manually adjust the header size in the library code in order to extend it.

Can I add a binary file to a String based server message queue?

I have a multi-threaded client-server application that uses Vector<String> as a queue of messages to send.
I need, however, to send a file using this application. In C++ I would not really worry, but in Java I'm a little confused when converting anything to string.
Java has 2 byte characters. When you see Java string in HEX, it's usually like:
00XX 00XX 00XX 00XX
Unless some Unicode characters are present.
Java also uses Big endian.
These facts make me unsure, whether - and eventually how - to add the file into the queue. Preferred format of the file would be:
-- Headers --
2 bytes Size of the block (excluding header, which means first four bytes)
2 bytes Data type (text message/file)
-- End of headers --
2 bytes Internal file ID (to avoid referring by filenames)
2 bytes Length of filename
X bytes Filename
X bytes Data
You can see I'm already using 2 bytes for all numbers to avoid some horrible operations required when getting 2 numbers out of one char.
But I have really no idea how to add the file data correctly. For numbers, I assume this would do:
StringBuilder packetData = new StringBuilder();
packetData.append((char) packetSize);
packetData.append((char) PacketType.BINARY.ordinal()); //Just convert enum constant to number
But file is really a problem. If I have also described anything wrongly regarding the Java data types please correct me - I'm a beginner.
Does it have to send only Strings? I think if it does then you really need to encode it using base64 or similar. The best approach overall would probably be to send it as raw bytes. Depending on how difficult it would be to refactor your code to support byte arrays instead of just Strings, that may be worth doing.
To answer your String question I just saw pop up in the comments, there's a getBytes method on a String.
For the socket question, see:
Java sending and receiving file (byte[]) over sockets

Creating a bitmask with a large number of options

In my Android app I have a class containing only data (exposed with getters). This class needs to be serialized and sent across to other clients (done naively by iterating over all getters, and storing them in a ByteBuffer).
public class Data
{
public int getOption1() { }
public int getOption2 { }
// ...
public int getOptionN { }
}
Serialize:
public void serialize(Data data) {
// write getOption1();
// write getOption2();
// ...
}
Deserialize:
public void deserialize() {
// read Option1();
// read Option2();
// ...
}
I'd like to be able to define which fields actually get sent (instead of blindly sending all of them), and one potential solution for this would be to define another field which is a bitmask that defines which fields are actually sent.
The receiving side parses the bitmask, and can tell which of the fields should be deserialized from the received message.
The problem is - using an int (32-bit) for bitmask allows for only 32 unique options (by using the "standard" power of 2 enum values).
How can one define a bitmask that can support a larger number of items? is there any other encoding (other than storing each value as a power of 2) ?
The number of actual values may vary (depending on user input) and may be anything from ~ 50 up to 200.
I'd like to encode the different options in the most efficient encoding.
An int provides a bit for each of 32 options. You can use a long to get a bit for each of 64 options. For larger number of options, you can use an int or long array. Take the number of options, divide by 32 (for an int array) or 64 (for a long array) and round up.
A byte array will provide the least waste. Divide the number of options by 8 and round up. You can reserve the first byte to contain the length of the byte array (if you're passing other data as well). Since Byte.MAX_VALUE is 127 (but you can treat the value as the maximum valid index, not the byte count), this limits you to 128 * 8 - 1 = 1023 options (or 2047 options if you are willing to do a little extra work to deal with negative byte count values). The maximum waste will be less than one byte (plus an additional byte of overhead to store the count).
If each option can be independently there or not there, you cannot do much better. If options can be grouped such that all options in a group are always either all present or all absent, then some additional compression may be possible.

Unwanted chars written from java REST-API to HadoopDFS using FSDataOutputStream

We built a java REST-API to receive event data (like click on a buy button) and write that data to HDFS.
Essentially we open streams for every host that is sending data (in JSON) or use existing ones, enrich data with a timestamp, an event name and hostname and write it into (FS)DataOutputStream:
1 public synchronized void writeToFile(String filename, String hostname, String content) throws IOException {
2 FSDataOutputStream stream = registry.getStream(filename, hostname);
3 stream.writeBytes(content);
4 stream.hflush();
5 }
First, we used stream.writeChars(content) in line 3, resulting in files like:
.{.".m.e.s.s.a.g.e.".:.".h.e.l.l.o.".}
Looking into the implementation of DataOutputStream.writeChars(String s), you see an 8-bit shift to the right and adding a leading x00 for every char, for reasons i don't understand.
Then I tried stream.writeUTF(content) in line 3, files looked much better:
.W{"message":"hello"}
But still, a few bytes to many. Looking into the code, writeUTF(String s) sends the number of bytes in s first, and then the string itself. So .W represents the number of bytes in the event data, proven when varying the length of the event data showed different leading chars in the file.
So my last resort, stream.writeBytes(content). Here everything looked fine:
{"message":"hello"} until special characters came into play:
{"message":"hallöchen"} became {"message":"hall.chen"}. writeBytes cuts the leading 8 bits of the character before writing it. I think I need some UTF-8 functionality to write these chars correctly.
So, now I'm kind of lost. How can I solve that?
When I read this: Why does DataOutputStream.writeUTF() add additional 2 bytes at the beginning? i felt like the mentioned FSDataOutputStream methods will not work for this.
A quick (and maybe dirty) solution is this:
3 byte[] contentAsBytes = content.getBytes("UTF-8");
4 for (byte singleByte : contentAsBytes) {
5 stream.writeByte(singleByte);
6 }
A cleaner way would be not to use the FSDataOutputStream, but I couldn't find an alternative.
Any hint is still appreciated.
Have you tried wrapping the FSDataOutputStream in a java.io.PrintStream and using its print methods. It is a long shot but let me know if that works for you.

How to compress a String in Java?

I use GZIPOutputStream or ZIPOutputStream to compress a String (my string.length() is less than 20), but the compressed result is longer than the original string.
On some site, I found some friends said that this is because my original string is too short, GZIPOutputStream can be used to compress longer strings.
so, can somebody give me a help to compress a String?
My function is like:
String compress(String original) throws Exception {
}
Update:
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.util.zip.GZIPOutputStream;
import java.util.zip.*;
//ZipUtil
public class ZipUtil {
public static String compress(String str) {
if (str == null || str.length() == 0) {
return str;
}
ByteArrayOutputStream out = new ByteArrayOutputStream();
GZIPOutputStream gzip = new GZIPOutputStream(out);
gzip.write(str.getBytes());
gzip.close();
return out.toString("ISO-8859-1");
}
public static void main(String[] args) throws IOException {
String string = "admin";
System.out.println("after compress:");
System.out.println(ZipUtil.compress(string));
}
}
The result is :
Compression algorithms almost always have some form of space overhead, which means that they are only effective when compressing data which is sufficiently large that the overhead is smaller than the amount of saved space.
Compressing a string which is only 20 characters long is not too easy, and it is not always possible. If you have repetition, Huffman Coding or simple run-length encoding might be able to compress, but probably not by very much.
When you create a String, you can think of it as a list of char's, this means that for each character in your String, you need to support all the possible values of char. From the sun docs
char: The char data type is a single 16-bit Unicode character. It has a minimum value of '\u0000' (or 0) and a maximum value of '\uffff' (or 65,535 inclusive).
If you have a reduced set of characters you want to support you can write a simple compression algorithm, which is analogous to binary->decimal->hex radix converstion. You go from 65,536 (or however many characters your target system supports) to 26 (alphabetical) / 36 (alphanumeric) etc.
I've used this trick a few times, for example encoding timestamps as text (target 36 +, source 10) - just make sure you have plenty of unit tests!
If the passwords are more or less "random" you are out of luck, you will not be able to get a significant reduction in size.
But: Why do you need to compress the passwords? Maybe what you need is not a compression, but some sort of hash value? If you just need to check if a name matches a given password, you don't need do save the password, but can save the hash of a password. To check if a typed in password matches a given name, you can build the hash value the same way and compare it to the saved hash. As a hash (Object.hashCode()) is an int you will be able to store all 20 password-hashes in 80 bytes).
Your friend is correct. Both gzip and ZIP are based on DEFLATE. This is a general purpose algorithm, and is not intended for encoding small strings.
If you need this, a possible solution is a custom encoding and decoding HashMap<String, String>. This can allow you to do a simple one-to-one mapping:
HashMap<String, String> toCompressed, toUncompressed;
String compressed = toCompressed.get(uncompressed);
// ...
String uncompressed = toUncompressed.get(compressed);
Clearly, this requires setup, and is only practical for a small number of strings.
Huffman Coding might help, but only if you have a lot of frequent characters in your small String
The ZIP algorithm is a combination of LZW and Huffman Trees. You can use one of theses algorithms separately.
The compression is based on 2 factors :
the repetition of substrings in your original chain (LZW): if there are a lot of repetitions, the compression will be efficient. This algorithm has good performances for compressing a long plain text, since words are often repeated
the number of each character in the compressed chain (Huffman): more the repartition between characters is unbalanced, more the compression will be efficient
In your case, you should try the LZW algorithm only. Used basically, the chain can be compressed without adding meta-informations: it is probably better for short strings compression.
For the Huffman algorithm, the coding tree has to be sent with the compressed text. So, for a small text, the result can be larger than the original text, because of the tree.
Huffman encoding is a sensible option here. Gzip and friends do this, but the way they work is to build a Huffman tree for the input, send that, then send the data encoded with the tree. If the tree is large relative to the data, there may be no not saving in size.
However, it is possible to avoid sending a tree: instead, you arrange for the sender and receiver to already have one. It can't be built specifically for every string, but you can have a single global tree used to encode all strings. If you build it from the same language as the input strings (English or whatever), you should still get good compression, although not as good as with a custom tree for every input.
If you know that your strings are mostly ASCII you could convert them to UTF-8.
byte[] bytes = string.getBytes("UTF-8");
This may reduce the memory size by about 50%. However, you will get a byte array out and not a string. If you are writing it to a file though, that should not be a problem.
To convert back to a String:
private final Charset UTF8_CHARSET = Charset.forName("UTF-8");
...
String s = new String(bytes, UTF8_CHARSET);
You don't see any compression happening for your String, As you atleast require couple of hundred bytes to have real compression using GZIPOutputStream or ZIPOutputStream. Your String is too small.(I don't understand why you require compression for same)
Check Conclusion from this article:
The article also shows how to compress
and decompress data on the fly in
order to reduce network traffic and
improve the performance of your
client/server applications.
Compressing data on the fly, however,
improves the performance of
client/server applications only when
the objects being compressed are more
than a couple of hundred bytes. You
would not be able to observe
improvement in performance if the
objects being compressed and
transferred are simple String objects,
for example.
Take a look at the Huffman algorithm.
https://codereview.stackexchange.com/questions/44473/huffman-code-implementation
The idea is that each character is replaced with sequence of bits, depending on their frequency in the text (the more frequent, the smaller the sequence).
You can read your entire text and build a table of codes, for example:
Symbol Code
a 0
s 10
e 110
m 111
The algorithm builds a symbol tree based on the text input. The more variety of characters you have, the worst the compression will be.
But depending on your text, it could be effective.

Categories

Resources