How to encode chars in 2-bits? in java - java

DNA molecules are denoted by one of four values: A, C, G, or T. I need to convert a string of characters from A, C, G, and T to an array of bytes, encoding each of the characters
with two bits.A with bits 00, C with bits 01, G with 10, and T with 11. I don't understand how to convert characters to 2 bits. I was trying to shift and mask, but got wrong result.
At the very beginning, I check if there are characters in the line. Then i convert each character into a bit value and insert it into an array. When i insert ACGT, in the output i got 0 1 3 2. And here I have a problem, because I don’t understand how to convert the value to 2 bits.
Scanner text = new Scanner(System.in);
String str = text.nextLine();
if (str.contains("A") && str.contains("C") && str.contains("G") && str.contains("T")){
System.out.println("");
}
else
{
System.out.println("wrong command format");
}
byte mas[] = str.getBytes();
System.out.println("String in byte array : " + Arrays.toString(mas));
for (int i = 0; i < mas.length; i++){
byte mask = 3;
byte number = mas[i];
byte result = (byte)((number >> 1) & mask);
System.out.println(result);
}
}
}

It seems that you want to save the bits in a byte. The following example might give some ideas.
public class Main
{
private static final int A = 0x00; // b00
private static final int C = 0x01; // b01
private static final int G = 0x02; // b10
private static final int T = 0x03; // b11
public static void main(String[] args) throws Exception
{
byte store = 0;
store = setByte(store, 0, A);
store = setByte(store, 1, C);
store = setByte(store, 2, G);
store = setByte(store, 3, T);
System.out.println(Integer.toBinaryString(store));
//11111111111111111111111111100100
System.out.println(getByte(store, 0)); //0
System.out.println(getByte(store, 1)); //1
System.out.println(getByte(store, 2)); //2
System.out.println(getByte(store, 3)); //3
}
//Behavior :: Store "value" into "store".
//Reminder :: Valid index 0 - 3. Valid value 0 - 3.
private static byte setByte(byte store, int index, int value)
{
store = (byte)(store & ~(0x3 << (2 * index)));
return store |= (value & 0x3) << (2 * index);
}
private static byte getByte(byte store, int index)
{
return (byte)((store >> (2 * index)) & 0x3);
}
}

I haven't tested this, but it may help you.
byte test = 69;
byte insert = 0b01;
byte index = 2;
final byte ones = 0b00000011;
//Clear out the data at specified index
test = (byte) (test & ~(ones << index));
//Insert data
test |= (byte) (insert << index);
It works as follows:
Clear the 2 bits at the index in the byte (using bitwise AND).
Insert the 2 data bits at the index in the byte using bitwise OR).

You can "convert" the chars ACGT to 0, 1, 2, 3 using bit arithmetic.
byte[] bytes = str.getBytes();
for (int i = 0; i < bytes.length; i++) {
bytes[i] = (byte)(bytes[i] >> 1 & 3 ^ bytes[i] >> 2 & 1);
}
I suspect your initial check should be:
if (!str.matches("[ACGT]+") {
System.out.println("wrong command format");
return;
}

Related

Bit shuffling to change encoding from little endian

So in the program I need to read an number from the user which needs to be changed from little endian encoding to whatever encoding the user wants to change it to. The encoding entered by the user is just a 4 digits number which just means which byte should be where after the encoding. e.g. 4321 means put the 4th byte first followed by the 3rd and so on. the encoding can take other form such as 3214 etc.
This is my code, would really appreciate if someone point out where I am missing out.
import java.util.Scanner;
class encoding {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int n = sc.nextInt();
String byteOrder = sc.next();
long[] bitMask = { // little endian
Long.parseLong("11111111000000000000000000000000", 2),
Long.parseLong("00000000111111110000000000000000", 2),
Long.parseLong("00000000000000001111111100000000", 2),
Long.parseLong("00000000000000000000000011111111", 2)
};
int[] bytes = {
(int)(bitMask[0] & n),
(int)(bitMask[1] & n),
(int)(bitMask[2] & n),
(int)(bitMask[3] & n)
};
int result = 0;
shuffleBytes(bytes, byteOrder);
for (int i = 0; i < 4; i++) {
bytes[i] = bytes[i] << (i * 8);
result |= bytes[i];
}
System.out.println(result);
}
static void shuffleBytes(int[] bytes, String encoding) {
for (int i = 0; i < 4; i++) {
int index = Integer.parseInt(encoding.substring(i, i+1))-1;
int copy = bytes[i];
bytes[i] = bytes[index];
bytes[index] = copy;
}
}
}
Fixing your current solution
There are two problems:
1. Forgot to right-align bytes
In ...
int[] bytes = {
(int)(bitMask[0] & n),
(int)(bitMask[1] & n),
(int)(bitMask[2] & n),
(int)(bitMask[3] & n)
};
... you forgot to shift each "byte" to the right. As a result, you end up with a list of "bytes" of the form 0x……000000, 0x00……0000, 0x0000……00, 0x000000……. This is not a problem yet, but after shuffleBytes you shift each of these entries again using bytes[i] = bytes[i] << (i * 8);. As a result, the relevant parts (__) end up at a completely different spot or are shifted completely out of the integer.
To fix this, shift each (int)(bitMask[…] & n) to the right:
int[] bytes = {
(int)(bitMask[0] & n) >> (3*8),
(int)(bitMask[1] & n) >> (2*8),
(int)(bitMask[2] & n) >> (1*8),
(int)(bitMask[3] & n) >> (0*8)
};
2. Swapping more than once
In ...
static void shuffleBytes(int[] bytes, String encoding) {
for (int i = 0; i < 4; i++) {
int index = Integer.parseInt(encoding.substring(i, i+1))-1;
int copy = bytes[i];
bytes[i] = bytes[index];
bytes[index] = copy;
}
}
... you swap some bytes multiple times because you operate in-place. To understand what happens consider the following minimal example where we want to swap two bytes using order = "21". We inspect the variables before/after each iteration of the for loop.
The original input is bytes = {x, y} and order = "21"
We moved bytes[0] to bytes[1]. Now we have bytes = {y, x}.
But we are not finished yet. The loop continues and moves bytes[1] to bytes[0]. You assumed that bytes[1] would still be y at this point. However, because of the previous iteration this entry now holds x instead. Therefore, the result is bytes = {x, y}.
Here nothing changed, but for more entries you might also end up with something that is neither the original order nor the expected output order.
The easiest way to fix this is to write the result into a new array:
static int[] shuffleBytes(int[] bytes, String encoding) {
int[] result = new int[bytes.length];
for (int i = 0; i < 4; i++) {
int index = Integer.parseInt(encoding.substring(i, i+1))-1;
result[index] = bytes[i];
}
return result; // also adapt main() to use this return value
}
Alternative Solution
Even though you could fix your solution as described above I'm not too happy with it. Therefore, I propose this alternative solution which is cleaner, shorter, and more efficient.
import java.util.Scanner;
public class Encoding {
public static void main(String[] args) {
Scanner sc = new Scanner(System.in);
int input = sc.nextInt();
System.out.format("input = 0x%08x = %1$d%n", input);
String newOrder = sc.next();
int output = reorder(input, newOrder);
System.out.format("output = 0x%08x = %1$d%n", output);
}
/** #param newOrder permutation of "1234" */
static int reorder(int input, String newOrder) {
int output = 0;
for (char byte1Based : newOrder.toCharArray()) {
output <<= 8;
int shift = (byte1Based - '1') * 8;
output |= ((0xFF << shift) & input) >> shift;
}
return output;
}
}

Convert a array having bits into one byte

I m trying to convert 8 bits into one byte. The way the bits are represented are by using a byte object that only contains a 1 or a 0.
If i have a 8 length byte array with these bits, how can i convert them into one byte.
public byte bitsToByte(byte[] bits) {
//Something in here. Each byte inside bits is either a 1 or a 0.
}
Can anyone help?
Thanks
public static byte bitsToByte(byte[] bits){
byte b = 0;
for (int i = 0; i < 8; i++)
b |= (bits[i]&1) << i;
return b;
}
//as an added bonus, the reverse.
public static byte[] bitsToByte(byte bits){
byte[] b = new byte[8];
for (int i = 0; i < 8; i++)
b[i] = (byte) ((bits&(1 << i)) >>> i);
return b;
}
left shift by 1 first for each bit in the array and then add the bit to the byte.
The implementation is based on the assumption that first bit in the array is sign bit and following bits are the magnitude in higher to lower positions of the byte.
public byte bitsToByte(byte[] bits) {
byte value = 0;
for (byte b : bits) {
value <<=1;
value += b;
}
return value;
}
Test the method:
public static void main(String[] args) {
BitsToByte bitsToByte = new BitsToByte();
byte bits[] = new byte[]{0,0,1,0,1,1,0,1}; // 1 + 0 + 4 + 8 + 0 + 32 + 0 + 0
byte value = bitsToByte.bitsToByte(bits);
System.out.println(value);
}
output:
45
Covert the byte array into a byte value (in the same order):
public static byte bitsToByte1(byte[] bits){
byte result = 0;
for (byte i = 0; i < bits.length; i++) {
byte tmp = bits[i];
tmp <<= i; // Perform the left shift by "i" times. "i" position of the bit
result |= tmp; // perform the bit-wise OR
}
return result;
}
input: (same array in reverse)
byte bits1[] = new byte[]{1,0,1,1,0,1,0,0};
value = bitsToByte1(bits1);
System.out.println(value);
output:
45

Java Byte Operation - Converting 3 Byte To Integer Data

I have some byte-int operations. But I cant figure out the problem.
First of all I have a hex data and I am holding it as an integer
public static final int hexData = 0xDFC10A;
And I am converting it to byte array with this function:
public static byte[] hexToByteArray(int hexNum)
{
ArrayList<Byte> byteBuffer = new ArrayList<>();
while (true)
{
byteBuffer.add(0, (byte) (hexNum % 256));
hexNum = hexNum / 256;
if (hexNum == 0) break;
}
byte[] data = new byte[byteBuffer.size()];
for (int i=0;i<byteBuffer.size();i++){
data[i] = byteBuffer.get(i).byteValue();
}
return data;
}
And I want to convert 3 byte array to integer back again how can I do that?
Or you can also suggest other converting functions like hex-to-3-bytes-array and 3-bytes-to-int thank you again.
UPDATE
In c# someone use below function but not working in java
public static int byte3ToInt(byte[] byte3){
int res = 0;
for (int i = 0; i < 3; i++)
{
res += res * 0xFF + byte3[i];
if (byte3[i] < 0x7F)
{
break;
}
}
return res;
}
This will give you the value:
(byte3[0] & 0xff) << 16 | (byte3[1] & 0xff) << 8 | (byte3[2] & 0xff)
This assumes, the byte array is 3 bytes long. If you need to convert also shorter arrays you can use a loop.
The conversion in the other direction (int to bytes) can be written with logical operations like this:
byte3[0] = (byte)(hexData >> 16);
byte3[1] = (byte)(hexData >> 8);
byte3[2] = (byte)(hexData);
You could use Java NIO's ByteBuffer:
byte[] bytes = ByteBuffer.allocate(4).putInt(hexNum).array();
And the other way round is possible too. Have a look at this.
As an example:
final byte[] array = new byte[] { 0x00, (byte) 0xdf, (byte) 0xc1, 0x0a };//you need 4 bytes to get an integer (padding with a 0 byte)
final int x = ByteBuffer.wrap(array).getInt();
// x contains the int 0x00dfc10a
If you want to do it similar to the C# code:
public static int byte3ToInt(final byte[] byte3) {
int res = 0;
for (int i = 0; i < 3; i++)
{
res *= 256;
if (byte3[i] < 0)
{
res += 256 + byte3[i]; //signed to unsigned conversion
} else
{
res += byte3[i];
}
}
return res;
}
to convert Integer to hex: integer.toHexString()
to convert hexString to Integer: Integer.parseInt("FF", 16);

Java: convert byte[] to base36 String

I'm a bit lost. For a project, I need to convert the output of a hash-function (SHA256) - which is a byte array - to a String using base 36.
So In the end, I want to convert the (Hex-String representation of the) Hash, which is
43A718774C572BD8A25ADBEB1BFCD5C0256AE11CECF9F9C3F925D0E52BEAF89
to base36, so the example String from above would be:
3SKVHQTXPXTEINB0AT1P0G45M4KI8U0HR8PGB96DVXSTDJKI1
For the actual conversion to base36, I found some piece of code here on StackOverflow:
public static String toBase36(byte[] bytes) {
//can provide a (byte[], offset, length) method too
StringBuffer sb = new StringBuffer();
int bitsUsed = 0; //will point how many bits from the int are to be encoded
int temp = 0;
int tempBits = 0;
long swap;
int position = 0;
while((position < bytes.length) || (bitsUsed != 0)) {
swap = 0;
if(tempBits > 0) {
//there are bits left over from previous iteration
swap = temp;
bitsUsed = tempBits;
tempBits = 0;
}
//fill some bytes
while((position < bytes.length) && (bitsUsed < 36)) {
swap <<= 8;
swap |= bytes[position++];
bitsUsed += 8;
}
if(bitsUsed > 36) {
tempBits = bitsUsed - 36; //this is always 4
temp = (int)(swap & ((1 << tempBits) - 1)); //get low bits
swap >>= tempBits; //remove low bits
bitsUsed = 36;
}
sb.append(Long.toString(swap, 36));
bitsUsed = 0;
}
return sb.toString();
}
Now I'm doing this:
// this creates my hash, being a 256-bit byte array
byte[] hash = PBKDF2.deriveKey(key.getBytes(), salt.getBytes(), 2, 256);
System.out.println(hash.length); // outputs "256"
System.out.println(toBase36(hash)); // outputs total crap
the "total crap" is something like
-7-14-8-1q-5se81u0e-3-2v-24obre-73664-7-5-5cor1o9s-6h-4k6hr-5-4-rt2z0-30-8-2u-8-onz-4a2j-6-8-18-8trzza3-3-2x-6-4153to-4e3l01me-6-azz-2-k-4ckq-nav-gu-irqpxx-el-1j-6-rmf8hs-1bb5ax-3z25u-2-2r-t5-22-6-6w1v-1p
so it's not even close to what I want. I tried to find a solution now, but it seems I'm a bit lost here. How do I get the base36-encoded String representation of the Hash that I need?
Try using BigInteger:
String hash = "43A718774C572BD8A25ADBEB1BFCD5C0256AE11CECF9F9C3F925D0E52BEAF89";
//use a radix of 16, default would be 10
String base36 = new BigInteger( hash, 16 ).toString( 36 ).toUpperCase();
This might work:
BigInteger big = new BigInteger(your_byte_array_to_hex_string, 16);
big.toString(36);

A question in java.lang.Integer internal code

While looking in the code of the method:
Integer.toHexString
I found the following code :
public static String toHexString(int i) {
return toUnsignedString(i, 4);
}
private static String toUnsignedString(int i, int shift) {
char[] buf = new char[32];
int charPos = 32;
int radix = 1 << shift;
int mask = radix - 1;
do {
buf[--charPos] = digits[i & mask];
i >>>= shift;
} while (i != 0);
return new String(buf, charPos, (32 - charPos));
}
The question is, in toUnsignedString, why we create a char arr of 32 chars?
32 characters is how much you need to represent an int in binary (base-2, shift of 1, used by toBinaryString).
It could be sized exactly, but I guess it has never made business sense to attempt that optimisation.
Because that method is also called by toBinaryString(), and an int is up to 32 digits in binary.
Because the max value for an int in Java is : 2^31 - 1

Categories

Resources