Trying to output UTF-8 Text, and it doesn't work

Trying to output UTF-8 Text, and it doesn't work - java

I have this code:
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintWriter;
import java.io.Writer;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
public class Decoder {
public static void Decode() throws IOException{
String input = "";
input = readFile("C:\\Users\\Dragon\\Pictures\\Binary.txt", StandardCharsets.UTF_8);
input = input.replace(" ","");
System.out.println(input);
String output = "";
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(new File("C:\\Users\\Dragon\\Pictures\\Binary2.txt")), "UTF8"));
for(int i = 0; i <= input.length() - 8; i+=8)
{
int k = Integer.parseInt(input.substring(i, i+8), 2);
out.append((char)k);
}
out.close();
System.out.println("Your File has been saved at C:\\Users\\Dragon\\Pictures\\Binary.txt");
}
static String readFile(String path, Charset encoding)
throws IOException
{
byte[] encoded = Files.readAllBytes(Paths.get(path));
return new String(encoded, encoding);
}
}
and basically what I'm doing is converting a text file containing binary that can be decoded into text. It decodes successfully, but when outputting the file containing the output, all UTF-8 Characters are replaced with '?'. Why is this happening?
EDIT:
Example Input:
00111111 01010000 01001110 01000111 00001101 00001010 00011010 00001010 00000000 00000000 00000000 00001101 01001001 01001000 01000100 01010010 00000000 00000000 00000000 00000001 00000000 00000000 00000000 00000001 00001000 00000110 00000000 00000000 00000000 00011111 00010101 00111111 00000000 00000000 00000000 00000001 01110011 01010010 01000111 01000010 00000000 00111111 00111111 00011100 00111111 00000000 00000000 00000000 00000100 01100111 01000001 01001101 01000001 00000000 00000000 00111111 00111111 00001011 00111111 01100001 00000101 00000000 00000000 00000000 00001001 01110000 01001000 01011001 01110011 00000000 00000000 00001110 00111111 00000000 00000000 00001110 00111111 00000001 00111111 01101111 00111111 01100100 00000000 00000000 00000000 00011000 01110100 01000101 01011000 01110100 01010011 01101111 01100110 01110100 01110111 01100001 01110010 01100101 00000000 01110000 01100001 01101001 01101110 01110100 00101110 01101110 01100101 01110100 00100000 00110100 00101110 00110000 00101110 00110011 00111111 00111111 01010000 00000000 00000000 00000000 00001101 01001001 01000100 01000001 01010100 00011000 01010111 01100011 00101000 00001000 01110011 01011011 00001011 00000000 00000100 00000000 00000001 00111111 00011110 01110011 00111111 00111111 00000000 00000000 00000000 00000000 01001001 01000101 01001110 01000100 00111111 01000010 01100000 00111111
Expected output:
‰PNG
IHDR Ä‰ sRGB ®Îé gAMA ±üa pHYs Ã ÃÇo¨d tEXtSoftware paint.net 4.0.3Œæ—P
IDATWc(s[ ºs¾² IEND®B`‚
Output gotten:
?PNG
IHDR ? sRGB ??? gAMA ???a pHYs ? ??o?d tEXtSoftware paint.net 4.0.3??P
IDATWc(s[ ?s?? IEND?B`?

Your output is binary data, you should write it to a binary file not a text file. Create your output simply like this:
OutputStream os = new FileOutputStream("output.png");
I named your output with extension .png because your expected output looks like a PNG image.
And write bytes to your output like this:
for(int i = 0; i <= input.length() - 8; i += 8) {
int k = Integer.parseInt(input.substring(i, i+8), 2);
out.write(k);
}
Improvements:
You don't have to remove spaces from your input string as you can just skip them when iterating over the String:
// If spaces are not removed:
for(int i = 0; i <= input.length() - 9; i += 9) { // NOTE THE +9
int k = Integer.parseInt(input.substring(i, i+8), 2); // AND STILL +8 HERE
out.write(k);
}
Also further, you don't even need to read the whole input file into memory as you can just read it by 8 or 9 byte chunks, as those provide a valid byte for the output file.

It seems that you have a problem with your input.
You expect your output to have the charcter ¾in it. Looking at the input for that character I found:
The integer value for ¾ is decimal 190 (hex BE), which is 10111110 in binary.
The value in your input that corresponds to that place is equal to 63 (00111111), which is 3F in hex. U+003F corresponds to ? 3f QUESTION MARK so your problem lies within your input.

Related

Bitwise operator with negative numbers

Why does..
-23&30 = 8
5&-3 = 5
15&-1 = 15
I understand & with positive numbers but for some reason when a negative number is thrown, I don't understand how the answer is derived.

You should read about 2's complement method of representing negative numbers in binary.
For example:
5 == 00000000 00000000 00000000 00000101
&
-3 == 11111111 11111111 11111111 11111101
= -----------------------------------
5 == 00000000 00000000 00000000 00000101

JAVA Bitwise code purpose , &

// following code prints out Letters aA bB cC dD eE ....
class UpCase {
public static void main(String args[]) {
char ch;
for(int i = 0; i < 10; i++) {
ch = (char)('a' + i);
System.out.print(ch);
ch = (char)((int) ch & 66503);
System.out.print(ch + " ")
}
}
}
Still learning Java but struggling to understand bitwise operations. Both codes work but I don't understand the binary reasons behind these codes. Why is (int) casted back to ch and what is 66503 used for that enables it to print out different letter casings.
//following code displays bits within a byte
class Showbits {
public static void main(String args[]) {
int t;
byte val;
val = 123;
for(t = 128; t > 0; t = t/2) {
if((val & t) != 0)
System.out.print("1 ");
else System.out.print("0 ");
}
}
}
//output is 0 1 1 1 1 0 1 1
For this code's output what's the step breakdown to achieve it ? If 123 is 01111011 and 128 as well as 64 and 32 is 10000000 shouldnt the output be 00000000 ? As & turns anything with 0 into a 0 ? Really confused.

Second piece of code(Showbits):
The code is actually converting decimal to binary. The algorithm uses some bit magic, mainly the AND(&) operator.
Consider the number 123 = 01111011 and 128 = 10000000. When we AND them together, we get 0 or a non-zero number depending whether the 1 in 128 is AND-ed with a 1 or a 0.
10000000
& 01111011
----------
00000000
In this case, the answer is a 0 and we have the first bit as 0.
Moving forward, we take 64 = 01000000 and, AND it with 123. Notice the shift of the 1 rightwards.
01000000
& 01111011
----------
01000000
AND-ing with 123 produces a non-zero number this time, and the second bit is 1. This procedure is repeated.
First piece of code(UpCase):
Here 65503 is the negation of 32.
32 = 0000 0000 0010 0000
~32 = 1111 1111 1101 1111
Essentially, we subtract a value of 32 from the lowercase letter by AND-ing with the negation of 32. As we know, subtracting 32 from a lowercase ASCII value character converts it to uppercase.

UpCase
The decimal number 66503 represented by a 32 bit signed integer is 00000000 00000001 00000011 11000111 in binary.
The ASCII letter a represented by a 8 bit char is 01100001 in binary (97 in decimal).
Casting the char to a 32 bit signed integer gives 00000000 00000000 00000000 01100001.
&ing the two integers together gives:
00000000 00000000 00000000 01100001
00000000 00000001 00000011 11000111
===================================
00000000 00000000 00000000 01000001
which casted back to char gives 01000001, which is decimal 65, which is the ASCII letter A.
Showbits
No idea why you think that 128, 64 and 32 are all 10000000. They obviously can't be the same number, since they are, well, different numbers. 10000000 is 128 in decimal.
What the for loop does is start at 128 and go through every consecutive next smallest power of 2: 64, 32, 16, 8, 4, 2 and 1.
These are the following binary numbers:
128: 10000000
64: 01000000
32: 00100000
16: 00010000
8: 00001000
4: 00000100
2: 00000010
1: 00000001
So in each loop it &s the given value together with each of these numbers, printing "0 " when the result is 0, and "1 " otherwise.
Example:
val is 123, which is 01111011.
So the loop will look like this:
128: 10000000 & 01111011 = 00000000 -> prints "0 "
64: 01000000 & 01111011 = 01000000 -> prints "1 "
32: 00100000 & 01111011 = 00100000 -> prints "1 "
16: 00010000 & 01111011 = 00010000 -> prints "1 "
8: 00001000 & 01111011 = 00001000 -> prints "1 "
4: 00000100 & 01111011 = 00000000 -> prints "0 "
2: 00000010 & 01111011 = 00000010 -> prints "1 "
1: 00000001 & 01111011 = 00000001 -> prints "1 "
Thus the final output is "0 1 1 1 1 0 1 1", which is exactly right.

understanding 3>>2 and -3>>2 results in Java

When I running the code:
public class OperateDemo18{
public static void main(String args[]){
int x = 3 ; // 00000000 00000000 00000000 00000011
int y = -3 ; // 11111111 11111111 11111111 11111101
System.out.println(x>>2) ;
System.out.println(y>>2) ;
}
};
I get output as:
x>>2 is 0
y>>2 is -1
As my understanding, since int x = 3, x>>2 is equal to (3/2)/2 which is 0.75, to integer, x>>2 is 0.
But I don't understand why for int y = -3, y>>2 is -1. Could anyone please explain it?

As my understanding, since int x = 3, x>>2 is equal to (3/2)/2 which is 0.75, to integer, x>>2 is 0.
That's not entirely true; >> is a bitshift operation, nothing else. The effect on positive integers is division by powers of two, yes. But for unsigned integers, it's not:
You conveniently supplied the binary form of y == -3 yourself:
11111111 11111111 11111111 11111101
let's bitshift that right by two!
y == 11111111 11111111 11111111 11111101
y>>2== xx111111 11111111 11111111 11111111
Now, what do you fill in for x?
Java, like most reasonable languages, sign-extends, ie. it uses the original highest (leftmost) bit:
y == 11111111 11111111 11111111 11111101
y>>2== 11111111 11111111 11111111 11111111
It isn't hard to see that this is the "biggest" negative integer (remember, negative integers are represented as "two's complement"!), i.e. -1.

>> operator shifts the bits from left to right, and enters the most-significant (leftmost) bit from the left, therefore in case of:
00000000 00000000 00000000 00000011
it becomes:
00000000 00000000 00000000 00000000
and in case of:
11111111 11111111 11111111 11111101
it becomes:
11111111 11111111 11111111 11111111

Java8 unsigned arithmetic

Java 8 is widely reported to have library support for unsigned integers. However, there seem to be no articles explaining how to use it and how much is possible.
Some functions like Integer.CompareUnsigned are easy enough to find and seem to do what one would expect. However, I fail to write even a simple loop that loops over all powers of two within the range of unsigned long.
int i = 0;
for(long l=1; (Long.compareUnsigned(l, Long.MAX_VALUE*2) < 0) && i<100; l+=l) {
System.out.println(l);
i++;
}
produces the output
1
2
4
8
...
1152921504606846976
2305843009213693952
4611686018427387904
-9223372036854775808
0
0
0
...
0
Am I missing something or are external libraries still required for this simple task?

If you're referring to
(Long.compareUnsigned(l, Long.MAX_VALUE*2) < 0)
l reaches
-9223372036854775808
unsigned it is
9223372036854775808
and
Long.MAX_VALUE*2
is
18446744073709551614
So l is smaller than Long.MAX_VALUE*2 in the unsigned world.
Assuming you're asking about the 0's
0
0
0
...
0
the problem (if you see it that way) is that, for long (other numerical primitives), the first bit is the sign bit.
so
10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
is
-9223372036854775808
When you do
-9223372036854775808 + -9223372036854775808
you underflow (overflow?) since
10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
+ 10000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
is
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
which is 0. On later loop iterations, 0 + 0 remains 0.

The only problem here is that you're printing l as a signed integer. You can use Integer.toUnsignedString to get the results you're expecting:
int i = 0;
byte[] tmp = new byte[9];
for(int l=1; (Long.compareUnsigned(l, Long.MAX_VALUE*2) < 0) && i<100; l+=l) {
System.out.println(Integer.toUsignedString(l)); // <== MODIFIED THIS LINE
i++;
}

count leading zeros (clz) or number of leading zeros (nlz) in Java

I need int 32 in binary as 00100000 or int 127 in binary 0111 1111.
The variant Integer.toBinaryString returns results only from 1.
If I build the for loop this way:
for (int i= 32; i <= 127; i + +) {
System.out.println (i);
System.out.println (Integer.toBinaryString (i));
}
And from binary numbers I need the number of leading zeros (count leading zeros (clz) or number of leading zeros (nlz)) I really meant the exact number of 0, such ex: at 00100000 -> 2 and at 0111 1111 - > 1

How about
int lz = Integer.numberOfLeadingZeros(i & 0xFF) - 24;
int tz = Integer.numberOfLeadingZeros(i | 0x100); // max is 8.

Count the number of leading zeros as follows:
int lz = 8;
while (i)
{
lz--;
i >>>= 1;
}
Of course, this supposes the number doesn't exceed 255, otherwise, you would get negative results.

Efficient solution is int ans = 8-(log2(x)+1)
you can calculate log2(x)= logy (x) / logy (2)

public class UtilsInt {
int leadingZerosInt(int i) {
return leadingZeros(i,Integer.SIZE);
}
/**
* use recursion to find occurence of first set bit
* rotate right by one bit & adjust complement
* check if rotate value is not zero if so stop counting/recursion
* #param i - integer to check
* #param maxBitsCount - size of type (in this case int)
* if we want to check only for:
* positive values we can set this to Integer.SIZE / 2
* (as int is signed in java - positive values are in L16 bits)
*/
private synchronized int leadingZeros(int i, int maxBitsCount) {
try {
logger.debug("checking if bit: "+ maxBitsCount
+ " is set | " + UtilsInt.intToString(i,8));
return (i >>>= 1) != 0 ? leadingZeros(i, --maxBitsCount) : maxBitsCount;
} finally {
if(i==0) logger.debug("bits in this integer from: " + --maxBitsCount
+ " up to last are not set (i'm counting from msb->lsb)");
}
}
}
test statement:
int leadingZeros = new UtilsInt.leadingZerosInt(255); // 8
test output:
checking if bit: 32 is set |00000000 00000000 00000000 11111111
checking if bit: 31 is set |00000000 00000000 00000000 01111111
checking if bit: 30 is set |00000000 00000000 00000000 00111111
checking if bit: 29 is set |00000000 00000000 00000000 00011111
checking if bit: 28 is set |00000000 00000000 00000000 00001111
checking if bit: 27 is set |00000000 00000000 00000000 00000111
checking if bit: 26 is set |00000000 00000000 00000000 00000011
checking if bit: 25 is set |00000000 00000000 00000000 00000001
bits in this integer from: 24 up to last are not set (i'm counting from msb->lsb)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Trying to output UTF-8 Text, and it doesn't work - java

Related

Bitwise operator with negative numbers

JAVA Bitwise code purpose , &

understanding 3>>2 and -3>>2 results in Java

Java8 unsigned arithmetic

count leading zeros (clz) or number of leading zeros (nlz) in Java

Categories

Resources