How to transplant the java MD5 encrypt code into Python? - java

Here is the standard code that java use to do the MD5 encryption for a string,
import java.security.MessageDigest;
public class TransCode{
public static byte[] transcode(String text) throws Exception{
byte[] bytes = text.getBytes("UTF-8");
return bytes;
}
public static void main(String[] args){
try{
System.out.println("ORIGIN STRING: hello world");
byte[] byteArray = TransCode.transcode("hello world");
int arrLen = byteArray.length;
MessageDigest messageDigest = MessageDigest.getInstance("MD5");
messageDigest.update(byteArray);
System.out.print("BYTE ARRAY: ");
for (int i=0;i<arrLen;i++){
System.out.print(byteArray[i]);
System.out.print(" ");
}
System.out.println();
byteArray = messageDigest.digest();
System.out.print("MD5 RESULT: ");
for (int i=0;i<arrLen;i++){
System.out.print(byteArray[i]);
System.out.print(" ");
}
System.out.print("\n");
}
catch(Exception ex){
System.out.println(ex);
}
}
}
The result is as follows,
ORIGIN STRING: hello world
BYTE ARRAY: 104 101 108 108 111 32 119 111 114 108 100
MD5 RESULT: 94 -74 59 -69 -32 30 -18 -48 -109 -53 34
so how to transplant this standard java code into python and get the same result, here is the code i wrote that has bugs:
# -*- coding: utf-8 -*-
from hashlib import md5
def MD5Digest(text):
byteList = []
for item in text:
md=md5()
unit = str(ord(item))
print unit,
md.update(unit)
byteList.append(md.hexdigest())
print
return byteList
print MD5Digest(u"hello world".encode("UTF8"))
Result is as follows,
104 101 108 108 111 32 119 111 114 108 100
['c9e1074f5b3f9fc8ea15d152add07294', '38b3eff8baf56627478ec76a704e9b52', 'a3c65c2974270fd093ee8a9bf8ae7d0b', 'a3c65c2974270fd093ee8a9bf8ae7d0b', '698d51a19d8a121ce581499d7b701668', '6364d3f0f495b6ab9dcf8d3b5c6e0b01', '07e1cd7dca89a1678042477183b7ac3f', '698d51a19d8a121ce581499d7b701668', '5fd0b37cd7dbbb00f97ba6ce92bf5add', 'a3c65c2974270fd093ee8a9bf8ae7d0b', 'f899139df5e1059396431415e770c6dd']
You can see although the byte code results seem to be the same, the MD5 encrypt results are totally different.
PLEASE help me to rewrite my python code to debug, thanks a lot.
Due to the project reason, the solution to just reference this piece of java code in my python project is not actually allowed, please understand.

There are a few issues, first you calculate the hexdigest instead of the digest. Second you calculate the hexdigest inside the loop instead of outside.
Instead it should look like this:
# Python 3
from hashlib import md5
def MD5Digest(text):
digest = md5(text).digest()
print('ORIGIN STRING:', text)
print('BYTE ARRAY:', *text)
print('MD5 RESULT:', *digest)
return digest
# Python 2
from hashlib import md5
def MD5Digest(text):
digest = md5(text).digest()
print 'ORIGIN STRING:', text
print 'BYTE ARRAY:', " ".join(str(ord(c)) for c in text)
print 'MD5 RESULT:', " ".join(str(ord(c)) for c in digest)
return digest
This outputs
>>> MD5Digest(b'hello world')
ORIGIN STRING: b'hello world'
BYTE ARRAY: 104 101 108 108 111 32 119 111 114 108 100
MD5 RESULT: 94 182 59 187 224 30 238 208 147 203 34 187 143 90 205 195
The only difference is the signage, just remember the numbers are equal mod 256.

Related

How to generate base64 string from Java to C#?

I am trying to convert a Java function in C#. Here is the original code:
class SecureRandomString {
private static SecureRandom random = new SecureRandom();
private static Base64.Encoder encoder = Base64.getUrlEncoder().withoutPadding();
public static String generate(String seed) {
byte[] buffer;
if (seed == null) {
buffer = new byte[20];
random.nextBytes(buffer);
}
else {
buffer = seed.getBytes();
}
return encoder.encodeToString(buffer);
}
}
And here is what I did in C#:
public class Program
{
private static readonly Random random = new Random();
public static string Generate(string seed = null)
{
byte[] buffer;
if (seed == null)
{
buffer = new byte[20];
random.NextBytes(buffer);
}
else
{
buffer = Encoding.UTF8.GetBytes(seed);
}
return System.Web.HttpUtility.UrlPathEncode(RemovePadding(Convert.ToBase64String(buffer)));
}
private static string RemovePadding(string s) => s.TrimEnd('=');
}
I wrote some testcases:
Assert(Generate("a"), "YQ");
Assert(Generate("ab"), "YWI");
Assert(Generate("abc"), "YWJj");
Assert(Generate("abcd"), "YWJjZA");
Assert(Generate("abcd?"), "YWJjZD8");
Assert(Generate("test wewqe_%we()21-3012"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI");
Assert(Generate("test wewqe_%we()21-3012_"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTJf");
Assert(Generate("test wewqe_%we()21-3012/"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTIv");
Assert(Generate("test wewqe_%we()21-3012!"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTIh");
Assert(Generate("test wewqe_%we()21-3012a?"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTJhPw");`
And everything works fine, until I try the following one:
Assert(Generate("test wewqe_%we()21-3012?"), "dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI_");
My code output dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI/ instead of the expected dGVzdCB3ZXdxZV8ld2UoKTIxLTMwMTI_. Why?
I think that the culprit is the encoder. The original code configure its encoder like this Base64.getUrlEncoder().withoutPadding(). The withoutPadding() is basically a TrimEnd("=") but I am not sure how to code the getUrlEncoder().
I looked into this handy conversion table URL Encoding using C# without finding nothing for my case.
I tried HttpUtility.UrlEncode but the output is not right.
What did I missed?
According to Oracle documentation, here is what getUrlEncoder() does:
Returns a Base64.Encoder that encodes using the URL and Filename safe type base64 encoding scheme.
Alright what is "URL and Filename safe". Once more the documenation is helping:
Uses the "URL and Filename safe Base64 Alphabet" as specified in Table 2 of RFC 4648 for encoding and decoding. The encoder does not add any line feed (line separator) character. The decoder rejects data that contains characters outside the base64 alphabet.
We can now look online for the RFC 4648. Here is the Table 2:
Table 2: The "URL and Filename safe" Base 64 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 - (minus)
12 M 29 d 46 u 63 _
13 N 30 e 47 v (underline)
14 O 31 f 48 w
15 P 32 g 49 x
16 Q 33 h 50 y (pad) =
It is an encoding table. For example given 0 should output A, given 42 should ouput q, etc.
Let's check the decoding table, the Table 1:
Table 1: The Base 64 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y
Note that both table are strictly equals minus two things:
'+' is encoded to '-'
'/' is encoded to '_'
You should be able to fix your problem with:
private static string Encode(string s) => s.Replace("+", "-").Replace("/", "_");

How to convert a byte into a String that displays its hex value (0x0e -> "0E") [duplicate]

This question already has answers here:
How to convert byte array to hex format in Java
(4 answers)
Closed 2 years ago.
So i have a piece of code, which converted byte[] from a binary file into int[] which then went on to be converted to String.
Example: 02 09 1A would translate to "2926" using String.valueOf(byte);
But this is where the fun begins: the old Code already split up the array before it became String so it didn't matter. Now i have code which needs to somehow figure out if there is a 2 9 or 29... How can i get the String to
- stay in Hex instead of "switching" to decimal and
- keep the zeros in the string
so i can always grab the next two chars, figure out which info they display and go on?
Input: 05 06 1D 11 07 08 01 32 21 28 2F 20 2E 21 34 22 25 33 01 02 09 0A FF 0B 0C 2!(/ .!4"%3
ÿ
Current Output: 562917781503340473246335234375112910-11112
Expected Output: 05061D110708013221282F202E21342225330102090AFF0B0C
(Didn't put my old code in here since it's based on int which is crap)
If I understand correctly, you have a byte[] as input, and you want to output a string that is all those bytes, in hex form, joined together:
private static String byteArrayToString(byte[] byteArray) {
StringBuilder builder = new StringBuilder();
for (byte b : byteArray) {
int i = Byte.toUnsignedInt(b);
if (i < 16) {
builder.append("0");
}
builder.append(Integer.toHexString(i).toUpperCase());
}
return builder.toString();
}
Note the use of Byte.toUnsignedInt. This converts things like -1 the byte to the int 0xff. This is required because Java's bytes are signed.
Also note where I pad with 0. I only do this if the byte is a single digit in hex. i < 16 is the condition to check this.
Another way:
char[] hexVals = {'0', '1','2','3','4','5','6','7','8','9','A', 'B','C','D','E','F'};
byte[] bytes = {5, 6, (byte)0xAB, (byte)0xde}; // sample input
char[] output = new char[bytes.length*2];
// split each byte's hex digits into two chars
for(int i=0; i < bytes.length; i++) {
output[2*i] = hexVals[(bytes[i] >> 4) & 0xf]; // HO nibble
output[2*i+1] = hexVals[bytes[i] & 0xf]; // LO nibble
}
System.out.println(new String(output)); // 0506ABDE

TCP/IP client incorrectly reading inputstream byte array

I'm creating a Java Client program that sends a command to server and server sends back an acknowledgement and a response string.
The response is sent back in this manner
client -> server : cmd_string
server -> client : ack_msg(06)
server -> client : response_msg
Client code
public static void readStream(InputStream in) {
byte[] messageByte = new byte[20];// assuming mug size -need to
// know eact msg size ?
boolean end = false;
String dataString = "";
int bytesRead = 0;
try {
DataInputStream in1 = new DataInputStream(in);
// while ctr==2 todo 2 streams
int ctr = 0;
while (ctr < 2) {//counter 2 if ACK if NAK ctr=1 todo
bytesRead = in1.read(messageByte);
if (bytesRead > -1) {
ctr++;
}
dataString += new String(messageByte, 0, bytesRead);
System.out.println("\ninput byte arr "+ctr);
for (byte b : messageByte) {
char c=(char)b;
System.out.print(" "+b);
}
}
System.out.println("MESSAGE: " + dataString + "\n bytesread " + bytesRead + " msg length "
+ dataString.length() + "\n");
char[] chars = dataString.toCharArray();
ArrayList<String> hex=new ArrayList<>();
// int[] msg ;
for (int i = 0; i < chars.length; i++) {
int val = (int) chars[i];
System.out.print(" " + val);
hex.add(String.format("%04x", val));
}
System.out.println("\n"+hex);
} catch (Exception e) {
e.printStackTrace();
}
// ===
}
Output
client Socket created ..
response:
input byte arr 1
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
input byte arr 2
2 -77 67 79 -77 48 -77 3 -116 0 0 0 0 0 0 0 0 0 0 0
MESSAGE: ##³CO³0³##
(where # is some not supported special character )
bytesread 9 msg length 10
dec: 6 2 179 67 79 179 48 179 3 338
hex: [0006, 0002, 00b3, 0043, 004f, 00b3, 0030, 00b3, 0003, 0152]
bytes: 2 -77 67 79 -77 48 -77 3 -116 0 0 0 0 0 0 0 0 0 0 0 (bytes recieved in 2nd packet)
connection closed
Problem: I'm reading the last value incorrect, I have verified using wireshark the server has sent back the response as 06 02 b3 43
4f b3 30 b3 03 8c
Some how I'm reading the last value in correctly. Is there some issue with the reading stream?
EDIT
Rest of the response is read correctly but the last character should be 8c But is read as 0152Hex
Response from server : 06 02 b3 43 4f b3 30 b3 03 8c
Read by program : [0006, 0002, 00b3, 0043, 004f, 00b3, 0030, 00b3, 0003, 0152]
issue with reading the last character
EDIT 2
Response is received as 2 packets/streams
packet 1 byte arr : 6 (ACK)
packet 2 byte arr: 2 -77 67 79 -77 48 -77 3 -116 (response)
complete response read by client
dec: 6 2 179 67 79 179 48 179 3 338
hex: [0006, 0002, 00b3, 0043, 004f, 00b3, 0030, 00b3, 0003, 0152]
Thanks
The problem in this question was a matter of signed variables versus unsigned variables. When you have a number in computer memory, it is represented by a bunch of bits, each of them 0 or 1. Bytes are generally 8 bits, shorts are 16 etc. In an unsigned number, 8 bits will get you from positive 0 to 255, but not to negative numbers.
This is where signed numbers come in. In a signed number, the first bit tells you whether the following bits represent a negative or positive value. So now you can use 8 bits to represent -128 to +127. (Notice that the positive range is halved, from 255 to 127, because you "sacrifice" half of your range to the negative numbers).
So now what happens if you convert signed to unsigned? Depending on how you do it, things can go wrong. In the problem above, the code char c=(char)b; was converting a signed byte to an unsigned char. The proper way to do this is to "make your byte unsigned" before converting it to a char. You can do that like this: char c=(char)(b&0xFF); more info on casting a byte here.
Essentially, you can just remember that except for char, all java numbers are signed, and all you need to do is paste the &0xFF to make it work for a byte, 0xFFFF to make it work for a short, etc.
The details about why this works are as follows. Calling & means a bitwise and, and 0xFF is hexadecimal for 255. 255 is above the limit of a signed byte (127), so the number b&0xFF gets upgraded to a short by java. However, the short signed bit is on bit 16, while the byte signed bit is on bit 8. So now the byte signed bit becomes a normal 'data' bit in the short, and so the sign of your byte is essentially discarded.
If you do a normal cast, java recognizes that doing direct bitconversion like above would mean that you lose the sign, and java expects you don't like that (at least, that is my assumption), so it preserves the sign for you. So if this isn't what you want, you explicitly tell java what to do.

Encoding and Decoding bin file

I have seen so many examples online but they are not exactly what Im looking for.
I have a .bin file that I'd like to POST to a server, so I am encoding and storing it as a string.
However, decoding it doesn't give the same result.
import org.apache.commons.codec.binary.Base64;
File file = getFileStreamPath(ENROLL_FILENAME);
byte[] bytes = loadFile(file);
byte[] encoded = Base64.encodeBase64(bytes);
for(int i =0; i< encoded.length; i++){
System.out.print(encoded[i]);
}
System.out.println();
String encodedString = new String(encoded); //send to sever and decode
System.out.println("Encode "+ encodedString);
byte[] decoded = Base64.decodeBase64(encodedString.getBytes());
for(int i =0; i< decoded.length; i++){
System.out.print(decoded[i]);
}
System.out.println();
Result:
System.out: 1147948656688781216567661069850481179071904876109571171018810311782109108117905086121997274112981108285908749119987170489081656565656565656565666511965678710365769951100112904878789087498010050537565651041221005010811081496648991101041191011036565666565656567122119891035265658065115656565686565656511081736565757366656565706565656565656565656565656565661016881656583857712010311971576582896765119491017310369786574119798181658665676580886569866574107788465658765661038082119698965748977756569906573119781148169976580115751026569986577657448103659965736578886569996566697811610369102657273755310365103656548781181036510465658579104656910565777374122119651086565527811281691086572107707165691106565119794365651126573897855656511465651157790656911465746979791196911865668178103816911965795675656569122656669787465691226566107795511965496574737768103694965668179841036550657473798811965516566103808465695365747778108119695365806573104816554656511580102656954656681759010369576566567886119666665737379831197066656765789711966686574103805211966736573111809065707665678979101103668165666980115103708365785677856570886567997970816697657289789011966976573858011381709965767378838166102656510779708166103656910778791196610465651077885817010465686580741036610665801117610265701096568657779103701146575698080656611565678178861196611565668979119103661156565119801001196611665737378107119661166573568097816612065658178978170122656885801121197050657665791081196653657248789065665565655276102119665565731077767103705665686980103817043657673801038167676580103777011971676568858011465717165688579122816772657399785011967776574737810211971776568103791068171806576997953119678265705278571036783656948808910371836568568010311967856573817610981718965699977100119679065805277111657197656973771161036798657377781161036710465801197811410371109656965797665671126566898085816711265756580108657111565778178791037111765777380106656712065657378102817112165698979110119711226577737869656756656777801066567476565103791101036865656565781148168656572737789103726665778980100816867656665805565686965711118011781687065738978120816872657252784881687665727780110816878657356791076572786577857976119688165655280114817281656969767711972876570111809711968906573103751011196890657369768481689965731157811365689965721118010211972996577107808010372104657873808511968106657277781081196810665728579116103721086574487376103681096567111798811968109657211177801036811265714879701196811365688980891197211765781038010110368118657156771076572118657077751001197248657811579731036851657552797910368536579107791141197257656910373821036843658069797065724765796580578165676510185801066569686585697487816971651018179901196972658611179761036574651004879114816977651006977688269817268798174105756874476571675081737353717976897773778950101661211017370876968701201077373101479067431225273759771741127765767454877010184577010275707110654677250107756897721197086101657110611212272821005267112113746811643677411652767511848536757514370118102827283103118721196711567854912170651041227088905572104864966121575269901176570102100557371538267113785169111685375727567771111211196890566770739772708011778705478556989885170887170757211452765284546749115676810310284718111711772117768367844379691021197471711076672110111686811770436911810772101103656566656589897310710410555998210880778212210210386865265985210211577995111578725185808680818510688107751138810777981118578731025287104102111821095265837551738811110210399871185285103111698853102115891064781978172527653781157987105488783675689111102567870731157910010248825411056874711989826611948875211985888965859776894875111111998012165119825489698782811037956103778282811038797801159065735279691046582568811110347471111031147211178120748982105736582545289107110537773771107785103515697565181995065738973656989978899986857119101786752901218011199547277109438077115497389768012185875347859079867398971218179112471078372895287886511597535211978501091117911111865856511711110174474873748948881084710789651191199899801118210573658566899985986689887612077871221021031011191105610751881071081
System.out: Encode rO0ABXNyACBjb20uZGZ0Lm9ueXguRmluZ2VycHJpbnRUZW1wbGF0ZQAAAAAAAAABAwACWgALc3dpZ0NNZW1Pd25KAAhzd2lnQ1B0cnhwegAABAAAACzwYg4AAPAsAAADAAAAnQIAAKIBAAAFAAAAAAAAAAAAAABeDQAASUMxgwG9ARYCAw1eIgENAJwOQQAVACAPXAEVAJkNTAAWABgPRwEYAJYMKAEZAIwNrQEaAPsKfAEbAMAJ0gAcAIANXAEcABENtgEfAHIK5gAgAA0NvgAhAAUOhAEiAMIJzwAlAA4NpQElAHkFGAEnAAwO+AApAIYN7AArAAsMZAErAJEOOwEvABQNgQEwAO8KAAEzABENJAEzABkO7wA1AJIMDgE1ABQOTgA2AJIOXwA3ABgPTAE5AJMNlwE5APAIhQA6AAsPfAE6ABQKZgE9AB8NVwBBAIIOSwFBACANawBDAJgP4wBIAIoPZAFLACYOegBQABEPsgFSAN8MUAFXACcOFQBaAHYNZwBaAIUPqQFcALINSQBfAAkOFQBgAEkNOwBhAAkNUQFhADAPJgBjAPoLfAFmADAMOgFrAKEPPABsACQNVwBsABYOwgBsAAwPdwBtAIINkwBtAI8PaQBxAAQNaQFzADUPpwF2ALAOlwB5AH0NZAB7AA4LfwB7AIkMCgF8ADEPgQF+ALIPgQCCAPgMFwGCADUPrAGGADUOzQCHAIcN2wCMAJINfwGMADgOjQGPALcO5wCRAF4N9gCSAE0PYgGSAD8PgwCUAIQLmQGYAEcMdwCZAP4MoAGaAEIMtgCbAIMNtgChAPwNrgGmAEAOLACpABYPUQCpAKAPlAGsAMQNOgGuAMIPjACxAAINfQGyAEYOnwGzAMINEAC8ACMPjAC/AAgOngDAAAANrQDAAHIMYgHBAMYPdQDCABAP7ADEAGoPuQDFAIYNxQDHAH4N0QDLAHMPnQDNAI8OkAHNAMUOLwDQAA4PrQHQAEELMwHWAFoPawDZAIgKewDZAIELTQDcAIsNqADcAHoPfwHcAMkPPgHhANIPUwDjAHMNlwDjAHUOtgHlAJ0ILgDmACoOXwDmAHoMPgDpAG0OFwDqADYPYwHuANgPegDvAG8MkAHvAFMKdwH0ANsOIgD3AK4OOgD5AOkOrwH9AEgIRgD+APEOFAH/AOAP9QACAeUPjAEDAUEJWQEGAeQOZwEHAVoOLgAJAd0OrQEMAdEMDREQHDOQJiKDJ/AGC2QII5GOLYMIMY2eByeIFWEDFxkIIe/ZC+z4IKaGJpMALJ6WFeT9FfKFGj6CH2kKDaHwFVeAGjpzHRd4CpqJDt+CJt4LKv05C93+FvfRHSgvHwCsCU1yFAhzFXZ7HhV1By94EZuAFfd7IG5RCqN3EoD5KHKCMoywDZ8CFIaHFPuNF6N7EYX3FXGFKHr4L4T6C1sCDgfTGQuuHuLSCT+OEfwJGGkBHnoDDuF+EvkHegAABAAYYIkhi7cRlPMRzfgVV4Ab4fsMc3sNH3UPVPQUjXkKqXkMboUNIf4WhfoRm4ASK3IXofgcWv4UgoEX5fsYj/QaQH4L5NsOWi0WSC8Yof8NFIsOdf0R6n8W/wYRBw0W4wUXYAUaLY0KoocPyAwR6YEWRQgO8gMRRQgWaPsZAI4OEhAR8Xog//ogrHoNxJYRiIAR64Ykn5MIMnMUg38a83Qc2AIYIAEYaXcbD9weNC4ZyPoc6HMm+PMs1IYLPyUW5/UZOVIbayQOp/kSHY4WXAsa54wN2moOovAUAuoeJ/0IJY0Xl/kYAwwbcPoRiIAUBYcUbBYXLxMWzfgewn8k3XklzgQOIAcXxP8b6gYdt/YUDHkV24wZQgYds48N/y0lQO8m7S8vVmYKlfcXEAcXQwEeu4YGSi0THnwnEm0rHFIV53QXOIEXwu0YWW8NrNMaEQIne8IqC30O8QAS1+UTlvMecIQG99MRPk8mOEApQyULs+UOcQAV8w8ehvMKDAkZv48dLQoeavsRDbETGoQVgvEYltYXuPsZiH4k8womjfgZznEewXoqR2wtr4ELGBsSPBsY7CobXA4TdRITCQ0UjQgYSJEdpIUpmXot6I8v830Pc34QKfkXLIoYZHQNJBIYWAYcTpcccQ0Lz/YPcYITB+4XOX8XMwUaXYMe5QUglPYRBHsak/4kMfklGG4Ntu4Yc4UYgPQiXfkLRQoUFfgYWIwbcoUHLJEQogcY9AwYiYUOYfwereQpudMywp8OFHoV9fsZhoIaYH0HPW8SxHQXNnYYeHsO3QQluegqM5Qyv9cR/4UXofMarO4gifUPB/UcCCkdqXseXAQN/jQPfAsc90UoXg8ODoYOUYEdWvEewPkOUn8P43AVcAUW33UNsswPJREcsdcpDC0Ptu8cvLseERwlIRgeOQcgCgolKo0qZfkNbIYSuIwZtvseMIIHRQUP05AVg4MZUQcNcnoZUwYfTXUgpfwHyvsSWwIWln4W1IsGPYce4PwmRxEo7fEGRHkg0IojYnUkwXYS3f4Uo34Z2Pkbt3wi0vMlanYnpwg9k4slYIop0KIs+ZAysJIN0wIVhn0WlIIYMn4pBWgs0PwxpYMz+/AOvvoXmgId5PIeMn4YMIIfpoAhYYEiO4QNVf4UoYIZgIAcdXsiRQ0ldRUvl+00hPkOOAYSeggWlHkX8/gNw3EPgI4SAvgXHP4NHuwPjnIVcuMZGvQfpoAo2f0sTAQt+e4XawgYpo8ZwYgdVg4cjPcnk+wz3h0zleQMAQgNihQZBBMbgoYMifgNDwsZjgwkaQQNmvUZl+0cAwkmmQENtI8TFxUV1R0WjYcXHvwdv4QekAMhYn8WtOQdnKEggoMjegAABADR2BdvpB3DfCDtfyoAARBGiCntmDOBejMVlBD4BxJXDhnJeBpoFRB/+RZBEhi1cR4NDAkYGBRYHhUKER7xAxJ2BROs6xzNCB3g9BeaBBkE7hnZdiEg8RBOeCe8+i7V7jCRDAmw6AwX+RaCBhr96xL7+xfW5h6Z9B97BheTXBkvVSdZSigszBBevRWvfBYYHBZ/QQyQBxLl8hWb7xwTCxCbQxBdhBT24hYI+hFuiBdDDCMpKCZbTRW0/Rnifhr49CGRDxbT7hu55xye9R8B+hR1EhnPihmleCLtjhUxAxkCgRlyEhmdiBF2eBWrhBb60RjWxQ2n+BBhfBTxDBZAvw9+dxkEqyLJ9SSxBw0fCBZAhBr+FB5hfh8c+zN7hjSHDkWk/B+XBT9Ve0JEckyCixQH7hrYZiJKcCQ4+Q6fihpsDCOGCS5DEg6pdhkDfxnggiUiGRR99BZLLxqS7CNjsw91iSI6kCh4NClPfsnClkBGrHEx2pwkEL58Oh8BAQFSUCYgEgQAAP8AAACasqW4j7W2riBtAQP1MLMEABgFJ/7DBQAkCSdzBAAOCif/xAcALwse/nHCBAD7Dg9/DABSDxfAd1rASxgAWQ0TwMHBWsH+wT7APf/A/P44IwCwDwNCZFH/R8D+O0cv/cH9Pv5CBQEmEBfBfwMAcBIMwwwAUBYXwHdlWxMAQxggwMJtd8D+wsD//kAFAE4aHsF2BACeGwBsCQFeIBb+p8VQBgCdIQzC/z8DAPwrBsEGAT8uEP7B/cAIAO4vFlhBBAG5Mm0wCAEnMxP/N0AFAA8098D8/hkA4TUPwD7/Qf4uRPxd//1lFAEDNhbAwP7/KcApwD8xBQEmNxf/QiIAYjgWcsJYV/4u/v4hRP/+wMJDYgcAiTgJ/13CBAAuPP1QHQCIPQ9za0H+wC4p/S9GRAsBT0Aa/ktVLBUA3UETW//+wSL9wP7+/v/A/zsGAFNDei/GBwFNRSDB/cA7FgDdTQYr//3AKv/9wP79aMA1BQFmTiL+wP4QAApO/cL8wMMqTv/DPgcAflAM/23DCAAJUwnE//7DMQYAe1QTdMEDAZdVIvwRAVNbIP79S8H+KFH+wAQAY1yDfQQAa1wXjwoADpA3wcPCjG8EAEtgCcL8BAGqYDf/whAAE2RDf8LDwMTAwl1uCAFVZCkvNRkAzGUG/if//fsp///+Vv9zwMDC
System.out: -84-19051151140329911110946100102116461111101211204670105110103101114112114105110116841011091121089711610100000001302900111151191051036777101109791191107408115119105103678011611412011212200400044-16981400-1644003000-99200-94100500000000000941300736749-1251-67122231394341130-100146502103215921210-103137602202415711240-10612401250-11613-831260-5101241270-649-460280-128139212801713-74131011410-2603201313-660330514-1241340-629-4903701413-91137012152413901214-80410-12213-20043011121001430-111145914702013-1271480-17100151017133615102514-170530-110121415302014780540-110149505502415761570-10913-1051570-168-123058011151241580201010216103113870650-1261475165032131070670-10415-290720-118151001750381412208001715-781820-33128018703914210900118131030900-12315-871920-781373095091421096073135909709138119704815380990-6111241102048125811070-95156001080361387010802214-6201080121511901090-12613-10901090-1131510501130413105111505315-8911180-8014-105012101251310001230141112701230-1191210112404915-12711260-7815-1270-1260-812231-12605315-841-12205314-510-1210-12113-370-1160-110131271-11605614-1151-1130-7314-250-11109413-100-11007715981-11006315-1250-1080-12411-1031-104071121190-1030-212-961-10206612-740-1010-12513-740-950-413-821-9006414440-8702215810-870-9615-1081-840-6013581-820-6215-1160-7902131251-7807014-971-770-6213160-6803515-1160-650814-980-640013-830-64011412981-630-58151170-6201615-200-60010615-710-590-12213-590-57012613-470-53011515-990-510-11314-1121-510-5914470-4801415-831-4806511511-42090151070-390-120101230-390-12711770-360-11713-880-360122151271-360-5515621-310-4615830-29011513-1050-29011714-741-270-998460-2604214950-26012212620-23010914230-2205415991-180-40151220-17011112-1121-17083101191-120-3714340-90-8214580-70-2314-811-30728700-20-1514201-10-3215-11021-2715-11613165989161-2814103171901446091-3514-831121-47121317162851-1123834-12539-16611100835-111-11445-125849-115-98739-120219732325833-17-3911-20-832-90-12238-109044-98-10621-28-321-14-1232662-126311051013-95-162187-1282658115292312010-102-11914-33-12638-341142-35711-35-222-9-47294047310-8497711420811521118123302111774712017-101-12821-9123321108110-9311918-128-740114-12650-116-8013-97220-122-12120-5-11523-9312317-123-921113-12340122-847-124-611912147-452511-8230-30-46963-11417-4924105130122314-3112618-7712200402496-11933-117-7317-108-1317-51-82187-12827-31-51211512313311171584-1220-11512110-8712112110-1231333-222-123-617-101-128184311423-95-82890-220-126-12723-27-524-113-12266412611-28-3714904522724724-95-11320-11714117-317-2212722-161771322-295239652645-11510-94-12115-561217-23-1272269814-1431769822104-5250-11414181617-1512232-1-632-8412213-60-10617-120-12817-21-12236-97-10985011520-12512726-1311628-40224321241051192715-3630524625-56-628-2411538-8-1344-44-12211633722-25-11255782271073614-89-71829-11422921126-25-11613-3810614-94-16202-223039-3837-11523-105-72431227112-617-120-128205-121201082223471922-51-830-6212736-3512137-5041432723-60-127-22629-73-10201212121-37-1162566629-77-11313-1453764-1738-1947478610210-107-9231672367130-69-122674451930124391810943288221-251162356-12723-62-19248911113-84-452617239123-62421112514-15018-41-2719-106-1330112-1246-9-4517627938566441673711-77-2714113021-131530-122-131012925-65-11329451030106-51713-791926-12421-126-1524-106-4223-72-525-12012636-131038-115-825-5011330-63122427110845-81-12711242718602724-204227921419117181991320-11582472-11129-92-12341-10312245-24-11347-13125151151261641-72344-11824100116133618248862878-105281131311-49-1015113-126197-182357127235152693-12530-27532-108-1017412326-109-23649-7372411013-74-1824115-12324-128-123493-71169102021-82488-11627114-123744-11116-94724-121224-119-1231497-430-83-2841-71-4550-62-97142012221-11-525-122-126269612576111118-6011623541182412012314-35437-71-244251-10850-65-4117-1-12323-95-1326-84-1832-119-11157-112884129-871233092413-252151241128-9694094151414-1221481-1272990-1530-64-7148212715-2911221112522-3311713-78-5215371728-79-4141124515-74-1728-68-69301728373324305773210103742-11542101-713108-12218-72-11625-74-53048-126769515-45-11221-125-1252581713114122258
Why are the byte arrays not the same?
I think they are the same. Note that the first line of your output is the encoded bytes, and 114 is the decimal ASCII code for 'r', 79 is the decimal ASCII code for 'O', 48 is the decimal ASCII code for '0' etc., so your first and second lines match and are what I'd expect given your code.
Then when you've decoded the values, the bytes are being sign-extended to ints, so -84 is actually 256 - 84 = 172, -19 is 256 - 19 = 237, 0 is 0, etc., and if you base64 decode the second line using command line tools:
$ base64 -d < x.bin | od -tu1 | head -2
0000000 172 237 0 5 115 114 0 32 99 111 109 46 100 102 116 46
0000020 111 110 121 120 46 70 105 110 103 101 114 112 114 105 110 116
You'll see things match. So I think everything's fine, it's just how and what you're printing that makes things look wrong.
As far as I see you print out encoded, encodedString and decoded but not bytes. Can you check bytewise whether your arrays are equal or not?

Issue with getBytes() for accented charaters

I'm trying to convert a string with special characters like É into a string with UTF-8 encoding. I tried doing this:
String str = "MARIE-HÉLÈNE";
byte sByte[] = str.getBytes("UTF-8");
str = new String(sByte,"UTF-8");
The problem is, when I do "É".getBytes("UTF-8"), I get 63 which is interpreted as '?' when it's being converted to a new string. How can I fix this issue?
EDIT: I also noticed that this issue was not reproducible on Eclipse, probably because the text file encoding is usually set to UTF-8.
I tried doing byte[] str = "MARIE-HÉLÈNE".getBytes("UTF-8") in http://www.javarepl.com/console.html and got the result byte[] str = [77, 65, 82, 73, 69, 45, 72, 63, 76, 63, 78, 69]
This kind of error happens when information about the encoding of the source file is not given to the compiler (javac) properly. If the encoding of your source file is UTF-8, compile the file like the following.
javac -encoding UTF-8 E.java
The following is another example for the case where the encoding of the source file is UTF-16 Big Endian.
javac -encoding UTF-16BE E.java
I've already confirmed that the program below properly shows "0xC3 0x89". So, there is no problem in your code.
public class E
{
public static void main(String[] args) throws Exception
{
byte[] bytes = "É".getBytes("UTF-8");
for (int i = 0; i < bytes.length; ++i)
{
System.out.format("0x%02X ", (byte)(bytes[i]));
}
System.out.println();
}
}
"É".getBytes("UTF-8") returns a byte[] of 2 bytes: c3 89.
"MARIE-HÉLÈNE" is 4d 41 52 49 45 2d 48 c3 89 4c c3 88 4e 45.
4d 41 52 49 45 2d 48 c3 89 4c c3 88 4e 45
M A R I E - H É L È N E
Converting the bytes back using new String(bytes,"UTF-8") will restore the original string.

Categories

Resources