Encoding and Decoding bin file - java

I have seen so many examples online but they are not exactly what Im looking for.
I have a .bin file that I'd like to POST to a server, so I am encoding and storing it as a string.
However, decoding it doesn't give the same result.
import org.apache.commons.codec.binary.Base64;
File file = getFileStreamPath(ENROLL_FILENAME);
byte[] bytes = loadFile(file);
byte[] encoded = Base64.encodeBase64(bytes);
for(int i =0; i< encoded.length; i++){
System.out.print(encoded[i]);
}
System.out.println();
String encodedString = new String(encoded); //send to sever and decode
System.out.println("Encode "+ encodedString);
byte[] decoded = Base64.decodeBase64(encodedString.getBytes());
for(int i =0; i< decoded.length; i++){
System.out.print(decoded[i]);
}
System.out.println();
Result:
System.out: 1147948656688781216567661069850481179071904876109571171018810311782109108117905086121997274112981108285908749119987170489081656565656565656565666511965678710365769951100112904878789087498010050537565651041221005010811081496648991101041191011036565666565656567122119891035265658065115656565686565656511081736565757366656565706565656565656565656565656565661016881656583857712010311971576582896765119491017310369786574119798181658665676580886569866574107788465658765661038082119698965748977756569906573119781148169976580115751026569986577657448103659965736578886569996566697811610369102657273755310365103656548781181036510465658579104656910565777374122119651086565527811281691086572107707165691106565119794365651126573897855656511465651157790656911465746979791196911865668178103816911965795675656569122656669787465691226566107795511965496574737768103694965668179841036550657473798811965516566103808465695365747778108119695365806573104816554656511580102656954656681759010369576566567886119666665737379831197066656765789711966686574103805211966736573111809065707665678979101103668165666980115103708365785677856570886567997970816697657289789011966976573858011381709965767378838166102656510779708166103656910778791196610465651077885817010465686580741036610665801117610265701096568657779103701146575698080656611565678178861196611565668979119103661156565119801001196611665737378107119661166573568097816612065658178978170122656885801121197050657665791081196653657248789065665565655276102119665565731077767103705665686980103817043657673801038167676580103777011971676568858011465717165688579122816772657399785011967776574737810211971776568103791068171806576997953119678265705278571036783656948808910371836568568010311967856573817610981718965699977100119679065805277111657197656973771161036798657377781161036710465801197811410371109656965797665671126566898085816711265756580108657111565778178791037111765777380106656712065657378102817112165698979110119711226577737869656756656777801066567476565103791101036865656565781148168656572737789103726665778980100816867656665805565686965711118011781687065738978120816872657252784881687665727780110816878657356791076572786577857976119688165655280114817281656969767711972876570111809711968906573103751011196890657369768481689965731157811365689965721118010211972996577107808010372104657873808511968106657277781081196810665728579116103721086574487376103681096567111798811968109657211177801036811265714879701196811365688980891197211765781038010110368118657156771076572118657077751001197248657811579731036851657552797910368536579107791141197257656910373821036843658069797065724765796580578165676510185801066569686585697487816971651018179901196972658611179761036574651004879114816977651006977688269817268798174105756874476571675081737353717976897773778950101661211017370876968701201077373101479067431225273759771741127765767454877010184577010275707110654677250107756897721197086101657110611212272821005267112113746811643677411652767511848536757514370118102827283103118721196711567854912170651041227088905572104864966121575269901176570102100557371538267113785169111685375727567771111211196890566770739772708011778705478556989885170887170757211452765284546749115676810310284718111711772117768367844379691021197471711076672110111686811770436911810772101103656566656589897310710410555998210880778212210210386865265985210211577995111578725185808680818510688107751138810777981118578731025287104102111821095265837551738811110210399871185285103111698853102115891064781978172527653781157987105488783675689111102567870731157910010248825411056874711989826611948875211985888965859776894875111111998012165119825489698782811037956103778282811038797801159065735279691046582568811110347471111031147211178120748982105736582545289107110537773771107785103515697565181995065738973656989978899986857119101786752901218011199547277109438077115497389768012185875347859079867398971218179112471078372895287886511597535211978501091117911111865856511711110174474873748948881084710789651191199899801118210573658566899985986689887612077871221021031011191105610751881071081
System.out: Encode rO0ABXNyACBjb20uZGZ0Lm9ueXguRmluZ2VycHJpbnRUZW1wbGF0ZQAAAAAAAAABAwACWgALc3dpZ0NNZW1Pd25KAAhzd2lnQ1B0cnhwegAABAAAACzwYg4AAPAsAAADAAAAnQIAAKIBAAAFAAAAAAAAAAAAAABeDQAASUMxgwG9ARYCAw1eIgENAJwOQQAVACAPXAEVAJkNTAAWABgPRwEYAJYMKAEZAIwNrQEaAPsKfAEbAMAJ0gAcAIANXAEcABENtgEfAHIK5gAgAA0NvgAhAAUOhAEiAMIJzwAlAA4NpQElAHkFGAEnAAwO+AApAIYN7AArAAsMZAErAJEOOwEvABQNgQEwAO8KAAEzABENJAEzABkO7wA1AJIMDgE1ABQOTgA2AJIOXwA3ABgPTAE5AJMNlwE5APAIhQA6AAsPfAE6ABQKZgE9AB8NVwBBAIIOSwFBACANawBDAJgP4wBIAIoPZAFLACYOegBQABEPsgFSAN8MUAFXACcOFQBaAHYNZwBaAIUPqQFcALINSQBfAAkOFQBgAEkNOwBhAAkNUQFhADAPJgBjAPoLfAFmADAMOgFrAKEPPABsACQNVwBsABYOwgBsAAwPdwBtAIINkwBtAI8PaQBxAAQNaQFzADUPpwF2ALAOlwB5AH0NZAB7AA4LfwB7AIkMCgF8ADEPgQF+ALIPgQCCAPgMFwGCADUPrAGGADUOzQCHAIcN2wCMAJINfwGMADgOjQGPALcO5wCRAF4N9gCSAE0PYgGSAD8PgwCUAIQLmQGYAEcMdwCZAP4MoAGaAEIMtgCbAIMNtgChAPwNrgGmAEAOLACpABYPUQCpAKAPlAGsAMQNOgGuAMIPjACxAAINfQGyAEYOnwGzAMINEAC8ACMPjAC/AAgOngDAAAANrQDAAHIMYgHBAMYPdQDCABAP7ADEAGoPuQDFAIYNxQDHAH4N0QDLAHMPnQDNAI8OkAHNAMUOLwDQAA4PrQHQAEELMwHWAFoPawDZAIgKewDZAIELTQDcAIsNqADcAHoPfwHcAMkPPgHhANIPUwDjAHMNlwDjAHUOtgHlAJ0ILgDmACoOXwDmAHoMPgDpAG0OFwDqADYPYwHuANgPegDvAG8MkAHvAFMKdwH0ANsOIgD3AK4OOgD5AOkOrwH9AEgIRgD+APEOFAH/AOAP9QACAeUPjAEDAUEJWQEGAeQOZwEHAVoOLgAJAd0OrQEMAdEMDREQHDOQJiKDJ/AGC2QII5GOLYMIMY2eByeIFWEDFxkIIe/ZC+z4IKaGJpMALJ6WFeT9FfKFGj6CH2kKDaHwFVeAGjpzHRd4CpqJDt+CJt4LKv05C93+FvfRHSgvHwCsCU1yFAhzFXZ7HhV1By94EZuAFfd7IG5RCqN3EoD5KHKCMoywDZ8CFIaHFPuNF6N7EYX3FXGFKHr4L4T6C1sCDgfTGQuuHuLSCT+OEfwJGGkBHnoDDuF+EvkHegAABAAYYIkhi7cRlPMRzfgVV4Ab4fsMc3sNH3UPVPQUjXkKqXkMboUNIf4WhfoRm4ASK3IXofgcWv4UgoEX5fsYj/QaQH4L5NsOWi0WSC8Yof8NFIsOdf0R6n8W/wYRBw0W4wUXYAUaLY0KoocPyAwR6YEWRQgO8gMRRQgWaPsZAI4OEhAR8Xog//ogrHoNxJYRiIAR64Ykn5MIMnMUg38a83Qc2AIYIAEYaXcbD9weNC4ZyPoc6HMm+PMs1IYLPyUW5/UZOVIbayQOp/kSHY4WXAsa54wN2moOovAUAuoeJ/0IJY0Xl/kYAwwbcPoRiIAUBYcUbBYXLxMWzfgewn8k3XklzgQOIAcXxP8b6gYdt/YUDHkV24wZQgYds48N/y0lQO8m7S8vVmYKlfcXEAcXQwEeu4YGSi0THnwnEm0rHFIV53QXOIEXwu0YWW8NrNMaEQIne8IqC30O8QAS1+UTlvMecIQG99MRPk8mOEApQyULs+UOcQAV8w8ehvMKDAkZv48dLQoeavsRDbETGoQVgvEYltYXuPsZiH4k8womjfgZznEewXoqR2wtr4ELGBsSPBsY7CobXA4TdRITCQ0UjQgYSJEdpIUpmXot6I8v830Pc34QKfkXLIoYZHQNJBIYWAYcTpcccQ0Lz/YPcYITB+4XOX8XMwUaXYMe5QUglPYRBHsak/4kMfklGG4Ntu4Yc4UYgPQiXfkLRQoUFfgYWIwbcoUHLJEQogcY9AwYiYUOYfwereQpudMywp8OFHoV9fsZhoIaYH0HPW8SxHQXNnYYeHsO3QQluegqM5Qyv9cR/4UXofMarO4gifUPB/UcCCkdqXseXAQN/jQPfAsc90UoXg8ODoYOUYEdWvEewPkOUn8P43AVcAUW33UNsswPJREcsdcpDC0Ptu8cvLseERwlIRgeOQcgCgolKo0qZfkNbIYSuIwZtvseMIIHRQUP05AVg4MZUQcNcnoZUwYfTXUgpfwHyvsSWwIWln4W1IsGPYce4PwmRxEo7fEGRHkg0IojYnUkwXYS3f4Uo34Z2Pkbt3wi0vMlanYnpwg9k4slYIop0KIs+ZAysJIN0wIVhn0WlIIYMn4pBWgs0PwxpYMz+/AOvvoXmgId5PIeMn4YMIIfpoAhYYEiO4QNVf4UoYIZgIAcdXsiRQ0ldRUvl+00hPkOOAYSeggWlHkX8/gNw3EPgI4SAvgXHP4NHuwPjnIVcuMZGvQfpoAo2f0sTAQt+e4XawgYpo8ZwYgdVg4cjPcnk+wz3h0zleQMAQgNihQZBBMbgoYMifgNDwsZjgwkaQQNmvUZl+0cAwkmmQENtI8TFxUV1R0WjYcXHvwdv4QekAMhYn8WtOQdnKEggoMjegAABADR2BdvpB3DfCDtfyoAARBGiCntmDOBejMVlBD4BxJXDhnJeBpoFRB/+RZBEhi1cR4NDAkYGBRYHhUKER7xAxJ2BROs6xzNCB3g9BeaBBkE7hnZdiEg8RBOeCe8+i7V7jCRDAmw6AwX+RaCBhr96xL7+xfW5h6Z9B97BheTXBkvVSdZSigszBBevRWvfBYYHBZ/QQyQBxLl8hWb7xwTCxCbQxBdhBT24hYI+hFuiBdDDCMpKCZbTRW0/Rnifhr49CGRDxbT7hu55xye9R8B+hR1EhnPihmleCLtjhUxAxkCgRlyEhmdiBF2eBWrhBb60RjWxQ2n+BBhfBTxDBZAvw9+dxkEqyLJ9SSxBw0fCBZAhBr+FB5hfh8c+zN7hjSHDkWk/B+XBT9Ve0JEckyCixQH7hrYZiJKcCQ4+Q6fihpsDCOGCS5DEg6pdhkDfxnggiUiGRR99BZLLxqS7CNjsw91iSI6kCh4NClPfsnClkBGrHEx2pwkEL58Oh8BAQFSUCYgEgQAAP8AAACasqW4j7W2riBtAQP1MLMEABgFJ/7DBQAkCSdzBAAOCif/xAcALwse/nHCBAD7Dg9/DABSDxfAd1rASxgAWQ0TwMHBWsH+wT7APf/A/P44IwCwDwNCZFH/R8D+O0cv/cH9Pv5CBQEmEBfBfwMAcBIMwwwAUBYXwHdlWxMAQxggwMJtd8D+wsD//kAFAE4aHsF2BACeGwBsCQFeIBb+p8VQBgCdIQzC/z8DAPwrBsEGAT8uEP7B/cAIAO4vFlhBBAG5Mm0wCAEnMxP/N0AFAA8098D8/hkA4TUPwD7/Qf4uRPxd//1lFAEDNhbAwP7/KcApwD8xBQEmNxf/QiIAYjgWcsJYV/4u/v4hRP/+wMJDYgcAiTgJ/13CBAAuPP1QHQCIPQ9za0H+wC4p/S9GRAsBT0Aa/ktVLBUA3UETW//+wSL9wP7+/v/A/zsGAFNDei/GBwFNRSDB/cA7FgDdTQYr//3AKv/9wP79aMA1BQFmTiL+wP4QAApO/cL8wMMqTv/DPgcAflAM/23DCAAJUwnE//7DMQYAe1QTdMEDAZdVIvwRAVNbIP79S8H+KFH+wAQAY1yDfQQAa1wXjwoADpA3wcPCjG8EAEtgCcL8BAGqYDf/whAAE2RDf8LDwMTAwl1uCAFVZCkvNRkAzGUG/if//fsp///+Vv9zwMDC
System.out: -84-19051151140329911110946100102116461111101211204670105110103101114112114105110116841011091121089711610100000001302900111151191051036777101109791191107408115119105103678011611412011212200400044-16981400-1644003000-99200-94100500000000000941300736749-1251-67122231394341130-100146502103215921210-103137602202415711240-10612401250-11613-831260-5101241270-649-460280-128139212801713-74131011410-2603201313-660330514-1241340-629-4903701413-91137012152413901214-80410-12213-20043011121001430-111145914702013-1271480-17100151017133615102514-170530-110121415302014780540-110149505502415761570-10913-1051570-168-123058011151241580201010216103113870650-1261475165032131070670-10415-290720-118151001750381412208001715-781820-33128018703914210900118131030900-12315-871920-781373095091421096073135909709138119704815380990-6111241102048125811070-95156001080361387010802214-6201080121511901090-12613-10901090-1131510501130413105111505315-8911180-8014-105012101251310001230141112701230-1191210112404915-12711260-7815-1270-1260-812231-12605315-841-12205314-510-1210-12113-370-1160-110131271-11605614-1151-1130-7314-250-11109413-100-11007715981-11006315-1250-1080-12411-1031-104071121190-1030-212-961-10206612-740-1010-12513-740-950-413-821-9006414440-8702215810-870-9615-1081-840-6013581-820-6215-1160-7902131251-7807014-971-770-6213160-6803515-1160-650814-980-640013-830-64011412981-630-58151170-6201615-200-60010615-710-590-12213-590-57012613-470-53011515-990-510-11314-1121-510-5914470-4801415-831-4806511511-42090151070-390-120101230-390-12711770-360-11713-880-360122151271-360-5515621-310-4615830-29011513-1050-29011714-741-270-998460-2604214950-26012212620-23010914230-2205415991-180-40151220-17011112-1121-17083101191-120-3714340-90-8214580-70-2314-811-30728700-20-1514201-10-3215-11021-2715-11613165989161-2814103171901446091-3514-831121-47121317162851-1123834-12539-16611100835-111-11445-125849-115-98739-120219732325833-17-3911-20-832-90-12238-109044-98-10621-28-321-14-1232662-126311051013-95-162187-1282658115292312010-102-11914-33-12638-341142-35711-35-222-9-47294047310-8497711420811521118123302111774712017-101-12821-9123321108110-9311918-128-740114-12650-116-8013-97220-122-12120-5-11523-9312317-123-921113-12340122-847-124-611912147-452511-8230-30-46963-11417-4924105130122314-3112618-7712200402496-11933-117-7317-108-1317-51-82187-12827-31-51211512313311171584-1220-11512110-8712112110-1231333-222-123-617-101-128184311423-95-82890-220-126-12723-27-524-113-12266412611-28-3714904522724724-95-11320-11714117-317-2212722-161771322-295239652645-11510-94-12115-561217-23-1272269814-1431769822104-5250-11414181617-1512232-1-632-8412213-60-10617-120-12817-21-12236-97-10985011520-12512726-1311628-40224321241051192715-3630524625-56-628-2411538-8-1344-44-12211633722-25-11255782271073614-89-71829-11422921126-25-11613-3810614-94-16202-223039-3837-11523-105-72431227112-617-120-128205-121201082223471922-51-830-6212736-3512137-5041432723-60-127-22629-73-10201212121-37-1162566629-77-11313-1453764-1738-1947478610210-107-9231672367130-69-122674451930124391810943288221-251162356-12723-62-19248911113-84-452617239123-62421112514-15018-41-2719-106-1330112-1246-9-4517627938566441673711-77-2714113021-131530-122-131012925-65-11329451030106-51713-791926-12421-126-1524-106-4223-72-525-12012636-131038-115-825-5011330-63122427110845-81-12711242718602724-204227921419117181991320-11582472-11129-92-12341-10312245-24-11347-13125151151261641-72344-11824100116133618248862878-105281131311-49-1015113-126197-182357127235152693-12530-27532-108-1017412326-109-23649-7372411013-74-1824115-12324-128-123493-71169102021-82488-11627114-123744-11116-94724-121224-119-1231497-430-83-2841-71-4550-62-97142012221-11-525-122-126269612576111118-6011623541182412012314-35437-71-244251-10850-65-4117-1-12323-95-1326-84-1832-119-11157-112884129-871233092413-252151241128-9694094151414-1221481-1272990-1530-64-7148212715-2911221112522-3311713-78-5215371728-79-4141124515-74-1728-68-69301728373324305773210103742-11542101-713108-12218-72-11625-74-53048-126769515-45-11221-125-1252581713114122258
Why are the byte arrays not the same?

I think they are the same. Note that the first line of your output is the encoded bytes, and 114 is the decimal ASCII code for 'r', 79 is the decimal ASCII code for 'O', 48 is the decimal ASCII code for '0' etc., so your first and second lines match and are what I'd expect given your code.
Then when you've decoded the values, the bytes are being sign-extended to ints, so -84 is actually 256 - 84 = 172, -19 is 256 - 19 = 237, 0 is 0, etc., and if you base64 decode the second line using command line tools:
$ base64 -d < x.bin | od -tu1 | head -2
0000000 172 237 0 5 115 114 0 32 99 111 109 46 100 102 116 46
0000020 111 110 121 120 46 70 105 110 103 101 114 112 114 105 110 116
You'll see things match. So I think everything's fine, it's just how and what you're printing that makes things look wrong.

As far as I see you print out encoded, encodedString and decoded but not bytes. Can you check bytewise whether your arrays are equal or not?

Related

Java String Encoding : some characters are wrong

I have a String which is :
PRESIÓN MÁXIMA:
This string is in ISO-8859, I want to write it in an xml UTF-8 file :
I get its UTF-8 bytes values which seems good :
-61, -109 = C3 93 = Ó
-61, -127 = C3 81 = Á
But when I turn back this array of bytes into a String, the Ó is OK, but not the Á :
For some unknown reason the C3 81 become a C3 3F
There is something that I dont understand with encoding, at least I would expect both character to be wrong.
How can I fix / convert my String ?

str.replaceAll() not matching "\r\n"

I'm trying to convert Unix-style line-endings (LF) in a multi-line string to Windows-style (CR LF).
My plan of attack is to:
replace all CR LF instances with just LF
then replace all LF instances with CR LF
However, this snippet of code isn't matching the "\r\n":
String test = "test\r\ncase";
test.replaceAll("\r\n","\n");
PrintWriter testFile = new PrintWriter("test.txt");
testFile.print(test);
testFile.close();
I've already tried using double/triple/quadruple backslashes. No dice.
I also know that the test string doesn't contain a literal \r\n because it detects them as CR LF when printing to file.
What am I missing here?
You are not gettign the modified String from your code.
String are immutable so you need to save the returned value from replaceAll. There is no method that can change an instance of String
String test = "test\r\ncase";
//Print the character before
for(char c : test.toCharArray()){ System.out.print((int)c + " ");};
System.out.println();
//Save the replace result
test = test.replaceAll("\r\n","\n");
//Print the character after
for(char c : test.toCharArray()){ System.out.print((int)c + " ");};
Show that the test is first not changed then changed
116 101 115 116 13 10 99 97 115 101 //BEFORE
116 101 115 116 10 99 97 115 101 //AFTER

TCP/IP client incorrectly reading inputstream byte array

I'm creating a Java Client program that sends a command to server and server sends back an acknowledgement and a response string.
The response is sent back in this manner
client -> server : cmd_string
server -> client : ack_msg(06)
server -> client : response_msg
Client code
public static void readStream(InputStream in) {
byte[] messageByte = new byte[20];// assuming mug size -need to
// know eact msg size ?
boolean end = false;
String dataString = "";
int bytesRead = 0;
try {
DataInputStream in1 = new DataInputStream(in);
// while ctr==2 todo 2 streams
int ctr = 0;
while (ctr < 2) {//counter 2 if ACK if NAK ctr=1 todo
bytesRead = in1.read(messageByte);
if (bytesRead > -1) {
ctr++;
}
dataString += new String(messageByte, 0, bytesRead);
System.out.println("\ninput byte arr "+ctr);
for (byte b : messageByte) {
char c=(char)b;
System.out.print(" "+b);
}
}
System.out.println("MESSAGE: " + dataString + "\n bytesread " + bytesRead + " msg length "
+ dataString.length() + "\n");
char[] chars = dataString.toCharArray();
ArrayList<String> hex=new ArrayList<>();
// int[] msg ;
for (int i = 0; i < chars.length; i++) {
int val = (int) chars[i];
System.out.print(" " + val);
hex.add(String.format("%04x", val));
}
System.out.println("\n"+hex);
} catch (Exception e) {
e.printStackTrace();
}
// ===
}
Output
client Socket created ..
response:
input byte arr 1
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
input byte arr 2
2 -77 67 79 -77 48 -77 3 -116 0 0 0 0 0 0 0 0 0 0 0
MESSAGE: ##³CO³0³##
(where # is some not supported special character )
bytesread 9 msg length 10
dec: 6 2 179 67 79 179 48 179 3 338
hex: [0006, 0002, 00b3, 0043, 004f, 00b3, 0030, 00b3, 0003, 0152]
bytes: 2 -77 67 79 -77 48 -77 3 -116 0 0 0 0 0 0 0 0 0 0 0 (bytes recieved in 2nd packet)
connection closed
Problem: I'm reading the last value incorrect, I have verified using wireshark the server has sent back the response as 06 02 b3 43
4f b3 30 b3 03 8c
Some how I'm reading the last value in correctly. Is there some issue with the reading stream?
EDIT
Rest of the response is read correctly but the last character should be 8c But is read as 0152Hex
Response from server : 06 02 b3 43 4f b3 30 b3 03 8c
Read by program : [0006, 0002, 00b3, 0043, 004f, 00b3, 0030, 00b3, 0003, 0152]
issue with reading the last character
EDIT 2
Response is received as 2 packets/streams
packet 1 byte arr : 6 (ACK)
packet 2 byte arr: 2 -77 67 79 -77 48 -77 3 -116 (response)
complete response read by client
dec: 6 2 179 67 79 179 48 179 3 338
hex: [0006, 0002, 00b3, 0043, 004f, 00b3, 0030, 00b3, 0003, 0152]
Thanks
The problem in this question was a matter of signed variables versus unsigned variables. When you have a number in computer memory, it is represented by a bunch of bits, each of them 0 or 1. Bytes are generally 8 bits, shorts are 16 etc. In an unsigned number, 8 bits will get you from positive 0 to 255, but not to negative numbers.
This is where signed numbers come in. In a signed number, the first bit tells you whether the following bits represent a negative or positive value. So now you can use 8 bits to represent -128 to +127. (Notice that the positive range is halved, from 255 to 127, because you "sacrifice" half of your range to the negative numbers).
So now what happens if you convert signed to unsigned? Depending on how you do it, things can go wrong. In the problem above, the code char c=(char)b; was converting a signed byte to an unsigned char. The proper way to do this is to "make your byte unsigned" before converting it to a char. You can do that like this: char c=(char)(b&0xFF); more info on casting a byte here.
Essentially, you can just remember that except for char, all java numbers are signed, and all you need to do is paste the &0xFF to make it work for a byte, 0xFFFF to make it work for a short, etc.
The details about why this works are as follows. Calling & means a bitwise and, and 0xFF is hexadecimal for 255. 255 is above the limit of a signed byte (127), so the number b&0xFF gets upgraded to a short by java. However, the short signed bit is on bit 16, while the byte signed bit is on bit 8. So now the byte signed bit becomes a normal 'data' bit in the short, and so the sign of your byte is essentially discarded.
If you do a normal cast, java recognizes that doing direct bitconversion like above would mean that you lose the sign, and java expects you don't like that (at least, that is my assumption), so it preserves the sign for you. So if this isn't what you want, you explicitly tell java what to do.

Issue with getBytes() for accented charaters

I'm trying to convert a string with special characters like É into a string with UTF-8 encoding. I tried doing this:
String str = "MARIE-HÉLÈNE";
byte sByte[] = str.getBytes("UTF-8");
str = new String(sByte,"UTF-8");
The problem is, when I do "É".getBytes("UTF-8"), I get 63 which is interpreted as '?' when it's being converted to a new string. How can I fix this issue?
EDIT: I also noticed that this issue was not reproducible on Eclipse, probably because the text file encoding is usually set to UTF-8.
I tried doing byte[] str = "MARIE-HÉLÈNE".getBytes("UTF-8") in http://www.javarepl.com/console.html and got the result byte[] str = [77, 65, 82, 73, 69, 45, 72, 63, 76, 63, 78, 69]
This kind of error happens when information about the encoding of the source file is not given to the compiler (javac) properly. If the encoding of your source file is UTF-8, compile the file like the following.
javac -encoding UTF-8 E.java
The following is another example for the case where the encoding of the source file is UTF-16 Big Endian.
javac -encoding UTF-16BE E.java
I've already confirmed that the program below properly shows "0xC3 0x89". So, there is no problem in your code.
public class E
{
public static void main(String[] args) throws Exception
{
byte[] bytes = "É".getBytes("UTF-8");
for (int i = 0; i < bytes.length; ++i)
{
System.out.format("0x%02X ", (byte)(bytes[i]));
}
System.out.println();
}
}
"É".getBytes("UTF-8") returns a byte[] of 2 bytes: c3 89.
"MARIE-HÉLÈNE" is 4d 41 52 49 45 2d 48 c3 89 4c c3 88 4e 45.
4d 41 52 49 45 2d 48 c3 89 4c c3 88 4e 45
M A R I E - H É L È N E
Converting the bytes back using new String(bytes,"UTF-8") will restore the original string.

How to transplant the java MD5 encrypt code into Python?

Here is the standard code that java use to do the MD5 encryption for a string,
import java.security.MessageDigest;
public class TransCode{
public static byte[] transcode(String text) throws Exception{
byte[] bytes = text.getBytes("UTF-8");
return bytes;
}
public static void main(String[] args){
try{
System.out.println("ORIGIN STRING: hello world");
byte[] byteArray = TransCode.transcode("hello world");
int arrLen = byteArray.length;
MessageDigest messageDigest = MessageDigest.getInstance("MD5");
messageDigest.update(byteArray);
System.out.print("BYTE ARRAY: ");
for (int i=0;i<arrLen;i++){
System.out.print(byteArray[i]);
System.out.print(" ");
}
System.out.println();
byteArray = messageDigest.digest();
System.out.print("MD5 RESULT: ");
for (int i=0;i<arrLen;i++){
System.out.print(byteArray[i]);
System.out.print(" ");
}
System.out.print("\n");
}
catch(Exception ex){
System.out.println(ex);
}
}
}
The result is as follows,
ORIGIN STRING: hello world
BYTE ARRAY: 104 101 108 108 111 32 119 111 114 108 100
MD5 RESULT: 94 -74 59 -69 -32 30 -18 -48 -109 -53 34
so how to transplant this standard java code into python and get the same result, here is the code i wrote that has bugs:
# -*- coding: utf-8 -*-
from hashlib import md5
def MD5Digest(text):
byteList = []
for item in text:
md=md5()
unit = str(ord(item))
print unit,
md.update(unit)
byteList.append(md.hexdigest())
print
return byteList
print MD5Digest(u"hello world".encode("UTF8"))
Result is as follows,
104 101 108 108 111 32 119 111 114 108 100
['c9e1074f5b3f9fc8ea15d152add07294', '38b3eff8baf56627478ec76a704e9b52', 'a3c65c2974270fd093ee8a9bf8ae7d0b', 'a3c65c2974270fd093ee8a9bf8ae7d0b', '698d51a19d8a121ce581499d7b701668', '6364d3f0f495b6ab9dcf8d3b5c6e0b01', '07e1cd7dca89a1678042477183b7ac3f', '698d51a19d8a121ce581499d7b701668', '5fd0b37cd7dbbb00f97ba6ce92bf5add', 'a3c65c2974270fd093ee8a9bf8ae7d0b', 'f899139df5e1059396431415e770c6dd']
You can see although the byte code results seem to be the same, the MD5 encrypt results are totally different.
PLEASE help me to rewrite my python code to debug, thanks a lot.
Due to the project reason, the solution to just reference this piece of java code in my python project is not actually allowed, please understand.
There are a few issues, first you calculate the hexdigest instead of the digest. Second you calculate the hexdigest inside the loop instead of outside.
Instead it should look like this:
# Python 3
from hashlib import md5
def MD5Digest(text):
digest = md5(text).digest()
print('ORIGIN STRING:', text)
print('BYTE ARRAY:', *text)
print('MD5 RESULT:', *digest)
return digest
# Python 2
from hashlib import md5
def MD5Digest(text):
digest = md5(text).digest()
print 'ORIGIN STRING:', text
print 'BYTE ARRAY:', " ".join(str(ord(c)) for c in text)
print 'MD5 RESULT:', " ".join(str(ord(c)) for c in digest)
return digest
This outputs
>>> MD5Digest(b'hello world')
ORIGIN STRING: b'hello world'
BYTE ARRAY: 104 101 108 108 111 32 119 111 114 108 100
MD5 RESULT: 94 182 59 187 224 30 238 208 147 203 34 187 143 90 205 195
The only difference is the signage, just remember the numbers are equal mod 256.

Categories

Resources