I am using a HTML parser called Jsoup, to load and parse HTML files. The problem is that the webpage I'm scraping is encoded in ISO-8859-1 charset while Android is using UTF-8 encoding(?). This is results in some characters showing up as question marks.
So now I guess I should convert the string to UTF-8 format.
Now I have found this Class called CharsetEncoder in the Android SDK, which I guess could help me. But I can't figure out how to implement it in practice, so I wonder if could get som help with by a practical example.
UPDATE: Code to read data (Jsoup)
url = new URL("http://www.example.com");
Document doc = Jsoup.parse(url, 4000);
You can let Android do the work for you by reading the page into a byte[] and then using the jSoup methods for parsing String objects.
Don't forget to specify the encoding when you create the string from the data read from the server using the correct String constructor.
Byte encodings and Strings
public static void main(String[] args) {
System.out.println(System.getProperty("file.encoding"));
String original = new String("A" + "\u00ea" + "\u00f1"
+ "\u00fc" + "C");
System.out.println("original = " + original);
System.out.println();
try {
byte[] utf8Bytes = original.getBytes("UTF8");
byte[] defaultBytes = original.getBytes();
String roundTrip = new String(utf8Bytes, "UTF8");
System.out.println("roundTrip = " + roundTrip);
System.out.println();
printBytes(utf8Bytes, "utf8Bytes");
System.out.println();
printBytes(defaultBytes, "defaultBytes");
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
} // main
Related
String encodedInputText = URLEncoder.encode("input=" + question, "UTF-8");
urlStr = Parameters.getWebserviceURL();
URL url = new URL(urlStr + encodedInputText + "&sku=" + sku);
BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
jsonOutput = in.readLine();
in.close();
The problem is that the returned JSON string contains all unicodes like
"question":"\u51e0\u5339\u7684",
Not the actual Chinese characters. The "UTF-8" should solve the problem. Why doesn't it?
EDIT:
ObjectMapper mapper = new ObjectMapper();
ResponseList = responseList = mapper.readValue(jsonOutput, ResponseList.class);
This is not problem of encoding, it is problem your data source. Encoding comes into play when you convert bytes into string. You expect encoding to convert string in form of \uxxxx into another string which is not going to happen.
The whole point is, that the source of data is serializing data this way so your raw data is gone and is replaced with \uxxxx.
Now you would have to manualy capture \uxxx sequences and convert that to actual characters.
Firstly, I know very little about Java. I'm taking part in a Capture the Flag and need to deserialize some Base64 encoded cookie information to alter some parameter values to progress further with the challenge.
I have reviewed many other questions on SO about this error message, but my limited Java knowledge (having only ever written Java code for the purpose of answering this challenge question) means I need some expertise to help me quickly understand.
I have the following code to decode the base64 cookie value and then attempt to deserialize.
import java.util.Base64;
import java.util.UUID;
import java.io.*;
public class decode {
public static void main(String args[]){
try {
// get the encoded string and print for reference
String base64encodedString = "mybase64encodedstring==";
System.out.println("Base64 encoded string : " + base64encodedString + "\n");
// base64 decode and print
byte[] base64decodedBytes = Base64.getDecoder().decode(base64encodedString);
System.out.println("Base64 decoded string : " + new String(base64decodedBytes, "utf-8") + "\n");
// deserialize
InputStream in = new ByteArrayInputStream(base64decodedBytes);
ObjectInputStream obin = new ObjectInputStream(in);
Object object = obin.readObject();
obin.close();
System.out.println("Object class is " + object.getClass().toString());
}catch(UnsupportedEncodingException e){
System.out.println("UnsupportedEncodingException :" + e.getMessage());
return;
}catch(IOException e) {
System.out.println("IOException :" + e.getMessage());
return;
}catch(ClassNotFoundException e) {
System.out.println("Class not found");
e.printStackTrace();
return;
}
}
}
Note, I've not included my actual base64 encoded string in fear of others on the challenge googling for it and finding the answer they may also need :)
The output of the base64 decoded string is something like the following format:
Base64 decoded string : ��srcom.myapp.name.model.User��x�g�e�Iidis_adminpasswordtLjava/lang/String;L password2q~usernameq~xpt<somepasswordstring>
bsmith
And the error I get is:
java.lang.ClassNotFoundException: com.myapp.name.model.User
I understand that this relates to something about com.myapp.name.model.User not being in a ClassPath? I've no idea what that means and what I need to do in order to deserialize this data.
Any help on this appreciated!
I am trying to send a POST request from a C# program to my java server.
I send the request together with an json object.
I recive the request on the server and can read what is sent using the following java code:
BufferedReader reader = new BufferedReader(new InputStreamReader(conn.getInputStream()));
OutputStream out = conn.getOutputStream();
String line = reader.readLine();
String contentLengthString = "Content-Length: ";
int contentLength = 0;
while(line.length() > 0){
if(line.startsWith(contentLengthString))
contentLength = Integer.parseInt(line.substring(contentLengthString.length()));
line = reader.readLine();
}
char[] temp = new char[contentLength];
reader.read(temp);
String s = new String(temp);
The string s is now the representation of the json object that i sent from the C# client. However, some characters are now messed up.
Original json object:
{"key1":"value1","key2":"value2","key3":"value3"}
recived string:
%7b%22key1%22%3a%22value1%22%2c%22key2%22%3a%22value2%22%2c%22key3%22%3a%22value3%22%%7d
So my question is: How do I convert the recived string so it looks like the original one?
Seems like URL Encoded so why not use java.net.URLDecoder
String s = java.net.URLDecoder.decode(new String(temp), StandardCharsets.UTF_8);
This is assuming the Charset is in fact UTF-8
Those appear the be URL encoded, so I'd use URLDecoder, like so
String in = "%7b%22key1%22%3a%22value1%22%2c%22key2"
+ "%22%3a%22value2%22%2c%22key3%22%3a%22value3%22%7d";
try {
String out = URLDecoder.decode(in, "UTF-8");
System.out.println(out);
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
Note you seemed to have an extra percent in your example, because the above prints
{"key1":"value1","key2":"value2","key3":"value3"}
I realise this is probably more of a general java question, but since it's running in Notes\ Domino environment, thought I'd check that community first.
Summary:
I don't seem to be able to decode the string: dABlAHMAdAA= using lotus.domino.axis.encoding.Base64 or sun.misc.BASE64Decoder
I know the original text is: test
I confirmed by decoding at http://www5.rptea.com/base64/ it appears it is UTF-16.
As simple test, using either of below:
String s_base64 = "dABlAHMAdAA=";
byte[] byte_base64 = null;
String s_decoded = "";
byte_base64 = new sun.misc.BASE64Decoder().decodeBuffer(s_base64);
s_decoded = new String(byte_base64, "UTF-16");
System.out.println("Test1: " + s_decoded);
byte_base64 = lotus.domino.axis.encoding.Base64.decode(s_base64);
s_decoded = new String(byte_base64, "UTF-16");
System.out.println("Test2: " + s_decoded);
System.out.println("========= FINISH.");
I get the output:
Test1: ????
Test2: ????
If I create String as UTF-8
s_decoded = new String(byte_base64, "UTF-8");
it outputs:
t
no error is thrown, but it doesn't complete the code, doesn't get to the "FINISH".
Detail
I'm accessing an asmx web service, in the SOAP response, some nodes contain base64 encoded data. At this point in time, there is no way to get the service changed, so I am having to XPath and decode myself. Encoded data is either text or html. If I pass the encoded data thru http://www5.rptea.com/base64/ and select UTF-16, it decodes correctly, so I must be doing something incorrectly.
As side note, I encoded "test":
s_base64 = lotus.domino.axis.encoding.Base64.encode(s_text.getBytes());
System.out.println("test1 encodes to: " + s_base64);
s_base64 = new sun.misc.BASE64Encoder().encode(s_text.getBytes());
System.out.println("test2 encodes to: " + s_base64);
they both encode to:
dGVzdA==
...which if you then feed into 2 decoders above, as expected, decodes correctly.
If I go to site above, and encode "test" as UTF-16, I get: dABlAHMAdAA= so that confirms that data is in UTF-16.
It's like the data is genuine base64 data, but the decoder doesn't recognise it as such. I'm slightly stumped at the moment.
Any pointers or comments would be gratefully received.
The string has been encoded in UTF-16LE (little-endian), where the least significant byte is stored first. Java defaults to big-endian. You need to use:
s_decoded = new String(byte_base64, "UTF-16LE");
i have used your sample "dABlAHMAdAA=" on my base64 decode online tool and it seems like you are missing the Apache base64 jar files
Click the link below.
http://www.hosting4free.info/Base64Decode/Base64-Decode.jsp
The code behind the website is
import org.apache.commons.codec.binary.Base64;
public class base64decode
{
public static void main(String[] args) throws UnsupportedEncodingException
{
byte[] decoded = Base64.decodeBase64("YWJjZGVmZw==".getBytes());
System.out.println(new String(decoded) + "\n");
}
}
I am trying to use the Base64 encoding functionality from Apache Common. But I seem to get the "wrong" result. The doc is here.
My code is like this:
import org.apache.commons.codec.binary.Base64;
String data = "hi,all,how can this happen?";
byte[] databytes = Base64.encodeBase64(data.getBytes());
data = databytes.toString();
System.out.println(data);
//the result is:
//[B#121cc40
However I encode the same string using Python, the code is:
import base64
print base64.b64encode("hi,all,how can this happen?")
#The result is aGksYWxsLGhvdyBjYW4gdGhpcyBoYXBwZW4/
How can there be such difference?
That's not how you print a byte (or any other type of) array in Java, if you want to view its contents.
This is:
System.out.println(Arrays.toString(data));
import org.apache.commons.codec.binary.Base64;
public class Codec {
public static void main(String[] args) {
try {
String data = "hi,all,how can this happen?";
String encodedText;
// Base64
encodedText = new String(Base64.encodeBase64(data.getBytes()));
System.out.println("Encoded: " + encodedText);
System.out.println("Decoded:"
+ new String(Base64.decodeBase64(encodedText.getBytes())));
}
catch (Exception e) {
e.printStackTrace();
}
}
}
Now it will encode your String data
You are printing the address.
If you want to print the String data, you can use,
String password = new String(databytes);
System.out.println("Encoded String "+ password)