Encoding problems in Android Application (WebView.LoadData()) - java

I'm having a problem encoding a part of a webpage in my Android-application. What I've got is a application collecting part of a webpage and displaying this to a user. For this question lets say that I've got a webpage with a text and below the text a table and below the table a lot of junk I'm not interested in. So I'm chosing what to view using the position of the first element (for example a unique tag) and a end position (same there, something unique. Using a inputstreamreader with a start/end position.
Then in my string ("string") I run:
String s = Uri.encode(string);
The string s is then used accordingly:
web.loadData(s, "text/html","ISO-8859-1");
But this gives me some unwanted chars in the middle of the text: "Â" appears. I've tried to in the string run .replace("Â", ""); but this doesn't solve the problem.
I've also tried following:
web.loadData(s, "text/html", "UTF-8");
web.loadData(s,"text/html;utf-8",null);
But the "Â" and one or two "*" still appears?
Been searching the web and found the: loadDataWithBaseUrlbut this doesn't solve it either so I would very much like som assistence :)
On the top of the page:
<html xmlns="http://www.w3.org/1999/xhtml" lang="sv-se" dir="ltr">
On another page:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-us" dir="ltr">
So I've got one english and one swedish page but the error is regarding both url:s.
Best regards!

use this:
webview.loadData(html_content, "text/html; charset=utf-8", "utf-8");
I tested it, and it works.

This code worked for me.
String base64EncodedString = null;
try {
base64EncodedString = android.util.Base64.encodeToString((preString+mailContent.getBody()+postString).getBytes("UTF-8"), android.util.Base64.DEFAULT);
} catch (UnsupportedEncodingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
if(base64EncodedString != null)
{
wvMailContent.loadData(base64EncodedString, "text/html; charset=utf-8", "base64");
}
else
{
wvMailContent.loadData(preString+mailContent.getBody()+postString, "text/html; charset=utf-8", "utf-8");
}

Related

From Angular Actual parameter value is "Ébénisterie" but in JAVA getting value "Ã?bénisterie"

From Angular, there is one parameter and the value of that parameter is Ébénisterie but when I print the value of that variable in java then I got Ã?bénisterie can you please let me know how I can convert it to original text Ébénisterie? Which Encode/decode I have to apply?
I have tried the following thing.
new String(readable.getBytes("ISO-8859-15"), "UTF-8");
new String(readable.getBytes("UTF-8"), "ISO-8859-15");
but it's not working.
String readable ="�bénisterie Distinction";
String test = null;
try {
test = new String(readable.getBytes("ISO-8859-15"), "UTF-8");
System.out.println("test"+test);
} catch (UnsupportedEncodingException e) {
}
Expected: Ébénisterie
Actual: �bénisterie
After long research didn't find anything.
So got one solution in mind that BASE64 Encode decode so now from Angularjs sending encoded text and In java side, I have decoded the text.
Here, is the sample code
Angularjs
window.btoa("Ébénisterie")
JAVA
String actualString= new String(Base64.getDecoder().decode("ENCODED STRING"));

How to find the specific attribute of a tag inside a script using Java?

I've been looking here in stackoverflow how can I search for a String part inside big texts. But I haven't managed to find how to get an specific value of an attribute inside a Script using Java. The goal is read a file (script) line by line, and extract the value of an attribute "src".
For instance, the file has many lines containing this structure:
<script src="js/vendor/modernizr-2.6.2.min.js"></script>
<script data-main="js/" src="js/require.min.js"></script>
<script data-main="js/" src="js/main.js"></script>
<script src="js/vendor/modernizr-2.6.2.min.js"></script>
<script data-main="js/" src="js/require.min.js"></script>
So, using Java I read the file this way using BufferedReader class, I want to get for each line the value of "src", for example, for the first line, I want to get: js/vendor/modernizr-2.6.2.min.js, for the second line, I want to get js/require.min.js and so on, I saw some suggestions like using regex, but I don't know if it is the most effective in this cases:
public Helper(String scriptPath) {
File scriptFile = null;
try {
scriptFile = new File(scriptPath);
String relativePath = scriptFile.getParent();
System.out.println(relativePath);
BufferedReader reader = new BufferedReader(new FileReader(scriptFile));
String readLine;
while ((readLine = reader.readLine()) != null) {
// How to match the src?
}
reader.close();
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Please, if somebody could help me I will really appretiate it or if someone knows that there's already an answer for this, please let me know in order to close this, but at the time I've been searching, I haven't found this kind of problem yet.
Thank you very much in advance.
Your file looks like html I would consider using an Html Parser.
http://jsoup.org/ is very easy to use with css like selectors

loadDataWithBaseURL does not load images which are part of html page

I am battling with this issue for a long time, dug around Google and SO but still no luck. Finally, I am out here to get your help, please advise or help.
My problem with the following source code is that it only displays string content the images do not display instead it shows white rectangle or sometimes a blue image with question mark.
Q: How to display images?
Here is my code:
private void openURL() {
DefaultHttpClient client = new HttpsRequest(getApplicationContext());
HttpGet get = new HttpGet(getUrlField().getText().toString());
// Execute the GET call and obtain the response
HttpResponse getResponse;
try {
getResponse = client.execute(get);
HttpEntity responseEntity = getResponse.getEntity();
String content = EntityUtils.toString(responseEntity);
getWebView().loadDataWithBaseURL(null, content, "text/html", "utf-8",
null);
} catch (ClientProtocolException e) {
WSLog.e(THIS_FILE, "HTTP Error.");
e.printStackTrace();
} catch (IOException e) {
WSLog.e(THIS_FILE, "Url Load Error.");
e.printStackTrace();
}
}
webView output:
With the help of Vasarat and few modification from my side helped me to answer this question. I have modified the following code line as
getWebView().loadDataWithBaseURL("http://mywebSite.com/parent_dir_to_iamges/", content, "text/html", "UTF-8","about:blank");
This modification gave me the perfect output as expected.
Please fallow the comments to understand details about the issue.
Note: I have used http in the base url instead https.....Please let me know if I can use https. webview.Loadurl() with https url works fine if API level is 10 or above but it shows blank page for API level 8.

How do I get parsed HTML special characters using JSOUP

I am using JSoup to get the H1 tag value from a webpage, this tag contains the following HTML.
Hexyl β-D-glucopyranoside
When I use the .text() method I get the following. (Note the ?) I assume this is because it cannot work out the HTML for the "β" character. How do I get this value as rendered on a webpage.
Hexyl ?-D-glucopyranoside
Do I need to do some kind of conversion after I have picked up the text I want?
Here is my code.
String check = "<title>Hexyl β-D-glucopyranoside ≥98.0% (TLC) | ≥ ≥</title>";
Document doc3 = Jsoup.parse(check);
doc3.outputSettings().escapeMode(Entities.EscapeMode.base); // default
doc3.outputSettings().charset("UTF-8");
System.out.println("UTF-8: " + doc3.html());
//doc3.outputSettings().charset("ISO 8859-1");
doc3.outputSettings().charset("ASCII");
System.out.println("ASCII: " + doc3.html());`
-----Output at console-----
UTF-8: <html>
<head>
<title>Hexyl ?-D-glucopyranoside ?98.0% (TLC) | ? ? </title>
</head>
<body></body>
</html>
ASCII: <html>
<head>
<title>Hexyl β-D-glucopyranoside ≥98.0% (TLC) | ≥ ≥</title>
</head>
<body></body>
</html>
Looks like the IDE you're using is using the wrong character encoding.
It's nothing to do with your code as I've ran it and it's fine (outputs the weird characters). If you're using Eclipse go to the run configuration settings for that particular project and click the 'common' tab then choose UTF-8.
It's too late to set charset after parsing a document. I had the same problem once, tried to do it your way and failed miserably.
This worked for me:
String url = "url to html page";
InputStream is is =new URL(url).openStream();
org.jsoup.nodes.Document doc = org.jsoup.Jsoup.parse(is , "ISO-8859-2", url);
If I have html text only as string, I convert it to InputString first (http://www.kodejava.org/examples/265.html)
InputStream is = new ByteArrayInputStream(text.getBytes("UTF-8"));
then read it with correct charset:
BufferedReaderr = new BufferedReader(new InputStreamReader(is, "UTF-8"), 4*1024);
StringBuilder total = new StringBuilder();
String line = "";
while ((line = r.readLine()) != null) {
total.append(line);
}
r.close();
is.close();
String html = total.toString();
...and parse:
doc = org.jsoup.Jsoup.parse(html);
The important thing is to somehow get InputStream object and from here there're ways to use your desired charset with it. Maybe it can be done in a more strightforward way. But it works.

JSP/Servlet : How to Read an image via URL and render it on a JSP page (image URL is not public)

what's the best approach to read an image via URL and render it on a JSP page?
so far, I've coded two JSP pages.
EDIT START:
*Experimental: Obviously the ImageServ will be a servlet, not a jsp.
EDIT END:
index.jsp
<%page ....
<html>
......
<img src="ImageServ.jsp?url=http://serveripaddress/folder/image.jpg" />
.....
ImageServ.jsp
<%#page import="javax.imageio.ImageIO"%>
<%#page import="java.net.URL"%>
<%#page import="java.io.*, java.awt.*, java.awt.image.*,com.sun.image.codec.jpeg.*" %>
<%
try {
String urlStr = "";
if(request.getParameter("url") != null)
{
urlStr = request.getParameter("url");
URL url = new URL(urlStr);
BufferedImage img = null;
try{
img = ImageIO.read(url);
out.println(" READ SUCCESS" + "<br>");
}catch(Exception e) {
out.println("READ ERROR " + "<br>");
e.printStackTrace(new PrintWriter(out));
}
try {
response.setContentType("image/jpeg");
JPEGImageEncoder encoder = JPEGCodec.createJPEGEncoder(response.getOutputStream());
encoder.encode(img);
}catch(Exception ee) {
response.setContentType("text/html");
out.println("ENCODING ERROR " + "<br>");
ee.printStackTrace(new PrintWriter(out));
}
}
} catch (Exception e) {
e.printStackTrace(new PrintWriter(out));
}
%>
But this doesn't seem to be working:
all the time i see this error:
READ SUCCESS
ENCODING ERROR
java.io.IOException: reading encoded JPEG Stream
at sun.awt.image.codec.JPEGImageEncoderImpl.writeJPEGStream(Native Method)
at sun.awt.image.codec.JPEGImageEncoderImpl.encode(JPEGImageEncoderImpl.java:476)
at sun.awt.image.codec.JPEGImageEncoderImpl.encode(JPEGImageEncoderImpl.java:228)
Any ideas on how to get this working???
Your image data is already encoded so you can simply write it: ImageIO.write(img, "jpeg", response.getOutputStream());. You don't need to (and can't) use JPEGImageEncoder.
Classic question. Here's an example: http://www.exampledepot.com/egs/javax.servlet/GetImage.html
Also, don't do all that coding in a JSP - keep that for front-end rendering coding only; do the Java coding in a backend class.
Terrible and awful code. NEVER EVER write controller logic in a JSP that's why I have JSP to the guts. You cannot write binary data to a JSP output stream. The stream has already been initialized for text output. Put your logic in a servlet and pipe the input stream to the response output stream with Commons IO. This will work. If you still insist on that crappy solution, you will need to write a filter which completely wraps the response and serves binary data instead. See this for reference and examine its code. Good luck.
Edit:
doGet(...) {
response.setContentType("image/jpeg");
String url = request.getParameter("url");
...
InputStream is = ....getInputStream();
IOUtils.copy(is, response.getOutputStream());
// cleanup
} // done
This is how I pipe PDF from local disk but there is no difference to serving from a URL.

Categories

Resources