App engine Url request utf-8 characters becoming '??' or '???' - java

I have an error where I am loading data from a web-service into the datastore. The problem is that the XML returned from the web-service has UTF-8 characters and app engine is not interpreting them correctly. It renders them as ??.
I'm fairly sure I've tracked this down to the URL Fetch request. The basic flow is: Task queue -> fetch the web-service data -> put data into datastore so it definitely has nothing to do with request or response encoding of the main site.
I put log messages before and after Apache Digester to see if that was the cause, but determined it was not. This is what I saw in logs:
string from the XML: "Doppelg��nger"
After digester processed: "Doppelg??nger"
Here is my url fetching code:
public static String getUrl(String pageUrl) {
StringBuilder data = new StringBuilder();
log.info("Requesting: " + pageUrl);
for(int i = 0; i < 5; i++) {
try {
URL url = new URL(pageUrl);
URLConnection connection = url.openConnection();
connection.connect();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
while ((line = reader.readLine()) != null) {
data.append(line);
}
reader.close();
break;
} catch (Exception e) {
log.warn("Failed to load page: " + pageUrl, e);
}
}
String resp = data.toString();
if(resp.isEmpty()) {
return null;
}
return resp;
Is there a way I can force this to recognize the input as UTF-8. I tested the page I am loading and the W3c validator recognized it as valid utf-8.
The issue is only on app engine servers, it works fine in the development server.
Thanks

try
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream(), "UTF-8"));

I was drawn into the same issue 3 months back Mike. It does look like and I would assume your problems are same.
Let me recollect and put it down here. Feel free to add if I miss something.
My set up was Tomcat and struts.
And the way I resolved it was through correct configs in Tomcat.
Basically it has to support the UTF-8 character there itself. useBodyEncodingForURI in the connector. this is for GET params
Plus you can use a filter for POST params.
A good resource where yu can find all this in one roof is Click here!
I had a problem in the production thereafter where I had apache webserver redirecting request to tomcat :). Similarly have to enable UTF-8 there too. The moral of the story resolve the problem as it comes :)

Related

JAVA Delete API with Array String Body

Sorry in advance for my googled english,
I work with an API and I make a JAVA software that allows to use it.
I need to make a DELETE and the software.
I have to perform a deletion, and with the supplied software to test the API, I am shown that I have to add the line in a body to remove it, like this :
["email","Termine","13/03/2018 09:52:20",etc...,""].
The body must contain a String Array with all the contents of the line to delete.
I can make it work in the test software.
However I can not understand how to make a DELETE with JAVA. I can make it work in the software test. That's what I did for now:
public static String delete(String json, String nomUrl) throws IOException {
URL url = new URL(baseUrl + "survey/"+ nomUrl + "/data");
//String json = "[\"Marc#Houdijk.nl\",\"Contacte\",\"10/04/2018 11:30:05\",\"Avoriaz\",\"Office de Tourisme\",\"Accueil OT\",\"Neerlandais\",\"Semaine 6\",\"Periode 2\",\"16\",\"\",\"Hiver 2018\",\"BJBR-CDQB\",\"04/12/2018 14:15:13\",\"04/12/2018 14:15:13\",\"04/12/2018 14:15:13\",\"\",\"Direct\",\"\",\"\",\"\"]\n";
HttpURLConnection con = (HttpURLConnection) url.openConnection();
con.setRequestMethod("DELETE");
con.setRequestProperty("Content-Type","application/json");
con.setRequestProperty("Accept","application/json");
con.setRequestProperty("Authorization","Bearer "+token);
con.setDoOutput(true);
DataOutputStream wr = new DataOutputStream(con.getOutputStream());
wr.writeBytes(json);
wr.flush();
wr.close();
int responseCode = con.getResponseCode();
StringBuilder responce = new StringBuilder();
responce.append("\\nSending 'DELETE' request to URL : ").append(url);
responce.append("\nResponse Code : ").append(responseCode);
BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
responce.append("\n").append(inputLine);
}
in.close();
return responce.toString();
}
I was inspired by what I did for the post and the get. But I do not see how to add a body correctly with my String Array to my delete function because it doesn't work, and the internet did not help me ...
Thank you in advance for your help !
EDIT : Finally, my code works. So if you want to delete with body, you can use this code. However, the problem comes from the json: I'm french, so was some accents on my words and special characters. After cleaning my string, everythings works.
EDIT : Finally, my code works. So if you want to delete with body, you can use this code. However, the problem comes from the json: I'm french, so was some accents on my words and special characters. After cleaning my string, everythings works.
You can create a POJO class with the fields required by RequestBody and send it to API, by Serializing the Object (Serialization means converting Java Objects into JSON and this can be done via GSON library). on API side you can easily get the ArrayList or whatever you want, just need to create same POJO class on server side as well, RequestBody will deserialize this JSON into Appropriate class, now via object of the class you can get whatever variables you want. Hope this helps.

Using Alchemy Entity Extraction to retrieve JSON output

I am running the EntityTest.java file from the Alchemy API Java SDK which can be found here. The programs works just fine, but it seems there is no way to change output format to JSON.
I have tried executing this code-
// Create an AlchemyAPI object.
AlchemyAPI alchemyObj = AlchemyAPI.GetInstanceFromFile("api_key.txt");
// Force the output type to be JSON
AlchemyAPI_NamedEntityParams params = new AlchemyAPI_NamedEntityParams();
params.setOutputMode("json");
// Extract a ranked list of named entities for a web URL.
Document doc = alchemyObj.URLGetRankedNamedEntities("http://www.techcrunch.com/", params);
System.out.println(getStringFromDocument(doc));
But the code throws a RunTimeException, and prints the following on console-
Exception in thread "main" java.lang.RuntimeException: Invalid setting json for parameter outputMode
at com.alchemyapi.api.AlchemyAPI_Params.setOutputMode(AlchemyAPI_Params.java:42)
at com.alchemyapi.test.EntityTest.main(EntityTest.java:29)
Also, here is the setOutputCode method from AlchemyAPI_Params.java file-
public void setOutputMode(String outputMode) {
if( !outputMode.equals(AlchemyAPI_Params.OUTPUT_XML) && !outputMode.equals(OUTPUT_RDF) )
{
throw new RuntimeException("Invalid setting " + outputMode + " for parameter outputMode");
}
this.outputMode = outputMode;
}
As is evident from the code, it seems that the only 2 acceptable output formats are XML and RDF. Is that so?? Is there no way the get the output in JSON?
Can anybody please help me out regarding that??
You will need to add new constant : OUTPUT_JSON in AlchemyAPI_Params and modify the setOutputMode method to accept it.
After that in AlchemyAPI :
You will need to modify the doRequest method with a the new OUTPUT_JSON case.
You can use :
http://www.oracle.com/technetwork/articles/java/json-1973242.html
to create the new content.
Hope it help
I solved the problem by resorting to a completely different approach. Instead of using the already available Java SDK, I made an HTTP connection to the endpoint of URLGetRankedNamedEntities API, and retrieved the response.
Here is a code sample that demonstrates how to do this-
URL urlObj = new URL("http://access.alchemyapi.com/calls/url/URLGetRankedNamedEntities?apikey=" + API_KEY_HERE + "&url=http://www.smashingmagazine.com/2015/04/08/web-scraping-with-nodejs/&outputMode=json");
System.out.println(urlObj.toString() + "\n");
URLConnection connection = urlObj.openConnection();
connection.connect();
BufferedReader reader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line;
StringBuilder builder = new StringBuilder();
while ((line = reader.readLine()) != null) {
builder.append(line + "\n");
}
System.out.println(builder);
Similar endpoints are avaliable for other APIs as well, which can found here.

How to display captcha in ImageView in Android.?

I have a PNR Inquiry app on Google Play. It was working very fine. But recently Indian Railwys added captcha to their PNR Inquiry section and because of this I am not able to pass proper data to the server to get proper response. How to add this captcha in my app in form of an imageview and ask the users to enter captcha details also so that I can send proper data and get proper response.
Indian Railways PNR Inquiry Link
If you check the html code, its actualy pretty bad captcha.
Background of captcha is: http://www.indianrail.gov.in/1.jpg
Those numbers are actualy in input tag:
<input name="lccp_cap_val" value="14167" id="txtCaptcha" type="hidden">
What they are doing is, via javascript, use numbers from that hidden input tag
and put them on that span with "captcha" background.
So basicaly your flow is:
read their html
get "captcha" (lol, funny captcha though) value from input field
when user puts data in your PNR field and presses Get Status
post form field, put PNR in proper value, put captcha in proper value
parse response
Oh yeah, one more thing. You can put any value in hidden input and "captcha"
input, as long as they are the same. They aren't checking it via session or
anything.
EDIT (code sample for submiting form):
To simplify posting form i recommend HttpClient components from Apache:
http://hc.apache.org/downloads.cgi
Lets say you downloaded HttpClient 4.3.1. Include client, core and mime
libraries in your project (copy to libs folder, right click on project,
properties, Java Build Path, Libraries, Add Jars -> add those 3.).
Code example would be:
private static final String FORM_TARGET = "http://www.indianrail.gov.in/cgi_bin/inet_pnstat_cgi.cgi";
private static final String INPUT_PNR = "lccp_pnrno1";
private static final String INPUT_CAPTCHA = "lccp_capinp_val";
private static final String INPUT_CAPTCHA_HIDDEN = "lccp_cap_val";
private void getHtml(String userPnr) {
MultipartEntityBuilder builder = MultipartEntityBuilder.create();
builder.addTextBody(INPUT_PNR, userPnr); // users PNR code
builder.addTextBody(INPUT_CAPTCHA, "123456");
builder.addTextBody("submit", "Get Status");
builder.addTextBody(INPUT_CAPTCHA_HIDDEN, "123456"); // values don't
// matter as
// long as they
// are the same
HttpEntity entity = builder.build();
HttpPost httpPost = new HttpPost(FORM_TARGET);
httpPost.setEntity(entity);
HttpClient client = new DefaultHttpClient();
HttpResponse response = null;
String htmlString = "";
try {
response = client.execute(httpPost);
htmlString = convertStreamToString(response.getEntity().getContent());
// now you can parse this string to get data you require.
} catch (Exception letsIgnoreItForNow) {
}
}
private static String convertStreamToString(InputStream is) {
BufferedReader reader = new BufferedReader(new InputStreamReader(is));
StringBuilder sb = new StringBuilder();
String line = null;
try {
while ((line = reader.readLine()) != null) {
sb.append(line);
}
} catch (IOException ignoredOnceMore) {
} finally {
try {
is.close();
} catch (IOException manyIgnoredExceptions) {
}
}
return sb.toString();
}
Also, be warned i didn't wrap this in async call, so you will have to do that.
Image from the network can be displayed in android via efficient image loading api's like Picasso/volley or simply image view via async task.
considering all above things as basic build a logic such that you should need a image URL for that captcha if user resets or refresh the captcha it should reload new image via network call requesting the new request implementation, you have to get REST api access to the Indian railway and check in that any image uri available in that (it may be in base64 format )
if REST API is not available you may think of building your own server with this code
RESTful API to check the PNR Status
pnrapi
Update: you don't need to do this complex hacks , just implement Drago's answer !

Reading and printing HTML from website hangs up

I've been working on some Java code in which a string is converted into a URL and then used to download and output its corresponding URL. Unfortunately, when I run the program, it just hangs up. Does anyone have any suggestsion?
Note: I've used import java.io.* and import java.net.*
public static boolean htmlOutput(String testURL) throws Exception {
URL myPage2 = new URL(testURL); //converting String to URL
System.out.println(myPage2);
BufferedReader webInput2 = new BufferedReader(
new InputStreamReader(myPage2.openStream()));
String individualLine=null;
String completeInput=null;
while ((individualLine = webInput2.readLine()) != null) {
//System.out.println(inputLine);
System.out.println(individualLine);
completeInput=completeInput+individualLine;
}//end while
webInput2.close();
return true;
}//end htmlOutput()
[Though this answer helped the OP it is wrong. HttpURLConnection does follow redirects so this could not be the OP 's problem. I will remove it as soon as the OP removes the accepted mark.]
My guess is that you don't get anything back in the response stream because the page you are trying to connect sends you a redirect response (i.e. 302).
Try to verify that by reading the response code and iterate over the response headers. There should be a header named Location with a new url that you need to follow
HttpURLConnection connection = (HttpURLConnection) url.openConnection();
int code = connection.getResponseCode();
Map<String, List<String>> map = conn.getHeaderFields();
// iterate over the map and find new url
If you are having trouble getting the above snippet to work take a look at a working example
You could do yourself a favor and use a third party http client like Apache Http client that can handle redirects otherwise you should do this manually.

Scraping a site

I am trying to write an alert system to scrape complaints board site periodically to look for any complaints about my product. I am using Jsoup for the same. Below is the the code fragment that gives me error.
doc = Jsoup.connect(finalUrl).timeout(10 * 1000).get();
This gives me error
java.net.SocketException: Unexpected end of file from server
When I copy paste the same finalUrl String in the browser, it works. I then tried simple URL connection
BufferedReader br = null;
try {
URL a = new URL(finalUrl);
URLConnection conn = a.openConnection();
// open the stream and put it into BufferedReader
br = new BufferedReader(new InputStreamReader(
conn.getInputStream()));
doc = Jsoup.parse(br.toString());
} catch (IOException e) {
e.printStackTrace();
}
But as it turned out, the connection itself is returning null (br is null). Now the question is, why does the same string when copy pasted in browser opens the site without any error?
Full stacktrace is as below:
java.net.SocketException: Unexpected end of file from server
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:774)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:771)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:633)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1195)
at ComplaintsBoardScraper.main(ComplaintsBoardScraper.java:46)
That one was tricky! :-)
The server blocks all requests which don't have a proper user agent. And that’s why you succeeded with your browser but failed with Java.
Fortunately changing user agent is not a big thing in jsoup:
final String url = "http://www.complaintsboard.com/?search=justanswer.com&complaints=Complaints";
final String userAgent = "Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.3) Gecko/20040924 Epiphany/1.4.4 (Ubuntu)";
Document doc = Jsoup.connect(url) // you get a 'Connection' object here
.userAgent(userAgent) // ! set the user agent
.timeout(10 * 1000) // set timeout
.get(); // execute GET request
I've taken the first user agent I found … I guess you can use any valid one instead too.

Categories

Resources