Http response code 429 while reading HTML

Http response code 429 while reading HTML - java

In java I want to read and save all the HTML from an URL(instagram), but getting Error 429 (Too many request). I think it is because I am trying to read more lines than request limits.
StringBuilder contentBuilder = new StringBuilder();
try {
URL url = new URL("https://www.instagram.com/username");
URLConnection con = url.openConnection();
InputStream is =con.getInputStream();
BufferedReader in = new BufferedReader(new InputStreamReader(is));
String str;
while ((str = in.readLine()) != null) {
contentBuilder.append(str);
}
in.close();
} catch (IOException e) {
log.warn("Could not connect", e);
}
String html = contentBuilder.toString();
And the Error is so;
Could not connect
java.io.IOException: Server returned HTTP response code: 429 for URL: https://www.instagram.com/username/
And it shows also that error occurs because of this line
InputStream is =con.getInputStream();
Does anybody have an idea why I get this error and/or what to do to solve it?

The problem might have been caused by the connection not being closed/disconnected.
For the input try-with-resources for automatic closing, even on exception or return is usefull too. Also you constructed an InputStreamReader that would use the default encoding of the machine where the application would run, but you need the charset of the URL's content.
readLine returns the line without line-endings (which in general is very useful). So add one.
StringBuilder contentBuilder = new StringBuilder();
try {
URL url = new URL("https://www.instagram.com/username");
URLConnection con = url.openConnection();
try (BufferedReader in = new BufferedReader(
new InputStreamReader(con.getInputStream(), "UTF-8"))) {
String line;
while ((line = in.readLine()) != null) {
contentBuilder.append(line).append("\r\n");
}
} finally {
con.disconnect();
} // Closes in.
} catch (IOException e) {
log.warn("Could not connect", e);
}
String html = contentBuilder.toString();

Related

How to get correct data from HTTP Request

I'm trying to get my user information from stackoverflow api using a simple HTTP request with GET method in Java.
This code I had used before to get another HTTP data using GET method without problems:
URL obj;
StringBuffer response = new StringBuffer();
String url = "http://api.stackexchange.com/2.2/users?inname=HCarrasko&site=stackoverflow";
try {
obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
con.setRequestMethod("GET");
int responseCode = con.getResponseCode();
System.out.println("\nSending 'GET' request to URL : " + url);
System.out.println("Response Code : " + responseCode);
BufferedReader in = new BufferedReader(new InputStreamReader(con.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
System.out.println(response.toString());
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
But in this case I'm getting just stranger symbols when I print the response var, like this:
�mRM��0�+�N!���FZq�\�pD�z�:V���JX���M��̛yO^���뾽�g�5J&� �9�YW�%c`do���Y'��nKC38<A�&It�3��6a�,�,]���`/{�D����>6�Ɠ��{��7tF ��E��/����K���#_&�yI�a�v��uw}/�g�5����TkBTķ���U݊c���Q�y$���$�=ۈ��ñ���8f�<*�Amw�W�ـŻ��X$�>'*QN�?�<v�ݠ FH*��Ҏ5����ؔA�z��R��vK���"���#�1��ƭ5��0��R���z�ϗ/�������^?r��&�f��-�OO7���������Gy�B���Rxu�#:0�xͺ}�\�����
thanks in advance.

The content is likely GZIP encoded/compressed. The following is a general snippet that I use in all of my Java-based client applications that utilize HTTP, which is intended to deal with this exact problem:
// Read in the response
// Set up an initial input stream:
InputStream inputStream = fetchAddr.getInputStream(); // fetchAddr is the HttpURLConnection
// Check if inputStream is GZipped
if("gzip".equalsIgnoreCase(fetchAddr.getContentEncoding())){
// Format is GZIP
// Replace inputSteam with a GZIP wrapped stream
inputStream = new GZIPInputStream(inputStream);
}else if("deflate".equalsIgnoreCase(fetchAddr.getContentEncoding())){
inputStream = new InflaterInputStream(inputStream, new Inflater(true));
} // Else, we assume it to just be plain text
BufferedReader sr = new BufferedReader(new InputStreamReader(inputStream));
String inputLine;
StringBuilder response = new StringBuilder();
// ... and from here forward just read the response...
This relies on the following imports: java.util.zip.GZIPInputStream; java.util.zip.Inflater; and java.util.zip.InflaterInputStream.

Download part of an url content to save bandwith

I have an huge text file online, I know how to fetch the data in the url... an example would be something like this
URL url = new URL(address);
urlConnection = (HttpURLConnection) url.openConnection();
int responseCode = urlConnection.getResponseCode();
if(responseCode == HttpURLConnection.HTTP_OK) {
InputStream stream = urlConnection.getInputStream();
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(stream));
StringBuilder builder = new StringBuilder();
try {
for (String line; (line = bufferedReader.readLine()) != null;)
builder.append(line);
response = builder.toString();
} catch (IOException e) {
e.printStackTrace();
}
}
This file get updated every X minutes and a new line is added to the bottom, so the real info is only in the last line/lines... I was wondering if it would be possible to download only that part, to save bandwith.
Edit: I am looking for a "client-side" solution, without modifying server
Thank you very much.

Sugarcrm - invalid session id via REST - post too large?

-- Hi everyone
I have a strange behaviour with sugarcrm.
Here the code that i'm using to set a new entry with REST:
public SugarApi(String sugarUrl){
REST_ENDPOINT = sugarUrl + "/service/v4/rest.php";
json = new GsonBuilder().create();
codec = new URLCodec();
}
SetEntryRequest req = new SetEntryRequest(sessionId, nameValueListSetEntry, myKind.getModuleName());
String response = null;
try {
response = postToSugar(REST_ENDPOINT+"?method=set_entry&response_type=JSON&input_type=JSON&rest_data="+codec.encode(json.toJson(req)));
} catch (RemoteException e) {
System.out.println("Set entry failed. Message: " + e.getMessage());
e.printStackTrace();
} catch (EncoderException e) {
e.printStackTrace();
} catch (Exception e) {
e.printStackTrace();
}
where postToSugar is:
public String postToSugar(String urlStr) throws Exception {
URL url = new URL(urlStr);
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setDoOutput(true);
conn.setDoInput(true);
conn.setUseCaches(false);
conn.setAllowUserInteraction(false);
conn.setRequestProperty("Content-Type", "application/x-www-form-urlencoded");
if (conn.getResponseCode() != 200) {
throw new IOException(conn.getResponseMessage());
}
// Buffer the result into a string
BufferedReader rd = new BufferedReader(new InputStreamReader(conn.getInputStream()));
StringBuilder sb = new StringBuilder();
String line;
while ((line = rd.readLine()) != null) {
sb.append(line);
}
rd.close();
conn.disconnect();
if(System.getenv("sugardebug") != null){
System.out.println(sb.toString());
}
return sb.toString();
}
So this code is working fine when the post is small.
the maximum size is the following:
{"id":"8c8801c5-ce3b-093c-ee77-514985c19fe1","entry_list":{"account_id":{"name":"account_id","value":"9b37913b-994b-9bc9-4fbf-500e771d845b"},"status":{"name":"status","value":"New"},
"description":{"name":"description","value":"Ceci est un test \/ TICKET A SUPPRIMER"},"priority":{"name":"priority","value":"P1"},
"name":{"name":"name","value":"test longueur post --------------"},"caseorigin_c":{"name":"caseorigin_c","value":"OnLineForm"},"case_chechindate_c":{"name":"case_chechindate_c","value":"2013-01-12"},"type":{"name":"type","value":"ErrorOnCancel"}}}
but if the post is longer, the server returns:
{"name":"Invalid Session ID","number":11,"description":"The session ID is invalid"}
Any help would be appreciated

I encountered this problem with a request I had where the description field had a new line character. I even urlencoded the newline character and it still gave me the error. Presumably apache on Sugar's server was breaking the request at the newline which meant the sessionID wasn't found.
My solution was to replace all occurances of %0A and %0D with %5Cn.
%0A and %0D are different newline characters and %5Cn becomes \n which is a newline in sugar.
I don't have an extensive list of invalid characters for the rest API.

How to read compressed HTML page with Content-Encoding : gzip

I request a web page that sends a Content-Encoding: gzip header, but got stuck how to read it..
My code:
try {
URLConnection connection = new URL("http://jquery.org").openConnection();
String html = "";
BufferedReader in = null;
connection.setReadTimeout(10000);
in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null){
html+=inputLine+"\n";
}
in.close();
System.out.println(html);
System.exit(0);
} catch (IOException ex) {
Logger.getLogger(Crawler.class.getName()).log(Level.SEVERE, null, ex);
}
The output looks very messy.. (I was unable to paste it here, a sort of symbols..)
I believe this is a compressed content, how to parse it?
Note:
If I change jquery.org to jquery.com (which don't send that header, my code works well)

Actually, this is pb2q's answer, but I post the full code for future readers
try {
URLConnection connection = new URL("http://jquery.org").openConnection();
String html = "";
BufferedReader in = null;
connection.setReadTimeout(10000);
//The changed part
if (connection.getHeaderField("Content-Encoding")!=null && connection.getHeaderField("Content-Encoding").equals("gzip")){
in = new BufferedReader(new InputStreamReader(new GZIPInputStream(connection.getInputStream())));
} else {
in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
}
//End
String inputLine;
while ((inputLine = in.readLine()) != null){
html+=inputLine+"\n";
}
in.close();
System.out.println(html);
System.exit(0);
} catch (IOException ex) {
Logger.getLogger(Crawler.class.getName()).log(Level.SEVERE, null, ex);
}

There is a class for this: GZIPInputStream. It is an InputStream and so is very transparent to use.

there are two cases with Content-Encoding:gzip header
if data already compressed(by application), Content-Encoding:gizp header will cause data to compressed again.so its double compressed.it's because http compression
if data is not compressed by application, Content-Encoding:gizp will cause data to compress(gzip mostly) and it will automatically uncompressed(un-zip) before it reaches to client. un-zip is default feature available in most of web browsers. browser will do un-zip if it finds Content-Encoding:gizp header in the response.

Java UTF-8 encoding not set to URLConnection

I'm trying to retrieve data from http://api.freebase.com/api/trans/raw/m/0h47
As you can see in text there are sings like this: /ælˈdʒɪəriə/.
When I try to get source from the page I get text with sings like ú etc.
So far I've tried with the following code:
urlConnection.setRequestProperty("Accept-Charset", "UTF-8");
urlConnection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded;charset=utf-8");
What am I doing wrong?
My entire code:
URL url = null;
URLConnection urlConn = null;
DataInputStream input = null;
try {
url = new URL("http://api.freebase.com/api/trans/raw/m/0h47");
} catch (MalformedURLException e) {e.printStackTrace();}
try {
urlConn = url.openConnection();
} catch (IOException e) { e.printStackTrace(); }
urlConn.setRequestProperty("Accept-Charset", "UTF-8");
urlConn.setRequestProperty("Content-Type", "text/plain; charset=utf-8");
urlConn.setDoInput(true);
urlConn.setUseCaches(false);
StringBuffer strBseznam = new StringBuffer();
if (strBseznam.length() > 0)
strBseznam.deleteCharAt(strBseznam.length() - 1);
try {
input = new DataInputStream(urlConn.getInputStream());
} catch (IOException e) { e.printStackTrace(); }
String str = "";
StringBuffer strB = new StringBuffer();
strB.setLength(0);
try {
while (null != ((str = input.readLine())))
{
strB.append(str);
}
input.close();
} catch (IOException e) { e.printStackTrace(); }

The HTML page is in UTF-8, and could use arabic characters and such. But those characters above Unicode 127 are still encoded as numeric entities like ú. An Accept-Encoding will not, help, and loading as UTF-8 is entirely right.
You have to decode the entities yourself. Something like:
String decodeNumericEntities(String s) {
StringBuffer sb = new StringBuffer();
Matcher m = Pattern.compile("\\&#(\\d+);").matcher(s);
while (m.find()) {
int uc = Integer.parseInt(m.group(1));
m.appendReplacement(sb, "");
sb.appendCodepoint(uc);
}
m.appendTail(sb);
return sb.toString();
}
By the way those entities could stem from processed HTML forms, so on the editing side of the web app.
After code in question:
I have replaced DataInputStream with a (Buffered)Reader for text. InputStreams read binary data, bytes; Readers text, Strings. An InputStreamReader has as parameter an InputStream and an encoding, and returns a Reader.
try {
BufferedReader input = new BufferedReader(
new InputStreamReader(urlConn.getInputStream(), "UTF-8"));
StringBuilder strB = new StringBuilder();
String str;
while (null != (str = input.readLine())) {
strB.append(str).append("\r\n");
}
input.close();
} catch (IOException e) {
e.printStackTrace();
}

Try adding also the user agent to your URLConnection:
urlConnection.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.155 Safari/537.36");
This solved my decoding problem like a charm.

Well I'm thinking the problem is when you are reading from the stream. You should either call the readUTF method on the DataInputStream instead of calling readLine or, what I would do, would be to create an InputStreamReader and set the encoding, then you can read from the BufferedReader line by line (this would be inside your existing try/catch):
Charset charset = Charset.forName("UTF8");
InputStreamReader stream = new InputStreamReader(urlConn.getInputStream(), charset);
BufferedReader reader = new BufferedReader(stream);
StringBuffer responseBuffer = new StringBuffer();
String read = "";
while ((read = reader.readLine()) != null) {
responseBuffer.append(read);
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Http response code 429 while reading HTML - java

Related

How to get correct data from HTTP Request

Download part of an url content to save bandwith

Sugarcrm - invalid session id via REST - post too large?

How to read compressed HTML page with Content-Encoding : gzip

Java UTF-8 encoding not set to URLConnection

Categories

Resources