Reading http file on url - java

I am trying to read an xml file on an http url. "Unexpected end of file from server" keeps on coming . the page is password protected, I would like to know if I am properly giving in my url authentication details.
I tried with non protected pages and I can read them properly. Here is my code:
URL url = new URL("http://username:pass#0.0.0.0:0000/test.xml");
URLConnection yc = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null)
System.out.println(inputLine);
in.close();

Related

Using anchor tag to link to an external URL in Java

I'm trying to send a Telegram message from an Android app. I want that message to contain a hyperlink so I used parse_mode=html param but I have a problem with the anchor tag. It seems that java is treating my URL as a local path.
This is the code:
String location = "http://www.google.com";
urlString = String.format("https://api.telegram.org/bot<bot_token>/sendMessage?chat_id=<chat_id>&parse_mode=html&text=<a href=%s>Location</a>", location);
URL url = new URL(urlString);
URLConnection conn = url.openConnection();
StringBuilder sb = new StringBuilder();
InputStream is = new BufferedInputStream(conn.getInputStream());
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String inputLine = "";
while ((inputLine = br.readLine()) != null) {
sb.append(inputLine);
}
And this is the error:
java.io.FileNotFoundException:
https://api.telegram.org/bot<bot_token>/sendMessage?chat_id=<chat_id>&parse_mode=html&text=<a href=http://google.com>Location</a>
How should I write this message so the href link will be treated as an external URL?
The error java.io.FileNotFoundException doesn't mean that it is treated as a local path.
It is HTTP 404 File Not Found. And it is the response from the server for your HTTP Request.
It seems that at first to provide proper <bot_token> and <chat_id> is needed. And second, you should urlencode that String before instantiating a URL object with it.
String encodedUrlString = URLEncoder.encode(urlString, "UTF-8");
URL url = new URL(encodedUrlString);

Reading HTML from URL in Java vs. Python

I'm trying to read the HTML from a particular URL and store it into a String for parsing. I referred to a previous post to help me out. When I print out what was read, all I get are special characters.
Here is my Java code (with try/catches left out) that reads from a URL and prints:
String path = "https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-6b1bd15200.jsonp";
URL url = new URL(path);
InputStream in = url.openStream();
BufferedReader bw = new BufferedReader(new InputStreamReader(in, "UTF-8");
String line;
while ((line = bw.readLine()) != null) {
System.out.println(line);
}
Program output:
�ĘY106-6b1bd15200.jsonpmP�r� �Ƨ�!�%m�vD"��Ra*��w�%����ݳ�sβ��MK�d�9+%�m��l^��މ����:���� ���8B�Vce�.A*��x$FCo���a�b�<����Xy��m�c�>t����� �Z������Gx�o� �J���oKe�0�5�kGYpb�*l����+|�U���-�N3��jBp�R�z5Cۥjh��o�;�~)����~��)~ɮhy��<c,=;tHW���'�c�=~�w���
Expected output:
window.page106_callback(["<div class=\"newpage\" id=\"page106\" style=\"width: 902px; height:1273px\">\n<div class=image_layer style=\"z-index: 1\">\n<div class=ie_fix>\n<img class=\"absimg\" style=\"left:18px;top:27px;width:860px;height:1077px;clip:rect(1px 859px 1076px 1px)\" orig=\"http://html.scribd.com/913q5pjrsw60h9i4/images/106-6b1bd15200.jpg\"/>\n</div>\n</div>\n</div>\n\n"]);
At first, I thought it was an issue with permissions or something that somehow encrypted the stream, but my friend wrote a small Python script to do the same thing and it worked, thereby ruling this out. This is what he wrote:
import requests
link = 'https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-
6b1bd15200.jsonp'
f = requests.get(link)
text = (f.text)
print(text)
So the question is, why is the Java version unable to correctly read and print from this particular URL? Note that I tried testing some other URLs from various websites and those worked fine. Maybe I should learn Python.
The response is gzip-encoded. You can do:
InputStream in = new GZIPInputStream(con.getInputStream());
#Maurice Perry is right, I tried with below code
String url = "https://html1-f.scribdassets.com/913q5pjrsw60h9i4/pages/106-6b1bd15200.jsonp";
URL obj = new URL(url);
HttpURLConnection con = (HttpURLConnection) obj.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(new GZIPInputStream(con.getInputStream())));
String inputLine;
StringBuffer response = new StringBuffer();
while ((inputLine = in.readLine()) != null) {
response.append(inputLine);
}
in.close();
System.out.println(response.toString());

Java InputStreamReader from URL does not encode "Umlaute"

I try to read the html content from an URL. When I wan't to print the content to the console "Umlaute" like ä, ö, ü are displayed wrong.
URL url = new URL("http://www.lauftreff.de/laeufe/halbmarathon-1-2017.html");
URLConnection conn = url.openConnection();
InputStreamReader input = new InputStreamReader(conn.getInputStream(),StandardCharsets.ISO_8859_1);
BufferedReader bi = new BufferedReader(input);
String inputLine;
while((inputLine = bi.readLine()) != null){
System.out.println(inputLine);
}
In the header of the html the information of the charset says ISO_8859_1. Also UTF-8 does not work.
Has anyone an Idea what to do?
In the website the Umlaute are decoded as HTML entities. So you would need to decode those. The code below should work, but it is untested.
URL url = new URL("http://www.lauftreff.de/laeufe/halbmarathon-1-2017.html");
URLConnection conn = url.openConnection();
InputStreamReader input = new InputStreamReader(conn.getInputStream(),StandardCharsets.ISO_8859_1);
BufferedReader bi = new BufferedReader(input);
String inputLine;
while((inputLine = bi.readLine()) != null){
inputLine = StringEscapeUtils.unescapeHtml4(inputLine);
System.out.println(inputLine);
}

URLConnection didnt return complete content of file

My code looks like
URL oracle = new URL(calURL);
FileWriter overall = new FileWriter("overall.txt");
HttpURLConnection yc = (HttpURLConnection) oracle.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(yc.getInputStream()));
String inputLine;
while ((inputLine = in.readLine()) != null) {
overall.append("\n"+inputLine);
}
It seems it is returning only half of content .. Not getting the full content
Note : calURL is dynamically generated
calURL is taking much time to load. Before its my stream starts reading I guess. I included timeout before URL connection it is getting full data now.

How do I send a cookie while trying to grab a sites source?

I am trying to grab a site's source code using this code
private static String getUrlSource(String url) throws IOException {
URL url = new URL(url);
URLConnection urlConn = url.openConnection();
BufferedReader in = new BufferedReader(new InputStreamReader(
urlConn.getInputStream(), "UTF-8"));
String inputLine;
StringBuilder a = new StringBuilder();
while ((inputLine = in.readLine()) != null)
a.append(inputLine);
in.close();
return a.toString();
}
When I do grab the site code this way I get an error about needing to allow cookies. Is there anyway to allow cookies in a java application just so I can grab some source code? I do have the cookie my browser uses to log me in if that helps.
Thanks
John
This way you would have to deal with raw request data, Go with apache http client that gives you abstraction and some methods to allow to set headers in request

Categories

Resources