I'm trying to open my university's website to read their menu. I've written a version that reads the menu given the link directly to the menu link, but I want to pull it back a little so I can read the menu from the website and not the direct link (in case the link ever changes).
Here is the URL I am opening:
https://nccudining.sodexomyway.com/dining-choices/index.html
Whenever I open the link to the website, this is the output that I get:
302
<html><head><title>Object moved</title></head><body>
<h2>Object moved to here.</h2>
</body></html>
The URL it outputs is the mobile version of the website, but when I try to use that URL, it outputs nothing.
This is my code:
import java.io.*;
import java.net.*;
public class test
{
public static void main( String[] args )
{
URL url = null;
try
{
url = new URL("https://nccudining.sodexomyway.com/dining-choices/index.html");
HttpURLConnection test = (HttpURLConnection) url.openConnection();
test.setInstanceFollowRedirects(true);
test.connect();
System.out.println(test.getResponseCode());
} catch ( MalformedURLException e1 )
{
System.out.println("URL cannot be opened.");
return;
}
BufferedReader in = null;
try
{
in = new BufferedReader(new InputStreamReader(url.openStream()));
} catch ( IOException e )
{
System.out.println("Error");
}
String inputLine;
try
{
while ((inputLine = in.readLine()) != null)
{
System.out.println(inputLine);
}
} catch ( IOException e )
{
System.out.println("Error");
}
}
}
I apologize for all the try/catch loops. I don't want to just throw an IOException from the main from the get-go because I've heard that's bad practice. Anyway, this code just opens the URL, sets up a connection so I can make sure the URL actually exists, and try to read the HTML of it. It works on any other site I've tried it on, including google.
My question is why will my code not read the correct source code of the website? Is it something wrong with my code (I figured adding in the HttpsURLConnection and allowing redirects would work) or is it just the website, and is there anything I can do to bypass that aside from just opening the weekly menu's page?
Solution found! Thanks to #ShayHaned for the fixes. I added the following lines to the HttpURLConnection so I got a 200 response code rather than a 302:
test = (HttpURLConnection) url.openConnection();
test.setRequestMethod("GET");
test.setRequestProperty("User-Agent", "Mozilla/5.0");
test.setInstanceFollowRedirects(true);
Then I changed the InputStream from opening the stream from the URL to getting the input stream from the HttpURLConnection, as shown:
BufferedReader in = new BufferedReader(new InputStreamReader(test.getInputStream()));
That gave me the HTML I was looking for.
You are just missing the appropriate headers for http communication to work safely and securely. You can add a few Headers to make sure that you get the desired response
HttpURLConnection test = (HttpURLConnection) url.openConnection();
test.addRequestProperty( "User-Agent", "Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko" );
test.addRequestProperty( "Accept" , "text/html,application/xhtml+xml,application/xml,image/png, image/svg+xml,;q=0.9,*/*;q=0.8");
test.addRequestProperty( "Accept-Charset" , "ISO-8859-1,utf-8;q=0.7,*;q=0.3");
test.addRequestProperty( "Accept-Language" , "en-US,en;q=0.8" );
test.addRequestProperty( "Connection" , "close" );
test.setRequestMethod("GET");
test.setInstanceFollowRedirects(true);
test.connect();
// Nopes DONT TRY THIS
//in = new BufferedReader(new InputStreamReader(url.openStream()));
in = new BufferedReader( new InputStreamReader( test.getInputStream() ) );
String htmlContent = "";
for( String inputLine = ""; ( inputLine = in.readLine() ) != null; )
htmlContent += inputLine;
System.out.println( htmlContent );
Instead of in = new BufferedReader(new InputStreamReader(url.openStream() ) ); , please try in = new BufferedReader(new InputStreamReader(test.getInputStream() ) ); , because it sounds pretty logical to open your InputStream from the actual HttpURLConnection object .
If you really want to understand the http header part try https://en.wikipedia.org/wiki/List_of_HTTP_header_fields for a detailed description of http headers and usage
Related
I have researched extensively and cannot find a solution. I have been using the solutions provided to other users and it does not seem to work for me.
My java code:
public class Post {
public static void main(String[] args) {
String name = "Bobby";
String address = "123 Main St., Queens, NY";
String phone = "4445556666";
String data = "";
try {
// POST as urlencoded is basically key-value pairs
// create key=value&key=value.... pairs
data += "name=" + URLEncoder.encode(name, "UTF-8");
data += "&address=" +
URLEncoder.encode(address, "UTF-8");
data += "&phone=" +
URLEncoder.encode(phone, "UTF-8");
// convert string to byte array, as it should be sent
byte[] dataBytes = data.toString().getBytes("UTF-8");
// open a connection to the site
URL url = new URL("http://xx.xx.xx.xxx/yyy.php");
HttpURLConnection conn =
(HttpURLConnection) url.openConnection();
// tell the server this is POST & the format of the data.
conn.setDoOutput(true);
conn.setRequestProperty("Content-Type",
"application/x-www-form-urlencoded");
conn.setRequestMethod("POST");
conn.setFixedLengthStreamingMode(dataBytes.length);
conn.getOutputStream().write(dataBytes);
conn.getInputStream();
// Print out the echo statements from the php script
BufferedReader in = new BufferedReader(
new InputStreamReader(url.openStream()));
String line;
while((line = in.readLine()) != null)
System.out.println(line);
in.close();
} catch(Exception e) {
e.printStackTrace();
}
}
}
and the php
<?php
echo $_POST["name"];
?>
The output I receive is an empty line. I tested to see if it was a php/server side issue by making an html form that sends data over to a similar script and prints the data on the screen and that worked. But, for the life of me, I cannot get this to work with a remote client.
I am using Ubuntu server and Apache.
Thank you in advance.
The problem is actually in what you read as output. You are doing two requests:
1)conn.getInputStream(); - sends POST request with desired body
2)BufferedReader in = new BufferedReader(
new InputStreamReader(url.openStream())); - sends empty GET request (!!)
Change it to:
// ...
conn.getOutputStream().write(dataBytes);
BufferedReader in = new BufferedReader(
new InputStreamReader(conn.getInputStream()));
and see result.
I want to get the HTML code of the following Web Page (http://www.studenti.ict.uniba.it/esse3/ListaAppelliOfferta.do) after:
selecting "Dipartimento di Informatica" among Facoltà
selecting "Informatica" (or one of the others available)
clicking "Avvia Ricerca"
I am not very keen in the matter but I noticed the URL of the page stays the same after each selection!?!
Can anyone help describing, possibly in details, how can I do that? Unfortunately I am not expert in web programming.
Many thanks
After some tests, it refresh the pages with a POST request
fac_id:1012 --
cds_id:197 --
ad_id: -- Attività didattica
docente_id: -- Id of the docent selected
data:06/03/2014 -- Date
Anyway you missed the value of Attività ditattica, Docente and Data esame
Just run a HTTP request using HttpURLConnection (?) with this POST args, and with a XML parser read the output of tplmessage table.
Try this tutorial for HTTP request: click.
Try to read this to understand how to parse response: click
An example using the code of the tutorial:
HttpURLConnection connection = null;
try
{
URL url = new URL("http://www.studenti.ict.uniba.it/esse3/ListaAppelliOfferta.do");
connection = (HttpURLConnection) url.openConnection(); // open the connection with the url
String params =
"fac_id=1012&cds_id=197"; // You need to add ad_id, docente_id and data
connection.setRequestMethod("POST"); // i need to use POST request method
connection.setRequestProperty("Content-Length", "" + Integer.toString(params.getBytes().length)); // It will add the length of params
connection.setRequestProperty("Content-Language", "it-IT"); // language italian
connection.setUseCaches (false);
connection.setDoInput (true);
connection.setDoOutput (true);
DataOutputStream wr = new DataOutputStream(
connection.getOutputStream ());
wr.writeBytes (params); // pass params
wr.flush (); // send request
wr.close ();
//Get Response
InputStream is = connection.getInputStream();
BufferedReader rd = new BufferedReader(new InputStreamReader(is));
String line;
StringBuilder response = new StringBuilder();
while((line = rd.readLine()) != null) {
response.append(line);
response.append('\r');
}
rd.close();
}
catch (MalformedURLException e)
{
e.printStackTrace();
} catch (IOException e)
{
e.printStackTrace();
}
finally
{
// close connection if created
if (connection != null)
connection.disconnect();
}
In response you will have the DOM of the page.
Anyway, use Chrome developers tool to get request args:
I am trying to use a signed java applet to post to a url like:
http://some.domain.com/something/script.asp?param=5041414F9015496EA699F3D2DBAB4AC2|178411|163843|557|1|1|164||attempt|1630315
But when java makes the connection, the java console shows:
network: Connecting http://some.domain.com/something/script.asp?param=5041414F9015496EA699F3D2DBAB4AC2%7C178411%7C163843%7C557%7C1%7C1%7C164%7C%7Cattempt%7C1630315
I do not want java to urlencode the pipes in the query from | to %7c. It seems the service I'm connecting to doesn't urldecode the param, and I can't change the server side code. Is there a way in java to make the post without escaping the query?
The java I'm using is below:
try {
URL url = new URL(myURL);
URLConnection connection = url.openConnection();
connection.setDoOutput(true);
OutputStreamWriter out = new OutputStreamWriter(
connection.getOutputStream());
out.write(toSend);
out.close();
BufferedReader in = new BufferedReader(
new InputStreamReader(
connection.getInputStream()));
String decodedString = "";
while ((decodedString = in.readLine()) != null) {
totalResponse = totalResponse + decodedString;
}
in.close();
} catch (Exception ex) {
}
Thank you for any help!
the URL class does not do any encoding. testing this on my dev server confirmed this suspicion. your code must be encoding the '|' character somewhere before the snippet you included in your question.
As the title says ... I have tried to use the following code to execute a PHP script when user clicks a button in my Java Swing application :
URL url = new URL( "http://www.mywebsite.com/my_script.php" );
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.connect();
But nothing happens ... Is there something wrong ?
I think you're missing the next step which is something like:
InputStream is = conn.getInputStream();
HttpURLConnection basically only opens the socket on connect in order to do something you need to do something like calling getInputStream() or better still getResponseCode()
URL url = new URL( "http://google.com/" );
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
if( conn.getResponseCode() == HttpURLConnection.HTTP_OK ){
InputStream is = conn.getInputStream();
// do something with the data here
}else{
InputStream err = conn.getErrorStream();
// err may have useful information.. but could be null see javadocs for more information
}
final URL url = new URL("http://domain.com/script.php");
final InputStream inputStream = new InputStreamReader(url);
final BufferedReader reader = new BufferedReader(inputStream).openStream();
String line, response = "";
while ((line = reader.readLine()) != null)
{
response = response + "\r" + line;
}
reader.close();
"response" will hold the text of the page. You may want to play around with the carriage return (depending on the OS, try \n, \r, or a combination of both).
Hope this helps.
I try to access an ASPX-website where subsequent pages are returned based on
post data. Unfortunately all my attempts to get the following pages fail.
Hopefully, someone here has an idea where to find the error!
In step one I read the session ID from the cookie as well as the value of the
viewstate variable in the returned html page. Step two intends to send it
back to the server to get the desired page.
Sniffing the data in the webbrowser gives
Host=www.geocaching.com
User-Agent=Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100618
Iceweasel/3.5.9 (like Firefox/3.5.9)
Accept=text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language=en-us,en;q=0.5
Accept-Encoding=gzip,deflate
Accept-Charset=ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive=300
Connection=keep-alive
Referer=http://www.geocaching.com/seek/nearest.aspx?state_id=149
Cookie=Send2GPS=garmin; BMItemsPerPage=200; maprefreshlock=true; ASP.
NET_SessionId=c4jgygfvu1e4ft55dqjapj45
Content-Type=application/x-www-form-urlencoded
Content-Length=4099
POSTDATA=__EVENTTARGET=ctl00%24ContentBody%24pgrBottom%
24lbGoToPage_3&__EVENTARGUMENT=&__VIEWSTATE=%2FwEPD[...]2Xg%3D%
3D&language=on&logcount=on&gpx=on
Currently, my script looks like this
import java.net.*;
import java.io.*;
import java.util.*;
import java.security.*;
import java.net.*;
public class test1 {
public static void main(String args[]) {
// String loginWebsite="http://geocaching.com/login/default.aspx";
final String loginWebsite = "http://www.geocaching.com/seek/nearest.aspx?state_id=159";
final String POST_CONTENT_TYPE = "application/x-www-form-urlencoded";
// step 1: get session ID from cookie
String sessionId = "";
String viewstate = "";
try {
URL url = new URL(loginWebsite);
String key = "";
URLConnection urlConnection = url.openConnection();
if (urlConnection != null) {
for (int i = 1; (key = urlConnection.getHeaderFieldKey(i)) != null; i++) {
// get ASP.NET_SessionId from cookie
// System.out.println(urlConnection.getHeaderField(key));
if (key.equalsIgnoreCase("set-cookie")) {
sessionId = urlConnection.getHeaderField(key);
sessionId = sessionId.substring(0, sessionId.indexOf(";"));
}
}
BufferedReader in = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
// get the viewstate parameter
String aLine;
while ((aLine = in.readLine()) != null) {
// System.out.println(aLine);
if (aLine.lastIndexOf("id=\"__VIEWSTATE\"") > 0) {
viewstate = aLine.substring(aLine.lastIndexOf("value=\"") + 7, aLine.lastIndexOf("\" "));
}
}
}
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(sessionId);
System.out.println("\n");
System.out.println(viewstate);
System.out.println("\n");
// String goToPage="3";
// step2: post data to site
StringBuilder htmlResult = new StringBuilder();
try {
String encoded = "__EVENTTARGET=ctl00$ContentBody$pgrBottom$lbGoToPage_3" + "&" + "__EVENTARGUMENT=" + "&"
+ "__VIEWSTATE=" + viewstate;
URL url = new URL(loginWebsite);
URLConnection urlConnection = url.openConnection();
urlConnection = url.openConnection();
// Specifying that we intend to use this connection for input
urlConnection.setDoInput(true);
// Specifying that we intend to use this connection for output
urlConnection.setDoOutput(true);
// Specifying the content type of our post
urlConnection.setRequestProperty("Content-Type", POST_CONTENT_TYPE);
// urlConnection.setRequestMethod("POST");
urlConnection.setRequestProperty("Cookie", sessionId);
urlConnection.setRequestProperty("Content-Type", "text/html");
DataOutputStream out = new DataOutputStream(urlConnection.getOutputStream());
out.writeBytes(encoded);
out.flush();
out.close();
BufferedReader in = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
String aLine;
while ((aLine = in.readLine()) != null) {
System.out.println(aLine);
}
} catch (MalformedURLException e) {
// Print out the exception that occurred
System.err.println("Invalid URL " + e.getMessage());
} catch (IOException e) {
// Print out the exception that occurred
System.err.println("Unable to execute " + e.getMessage());
}
}
}
Any idea what's wrong? Any help is very appreciated!
Update
Thank you for the fast reply!
I switched to use the HttpURLConnection instead of the URLConnection which implements the setRequestMethod(). I also corrected the minor mistakes you mentioned, e.g. removed the obsolete first setRequestProperty call.
Unfortunately this doesn’t change anything... I think I set all relevant parameters but still get the first page of the list, only. It seems that the "__EVENTTARGET=ctl00$ContentBody$pgrBottom$lbGoToPage_3" is ignored. I don't have any clues why.
Internally, the form on the website looks like this:
<form name="aspnetForm" method="post" action="nearest.aspx?state_id=159" id="aspnetForm">
It is called by the following javascript:
<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['aspnetForm'];
if (!theForm) {
theForm = document.aspnetForm;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
//]]>
</script>
Hopefully, this helps to find a solution?
Greetings
maik.
Do you actually want to GET or POST? If you want to POST, then you may need the setRequestMethd() line.
You're setting Content-Type twice -- I think you may need to combine these into one line.
Then, don't close the output stream before you try and read from the input stream.
Other than that, is there any more logging you can put in/clues you can give as to what way it's going wrong in?
Hey use following code
String userAgent = "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0";
org.jsoup.nodes.Document jsoupDoc = Jsoup.connect(url).timeout(15000).userAgent(userAgent).referrer("http://calendar.legis.ga.gov/Calendar/?chamber=House").ignoreContentType(true)
.data("__EVENTTARGET", eventtarget).data("__EVENTARGUMENT", eventarg).data("__VIEWSTATE", viewState).data("__VIEWSTATEGENERATOR", viewStateGenarator)
.data("__EVENTVALIDATION", viewStateValidation).parser(Parser.xmlParser()).post();