When I try to check status codes within sites I face off 403 response code after a while. First when I run the code every sites send back datas but after my code repeat itself with Timer I see one webpage returns 403 response code. Here is my code.
public class Main {
public static void checkSites() {
Timer ifSee403 = new Timer();
try {
File links = new File("./linkler.txt");
Scanner scan = new Scanner(links);
ArrayList<String> list = new ArrayList<>();
while(scan.hasNext()) {
list.add(scan.nextLine());
}
File linkStatus = new File("LinkStatus.txt");
if(!linkStatus.exists()){
linkStatus.createNewFile();
}else{
System.out.println("File already exists");
}
BufferedWriter writer = new BufferedWriter(new FileWriter(linkStatus));
for(String link : list) {
try {
if(!link.startsWith("http")) {
link = "http://"+link;
}
URL url = new URL(link);
HttpURLConnection.setFollowRedirects(true);
HttpURLConnection http = (HttpURLConnection)url.openConnection();
http.setRequestMethod("HEAD");
http.setConnectTimeout(5000);
http.setReadTimeout(8000);
int statusCode = http.getResponseCode();
if (statusCode == 200) {
ifSee403.wait(5000);
System.out.println("Hello, here we go again");
}
http.disconnect();
System.out.println(link + " " + statusCode);
writer.write(link + " " + statusCode);
writer.newLine();
} catch (Exception e) {
writer.write(link + " " + e.getMessage());
writer.newLine();
System.out.println(link + " " +e.getMessage());
}
}
try {
writer.close();
} catch (Exception e) {
System.out.println(e.getMessage());
}
System.out.println("Finished.");
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
public static void main(String[] args) throws Exception {
Timer myTimer = new Timer();
TimerTask sendingRequest = new TimerTask() {
public void run() {
checkSites();
}
};
myTimer.schedule(sendingRequest,0,150000);
}
}
How can I solve this? Thanks
Edited comment:
I've added http.disconnect(); for closing connection after checked status codes.
Also I've added
if(statusCode == 200) {
ifSee403.wait(5000);
System.out.println("Test message);
}
But it didn't work. Compiler returned current thread is not owner error. I need to fix this and change 200 with 403 and say ifSee403.wait(5000) and try it again the status code.
One "alternative" - by the way - to IP / Spoofing / Anonymizing would be to (instead) try "obeying" what the security-code is expecting you to do. If you are going to write a "scraper", and are aware there is a "bot detection" that doesn't like you debugging your code while you visit the site over and over and over - you should try using the HTML Download which I posted as an answer to the last question you asked.
If you download the HTML and save it (save it to a file - once an hour), and then write you HTML Parsing / Monitoring Code using the HTML contents of the file you have saved, you will (likely) be abiding by the security-requirements of the web-site and still be able to check availability.
If you wish to continue to use JSoup, that A.P.I. has an option for receiving HTML as a String. So if you use the HTML Scrape Code I posted, and then write that HTML String to disk, you can feed that to JSoup as often as you like without causing the Bot Detection Security Checks to go off.
If you play by their rules once in a while, you can write your tester without much hassle.
import java.io.*;
import java.net.*;
...
// This line asks the "url" that you are trying to connect with for
// an instance of HttpURLConnection. These two classes (URL and HttpURLConnection)
// are in the standard JDK Package java.net.*
HttpURLConnection con = (HttpURLConnection) url.openConnection();
// Tells the connection to use "GET" ... and to "pretend" that you are
// using a "Chrome" web-browser. Note, the User-Agent sometimes means
// something to the web-server, and sometimes is fully ignored.
con.setRequestMethod("GET");
con.setRequestProperty("User-Agent", "Chrome/61.0.3163.100");
// The classes InputStream, InputStreamReader, and BufferedReader
// are all JDK 1.0 package java.io.* classes.
InputStream is = con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
StringBuffer sb = new StringBuffer();
String s;
// This reads each line from the web-server.
while ((s = br.readLine()) != null) sb.append(s + "\n");
// This writes the results from the web-server to a file
// It is using classes java.io.File and java.io.FileWriter
File outF = new File("SavedSite.html");
outF.createNewFile();
FileWriter fw = new FileWriter(outF);
fw.write(sb.toString());
fw.close();
Again, this code is very basic stuff that doesn't use any special JAR Library Code at all. The next method uses the JSoup library (which you have explicitly requested - even though I don't use it... It is just fine!) ... This is the method "parse" which will parse the String you have just saved. You may load this HTML String from disk, and send it to JSoup using:
Method Documentation: org.jsoup.Jsoup.parse(File in, String charsetName, String baseUri)
If you wish to invoke JSoup just pass it a java.io.File instance using the following:
File f = new File("SavedSite.html");
Document d = Jsoup.parse(f, "UTF-8", url.toString());
I do not think you need timers at all...
AGAIN: If you are making lots of calls to the server. The purpose of this answer is to show you how to save the response of the server to a file on disk, so you don't have to make lots of calls - JUST ONE! If you restrict your calls to the server to once per hour, then you will (likely, but not a guarantee) avoid getting a 403 Forbidden Bot Detection Problem.
Related
I'm trying to connect a java application to an external api for GuildWars2.
The link I am trying to test is:
http://api.guildwars2.com/v2/commerce/listings
A list of int IDs are returned when navigating to that page within a browser.
As a learning practice, I am trying to get that list of id's when running my java application.
I use the following code (hopefully it formats correct, currently on my phone, trying to program remotely to my desktop):
public class GuildWarsAPI
{
public static void main(String[] args)
{
GuildWarsAPI api = new GuildWarsAPI();
api.getAPIResponse("http://api.guildwars2.com/v2/commerce/listings");
}
public void getAPIResponse(String URLString)
{
URL url = null;
try {
url = new URL(URLString);
} catch (MalformedURLException e1) {
e1.printStackTrace();
}
HttpURLConnection connection = null;
try {
connection = (HttpURLConnection) url.openConnection();
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
if (connection != null)
{
System.out.println("connection success");
connection.setRequestProperty("Accept", "application/json");
connection.setDoInput(true);
connection.setDoOutput(true);
connection.setReadTimeout(10000);
connection.setConnectTimeout(10000);
try {
/*BufferedReader in = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder input = new StringBuilder();
String nextLine = null;
while ((nextLine = in.readLine()) != null)
{
System.out.println("adding output");
input.append(nextLine);
}*/
InputStream in = new BufferedInputStream(connection.getInputStream());
int b = 0;
while ((b = in.read()) != -1)
{
System.out.println("byte:" + b);
}
System.out.println("done");
}
catch (Exception e) {
e.printStackTrace();
}
finally {
connection.disconnect();
System.out.println("closed");
}
}
}
}
Upon running my class, it immediately prints out connection success, done, closed. It definitely isnt waiting for the timeouts, and i've been trying to play with that, the request header, and the DoInput/DoOutput. I stepped through it, and it appears as if it connects, and just doesnt receive any bytes of information back. (doesnt go into the while loop)
So, while my ultimate question is: How do I get the id's back like I expect?, my other question is: how can I figure out how to get the other id's back like I expect?
Your code is getting response code 302 Found. It should follow the Location: header to the new location, as followRedirects is true by default, but it isn't. The server is however returning a Location: header of https://api.guildwars2.com/v2/commerce/listings. I don't know why HttpURLConnection isn't following that, but the simple fix is to use https: in the original URL.
You're setting doOutput(true) but you aren't sending any output.
Your code is poorly structured. Code that depends on the success of code in a prior try block should be inside that same try block. I would have the method throw MalformedURLException and IOException and not have any internal try/catch blocks at all.
In my experience, wrestling with HttpUrlConnection is more trouble than it's worth.
It's hard to debug, hard to use, and provides very little support for complex http operations.
There are a bunch of better options.
My default choice is Apache HttpConponents Client (http://hc.apache.org/). It's not necessarily any better than all the other options, but it's quite well documented and widely used.
i use the following code to download file from specified url using socket not url.openconnection();
after downloading when i checked it was not working... when i open the file with editor it was completely blank no data inside the file(empty file ) need suggestion ???... ...
try {
String address="http://tineye.com/images/widgets/mona.jpg";
URL url_of_file=new URL(addres);
String hostaddress=url_of_file.getHost();
Socket mysocket=new Socket(hostaddress, 80);
System.out.println("Socket opened to " + hostaddress + "\n");
String file=url_of_file.getFile();
System.out.println(" file = "+file);
OutputStreamWriter osw=new OutputStreamWriter(mysocket.getOutputStream());
osw.write("GET " + file + " HTTP/1.0\r\n\n");
osw.flush();
dis = new DataInputStream(mysocket.getInputStream());
fileData = new byte[7850];
for (int x = 0; fileData[x] > 0; x++){
fileData[x] = (byte) dis.read();
}
// close the data input stream
fos = new FileOutputStream(new File("C:\\Users\\down-to\\filedownloaded.jgp")); //create an object representing the file we want to save
fos.write(fileData); // write out the file we want to save.
dis.close();
fos.close();
} catch (UnknownHostException ex) {
Logger.getLogger(Check.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(Check.class.getName()).log(Level.SEVERE, null, ex);
}
Is this:
for(int x=0;fileData[x]>0;x++){
right ? It looks like you're trying to break based upon the content of the stream. As Flavio has indicated, this statement is false immediately since the array is newly created.
I think you're much more likely to read the indicated length of the content, or until the end of the stream is reached.
In fact, I'd much rather use an existing HttpClient and bypass all of the above. Writing reliable HTTP code is not as trivial as it first appears and 3rd party library will save you a lot of grief.
You're doing a few things wrong. First of all it's reinventing the wheel, since lots of HTTP libraries already exist.
Then you're crafting an invalid HTTP request. Just like HTTP 1.1, the headers of such a request should be ended with a \r\n, while you only send an \n:
osw.write("GET " + file + " HTTP/1.0\r\n\n");
The server will probably wait until you finish your request (it's still waiting for a complete request, ended with a double \r\n) or throw an error since it does not expect the second \n there.
Then you're not reading the response headers, which may indicate how much data to expect. It's funny you initialize a byte array that's exactly large enough (the file you want to download is 7850 bytes large), but you can't hardcode file sizes for every file on the web, since you'll soonly run out of disk space storing them.
So, either read and parse a Content-length header or wait for the server to close the connection after sending all data (those are the two options in HTTP 1.0).
Finally you're not reading the response correctly, as pointed out by others. Please get these basics fixed, then you can try to store the response. Now your file is filled with zeroes.
What is the meaning of the for condition?
for(int x=0;fileData[x]>0;x++){
The fileData array was just created, so it is filled with zeroes. fileData[x]>0 is immediately false.
By modifying your code as below I was able to fill the file with data and here is what i got:
ERROR: Access Denied
ERROR Access Denied Access
Denied by security policy The security policy for
your network prevents your request from being allowed at this time.
Please contact your administrator if you feel this is incorrect.
try {
String address = "http://tineye.com/images/widgets/mona.jpg";
URL url_of_file = new URL(address);
String hostaddress = url_of_file.getHost();
Socket mysocket = new Socket(hostaddress, 80);
System.out.println("Socket opened to " + hostaddress + "\n");
String file = url_of_file.getFile();
System.out.println(" file = " + file);
OutputStreamWriter osw = new OutputStreamWriter(mysocket.getOutputStream());
osw.write("GET " + file + " HTTP/1.0\r\n\n");
osw.flush();
DataInputStream dis = new DataInputStream(mysocket.getInputStream());
byte[] fileData = new byte[7850];
FileOutputStream fos = new FileOutputStream(new File("C:\\Users\\aboutros\\Desktop\\filedownloaded.jgp")); // create an object representing the file we want to save
while (dis.read(fileData) >= 0) {
fos.write(fileData); // write out the file we want to save.
}
dis.close();
fos.close();
} catch (UnknownHostException ex) {
ex.printStackTrace();
} catch (IOException ex) {
ex.printStackTrace();
}
Is this condition correct? If yes what's the meaning behind this condition?
for(int x=0;fileData[x]>0;x++)
I'm writing a program that connects to a servlet thanks to a HttpURLConnection but I stuck while checking the url
public void connect (String method) throws Exception {
server = (HttpURLConnection) url.openConnection ();
server.setDoInput (true);
server.setDoOutput (true);
server.setUseCaches (false);
server.setRequestMethod (method);
server.setRequestProperty ("Content-Type", "application / xml");
server.connect ();
/*if (server.getResponseCode () == 200)
{
System.out.println ("Connection OK at the url:" + url);
System.out.println ("------------------------------------------- ------- ");
}
else
System.out.println ("Connection failed");
}*/
I got the error :
java.net.ProtocolException: Cannot write output after reading input.
if i check the url with the code in comments but it work perfectly without it
unfortunately, I need to check the url so i think the problem comes from the getResponseCode method but i don t know how to resolve it
Thank you very much
The HTTP protocol is based on a request-response pattern: you send your request first and the server responds. Once the server responded, you can't send any more content, it wouldn't make sense. (How could the server give you a response code before it knows what is it you're trying to send?)
So when you call server.getResponseCode(), you effectively tell the server that your request has finished and it can process it. If you want to send more data, you have to start a new request.
Looking at your code you want to check whether the connection itself was successful, but there's no need for that: if the connection isn't successful, an Exception is thrown by server.connect(). But the outcome of a connection attempt isn't the same as the HTTP response code, which always comes after the server processed all your input.
I think the exception is not due toprinting url. There should some piece of code which is trying to write to set the request body after the response is read.
This exception will occur if you are trying to get HttpURLConnection.getOutputStream() after obtaining HttpURLConnection.getInputStream()
Here is the implentation of sun.net.www.protocol.http.HttpURLConnection.getOutputStream:
public synchronized OutputStream getOutputStream() throws IOException {
try {
if (!doOutput) {
throw new ProtocolException("cannot write to a URLConnection"
+ " if doOutput=false - call setDoOutput(true)");
}
if (method.equals("GET")) {
method = "POST"; // Backward compatibility
}
if (!"POST".equals(method) && !"PUT".equals(method) &&
"http".equals(url.getProtocol())) {
throw new ProtocolException("HTTP method " + method +
" doesn't support output");
}
// if there's already an input stream open, throw an exception
if (inputStream != null) {
throw new ProtocolException("Cannot write output after reading
input.");
}
if (!checkReuseConnection())
connect();
/* REMIND: This exists to fix the HttpsURLConnection subclass.
* Hotjava needs to run on JDK.FCS. Do proper fix in subclass
* for . and remove this.
*/
if (streaming() && strOutputStream == null) {
writeRequests();
}
ps = (PrintStream)http.getOutputStream();
if (streaming()) {
if (strOutputStream == null) {
if (fixedContentLength != -) {
strOutputStream =
new StreamingOutputStream (ps, fixedContentLength);
} else if (chunkLength != -) {
strOutputStream = new StreamingOutputStream(
new ChunkedOutputStream (ps, chunkLength), -);
}
}
return strOutputStream;
} else {
if (poster == null) {
poster = new PosterOutputStream();
}
return poster;
}
} catch (RuntimeException e) {
disconnectInternal();
throw e;
} catch (IOException e) {
disconnectInternal();
throw e;
}
}
I have this problem too, what surprises me is that the error is caused by my added code System.out.println(conn.getHeaderFields());
Below is my code:
HttpURLConnection conn=(HttpURLConnection)url.openConnection();
conn.setRequestMethod("POST");
configureConnection(conn);
//System.out.println(conn.getHeaderFields()); //if i comment this code,everything is ok, if not the 'Cannot write output after reading input' error happens
conn.connect();
OutputStream os = conn.getOutputStream();
os.write(paramsContent.getBytes());
os.flush();
os.close();
I had the same problem.
The solution for the problem is that you need to use the sequence
openConnection -> getOutputStream -> write -> getInputStream -> read
That means..:
public String sendReceive(String url, String toSend) {
URL url = new URL(url);
URLConnection conn = url.openConnection();
connection.setDoInput(true);
connection.setDoOutput(true);
connection.sets...
OutputStreamWriter out = new OutputStreamWriter(conn.getOutputStream());
out.write(toSend);
out.close();
BufferedReader in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String receive = "";
do {
String line = in.readLine();
if (line == null)
break;
receive += line;
} while (true);
in.close();
return receive;
}
String results1 = sendReceive("site.com/update.php", params1);
String results2 = sendReceive("site.com/update.php", params2);
...
I know there are many people who already asked this Question, but in all the threads I read I couldn't find 1 solution for my problem (even if others had the same one, it didn't work for me).
As the Title says, I'm trying to connect from a Flash/SWF-Application to a small Java server I wrote via Sockets. It works fine offline (on the same machine), but as soon as I put the .swf on a Webspace and open it from there, Flash requests the Policy file from the server. There's nothing bad with that, but my problem is that Flash disconnects after (hopefully) getting the policy-file but doesn't reconnect again.
My server always opens a new Thread when a client connects, but that's not where the trouble is made, as I already tried it without opening a new Thread.
Here's my code:
while (true) {
connection = providerSocket.accept();
System.out.println("Incoming connection from " +
connection.getInetAddress().getHostName());
BufferedReader in = new BufferedReader(
new InputStreamReader(connection.getInputStream()));
String request = in.readLine();
if (request != null && request.contains("<policy-file-request/>")) {
System.out.println("Authorization request.");
PrintStream out = new PrintStream(connection.getOutputStream(), true);
out.println("<?xml version=\"1.0\"?><cross-domain-policy><!DOCTYPE cross-domain-policy SYSTEM \"http://www.macromedia.com/xml/dtds/cross-domain-policy.dtd\"><allow-access-from domain=\"*\" to-ports=\"3002\" /></cross-domain-policy>\u0000");
out.flush();
System.out.println("AuthData sent.");
connection.close();
System.out.println("Authorization complete.");
connection = providerSocket.accept();
System.out.println("TEST");
RequestProcessor c = new RequestProcessor(connection, connectionCounter++);
Thread t = new Thread(c);
t.start();
} else {
RequestProcessor c = new RequestProcessor(connection, connectionCounter++);
Thread t = new Thread(c);
t.start();
}
}
You will surely notice that I am using "\u0000" at the end instead of "\0", but don't worry, I also tested that case, didn't change anything. :/
I dont even reach the "TEST"-Ouput, because I don't get a new connection. And if I don't close the connection myself, flash automatically disconnects me.
The last thing I tried was just sending the xml without any request (right at the beginning, after the connection is established). I got a "recv failed" - Error for that.
PS: RequestProcessor is my new Thread, in which I would process the Strings/Commands sent from my.swf-File...
Thanks for helping me! :)
I had this problem before, you can not just use in.readLine() to get the policy file request string, because there're zero character.
To make sure you read the whole policy file request:
private String read(BufferedReader in) throws IOException {
StringBuilder builder = new StringBuilder();
int codePoint;
boolean zeroByteRead = false;
System.out.println("Reading...");
do {
codePoint = in.read();
if (codePoint == -1) {
return null;
}
if (codePoint == 0) {
zeroByteRead = true;
} else {
builder.appendCodePoint(codePoint);
}
} while (!zeroByteRead);
return builder.toString();
}
In the calling method:
BufferedReader in = new BufferedReader(new InputStreamReader(
clientSocket.getInputStream()));
String inputLine;
while ((inputLine = read(in)) != null) {
System.out.println("Receive from client: " + inputLine);
if ("<policy-file-request/>".equals(inputLine)) {
// Serve policy file, like the one in your question.
out.println(buildPolicy() +"\u0000");
} else {
// Do your job.
}
}
You can find the policy file project in java which can be downloaded. I myself thank to the guys over there.
I'm trying to login to a site that is using form-based authentication so that my application can go in, download the protected pages, and then exit (yes, I have a valid username/password combination).
I know:
1. the url to the login page
2. the url to the login authenticator
3. the method (post)
4. my information (obviously)
5. the username and password fields (which change based on...something. I already wrote a method to get the names).
Currently I'm using the code at this dream.in.code page as a base for my efforts.
Every time I run the application, it gets the login page sent back with a "bad username/password" message.
Code:
import java.net.*;
import java.util.LinkedList;
import java.io.*;
import javax.swing.JOptionPane;
public class ConnectToURL
{
// Variables to hold the URL object and its connection to that URL.
private static URL URLObj;
private static URLConnection connect;
private static String loginField;
private static String passwordField;
private static void getFields()
{
try
{
URLObj = new URL("http://url.goes.here/login.jsp");
connect = URLObj.openConnection();
// Now establish a buffered reader to read the URLConnection's input
// stream.
BufferedReader reader = new BufferedReader(new InputStreamReader(
connect.getInputStream()));
String lineRead = "";
LinkedList<String> lines = new LinkedList<String>();
// Read all available lines of data from the URL and print them to
// screen.
while ((lineRead = reader.readLine()) != null)
{
lines.add(lineRead);
}
reader.close();
while(lines.peekFirst().indexOf("<th>Username or E-mail:</th>") == -1)
{
lines.removeFirst();
}
String usernameCell = "";
while (usernameCell.indexOf("</td>") == -1)
{
usernameCell = usernameCell + lines.removeFirst().trim();
}
usernameCell = usernameCell.substring(usernameCell.indexOf("name=\"") + 6);
usernameCell = usernameCell.substring(0, usernameCell.indexOf("\""));
loginField = usernameCell;
while(lines.peekFirst().indexOf("<th>Password:</th>") == -1)
{
lines.removeFirst();
}
String passwordCell = "";
while (passwordCell.indexOf("</td>") == -1)
{
passwordCell = passwordCell + lines.removeFirst().trim();
}
passwordCell = passwordCell.substring(passwordCell.indexOf("name=\"") + 6);
passwordCell = passwordCell.substring(0, passwordCell.indexOf("\""));
passwordField = passwordCell;
}
catch (MalformedURLException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void main(String[] args)
{
try
{
// getFields() grabs the names of the username and password fields and stores them into variables above
getFields();
// Establish a URL and open a connection to it. Set it to output
// mode.
URLObj = new URL("http://url.goes.here/login_submit.jsp");
connect = URLObj.openConnection();
HttpURLConnection.setFollowRedirects(true);
connect.setDoOutput(true);
}
catch (MalformedURLException ex)
{
System.out
.println("The URL specified was unable to be parsed or uses an invalid protocol. Please try again.");
System.exit(1);
}
catch (Exception ex)
{
System.out.println("An exception occurred. " + ex.getMessage());
System.exit(1);
}
try
{
// Create a buffered writer to the URLConnection's output stream and
// write our forms parameters.
BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(
connect.getOutputStream()));
// For obvious reasons, login info is editted.
// The line begins with username=& because there's a username field that send no data and is set to display:none.
// When I observed the request in Chrome, username was sent, but left blank. Without it, my request doesn't go through.
writer.write("username=&" + loginField + "=" + URLEncoder.encode("Username", "UTF-8") + "&" + passwordField + "=" + URLEncoder.encode("myPassword", "UTF-8"));
writer.close();
// Now establish a buffered reader to read the URLConnection's input
// stream.
BufferedReader reader = new BufferedReader(new InputStreamReader(
connect.getInputStream()));
String lineRead = "";
// Read all available lines of data from the URL and print them to
// screen.
while ((lineRead = reader.readLine()) != null)
{
System.out.println(lineRead);
}
reader.close();
}
catch (Exception ex)
{
System.out.println("There was an error reading or writing to the URL: "
+ ex.getMessage());
}
}
}
I would try to use something like HttpFox or Fiddler to see what exactly is being sent during the login and try to emulate that. Sometimes login pages massage what is being sent with Javascript.
Use LiveHTTPHeaders to check out EVERYTHING that gets posted. There is probably cookie/session data that you aren't passing though to the POST command.
Also, referrer is sometimes monitored and should be faked as well by passing the header "Referrer: http://homepage.com.../login.html"
have you tried using Appache httpClient (http://hc.apache.org/httpcomponents-client-ga/) rather writing your own code where you are parsing html and inserting values?
I believe, you don't have to parse html. Your steps should be
Look at the html and see to which "url" your authentication request is going to
open your connection to the url you found rather sending to the "login page" and parsing it to find.
httpclient class can help you manage your session to keep your session alive. you can do it urself but it would be a lot of work