First of all, I'm a newbie to Java and my English is bad, so hope you can understand my problem.
I want to read the text file from this URL: http://www.cophieu68.com/export/metastock.php?id=AAA
Okay, let me explain. This is a Vietnamese stock data website and the link above point to the file aaa.txt which contains the information of the stock with codename is AAA. And I can take the other stocks info by just modifying the value of the id variable.
And my problem is what I get is a bunch of HTML code, not the text file I expect (aaa.txt)
And here is my code:
public static void main(String[] args){
try {
URL url = new URL("http://www.cophieu68.com/export/metastock.php?id=AAA");
URLConnection urlConn = url.openConnection();
System.out.println(urlConn.getContentType()); //it returns text/html
BufferedReader in = new BufferedReader
(new InputStreamReader(urlConn.getInputStream()));
String text;
while ((text = in.readLine()) != null) {
System.out.println(text);
}
in.close();
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
Thanks for your help.
The site seems to be sniffing the user-agent to decide what content to send down.
If you spoof the user-agent as shown below, it works as you'd expect - the response is the plain-text file:
urlConn.setRequestProperty ( "User-agent", "Mozilla/5.0 (X11; U; Linux i686; pl-PL; rv:1.9.0.2) Gecko/20121223 Ubuntu/9.25 (jaunty) Firefox/3.8");
As you can probably tell, this pretends that the user-agent is Firefox 3.8 on Ubuntu.
It is probably because the link (http://www.cophieu68.com/export/metastock.php?id=AAA) is send as an attachment. If you have access to the PHP file you should just do nothing but print the data and include
header('Content-Type: text/plain');
in your PHP file
Related
When I try to check status codes within sites I face off 403 response code after a while. First when I run the code every sites send back datas but after my code repeat itself with Timer I see one webpage returns 403 response code. Here is my code.
public class Main {
public static void checkSites() {
Timer ifSee403 = new Timer();
try {
File links = new File("./linkler.txt");
Scanner scan = new Scanner(links);
ArrayList<String> list = new ArrayList<>();
while(scan.hasNext()) {
list.add(scan.nextLine());
}
File linkStatus = new File("LinkStatus.txt");
if(!linkStatus.exists()){
linkStatus.createNewFile();
}else{
System.out.println("File already exists");
}
BufferedWriter writer = new BufferedWriter(new FileWriter(linkStatus));
for(String link : list) {
try {
if(!link.startsWith("http")) {
link = "http://"+link;
}
URL url = new URL(link);
HttpURLConnection.setFollowRedirects(true);
HttpURLConnection http = (HttpURLConnection)url.openConnection();
http.setRequestMethod("HEAD");
http.setConnectTimeout(5000);
http.setReadTimeout(8000);
int statusCode = http.getResponseCode();
if (statusCode == 200) {
ifSee403.wait(5000);
System.out.println("Hello, here we go again");
}
http.disconnect();
System.out.println(link + " " + statusCode);
writer.write(link + " " + statusCode);
writer.newLine();
} catch (Exception e) {
writer.write(link + " " + e.getMessage());
writer.newLine();
System.out.println(link + " " +e.getMessage());
}
}
try {
writer.close();
} catch (Exception e) {
System.out.println(e.getMessage());
}
System.out.println("Finished.");
} catch (Exception e) {
System.out.println(e.getMessage());
}
}
public static void main(String[] args) throws Exception {
Timer myTimer = new Timer();
TimerTask sendingRequest = new TimerTask() {
public void run() {
checkSites();
}
};
myTimer.schedule(sendingRequest,0,150000);
}
}
How can I solve this? Thanks
Edited comment:
I've added http.disconnect(); for closing connection after checked status codes.
Also I've added
if(statusCode == 200) {
ifSee403.wait(5000);
System.out.println("Test message);
}
But it didn't work. Compiler returned current thread is not owner error. I need to fix this and change 200 with 403 and say ifSee403.wait(5000) and try it again the status code.
One "alternative" - by the way - to IP / Spoofing / Anonymizing would be to (instead) try "obeying" what the security-code is expecting you to do. If you are going to write a "scraper", and are aware there is a "bot detection" that doesn't like you debugging your code while you visit the site over and over and over - you should try using the HTML Download which I posted as an answer to the last question you asked.
If you download the HTML and save it (save it to a file - once an hour), and then write you HTML Parsing / Monitoring Code using the HTML contents of the file you have saved, you will (likely) be abiding by the security-requirements of the web-site and still be able to check availability.
If you wish to continue to use JSoup, that A.P.I. has an option for receiving HTML as a String. So if you use the HTML Scrape Code I posted, and then write that HTML String to disk, you can feed that to JSoup as often as you like without causing the Bot Detection Security Checks to go off.
If you play by their rules once in a while, you can write your tester without much hassle.
import java.io.*;
import java.net.*;
...
// This line asks the "url" that you are trying to connect with for
// an instance of HttpURLConnection. These two classes (URL and HttpURLConnection)
// are in the standard JDK Package java.net.*
HttpURLConnection con = (HttpURLConnection) url.openConnection();
// Tells the connection to use "GET" ... and to "pretend" that you are
// using a "Chrome" web-browser. Note, the User-Agent sometimes means
// something to the web-server, and sometimes is fully ignored.
con.setRequestMethod("GET");
con.setRequestProperty("User-Agent", "Chrome/61.0.3163.100");
// The classes InputStream, InputStreamReader, and BufferedReader
// are all JDK 1.0 package java.io.* classes.
InputStream is = con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
StringBuffer sb = new StringBuffer();
String s;
// This reads each line from the web-server.
while ((s = br.readLine()) != null) sb.append(s + "\n");
// This writes the results from the web-server to a file
// It is using classes java.io.File and java.io.FileWriter
File outF = new File("SavedSite.html");
outF.createNewFile();
FileWriter fw = new FileWriter(outF);
fw.write(sb.toString());
fw.close();
Again, this code is very basic stuff that doesn't use any special JAR Library Code at all. The next method uses the JSoup library (which you have explicitly requested - even though I don't use it... It is just fine!) ... This is the method "parse" which will parse the String you have just saved. You may load this HTML String from disk, and send it to JSoup using:
Method Documentation: org.jsoup.Jsoup.parse(File in, String charsetName, String baseUri)
If you wish to invoke JSoup just pass it a java.io.File instance using the following:
File f = new File("SavedSite.html");
Document d = Jsoup.parse(f, "UTF-8", url.toString());
I do not think you need timers at all...
AGAIN: If you are making lots of calls to the server. The purpose of this answer is to show you how to save the response of the server to a file on disk, so you don't have to make lots of calls - JUST ONE! If you restrict your calls to the server to once per hour, then you will (likely, but not a guarantee) avoid getting a 403 Forbidden Bot Detection Problem.
I'm doing an on-line Java course and created a very simple server. Using Edge as client it works but not with Chrome. I've tried turning off firewall and some of the things suggested here (WAMP server localhost wont work with Chrome) and elsewhere, but they all seem to be overly complicated changes to Windows files. Surely there's a config setting somewhere?!
In Edge it's fine, in Chrome it returns this whether I use localhost:999 or 127.0.0.1:999. I've tried other ports too.
This page isn’t working
127.0.0.1 sent an invalid response.
ERR_INVALID_HTTP_RESPONSE
Here is my code in case needed.
public static void main(String[] args) {
try (
ServerSocket aServer = new ServerSocket(999);
Socket cn = aServer.accept();
BufferedReader bis = new BufferedReader(new InputStreamReader(cn.getInputStream()));
BufferedOutputStream bos = new BufferedOutputStream(cn.getOutputStream());)
{
String line = bis.readLine();
while(line != null && !line.equals(""))
{
System.out.println(line);
line = bis.readLine();
}
bos.write("Hello from java TCP Server!".getBytes());
} catch (IOException ex) {
System.out.println("Error in connnection: " + ex.getMessage());
}
}
Any help appreciated.
Web browsers require that the server speaks HTTP protocol. A proper server response would be:
BufferedOutputStream bos = new BufferedOutputStream(System.out);
byte[] message="Hello from java TCP Server!".getBytes();
bos.write("HTTP/1.1 200 OK\r\n".getBytes());
bos.write("Content-Type: text/plain\r\n".getBytes());
bos.write(("Content-Length: "+message.length+"\r\n").getBytes());
bos.write("\r\n".getBytes()); // empty line between HTTP header and HTTP content
bos.write(message);
You may test your program with another tool that speaks plain TCP sockets. Ich recommend "Netcat" (command: nc). Download for Windows: http://stefanfrings.de/avr_tools/netcat-win32-1.12.zip
My issue is in my title,also I will give java code and detail.
void getSourceCode(String text_url){
String source_code="";
BufferedReader reader = null;
try {
reader = new BufferedReader(new InputStreamReader((new URL(text_url)).openStream(), Charset.forName("UTF-8")));
String inputLine;
while ((inputLine = reader.readLine()) != null) {
source_code+=inputLine.replace(" ", "");
}
}
catch (Exception e) {
e.printStackTrace();
}
finally {
if (reader != null) {
try {
reader.close();
}
catch (IOException e) {
e.printStackTrace();
}
}
}
System.out.println( source_code );
}
For example,I send "http://ekenlermangalkomuru.com/urunlerimiz/liste/144/BinarKömürü" as the parameter via html a href="..." parsing but the error report is :
java.io.IOException: Server returned HTTP response code: 400 for URL:
http://ekenlermangalkomuru.com/urunlerimiz/liste/144/BinarKömürü
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
The link works on all browsers but when I tried to connect via java code it doesnt work.How I can I solve ? Thanks for all advices..
Error 400 means that the request is malformed so that the server is unable to process it. Are you sure that you have properly URL encoded the request URL? For something like: http://ekenlermangalkomuru.com/urunlerimiz/liste/144/BinarKömürü, at least ö and ü are not ASCII letters and need to be encoded to create a well-formed request.
Error 400 means that you didn't make a "good" request to the site. I think you would need something more than just a BufferedReader to open a URL via HTTP. That's because the protocol needs to know what you want to see and if you have a cache available etc. To open a URL you should use HttpUrlConnection, check How to send HTTP request in java? out for info, also http://docs.oracle.com/javase/7/docs/api/java/net/HttpURLConnection.html could be of help.
Hey I am relatively new to java, and I am trying to make an application that does the following:
Sends a request to a live website
Retrieves the data of that page
For example, assume the following site displays game results, where 'game=500' shows the results for game number 324 of 500 different games. http://www.some-site.com/results.php?game=324
I would like to use a Java program to automatically cycle through the game=1 to game=500, posting to the website and retrieving the results of the page.
What is the best way to do this? Can anyone give me a simple example? If I knew the correct java 'key words', I would google for some tutorials on this concept.
Note: the target-page in question is php
URL url;
InputStream is = null;
DataInputStream dis;
String line;
for(int i=1;i<=500;i++){
try {
url = new URL("http://www.some-site.com/results.php?game="+i);
is = url.openStream(); // throws an IOException
dis = new DataInputStream(new BufferedInputStream(is));
while ((line = dis.readLine()) != null) {
//do sth with the datea
}
} catch (MalformedURLException mue) {
mue.printStackTrace();
} catch (IOException ioe) {
ioe.printStackTrace();
} finally {
try {
is.close();
} catch (IOException ioe) {
// nothing to see here
}
}
}
Do something like the answer in this other stackoverflow page
and then you want to use a for loop for loop through pages 1 through 500.
Apache has some really good Java libraries for accessing HTTP. See this for more details.
I know virtually no java, but I need to make a simple java application for mobiles that would display a form.On being submitted The data would then be sent to a webpage. As I am a php programmer, I would prefer to have it sent to a php file, which would then use the form's data. I only need a couple of input text areas, would anybody be able to help me with the java part?
Thanks in advance.
Niall
You don't mention what platform but no matter what you choose you will need to look into HTTP Client and here is a good example of mimicking a form based submission.
As far as presenting a form, that's very platform dependent.
If you can call your PHP File by calling an URL via GET, than you should be able to solve your problem with the following piece of code:
int variable1 = 4;
String variable2 = "My Phone Service";
URL url = null;
try {
url = new URL("http://myserver.com/service.php?var1=" + variable1 + "&variable2=" + variable2);
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openConnection().getInputStream()));
String s = null;
while((s= reader.readLine())!=null){
System.out.println(s);
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}