When I'm trying to download html using this method:
public class DownloadHtml extends AsyncTask<String, Void, String> {
#Override
protected String doInBackground(String... urls) {
String result = "";
URL url;
HttpURLConnection connection = null;
try {
url = new URL(urls[0]);
connection = (HttpURLConnection) url.openConnection();
InputStream inputStream = connection.getInputStream();
InputStreamReader reader = new InputStreamReader(inputStream);
int data = reader.read();
while (data != -1) {
char currentChar = (char) data;
result += currentChar;
data = reader.read();
}
return result;
} catch (Exception e) {
e.printStackTrace();
return "Failed";
}
}
}
And logging a result
DownloadHtml downloadHtml = new DownloadHtml();
String result = null;
try {
result = downloadHtml.execute("http://stackoverflow.com").get();
} catch (Exception e) {
e.printStackTrace();
}
Log.i("Html", result);
I am gettin only small part of it.
Is there a way to get whole HTML of webpage?
Solution was simple. Looks like Log.i doesn't print everything in one go.
When I have tried to get all the links from HTML they were successfully printed.
Related
I am trying to download the HTML of page. After it downloads I try to Log it. Everything goes smoothly but the HTML stops at a certain point every time, even though it has a lot more HTML to show.
I tried using a different page, my page which just has some instructions for my Company and it worked perfectly. Is there a limit maybe? I tried it with urlconnection.connect(), and without it and there is no difference.
public class MainActivity extends AppCompatActivity {
public class DownloadHTML extends AsyncTask<String, Void, String>{
#Override
protected String doInBackground(String... urls) {
URL url;
String result = "";
HttpURLConnection urlConnection = null;
try {
url = new URL(urls[0]);
urlConnection = (HttpURLConnection)url.openConnection();
InputStream in = urlConnection.getInputStream();
InputStreamReader reader = new InputStreamReader(in);
int data = reader.read();
while (data!=-1){
char current = (char) data;
result += current;
data = reader.read();
}
return result;
} catch (Exception e) {
e.printStackTrace();
return "Fail";
}
}
}
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
String Result = "";
DownloadHTML task = new DownloadHTML();
try {
Result = task.execute("http://www.posh24.se/kandisar").get();
} catch (Exception e) {
e.printStackTrace();
}
Log.i("URL", Result);
}
}
Here is the splitting and it wont work.
try {
Result = task.execute("http://www.posh24.se/kandisar").get();
String[] splitStrings = Result.split("<div class=\"channelListEntry\">");
Pattern p = Pattern.compile("<img src=\"(.*?)\"");
Matcher m = p.matcher(splitStrings[0]);
while (m.find()){
CelebUrls.add(m.group(1));
}
p = Pattern.compile("alt=\"(.*?)\"");
m = p.matcher(splitStrings[0]);
while (m.find()){
CelebNames.add(m.group(1));
}
} catch (Exception e) {
e.printStackTrace();
}
Log.i("URL", Arrays.toString(CelebUrls.toArray()));
}
}
Modifing your method like this will give you the content of the html page in UTF-8 format.
(In this case its UTF-8 because the page is encoded like that, in doubt you can pass Charset.forName("utf-8") as second paramter to the constructor of InputStreamReader)
When testing you example implementation I only got some output with various unreadable characters.
Ignore the class and the method changes, I only made them to have a standalone example.
public class ParsingTest {
static String doInBackground(String address) {
URL url;
StringBuilder result = new StringBuilder(1000);
HttpURLConnection urlConnection = null;
try {
url = new URL(address);
urlConnection = (HttpURLConnection)url.openConnection();
InputStream in = urlConnection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
String line = reader.readLine();
while (line != null){
result.append(line);
result.append("\n");
line = reader.readLine();
}
return result.toString();
} catch (Exception e) {
e.printStackTrace();
return "Fail";
}
}
public static void main(String[] args) {
String result = doInBackground("http://www.posh24.se/kandisar");
System.out.println(result);
}
}
If the only part that interest you are the images of the top100, you can just adjust the while loop to:
String line = reader.readLine();
while (line != null){
if (line.contains("<div class=\"channelListEntry\">")) {
reader.readLine();
reader.readLine();
line = reader.readLine().trim();
// At this points its probably easier to use a List<String> for the result instead
result.append(line);
result.append("\n");
}
line = reader.readLine();
}
This is only a simplied example based on the current design of the page, where the img comes 3 lines after the declaration of the div.
If you want to you can also just extract the url of the image and the alt description directly at this point. Instead of using complicated regex you could rely on the String#indexOf instead.
private static final String SRC = "src=\"";
private static final String ALT = "\" alt=\"";
private static final String END = "\"/>";
public static void extract(String image) {
int index1 = image.indexOf(SRC);
int index2 = image.indexOf(ALT);
int index3 = image.indexOf(END);
System.out.println(image);
System.out.println(image.substring(index1 + SRC.length(), index2));
System.out.println(image.substring(index2 + ALT.length(), index3));
}
Note that if you directly process the content from the page your app does not require the memory to store the full page.
Its taking too long to compile the code (around 5mins +, only for this app).
Also when it's finally done, complete HTML is not displayed in the logcat! Only partial.
Can you guys please point out what's wrong with the code?
Is it because of "InputStream" reading character by character (as the HTML is huge)?
public class MainActivity extends AppCompatActivity {
public class DownloadTask extends AsyncTask<String, Void, String> {
#Override
protected String doInBackground(String... urls) {
String result = "";
URL url;
HttpURLConnection urlConnection = null;
try {
url = new URL(urls[0]);
urlConnection = (HttpURLConnection) url.openConnection();
InputStream in = urlConnection.getInputStream();
InputStreamReader reader = new InputStreamReader(in);
int data = reader.read();
while (data != -1) {
char current = (char) data;
result += current;
data = reader.read();
}
return result;
} catch (Exception e) {
e.printStackTrace();
return "Failed";
}
}
}
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
DownloadTask task = new DownloadTask();
String result = null;
try {
result = task.execute("http://www.amazon.com").get();
} catch (Exception e) {
e.printStackTrace();
}
Log.i("Result",result);
}
Yes. The more system calls you make like that, the worse your performance. You should be reading in multiple kilobytes at a time, not characters. If you need to loop over it one at a time, do that afterwards.
Also, use a StringBuilder!!!! + on a string is HIGHLY inefficient. For every character you make a new String object. StringBuilder avoids that.
I am trying to download the Source code of a web page .
But the problem is the whole code is not being showing up only a small part is downloading every time .
public class MainActivity extends AppCompatActivity {
public class DownloadTask extends AsyncTask < String , Void , String >
{
#Override
protected String doInBackground(String... params) {
String content ="";
URL url ;
HttpURLConnection conn = null;
try {
url = new URL (params[0]);
conn = (HttpURLConnection)url.openConnection();
InputStream is = conn.getInputStream();
InputStreamReader isr = new InputStreamReader(is);
int data = isr.read();
while(data!=-1)
{
char c = (char) data;
content += c;
data = isr.read();
}
Log.i("The Code is ",content);
}
catch (Exception e)
{
e.getStackTrace();
}
return content;
}
}
#Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
String result =" ";
DownloadTask DT = new DownloadTask();
try {
result = DT.execute("https://www.google.co.in").get();
}
catch (Exception e)
{
e.getStackTrace();
}
Log.i("The Code is ",result);
}
}
It's important to close the StreamReader. Might not be a problem, but it's a good practice.
while(data!=-1)
{
char c = (char) data;
content += c;
data = isr.read();
}
isr.close();
is.close();
I think your first page is downloaded fine, but when you try to load it again and again you might face problem. As I said this might not be a fix, but it's important. Hope this helps someone.
I am trying to parse the following page with AsyncTask, urlConnection and InputStreamReader
public class DownloadTask extends AsyncTask<String, Void, String> {
URL url;
URLConnection urlConnection;
String result = null;
#Override
protected String doInBackground(String... urls) {
try {
url = new URL(urls[0]);
urlConnection = (URLConnection) url.openConnection();
InputStream in = urlConnection.getInputStream();
InputStreamReader reader = new InputStreamReader(in);
int data = reader.read();
while (data != -1) {
char current = (char) data;
result += current;
data = reader.read();
}
return result;
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
}
And I am using this on this way:
DownloadTask downloadTask = new DownloadTask();
String data = null;
try {
data = downloadTask.execute("http://www.imdb.com/movies-in-theaters/").get();
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
}
The problem is that it takes more than 3 mins to finish the DownloadTask. Finally after this time it works on the emulator but not in a real device.
I know that this is not a good way (parsing a web page) to do stuff like that, but I am doing it for educational reasons.
Any advice how I can speed up the procedure?
Thanks!
Here is my code. I did a "GET" method for have a response of my DB.
Then I read my own file csv. All is ok in this point, but... I have not idea how can i do a "POST" method. i know that i need to use "addRequestProperty"method.
Any idea for create vertex and edge?
public void run() throws MalformedURLException, JSONException, IOException {
String viaURl = "http://xxxxxxxxxxxxxxxxxxxxxxxxxxx/mydb";
URL url = new URL(viaURl);
HttpURLConnection conexion = null;
String texto = null;
String json;
BufferedReader in = null, in2 = null;
int numDump = 5;
String dato;
String csvSplitBy = ";";
int numApps = 0;
OutputStreamWriter out;
try {
Authenticator.setDefault(new Authenticator() {
#Override
protected PasswordAuthentication getPasswordAuthentication() {
return new PasswordAuthentication("xxxxx", "xxxxxxxxxxxxx.".toCharArray());
}
});
conexion = (HttpURLConnection) url.openConnection();
conexion.setRequestMethod("GET");
conexion.connect();
System.out.println("¡¡¡Conectado!!!");
in = new BufferedReader(new InputStreamReader(conexion.getInputStream()));
out = new OutputStreamWriter(conexion.getOutputStream());
json = "";
while ((texto = in.readLine()) != null) {
json += texto;
}
in.close();
System.out.println(json);
conexion.setDoOutput(true);
try {
for (int i = 0; i < numDump; i++) {
String csvFile = "/home/danicroque/dump/dump_" + i;
try {
in2 = new BufferedReader(new FileReader(csvFile));
while ((dato = in2.readLine()) != null) {
numApps++;
String[] datos = dato.split(csvSplitBy, 15);
conexion.setRequestMethod("POST");
conexion.addRequestProperty("_id0" , datos[0]);
}
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
} catch (IOException ex) {
ex.printStackTrace();
}
} catch (IOException e) {
e.printStackTrace();
} finally {
System.out.println("Fin");
}
}
}
Thank in advance.
You can use this POST methods to create class:
http://your_host:2480/class/mydb/className
to add property to a class
http://your_host:2480/property/mydb/className/propertyName
You can fine more detailed information here.
Hope it helps,
Alex.
UPDATE:
To insert use this POST method:
http://your_host:2480/command/mydb/sql/insert into className(propertyName) values(“yourValue”)