How to get a part of an URL content - java

I wan't to display a part of an HTML page, but I only find how to get a source code of an HTML page when I search a solution: How to get the html-source of a page from a html link in android?
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
But if I do that, how can I get a part of the page, and the solution is too old. I've found an another solution named webview, but I don't know what is the best solution to display a part of a html page and how to do it.
EDIT:
This page: http://www.solutis.fr/groupe-solutis,mentions-legales.html
Without tag, header, footer, only the content of body without tag too.

What you need is one more line - when you become the HttpResponse it is the whole html from the response site, so you need to remove all tags from it and you could do this with a single line
String responseAsText = android.text.Html.fromHtml(html).toString();
where html is your string with the repsonse from the HttpResponse.

Related

Preloading a website before fetching HTML from the URL

I'm trying to get data off of a URL, but the information I need takes a few seconds to load, and only shows as LOADING in the HTML until it does load, so when I use this code I can't pull the data I need.
URL url = new URL("https://www.cardservices.uga.edu/fs_mobile/");
URLConnection con = url.openConnection();
InputStream is = con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = br.readLine()) != null){
System.out.println(lineNumber +": "+ line);
}
How could I go about allowing the URL to load for a set amount of time before pulling the HTML off of it?
The webpage you are calling probably call an ajax call to fetch the data, thats why you won't get it using your approach.
You have 2 options to get that data:
Use browser's inspect elements(F12 in chrome) and in "network" tab, get that ajax call, and use it instead of the URL you are using in your code.
Call your URL using a headless library(e.g ghoustjs) and after page is load crawl the data.
IMO I would choose option 1
Here is a working alternate,
URL url = new URL("https://www.cardservices.uga.edu/fs_mobile/index.php/dashboard/occupancies/"); //This is the AJAX call that goes to load the data into webpage. You can get this from inspecting the network calls.
URLConnection con = url.openConnection();
InputStream is = con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = br.readLine()) != null){
System.out.println(line);
}
Which basically gives you the JSON response containing the percentage.
Hope it helps.
Also, you can use Selenium for performing wait if you are so curious to get the exact HTML output.

Android - How to fill data in an HTML form on an external website and hit the upload button?

I have an Android device. I want to fill a form in my app, with edittexts etc (one of these fields would take the path of an image on the SDCard). I want these form contents to be the data for an HTML form in an external website where this file (from the SD Card) needs to be uploaded. The HTML form has an upload button. I do not want to show this HTML webpage to my android app users. Is there any way to do this? Please let me know! Thanks!
EDIT: I've looked through many websites and I understand that I should use a HttpPost. I have a few doubts though:
1. What is the url that you use in HttpPost- Is it the url which contains the form, or the url which the form redirects to.
2. In a multipartentity, what is the first parameter in addPart? Is it the ID given to the field or the name?
3. How does the HttpPost know which form it should go to?
Well, you need to make a MultiPart Http Post. You could use this sample:
HttpClient httpClient = new DefaultHttpClient();
HttpPost postRequest = new HttpPost("target_link");
MultipartEntity reqEntity = new MultipartEntity(HttpMultipartMode.BROWSER_COMPATIBLE);
reqEntity.addPart("data1", new StringBody("Data1"));
reqEntity.addPart("data2", new StringBody("Data2"));
reqEntity.addPart("data3",new StringBody("Data3"));
try{
ByteArrayOutputStream bos = new ByteArrayOutputStream();
bitmap.compress(CompressFormat.JPEG, 80, bos);
byte[] data = bos.toByteArray();
ByteArrayBody bab = new ByteArrayBody(data, "forest.jpg");
reqEntity.addPart("picture", bab);
}
catch(Exception e){
//Log.v("Exception in Image", ""+e);
reqEntity.addPart("picture", new StringBody(""));
}
postRequest.setEntity(reqEntity);
HttpResponse response = httpClient.execute(postRequest);
BufferedReader reader = new BufferedReader(new InputStreamReader(response.getEntity().getContent(), "UTF-8"));
String sResponse;
StringBuilder s = new StringBuilder();
while ((sResponse = reader.readLine()) != null) {
s = s.append(sResponse);
}
Personally, I prefer to use Spring for Android as that is easier to configure. Here's a link with a multi-part Http Post.
Good luck!

Get raw text from html

Im on quite a basic level of android development.
I would like to get text from a page such as "http://www.google.com". (The page i will be using will only have text, so no pictures or something like that)
So, to be clear: I want to get the text written on a page into etc. a string in my application.
I tried this code, but im not even sure if it does what i want.
URL url = new URL(/*"http://www.google.com");
URLConnection connection = url.openConnection();
// Get the response
BufferedReader rd = new BufferedReader(new InputStreamReader(connection.getInputStream()));
String line = "";
I cant get any text from it anyhow. How should I do this?
From the sample code you gave you are not even reading the response from the request. I would get the html with the following code
URL u = new URL("http://www.google.com");
URLConnection conn = u.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
conn.getInputStream()));
StringBuffer buffer = new StringBuffer();
String inputLine;
while ((inputLine = in.readLine()) != null)
buffer.append(inputLine);
in.close();
System.out.println(buffer.toString());
From there you would need to pass the string into some kind of html parser if you want only the text. From what I've heard JTidy would is a good library for this however I have never used any Java html parsing libraries.
You want to extract text from HTML file? You can make use of specialized tool such as the Jericho HTML parser library. I'm not sure if it can be used directly in Android app, it is quite big, but it is open source so you can make use of its code and take only what you need for your task.
Here is one way:
public String scrape(String urlString) throws Exception {
URL url = new URL(urlString);
URLConnection connection = url.openConnection();
BufferedReader reader = new BufferedReader(new InputStreamReader(
connection.getInputStream()));
String line = null, data = "";
while ((line = reader.readLine()) != null) {
data += line + "\n";
}
return data;
}
Here is another.

Get HTML body from a webpage in Android?

I want my Android app to check for update so I hosted a simple HTML page with this code:
<html>
<body>2.3</body> // Latest version
</html>
So I would get the version in the Body and compare it to the current version that is in the phone.
How do I get that number from a web page?
Android has left the net, io and nio.
Try the Java.net.URLConnection: http://docs.oracle.com/javase/tutorial/networking/urls/readingWriting.html
URL url = new URL("http://url for your webpage");
URLConnection yc = url.openConnection();
BufferedReader in = new BufferedReader(
new InputStreamReader(
yc.getInputStream()));
String inputLine;
StringBuilder builder = new StringBuilder();
while ((inputLine = in.readLine()) != null)
builder.append(inputLine.trim());
in.close();
String htmlPage = builder.toString();
String versionNumber = htmlPage.replaceAll("\\<.*?>","");
NOTE: this will work only if your webpage contains html element like you put above , but it doesnot work if there is an entity element like & in your html page .

Unreadable Characters in Apache HttpClient

I'm trying to login to a webpage, but even before that, I'm loading the page using HttpGet, and this is one the lines that's being returned,
ÓA;
That's all I could put, won't let me paste any other characters. But they are all like that, like I'm somehow getting the wrong encoding? Here is the code I am using to GET
HttpGet httpget = new HttpGet(url);
if(headers == null) {
headers = getDefaultHeaders();
}
for(String s : headers.keySet()) {
httpget.addHeader(s, headers.get(s));
}
HttpResponse response = getClient().execute(httpget);
HttpEntity entity = response.getEntity();
System.out.println("Status Line: " + response.getStatusLine());
if (entity != null) {
InputStream input = entity.getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
String ln = "";
while((ln = reader.readLine()) != null) {
System.out.println("During Get - " + ln);
}
}
What am I doing wrong?
Thanks for any help.
If you need any more information like headers, just ask.
The following line is possibly the cause of your problems:
BufferedReader reader = new BufferedReader(new InputStreamReader(input));
You are creating a reader using the default characterset of your platform, and completely ignoring any character set that may be specified in the HTTP response headers.
If you are getting the same problem when reading the content the correct way, then it is possible that the server is at fault for not setting the response header correctly.
DO the entity reading like this:
String content = org.apache.http.util.EntityUtils.toString( entity );
System.out.println(content);
This is going to read it all for you so you can check what's being really returned.
Make sure that you didn't accidentally go to port 443 with a simple HTTP connection. Because in that case you will get back the SSL handshake instead of an HTTP response.

Categories

Resources