Jsoup display data to textview - java

I parsed a html web page with jsoup. now i want to display my parsed data in my textview.
code
String ID = loginpreferences.getString("ID", null);
String Type = loginpreferences.getString("Type", null);
String myURL = "http://roosters.gepro-osi.nl/roosters/rooster.php?leerling="+ID+"&type=Leerlingrooster&afdeling="+Type+"&tabblad=2&school=905";
Document doc = null;
try {
doc = Jsoup.connect(myURL).get();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
Elements data = doc.select(".1nameheader");
}
}
I tried
Textview1.SetText(data);
But that didn't work.

Seems as if you want to print the text values from a list of Elements. To do so you need to iterate over the list of Elements and get the text out of them.
StringBuilder text = new StringBuilder();
for(Element e: data){
text.append(e.text());
}
Textview1.setText(text.toString());

Line
Textview1.SetText(data);
shouldn't even compile.
From Android TextView class reference:
final void setText(CharSequence text)
Sets the string value of the TextView.
You're giving Elements class instance to the method.
Element and Elements classes of JSoup provide you with html() and text() methods that you should use in that case.

Have you tried android.text.html.forHtml(String)?
This method gets a html as input and returns a spanned text that you cat set it to a TextView

Related

Java XML Read with WSIL file

at the moment I am trying to program a program which is able to render a link of an xml-file. I use Jsoup, my current code is the following
public static String XmlReader() {
InputStream is = RestService.getInstance().getWsilFile();
try {
Document doc = Jsoup.parse(fis, null, "", Parser.xmlParser());
} catch (Exception e) {
e.printStackTrace();
return null;
}
}
}
I would like to read the following part from a XML file:
<wsil:service>
<wsil:abstract>Read the full documentation on: https://host/sap/bc/mdrs/cdo?type=psm_isi_r&objname=II_QUERY_PROJECT_IN&saml2=disabled</wsil:abstract>
<wsil:name>Query Projects</wsil:name>
<wsil:description location="host/sap/bc/srt/wsdl/srvc_00163E5E1FED1EE897C188AB4A5723EF/wsdl11/allinone/ws_policy/document?sap-vhost=host&saml2=disabled" referencedNamespace="http://schemas.xmlsoap.org/wsdl/"/>
</wsil:service>
I want to return the following URL as String
host/sap/bc/srt/wsdl/srvc_00163E5E1FED1EE897C188AB4A5723EF/wsdl11/allinone/ws_policy/document?sap-vhost=host&saml2=disabled
How can I do that ?
Thank you
If there is only one tag wsil:description then you can use this code:
doc.outputSettings().escapeMode(EscapeMode.xhtml);
String val = doc.select("wsil|description").attr("location");
Escape mode should be changed, since you are not working on regular html, but xml.
If you have more than one tag with given name you can search for distinct neighbour element, and find required tag with respect to it:
String val = doc.select("wsil|name:contains(Query Projects)").first().parent().select("wsil|description").attr("location");

how to read string which is written outside the html <> tags.?

I am having HTML code of 1000 lines and I wanted to extract the data which is written outside the HTML <> Tags.
for example..
<>Java Programm<>
It should read only "Java Programm" and escape whatever written inside the "<>" tags
I tried following code but it is reading whole data including <> but I do not need "<>" in my output.
public static void main(String[] args) throws Exception {
try {
FileInputStream fin = new FileInputStream("C:\\Users\\File.txt");
int i;
while ((i=fin.read())!=-1) {
System.out.print((char)i);
}
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
You would need an HTML parser. For JSoup it's
File input = new File("C:\\Users\\File.txt");
Document doc = Jsoup.parse(input, "UTF-8", "http://example.com/");
Element body = doc.body(); //Get the body of the html
System.out.println(body.text()) ; //Get the all the text inside the body tag
This is one way to do it. Simple enough :), there of course are other ways to do it. This ofcourse will leave the text outside of body tag. You can explore JSoup a here and find a solution for it.

How to get link from ArrayList filling by Jsoup

I trying to parse website. After all links collect to ArrayList, I wanna parse them again, but I have trouble with initialization of them.
This is my ArrayList:
public ArrayList<String> linkList = new ArrayList<String>();
How I collect links in "doInBackground":
try {
Document doc = Jsoup.connect("http://forurl.com/archive/").get();
Element links = doc.select("a[href]");
for (Element link : links)
{
linkList.add(link.attr("abs:href"));
}
}
In "onPostExecute" showing what I get:
lk.setText("Collected: " +linkList.size()); // showing how much is collected
lj.setText("First link: " +linkList.get(0)); // showing first link
Try to parse child links:
public class imgTread extends AsyncTask<Void, Void, Void> {
Bitmap bitmap;
String[] url = {"http://forurl.com/link1/",
"http://forurl.com/link2/"}; // this way work well
protected Void doInBackground(Void... params) {
try {
for (int i = 0; i < url.length; i++){
Document doc1 = Jsoup.connect(url[0]).get(); // connect to 1 link for example
Elements img = doc1.select("#strip");
String imgSrc = img.attr("src");
InputStream input = new java.net.URL(imgSrc).openStream();
bitmap = BitmapFactory.decodeStream(input);
}
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
I traing to make String[] from ArrayList, but it doesn't work.
String[] url = linkList.toArray(new String[linkList.size()]);
Output for this way will be Ljava.lang.String;#45ds364
The idea is 1) collected all links from url; 2) Connect to them 1 by 1 and get that information what I need.
First point work, the second too, but how tie it.
Thanks for any advise.
Working code:
Document doc = Jsoup.connect(url).get(); // connect to site
Elements links = doc.select("a[href]"); // get all links
String link_addr = links.get(3).attr("abs:href"); // choose 3 link
Document link_doc = Jsoup.connect(link_addr).get(); // connetect to it
Elements img = link_doc.select("#strip"); // get all elements by tag #strip
String imgSrc = img.attr("src"); // get url
InputStream input = new java.net.URL(imgSrc).openStream();
bitmap = BitmapFactory.decodeStream(input);
I hope this helps someone.
You are doing many unnecessary steps. You have a perfectly fine collection of Element objects in your Elements links object. Why do you have to add them to an ArrayList?
If I have understood your question correctly, your thought process should be something like this.
Get all the links from a URL
Establish a new connection to each link
Download all the images on that page where the element id = "strip".
Get all the links:
Document doc = Jsoup.connect(url).get();
Elements links = doc.select("a[href]");
doInBackground(links);
Call the doInBackground method with the links as a parameter:
public static void doInBackground(Elements links) throws IOException {
try {
for (Element element : links) {
//Connect to the first link
Document doc = Jsoup.connect(element.attr("abs:href")).get();
//Select all the elements with ID 'strip'
Elements img = doc.select("#strip");
//Get the source of the image
String imgSrc = img.attr("abs:src");
//Open an InputStream
InputStream input = new java.net.URL(imgSrc).openStream();
//Get the image
bitmap = BitmapFactory.decodeStream(input);
...
//Perhaps save the image somewhere?
//Close the InputStream
input.close();
}
} catch (IllegalArgumentException e) {
System.out.println(e.getMessage());
} catch (MalformedURLException ex) {
System.out.println(ex.getMessage());
}
}
Of course, you will have to properly use AsyncTask and call the methods from preferred places, but this is the overall idea of how you can use Jsoup to do the job you want it to.
If you want create a array instead of list you can do:
try {
Document doc = Jsoup.connect("http://forurl.com/archive/").get();
Element links = doc.select("a[href]");
String array = new String[links.size();
for (int i = 0; i < links.size(); i++)
{
array[i] = link.attr("abs:href");
}
}
First of all what the hell is that?
Why link variable is a collection of Elements and links is a single member of that collection?? Isn't that confusing?
Second, keep to java naming convention and name variables with noncapitalized letters so change LinkList to linkList. Even syntax highlighter got crazy thanks to you.
Third
traing to make String[] from ArrayList, but it doesn't work.
Where are you trying to do that and it does not work? I don't see it anywhere in the code.
Forth
To create Array out of List you have to do something like that
String links[]=linksList.toArray(new String[linksList.size()]);
Fifth
Change the topic to more apropriate, as present one is very missleading (you have no trouble with Jsoup here)

JSoup core web text extraction

I am new to JSoup, Sorry if my question is too trivial.
I am trying to extract article text from http://www.nytimes.com/ but on printing the parse document
I am not able to see any articles in the parsed output
public class App
{
public static void main( String[] args )
{
String url = "http://www.nytimes.com/";
Document document;
try {
document = Jsoup.connect(url).get();
System.out.println(document.html()); // Articles not getting printed
//System.out.println(document.toString()); // Same here
String title = document.title();
System.out.println("title : " + title); // Title is fine
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
ok I have tried to parse "http://en.wikipedia.org/wiki/Big_data" to retrieve the wiki data, same issue here as well I am not getting the wiki data in the out put.
Any help or hint will be much appreciated.
Thanks.
Here's how to get all <p class="summary> text:
final String url = "http://www.nytimes.com/";
Document doc = Jsoup.connect(url).get();
for( Element element : doc.select("p.summary") )
{
if( element.hasText() ) // Skip those tags without text
{
System.out.println(element.text());
}
}
If you need all <p> tags, without any filtering, you can use doc.select("p") instead. But in most cases it's better to select only those you need (see here for Jsoup Selector documentation).

HTML Parser fetch link text

I'm using HTML Parser to fetch links from a web page. I need to store the URL, link text and the URL to the parent page containing the link. I have managed to get the link URL as well as the parent URL.
I still ned to get the link text.
link text
Unfortunately I'm having a hard time figuring it out, any help would be greatly appreciated.
public static List<LinkContainer> findUrls(String resource) {
String[] tagNames = {"A", "AREA"};
List<LinkContainer> urls = new ArrayList<LinkContainer>();
Tag tag;
String url;
String sourceUrl;
try {
for (String tagName : tagNames) {
Parser parser = new Parser(resource);
NodeList nodes = parser.parse(new TagNameFilter(tagName));
NodeIterator i = nodes.elements();
while (i.hasMoreNodes()) {
tag = (Tag) i.nextNode();
url = tag.getAttribute("href");
sourceUrl = tag.getPage().getUrl();
if (RegexUtil.verifyUrl(url)) {
urls.add(new LinkContainer(url, null, sourceUrl));
}
}
}
} catch (ParserException pe) {
pe.printStackTrace();
}
return urls;
}
Have you tried ((LinkTag) tag).getLinkText() ? Personally I prefer n html parser which produces XML according to a well used standard, e.g., xerces or similar. This is what you get from using e.g., http://nekohtml.sourceforge.net/.
You would need to check the children of each A Tag. If you assume that your A tags only have a single child (the text itself), you can use the getFirstChild() method. This should be an instance of TextNode, and you can call getText() on this to get the link text.

Categories

Resources