Jsoup -- iterate over multiple elements simultaneously? - java

I am attempting to convert an html page with entries that have multiple types of details (e.g. name, phone number, and address), into a spreadsheet. I am able to to isolate each of these details as Elements, but I cannot seem to find a way to iterate over multiple Elements at once to print names and phone numbers next to one another rather than having all the names printed and then all of the phone numbers printed.
Jsoup.connect(page).timeout(999999);
Document doc = Jsoup.connect(page).get();
String title = doc.title();
System.out.println(title);
Elements names = doc.select("li a");
Elements ratings = doc.select("li img");
for (Element name:names){
if (name.attr("href").startsWith("/biz/")){
System.out.println(name.text());
}
for (Element rating:ratings){
System.out.println(rating.attr("alt"));
}

Assuming the index its the same for both this would work fine.
for(int i = 0; i < names.size() && i < ratings.size(); i++) {
System.out.println("Name: " + names.get(i) + " Phone: " + ratings.get(i));
}

Related

arraylist loop not displaying

I need to make a program that let's you add CD titles, remove them etc.
I need to use an Arraylist for my program (to store the songs)
Code:
ArrayList songlist = new ArrayList();
Collections.addAll(songlist, "Something", "Hard Days Night", "I am the Walrus", "Yesterday", "All the Lonely People");
Collections.sort(songlist);
int songlistsize = songlist.size ();
for (int i = 0; i < songlistsize; i++) {
outputField.setText(i + ": " + songlist.get(i));
The problem is that the program will only display "Yesterday", and not anything else.
outputField.setText(i + ": " + songlist.get(i));
Because you are setting the last value and not appending. Do something like this:
StringBuilder string = new StringBuilder();
for (int i = 0; i < songlistsize; i++) {
string.append(songlist.get(i));
}
outputField.setText(string);
There are many other problems with the code but I am sticking to the point.
If you try to print your output on the console you will see that the part that deals with the collection works fine.
But since setText() replaces the current String with the latest song name you only see "Yesterday" because its at the end of your collection.
That´s why you should try to append() the next song name to your String or make sure you copy your current String, add the next item and finally use setText()
For example:
String string = "";
for (int i = 0; i < songlistsize; i++)
{
string = outputField.getText() + songlist.get(i);
outputField.setText(string);
}

Jsoup iterate over Elements causes duplicated output

I have a page link to extract some data from it (I want to get tables' some tds attributes).
I used for-loop to iteration via elements that I have to extract some attributes
of it . but I get duplicated output.
The output should be like the output at the image on the end of my post
Document doc = Jsoup.connect("http://www.saudisale.com/SS_a_mpg.aspx").get();
Elements elements = doc.select("table").select("tbody").select("tr").select("td") ;
for(Element e:elements) {
System.out.println(e.select("span[id~=Label4]").text() +
"\t" + e.select("input[id$=ImageButton1]").attr("src") +
"\t" + "" + e.select("span[id~=Label13]").text());
}
This is the output that I get them, they are duplicated!!! :
The output should be like this:-
Would you please try below code?
Elements description = doc.select("tbody");
doc=Jsoup.parse(description.html());
description = doc.select("td");
for(int j = 0; j < description.size(); ++ j)
{
String bodytext = description.eq(j).text(); // bodytext is the text of each TD
}
I used for-loop with incrementing counter instead and problem solved.
where 31 is the number of items on that page
The following code gives the desired output.
for(int i=1;i<description.size();i++)
{
System.out.println(elements.select("td").select("span[id~=Label4]").get(i).text()+""+elements.select("td").select("input[id$=ImageButton1]").get(i).attr("src"));
}

JSoup get specific data from webpage

I've been trying to get data from: http://www.betvictor.com/sports/en/to-lead-anytime, where I would like to get the list of matches using JSoup.
For example:
Caen v AS Saint Etienne
Celtic v Rangers
and so on...
My current code is:
String couponPage = "http://www.betvictor.com/sports/en/to-lead-anytime";
Document doc1 = Jsoup.connect(couponPage).get();
String match = doc1.select("#coupon_143751140 > table:nth-child(3) > tbody > tr:nth-child(2) > td.event_description").text();
System.out.println("match:" + match);
Once I can figure out how to get one item of data, I will put it in a for loop to loop through the whole table, but first I need to get one item of data.
Currently, the output is "match: " so it looks like the "match" variable is empty.
Any help is most appreciated,
I have worked out how to answer my question after a few hours of experimenting. Turns out the page didn't load properly straight away, and had to implement the "timeout" method.
Document doc;
try {
// need http protocol
doc = Jsoup.connect("http://www.betvictor.com/sports/en/football/coupons/100/0/0/43438/0/100/0/0/0/0/1").timeout(10000).get();
// get all links
Elements matches = doc.select("td.event_description a");
Elements odds = doc.select("td.event_description a");
for (Element match : matches) {
// get the value from href attribute
String matchEvent = match.text();
String[] parts = matchEvent.split(" v ");
String team1 = parts[0];
String team2 = parts[1];
System.out.println("text : " + team1 + " v " + team2);
}

using a regex in jsoup

I'm trying my first serious project in jsoup and I've got stuck in this matter-
I'm trying to get zipcodes from a site. There is a list of zipcodes.
Here is one of the lines that presents the zipcode-
<td align="center">33011</td>
So the idea I've got is going through the page and getting all the strings that contain 6 digits from 1-9. Regex is ^[0-9]{6,6}$
code was -
doc.select("td:matchesOwn(^[0-9]{5,5}$)");
but nothing came out. I can't find the way to get these zipcodes out of that site....
Does anyone know how to do it?
the real question here is how do i get the numbers that are not in any tags,but just written out in the open (i guess there is a term for that but im not that good with xml terms)
I solved it using Element#getElementsMatchingOwnText:
public static void main(String[] args) {
final String html = "<td align=\"center\">33011</td> ";
final Elements elements = Jsoup.parse(html).getElementsMatchingOwnText("^[0-9]{5,5}$");
for (final Element element : elements) {
System.out.println("element = [" + element + "]");
System.out.println("zip = [" + element.text() + "]");
}
}
Output:
element = [33011]
zip = [33011]

WordNet(JWI MIT) : Antonyms of a word?

I am using www.wordnet.princeton.edu open source dictionary with www.projects.csail.mit.edu/jwi/api/edu/mit/jwi library?
I am unable to find out antonyms of a word. People claim that this is a very good dictionary but I could not find my words in it. I need Antonyms and other related words. Good descriptions and other vocabulary info but I am unable to find what I need.
Here is my code:
List<IWordID> wordIDList = indexWordList.get(0).getWordIDs();
for(int idIndex = 0; idIndex < wordIDList.size(); idIndex++)
{
IWordID wordID = wordIDList.get(idIndex);
IWord word = m_Dict.getWord(wordID);
System.out.println("Id = " + wordID);
System.out.println(" Lemma = " + word.getLemma());
System.out.println(" Gloss = " + word.getSynset().getGloss());
ISynset synset = word.getSynset();
String LexFileName = synset.getLexicalFile().getName();
System.out.println("Lexical Name : " + LexFileName);
/** Finding stem for the word. */
WordnetStemmer stem = new WordnetStemmer(m_Dict);
//System.out.println("test" + stem.findStems(key, POS.NOUN));
ArrayList<String> antonymsList = new ArrayList<String>();
List<IWordID> relatedWords = word.getRelatedWords();
Map<IPointer, List<IWordID>> map = word.getRelatedMap();
AdjMarker marker = word.getAdjectiveMarker();
for (IWordID antonym : word.getRelatedWords()) {
String meaning = m_Dict.getWord(antonym).getLemma();
antonymsList.add(meaning);
System.out.println("Antonym: " + meaning);
System.out.println("Antonym POS: " + m_Dict.getWord(antonym).getPOS());
}
}
What I actually need? :::
I need suggestions on how can I get that relevant information from WordNet. Also, **I am open to accept any other API or library that will provide me the latest version of Dictionary, antonyms, Synonyms and well written description.** Every suggestion is appreciated.
Use IWord#getRelatedMap to get map java.util.Map<IPointer,java.util.List<IWordID>>. This map contain map of relations of current Lemma(word) with other words.
Check presence of Pointer#Antonym in this map.
Take a look at wordnet interface Artha to compare correctness of your dictionary lookup result.
There is not direct way of having list of all words. Have used a hack:
sed 's/^\ *//' index.adj | cut -f1 -d\
Dot this for all index files: index.adj, index.adv, index.noun, index.sense, index.verb

Categories

Resources