I am using www.wordnet.princeton.edu open source dictionary with www.projects.csail.mit.edu/jwi/api/edu/mit/jwi library?
I am unable to find out antonyms of a word. People claim that this is a very good dictionary but I could not find my words in it. I need Antonyms and other related words. Good descriptions and other vocabulary info but I am unable to find what I need.
Here is my code:
List<IWordID> wordIDList = indexWordList.get(0).getWordIDs();
for(int idIndex = 0; idIndex < wordIDList.size(); idIndex++)
{
IWordID wordID = wordIDList.get(idIndex);
IWord word = m_Dict.getWord(wordID);
System.out.println("Id = " + wordID);
System.out.println(" Lemma = " + word.getLemma());
System.out.println(" Gloss = " + word.getSynset().getGloss());
ISynset synset = word.getSynset();
String LexFileName = synset.getLexicalFile().getName();
System.out.println("Lexical Name : " + LexFileName);
/** Finding stem for the word. */
WordnetStemmer stem = new WordnetStemmer(m_Dict);
//System.out.println("test" + stem.findStems(key, POS.NOUN));
ArrayList<String> antonymsList = new ArrayList<String>();
List<IWordID> relatedWords = word.getRelatedWords();
Map<IPointer, List<IWordID>> map = word.getRelatedMap();
AdjMarker marker = word.getAdjectiveMarker();
for (IWordID antonym : word.getRelatedWords()) {
String meaning = m_Dict.getWord(antonym).getLemma();
antonymsList.add(meaning);
System.out.println("Antonym: " + meaning);
System.out.println("Antonym POS: " + m_Dict.getWord(antonym).getPOS());
}
}
What I actually need? :::
I need suggestions on how can I get that relevant information from WordNet. Also, **I am open to accept any other API or library that will provide me the latest version of Dictionary, antonyms, Synonyms and well written description.** Every suggestion is appreciated.
Use IWord#getRelatedMap to get map java.util.Map<IPointer,java.util.List<IWordID>>. This map contain map of relations of current Lemma(word) with other words.
Check presence of Pointer#Antonym in this map.
Take a look at wordnet interface Artha to compare correctness of your dictionary lookup result.
There is not direct way of having list of all words. Have used a hack:
sed 's/^\ *//' index.adj | cut -f1 -d\
Dot this for all index files: index.adj, index.adv, index.noun, index.sense, index.verb
Related
If I have the following URL:
http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0
How can I get the name of the plugin (simply named wordpressplugin in the URL) and the version so the output will be - wordpressplugin ver 1.0?
I am posting my comment as an answer
String s = "http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0";
String[] ary = s.split("/");
System.out.println(ary[5] + " " + ary[7]);
Easiest way this is acc to your question,
you have to use regex for more dynamic searching.
You may do it like so, using Regex support in Java.
String url = "http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0";
Pattern pattern = Pattern.compile("(.*plugins/)(.*)(/\\d{3}/)(ver.*)");
Matcher matcher = pattern.matcher(url);
if (matcher.matches()) {
System.out.println("Plugin: " + matcher.group(2));
System.out.println("Version: " + matcher.group(4));
}
Notice the use of capture groups. Here's the output.
Plugin: wordpressplugin
Version: ver=1.0
You should have a look into Regular Expressions (in Oracle tutorials), which are the general tool in any programming language to get/match sub-strings out of a larger string (which follows some more or less fixed format).
Because you claim to be new to JAVA, here is a very simple answer that should suit your skills
String url = "http://www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0";
String search = "plugins/";
int index = url.indexOf(search);
String pluginName, version;
if (index > -1)
{
index += search.length;
pluginName = url.substring(index, url.indexOf("/",index + 1));
search = "ver=";
index = url.indexOf(search);
if (index > -1)
{
version = url.substring(index + search.length);
System.out.prinln(pluginName + " " + version);
}
}
PS: This would work if and only if your url format always remains the same!
The fastest way to solve this problem is to take advantage of the split method of Strings. Just study the method below carefully, it's basic.
public String getVersionNumber(String url){
String[] arr0 = url.split("//");
//The code above returns an array of two strings: "http:" and "www.example.com/wordpress/plugins/wordpressplugin/123/ver=1.0"
String[] arr1 = arr0[1].split("/");
//The code above returns an array of six strings: "www.example.com", "wordpress", "plugins", "wordpressplugin", "123" and "ver=1.0".
return String.format("%s %s", arr1[3], arr1[5]);
//OUTPUT: wordpressplugin ver=1.0
//I simply returned what I needed.
}
I hope this helps.. merry coding!
Here are some lines from a file and I'm not sure how to parse it to extract 4 pieces of information.
11::American President, The (1995)::Comedy|Drama|Romance
12::Dracula: Dead and Loving It (1995)::Comedy|Horror
13::Balto (1995)::Animation|Children's
14::Nixon (1995)::Drama
I would like to get the number, title, release date and genre.
Genre has multiple genres so I would like to save each one in a variable as well.
I'm using the .split("::|\\|"); method to parse it but I'm not able to parse out the release date.
Can anyone help me!
The easiest would be matching by regex, something like this
String x = "11::Title (2016)::Category";
Pattern p = Pattern.compile("^([0-9]+)::([a-zA-Z ]+)\\(([0-9]{4})\\)::([a-zA-Z]+)$");
Matcher m = p.matcher(x);
if (m.find()) {
System.out.println("Number: " + m.group(1) + " Title: " + m.group(2) + " Year: " + m.group(3) + " Categories: " + m.group(4));
}
(please don't nail me on the exact syntax, just out of my head)
Then first capture will be the number, the second will be the name, the third is the year and the fourth is the set of categories, which you may then split by '|'.
You may need to adjust the valid characters for title and categories, but you should get the idea.
If you have multiple lines, split them into an ArrayList first and treat each one separately in a loop.
Try this
String[] s = {
"11::American President, The (1995)::Comedy|Drama|Romance",
"12::Dracula: Dead and Loving It (1995)::Comedy|Horror",
"13::Balto (1995)::Animation|Children's",
"14::Nixon (1995)::Drama",
};
for (String e : s) {
String[] infos = e.split("::|\\s*\\(|\\)::");
String number = infos[0];
String title = infos[1];
String releaseDate = infos[2];
String[] genres = infos[3].split("\\|");
System.out.printf("number=%s title=%s releaseDate=%s genres=%s%n",
number, title, releaseDate, Arrays.toString(genres));
}
output
number=11 title=American President, The releaseDate=1995 genres=[Comedy, Drama, Romance]
number=12 title=Dracula: Dead and Loving It releaseDate=1995 genres=[Comedy, Horror]
number=13 title=Balto releaseDate=1995 genres=[Animation, Children's]
number=14 title=Nixon releaseDate=1995 genres=[Drama]
I've been trying to get data from: http://www.betvictor.com/sports/en/to-lead-anytime, where I would like to get the list of matches using JSoup.
For example:
Caen v AS Saint Etienne
Celtic v Rangers
and so on...
My current code is:
String couponPage = "http://www.betvictor.com/sports/en/to-lead-anytime";
Document doc1 = Jsoup.connect(couponPage).get();
String match = doc1.select("#coupon_143751140 > table:nth-child(3) > tbody > tr:nth-child(2) > td.event_description").text();
System.out.println("match:" + match);
Once I can figure out how to get one item of data, I will put it in a for loop to loop through the whole table, but first I need to get one item of data.
Currently, the output is "match: " so it looks like the "match" variable is empty.
Any help is most appreciated,
I have worked out how to answer my question after a few hours of experimenting. Turns out the page didn't load properly straight away, and had to implement the "timeout" method.
Document doc;
try {
// need http protocol
doc = Jsoup.connect("http://www.betvictor.com/sports/en/football/coupons/100/0/0/43438/0/100/0/0/0/0/1").timeout(10000).get();
// get all links
Elements matches = doc.select("td.event_description a");
Elements odds = doc.select("td.event_description a");
for (Element match : matches) {
// get the value from href attribute
String matchEvent = match.text();
String[] parts = matchEvent.split(" v ");
String team1 = parts[0];
String team2 = parts[1];
System.out.println("text : " + team1 + " v " + team2);
}
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Query about the trim() method in Java
I am parsing a site's usernames and other information, and each one has a bunch of spaces after it (but spaces in between the words).
For example: "Bob the Builder " or "Sam the welder ". The numbers of spaces vary from name to name. I figured I'd just use .trim(), since I've used this before.
However, it's giving me trouble. My code looks like this:
for (int i = 0; i < splitSource3.size(); i++) {
splitSource3.set(i, splitSource3.get(i).trim());
}
The result is just the same; no spaces are removed at the end.
Thank you in advance for your excellent answers!
UPDATE:
The full code is a bit more complicated, since there are HTML tags that are parsed out first. It goes exactly like this:
for (String s : splitSource2) {
if (s.length() > "<td class=\"dddefault\">".length() && s.substring(0, "<td class=\"dddefault\">".length()).equals("<td class=\"dddefault\">")) {
splitSource3.add(s.substring("<td class=\"dddefault\">".length()));
}
}
System.out.println("\n");
for (int i = 0; i < splitSource3.size(); i++) {
splitSource3.set(i, splitSource3.get(i).substring(0, splitSource3.get(i).length() - 5));
splitSource3.set(i, splitSource3.get(i).trim());
System.out.println(i + ": " + splitSource3.get(i));
}
}
UPDATE:
Calm down. I never said the fault lay with Java, and I never said it was a bug or broken or anything. I simply said I was having trouble with it and posted my code for you to collaborate on and help solve my issue. Note the phrase "my issue" and not "java's issue". I have actually had the code printing out
System.out.println(i + ": " + splitSource3.get(i) + "*");
in a for each loop afterward.
This is how I knew I had a problem.
By the way, the problem has still not been fixed.
UPDATE:
Sample output (minus single quotes):
'0: Olin D. Kirkland '
'1: Sophomore '
'2: Someplace, Virginia 12345<br />VA SomeCity<br />'
'3: Undergraduate '
EDIT the OP rephrased his question at Query about the trim() method in Java, where the issue was found to be Unicode whitespace characters which are not matched by String.trim().
It just occurred to me that I used to have this sort of issue when I worked on a screen-scraping project. The key is that sometimes the downloaded HTML sources contain non-printable characters which are non-whitespace characters too. These are very difficult to copy-paste to a browser. I assume that this could happened to you.
If my assumption is correct then you've got two choices:
Use a binary reader and figure out what those characters are - and delete them with String.replace(); E.g.:
private static void cutCharacters(String fromHtml) {
String result = fromHtml;
char[] problematicCharacters = {'\000', '\001', '\003'}; //this could be a private static final constant too
for (char ch : problematicCharacters) {
result = result.replace(ch, ""); //I know, it's dirty to modify an input parameter. But it will do as an example
}
return result;
}
If you find some sort of reoccurring pattern in the HTML to be parsed then you can use regexes and substrings to cut the unwanted parts. E.g.:
private String getImportantParts(String fromHtml) {
Pattern p = Pattern.compile("(\\w*\\s*)"); //this could be a private static final constant as well.
Matcher m = p.matcher(fromHtml);
StringBuilder buff = new StringBuilder();
while (m.find()) {
buff.append(m.group(1));
}
return buff.toString().trim();
}
Works without a problem for me.
Here your code a bit refactored and (maybe) better readable:
final String openingTag = "<td class=\"dddefault\">";
final String closingTag = "</td>";
List<String> splitSource2 = new ArrayList<String>();
splitSource2.add(openingTag + "Bob the Builder " + closingTag);
splitSource2.add(openingTag + "Sam the welder " + closingTag);
for (String string : splitSource2) {
System.out.println("|" + string + "|");
}
List<String> splitSource3 = new ArrayList<String>();
for (String s : splitSource2) {
if (s.length() > openingTag.length() && s.startsWith(openingTag)) {
String nameWithoutOpeningTag = s.substring(openingTag.length());
splitSource3.add(nameWithoutOpeningTag);
}
}
System.out.println("\n");
for (int i = 0; i < splitSource3.size(); i++) {
String name = splitSource3.get(i);
int closingTagBegin = splitSource3.get(i).length() - closingTag.length();
String nameWithoutClosingTag = name.substring(0, closingTagBegin);
String nameTrimmed = nameWithoutClosingTag.trim();
splitSource3.set(i, nameTrimmed);
System.out.println("|" + splitSource3.get(i) + "|");
}
I know that's not a real answer, but i cannot post comments and this code as a comment wouldn't fit, so I made it an answer, so that Olin Kirkland can check his code.
i am having an issue, where java is reading an array list from a YAML file of numbers, or strings, and it is interpreting the numbers as octal if it has a leading 0, and no 8-9 digit.
is there a way to force java to read the yaml field as a string?
code:
ArrayList recordrarray = (ArrayList) sect.get("recordnum");
if (recordrarray != null) {
recno = join (recordrarray, " ");
}
HAVE ALSO TRIED:
Iterator<String> iter = recordrarray.iterator();
if (iter.hasNext()) recno = " " +String.valueOf(iter.next());
System.out.println(" this recnum:" + recno);
while (iter.hasNext()){
recno += ""+String.valueOf(iter.next()));
System.out.println(" done recnum:" + String.valueOf(iter.next()));
}
the input is such:
061456 changes to 25390
061506 changes to 25414
061559 -> FINE
it took a while to figure out what it was doing, and apparently this is a common issue for java,
ideas?
thanks
edit: using jvyaml
yaml:
22:
country_code: ' '
description: ''
insection: 1
recordnum:
- 061264
type: misc
yaml loading:
import org.jvyaml.YAML;
Map structure = new HashMap();
structure = (Map) YAML.load(new FileReader(structurefn)); // load the structure file
Where are you reading the file? The problem lies in where the file contents are being read. Most likeley the recordarray list contains integers, ie. they have alreadey been parsed. Find the place where the records are being read. Maybe you are doing something like this:
int val = Integer.parseInt(record);
Use this instead:
int val = Integer.parseInt(record, 10);