I'm trying to implement the Lesk Algorithm for word sense disambiguation using Wordnet and it's Java API JWI. One of the steps requires to build a bag of words from the gloss and example sentences of the target word. I can easily get the gloss from the method getGloss() in class ISynset, but I don't see a method to get the example sentences. I'm sure I'm missing something obvious since JWI is described as "full-featured" on wordnet's site, but i can't find anything useful in the documentation or the internet. How do I get those sentences?
It may not be there. Examples are attached to synsets (e.g. they are a sibling function to getting lemmas and definitions in the NLTK API), but the 2.4.0 JWI docs for synset only have functions for getGloss() and getWords().
(If it turns out there is a way to get them from JWI, can someone leave me a comment, and I'll then delete this answer.)
Related
I am currently working on Android platform using parse server. I am making a function in which if i enter a character key it shows all the entries which are having that letter in it.
For example - i will be getting all the names starting or having character "p" in them.
I would really appreciate if u can suggest, find or give me some examples of them.
You should use the documentation as your first resource. Parse is pretty well documented, and the ParseQuery class has a function specifically for this.
http://parseplatform.org/Parse-SDK-Android/api/com/parse/ParseQuery.html#whereContains(java.lang.String,%20java.lang.String)
Not sure if I should ask this here, but I can't figure it out.
I saw the issue first on Wikipedia's "Meme" article (https://en.wikipedia.org/wiki/Meme). There are several special characters for pronunciation that don't appear in the extract queried with the MediaWiki API (https://en.wikipedia.org/w/api.php?format=jsonfm&action=query&prop=revisions|extracts&redirects=true&titles=meme).
I couldn't find a solution in the MediaWiki API documentation or alternatives (I tried jsoup to parse the entire page but couldn't reliably get the content from the article that I need while the extract query does).
The extracts API tries to sanitize the text in various ways to make it more readable (you might have noticed that the sentences in italic preceding the pronunciation do not show either). Part of that is removing everything with the noexcerpt class, which includes the spelling. (In the future, text in parantheses might be removed completely to handle metadata creep.)
I have a Java (lucene 4) based application and a set of keywords fed into the application as a search query (the terms may include more than one words, eg it can be: “memory”, “old house”, “European Union law”, etc).
I need a way to get the list of matched keywords out of an indexed document and possibly also get keyword positions in the document (also for the multi-word keywords).
I tried with the lucene highlight package but I need to get only the keywords without any surrounding portion of text. It also returns multi-word keywords in separate fragments.
I would greatly appreciate any help.
There's a similar (possibly same) question here:
Get matched terms from Lucene query
Did you see this?
The solution suggested there is to disassemble a complicated query into a more simple query, until you get a TermQuery, and then check via searcher.explain(query, docId) (because if it matches, you know that's the term).
I think It's not very efficient, but
it worked for me until I ran into SpanQueries. it might be enough for you.
I want to identify all the names written in any text, currently I am using IMDB movie reviews.
I am using stanford POS tagger, and analysing all the proper nouns (as proper noun are names of person,things,places), but this is slow.
Firstly I am tagging all the input lines, then I am checking for all the words with NNP in the end, which is a slow process.
Is there any efficient substitute to achieve this task? ANy library (preferably in JAVA).
Thanks.
Do you know the input language? If yes you could match each word against a dictionnary and flag the word as proper noun if it is not in the dictionnary. It would require a complete dictionnary with all the declensions of each word of the language, and pay attention to numbers and other special cases.
EDIT: See also this answer in the official FAQ: have you tried to change the model used?
A (paid) web service called GlobalNLP can do it in multiple languages: https://nlp.linguasys.com/docs/services/54131f001c78d802f0f2b28f/operations/5429f9591c78d80a3cd66926
Currently, I am using Lucene version 3.0.2 to create a search application that is similar to a dictionary. One of the objects that I want to display is a sort of "example", where Lucene would look for a word in a book and then the sentences where the words were used are displayed.
I've been reading the Lucene in Action book and it mentions something like this, but looking through it I can't find other mentions. Is this something you can do with Lucene? If it is, how is can you do it?
I believe what you are looking for is a Highlighter.
One possibility is to use the lucene.search.highlight package, specifically the Highlighter.
Another option is to use the lucene.search.vectorhighlight package, specifically the FastVectorHighlighter.
Both classes search a text document, choose relevant snippets and display them with the matching terms highlighted. I have only used the first one, which worked fine for my use-case. If you can pre-divide the book into shorter parts, it would make highlighting faster.