I am using java jwi API for searching the wordnet to get the synonyms of a word. The problem is that it only gives me one result the word to find its synonyms itself. Please guide me. Is it possible to get the list of all possible synonyms of a given word? My code is:
public void searcher() {
try {
url = new URL("file", null, path);
dict = new Dictionary(url);
try {
dict.open();
} catch (IOException ex) {
JOptionPane.showMessageDialog(null, "Dictionary directory does not exist\n" + ex + "\nClass:Meaning Thread", "Dictionary Not Found Error", JOptionPane.ERROR_MESSAGE);
}
IIndexWord idxWord = dict.getIndexWord("capacity", POS.NOUN);
IWordID wordID = idxWord.getWordIDs().get(0);
IWord word = dict.getWord(wordID);
//Adding Related Words to List of Realted Words
ISynset synset = word.getSynset();
for (IWord w : synset.getWords()) {
System.out.println(w.getLemma());
}
} catch (Exception e) {
}
}
The output is only:
capacity
itself! The actual synonyms must be:
capability
capacitance
content
electrical capacitance
mental ability...(so on)
So is there anything I missed in the code or can somebodygive me any ideas what is the real problem?
Thanks in advance
So, here comes the answer i use Java JAWS for wordnet searching! The steps are:
1- Download WordNet Dictionary from
Here
2- Install WordNet
3- Go to Installed Directory and copied the WordNet Directory (in my case C:\Program Files (x86) was the Directory for WordNet Folder)
4- Pasted it into my Java Project (under MyProject>WordNet)
5- Making Path to the directory as:
File f=new File("WordNet\\2.1\\dict");
System.setProperty("wordnet.database.dir", f.toString());
6- Got Synonyms as:
public class TestJAWS{
public static void main(String[] args){
String wordForm = "capacity";
// Get the synsets containing the word form=capicity
File f=new File("WordNet\\2.1\\dict");
System.setProperty("wordnet.database.dir", f.toString());
//setting path for the WordNet Directory
WordNetDatabase database = WordNetDatabase.getFileInstance();
Synset[] synsets = database.getSynsets(wordForm);
// Display the word forms and definitions for synsets retrieved
if (synsets.length > 0){
ArrayList<String> al = new ArrayList<String>();
// add elements to al, including duplicates
HashSet hs = new HashSet();
for (int i = 0; i < synsets.length; i++){
String[] wordForms = synsets[i].getWordForms();
for (int j = 0; j < wordForms.length; j++)
{
al.add(wordForms[j]);
}
//removing duplicates
hs.addAll(al);
al.clear();
al.addAll(hs);
//showing all synsets
for (int i = 0; i < al.size(); i++) {
System.out.println(al.get(i));
}
}
}
}
else
{
System.err.println("No synsets exist that contain the word form '" + wordForm + "'");
}
}
The Thing is you must have jaws-bin.jar
What you are getting is "capacity#1", which has the meaning of "capability to perform or produce", and it does indeed only have one synonym. (Play around with the PWN search page to get a feel for how WordNet organizes the words into synsets.)
It sounds like what you are after is the union of all synonyms in all the synsets? I think you either use getSenseEntryIterator(), or simply put a loop around idxWord.getWordIDs().get(0);, replacing the 0 with the loop counter, so you are not only ever getting the first item in the array.
If you want to use JWI and want to fetch more than 1 synonym then change your code from this exact spot:
IIndexWord idxWord = dict.getIndexWord(inputWord, POS.NOUN);
try {
int x = idxWord.getTagSenseCount();
for (int i = 0; i < x; i++) {
IWordID wordID = idxWord.getWordIDs().get(i);
IWord word = dict.getWord(wordID);
// Adding Related Words to List of Realted Words
ISynset synset = word.getSynset();
for (IWord w : synset.getWords()) {
System.out.println(w.getLemma());
// output.add(w.getLemma());
}
}
} catch (Exception ex) {
System.out.println("No synonym found!");
}
It works perfectly fine.
Related
This is what I have so far, and I am having trouble downloading 1-100 comics starting at https://xkcd.com/1/ and I know I am supposed to be going to the source code for the website. However, I cant seem to figure out how to get all the first 100 comics into my designated file I set it to save to. For example, I want https://xkcd.com/1/(view-source:https://xkcd.com/1/), https://xkcd.com/2/(view-source:https://xkcd.com/2/), and all the way up to comic 100. I know the img src is at line 50, but once again I don't know how to approach it.
public static void main(String[] args) {
URL imgURL = null;
for (int web = 1; web <= 100; web++) {
try {
imgURL = new URL("https://imgs.xkcd.com/comics/barrel_cropped_(1).jpg");
InputStream stream = imgURL.openStream();
Files.copy(stream, Paths.get("file/WebComics" + web + ".png"));
System.out.println("Done!");
} catch (Exception e) {
e.printStackTrace();
System.out.println("Error!");
}
}
}
}
Add jsoup library jar to your project, and then try this:
static void do_page(int id) throws IOException {
Document doc = Jsoup.connect("https://xkcd.com/" + id).get();
Elements imgs = doc.select("#comic img");
for (Element e: imgs) {
System.out.println(e.attr("src"));
}
}
Then call the do_page function in a loop:
for (int i = 1; i <= 100; i++) {
do_page(i);
}
Now, instead of printing it, you can use JSoup again to probably download the images like you see fit.
I'm looking for a snippet of code that does the following:
Given two list of string representing two files
For example,
FILE1 = {"SSome" , "SSimple", "TText", "FFile"}
FILE2 = {"AAnother", "TText", "FFile", "WWith", "AAdditional", "LLines"}
If I call diff(file1,file2)
The output would be the diff between FILE1 and FILE2:
*SSome|Another
-SSimple
TText
FFile
+WWith
+AAdditional
+LLines
Many thanks!
I gather from your question the following:
*word1|word2 - Means the word from file 1 was changed in file 2
-word - Means the word from file 1 was removed file 2
word - Means the word from file 1 remained the same in file 2
+word - Means the word wasn't originally in file 1, but was added to file 2
I figured file 1 is the "source" file and file 2 is the "destination" file for which we are showing these differences from. Having said that, try this algorithm (It's not perfect to DiffNow but it's pretty close):
public static void main(String[] args) throws Exception {
List<String> file1 = new ArrayList(Arrays.asList("Some", "Simple", "Text", "File"));
List<String> file2 = new ArrayList(Arrays.asList("Another", "Text", "File", "With", "Additional", "Lines"));
boolean diff = false;
int file2Index = 0;
for (int file1Index = 0; file1Index < file1.size();) {
if (!file1.get(file1Index).equals(file2.get(file2Index)) && !diff) {
diff = true;
// The word from file 1 was changed
System.out.println("*" + file1.get(file1Index) + "|" + file2.get(file2Index));
file1Index++;
file2Index++;
} else if (!file1.get(file1Index).equals(file2.get(file2Index)) && diff) {
// This word was removed from file 1
System.out.println("-" + file1.get(file1Index));
file1Index++;
} else {
System.out.println(file1.get(file1Index));
diff = false;
file1Index++;
file2Index++;
}
}
// Print what's left from file 2
for (; file2Index < file2.size(); file2Index++) {
System.out.println("+" + file2.get(file2Index));
}
}
Results:
*Some|Another
-Simple
Text
File
+With
+Additional
+Lines
Here is what I tried.
import java.util.*;
public class SetDemo
{
public static void main(String[] args){
String[] file1 = new String[]{"Some", "Simple", "Text", "File"};
String[] file2 = new String[]{"Another", "Text", "File", "With", "Additional", "Lines"};
Set<String> set1 = new HashSet<String>();
Set<String> set2 = new HashSet<String>();
for(String s: file1)
{
set1.add(s);
}
for(String s2: file2)
{
set2.add(s2);
}
Set<String> s1intercopy = new HashSet<String>(set1);
Set<String> s2intercopy = new HashSet<String>(set2);
s1intercopy.retainAll(s2intercopy); //Finds the intesection
Set<String> s1symdiffcopy = new HashSet<String>(set1);
Set<String> s2symdiffcopy = new HashSet<String>(set2);
s1symdiffcopy.removeAll(set2);
s2symdiffcopy.removeAll(set1);
int count = 0;
for(String s7: s1intercopy){
count++;
System.out.println(Integer.toString(count)+'.'+s7);
}
if (set1.size() > set2.size())
{
for(String s3: s1symdiffcopy){
count++;
System.out.println(Integer.toString(count)+'.'+'+'+s3);
}
for(String s4: s2symdiffcopy){
count++;
System.out.println(Integer.toString(count)+'.'+'-'+s4);
}
}else if (set2.size() > set1.size())
{
for(String s5: s2symdiffcopy){
count++;
System.out.println(Integer.toString(count)+'.'+'+'+s5);
}
for(String s6: s1symdiffcopy){
count++;
System.out.println(Integer.toString(count)+'.'+'-'+s6);
}
}
}
}
Output:
1.Text
2.File
3.+Lines
4.+Additional
5.+Another
6.+With
7.-Some
8.-Simple
I wasn't sure what you meant by *Some|Another, but what the above code does is simply find the intersection and the symmetric differences between the sets, determine which set is bigger, and assign '+' to the values which are part of the bigger set and '-' to those of the smaller set. I didn't read in from a file to save time but that part is easy and you can look that up. It seems based on your output that you were searching through one file and for each string in that file searching through the other file. This is pretty inefficient for large files so I believe the above solution optimizes that by saving it into sets and performing set operations.
I am in the middle of creating an app that allows users to apply for job positions and upload their CVs. I`m currently stuck on trying to make a search box for the admin to be able to search for Keywords. The app will than look through all the CVs and if it finds such keywords it will show up a list of Cvs that contain the keyword. I am fairly new to Gui design and app creation so not sure how to go about doing it. I wish to have it done via java and am using the Eclipse Window builder to help me design it. Any help will be greatly appreciated, hints, advice anything. Thank You.
Well, this not right design approach as real time search of words in all files of given folder will be slow and not sustainable in long run. Ideally you should have indexed all CV's for keywords. The search should run on index and then get the associated CV for that index ( think of indexes similar to tags). There are many options for indexing - simples DB indexing or using Apache Lucene or follow these steps to create a index using Maps and refer this index for search.
Create a map Map<String, List<File>> for keeping the association of
keywords to files
iterate through all files, and for each word in
each file, add that file to the list corresponding to that word in
your index map
here is the java code which will work for you but I would still suggest to change your design approach and use indexes.
File dir = new File("Folder for CV's");
if(dir.exists())
{
Pattern p = Pattern.compile("Java");
ArrayList<String> list = new ArrayList<String>(); // list of CV's
for(File f : dir.listFiles())
{
if(!f.isFile()) continue;
try
{
FileInputStream fis = new FileInputStream(f);
byte[] data = new byte[fis.available()];
fis.read(data);
String text = new String(data);
Matcher m = p.matcher(text);
if(m.find())
{
list.add(f.getName()); // add file to found-keyword list.
}
fis.close();
}
catch(Exception e)
{
System.out.print("\n\t Error processing file : "+f.getName());
}
}
System.out.print("\n\t List : "+list); // list of files containing keyword.
} // IF directory exists then only process.
else
{
System.out.print("\n Directory doesn't exist.");
}
Here you get the files list to show now for "Java". As I said use indexes :)
Thanks for taking your time to look into my problem.
I have actually come up with a solution of my own. It is probably very amateur like but it works for me.
JButton btnSearch = new JButton("Search");
btnSearch.addActionListener(new ActionListener()
{
public void actionPerformed(ActionEvent arg0)
{
list.clear();
String s = SearchBox.getText();
int i = 0,present = 0;
int id;
try
{
Class.forName(driver).newInstance();
Connection conn = DriverManager.getConnection(url+dbName,userName,password);
Statement st = conn.createStatement();
ResultSet res = st.executeQuery("SELECT * FROM javaapp.test");
while(res.next())
{
i = 0;
present = 0;
while(i < 9)
{
String out = res.getString(search[i]);
if(out.toLowerCase().contains(s.toLowerCase()))
{
present = 1;
break;
}
i++;
}
if(tglbtnNormalshortlist.isSelected())
{
if(present == 1 && res.getInt("Shortlist") == 1)
{
id = res.getInt("Candidate");
String print = res.getString("Name");
list.addElement(print+" "+id);
}
}
else
{
if(present == 1 && res.getInt("Shortlist") == 0)
{
id = res.getInt("Candidate");
String print = res.getString("Name");
list.addElement(print+" "+id);
}
}
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
});
This is my first question on Stack Overflow,so wish me luck.
I am doing a classification process over a Lucene index with java and i need to update a document field named category. I have been using Lucene 4.2 with the index writer updateDocument() function for that purpose and its working very well, except for the deletion part. Even if i use the forceMergeDeletes() function after the update the index show me some already deleted documents. For example, if I run the classification over an index with 1000 documents the final amount of documents in the index remain the same and work as expected, but when I increase the index documents to 10000 the index shows some already deleted documents but not all. So, how can I actually erase those deleted documents from index?
Here is some snippets of my code:
public static void main(String[] args) throws IOException, ParseException {
///////////////////////Preparing config data////////////////////////////
File indexDir = new File("/indexDir");
Directory fsDir = FSDirectory.open(indexDir);
IndexWriterConfig iwConf = new IndexWriterConfig(Version.LUCENE_42, new WhitespaceSpanishAnalyzer());
iwConf.setOpenMode(IndexWriterConfig.OpenMode.APPEND);
IndexWriter indexWriter = new IndexWriter(fsDir, iwConf);
IndexReader reader = DirectoryReader.open(fsDir);
IndexSearcher indexSearcher = new IndexSearcher(reader);
KNearestNeighborClassifier classifier = new KNearestNeighborClassifier(100);
AtomicReader ar = new SlowCompositeReaderWrapper((CompositeReader) reader);
classifier.train(ar, "text", "category", new WhitespaceSpanishAnalyzer());
System.out.println("***Before***");
showIndexedDocuments(reader);
System.out.println("***Before***");
int maxdoc = reader.maxDoc();
int j = 0;
for (int i = 0; i < maxdoc; i++) {
Document doc = reader.document(i);
String clusterClasif = doc.get("category");
String text = doc.get("text");
String docid = doc.get("doc_id");
ClassificationResult<BytesRef> result = classifier.assignClass(text);
String classified = result.getAssignedClass().utf8ToString();
if (!classified.isEmpty() && clusterClasif.compareTo(classified) != 0) {
Term term = new Term("doc_id", docid);
doc.removeField("category");
doc.add(new StringField("category",
classified, Field.Store.YES));
indexWriter.updateDocument(term,doc);
j++;
}
}
indexWriter.forceMergeDeletes(true);
indexWriter.close();
System.out.println("Classified documents count: " + j);
System.out.println();
reader.close();
reader = DirectoryReader.open(fsDir);
System.out.println("Deleted docs: " + reader.numDeletedDocs());
System.out.println("***After***");
showIndexedDocuments(reader);
}
private static void showIndexedDocuments(IndexReader reader) throws IOException {
int maxdoc = reader.maxDoc();
for (int i = 0; i < maxdoc; i++) {
Document doc = reader.document(i);
String idDoc = doc.get("doc_id");
String text = doc.get("text");
String category = doc.get("category");
System.out.println("Id Doc: " + idDoc);
System.out.println("Category: " + category);
System.out.println("Text: " + text);
System.out.println();
}
System.out.println("Total: " + maxdoc);
}
I have spend many hours looking for a solution to this, someones say that the deleted documents in the index are not important and that eventually they will be erased when we keep adding documents to the index, but I need to control that process in a way I can iterate over the index documents at any time and that the documents I retrieve are actually the lived ones. Lucene versions previous to 4.0 had a function in the IndexReader class named isDeleted(docId) that gives if a document has been marked has deleted, that could be just half of the solution to my problem but I have not found a way to do that with the version 4.2 of Lucene. If you know how to do that I really appreciate if you share it.
You can check is a document is deleted is the MultiFields class, like:
Bits liveDocs = MultiFields.getLiveDocs(reader);
if (!liveDocs.get(docID)) ...
So, working this into your code, perhaps something like:
int maxdoc = reader.maxDoc();
Bits liveDocs = MultiFields.getLiveDocs(reader);
for (int i = 0; i < maxdoc; i++) {
if (!liveDocs.get(docID)) continue;
Document doc = reader.document(i);
String idDoc = doc.get("doc_id");
....
}
By the way, sounds like you have previously been working with 3.X, and are now on 4.X. The Lucene Migration Guide is very helpful for these understanding these sorts of changes between versions, and how to resolve them.
the swt file dialog will give me an empty result array if I select too much files (approx. >2500files). The listing shows you how I use this dialog. If i select too many sound files, the syso will show 0. Debugging tells me, that the files array is empty in this case. Is there any way to get this work?
FileDialog fileDialog = new FileDialog(mainView.getShell(), SWT.MULTI);
fileDialog.setText("Choose sound files");
fileDialog.setFilterExtensions(new String[] { new String("*.wav") });
Vector<String> result = new Vector<String>();
fileDialog.open();
String[] files = fileDialog.getFileNames();
for (int i = 0, n = files.length; i < n; i++) {
if( !files[i].contains(".wav")) {
System.out.println(files[i]);
}
StringBuffer stringBuffer = new StringBuffer();
stringBuffer.append(fileDialog.getFilterPath());
if (stringBuffer.charAt(stringBuffer.length() - 1) != File.separatorChar) {
stringBuffer.append(File.separatorChar);
}
stringBuffer.append(files[i]);
stringBuffer.append("");
String finalName = stringBuffer.toString();
if( !finalName.contains(".wav")) {
System.out.println(finalName);
}
result.add(finalName);
}
System.out.println(result.size())
;
I've looked at the FileDialog source code and I'm afraid, there is an upper boundary. A 32kB byte buffer for all 0-terminated filenames (if I understood it correctly).
So calculating with your values, if the medium size of your filname strings is around 12 characters, then you've hit exactly that upper boundary.
So the only way out is to select the files in two or more steps.