Common Hypernym between two words using WordNet (JWI) - JAVA

Common Hypernym between two words using WordNet (JWI) - JAVA - java

Does anyone know how a good way to retrieve the first common hypernym between two words? I can access the first level (immediate parent) from a given word, but I'm stuck on how to retrieve all hypernyms ("going up") from this word until it matches another word. The idea is to identify where/when/which two words can be considered "the same" through WordNet according with their root (if not found it should continue until the end of the words in wordnet). I found some topics here but for Python and Perl, nothing specific for this problem in JAVA
I'm using JWI (2.4.0) to access SynsetID, WordID and other information from WordNet. If there is a simpler API that does the job is also welcome. Here below is the method that provides the hypernym I mentioned.
public void getHypernyms(IDictionary dict_param, String lemma_param) throws IOException {
dict_param.open();
// get the synset
IIndexWord idxWord = dict_param.getIndexWord(lemma_param, POS.NOUN);
// 1st meaning
IWordID wordIDb = idxWord.getWordIDs().get(0);
IWord word = dict_param.getWord(wordIDb);
ISynset synset = word.getSynset();
System.out.println("Synset = " + synset);
// get the hypernyms by pointing a list of <types> in the words
List<ISynsetID> hypernyms = synset.getRelatedSynsets(Pointer.HYPERNYM);
// print out each h y p e r n y m s id and synonyms
List<IWord> words, wordsb;
for (ISynsetID sid : hypernyms) {
words = dict_param.getSynset(sid).getWords();
System.out.println("Lemma: " + word.getLemma());
System.out.print("Hypernonyms = " + sid + " {");
for (Iterator<IWord> i = words.iterator(); i.hasNext();) {
System.out.print(i.next().getLemma());
if (i.hasNext()) {
System.out.print(", ");
}
}
System.out.println("}");
}
}
Providing a dictionary and the word "dog" we have as a result (as you can see I'm just usingn the first meaning to execute this method):
Synset = SYNSET{SID-02084071-N : Words[W-02084071-N-1-dog, W-02084071-N-2 domestic_dog, W-02084071-N-3-Canis_familiaris]}
Lemma: dog Hypernonyms = SID-02083346-N {canine, canid}
Lemma: dog Hypernonyms = SID-01317541-N {domestic_animal, domesticated_animal}

For those who might be interested. After some time I figured it out.
public List<ISynsetID> getListHypernym(ISynsetID sid_pa) throws IOException {
IDictionary dict = openDictionary();
dict.open(); //Open the dictionary to start looking for LEMMA
List<ISynsetID> hypernym_list = new ArrayList<>();
boolean end = false;
while (!end) {
hypernym_list.add(sid_pa);
List<ISynsetID> hypernym_tmp = dict.getSynset(sid_pa).getRelatedSynsets(Pointer.HYPERNYM);
if (hypernym_tmp.isEmpty()) {
end = true;
} else {
sid_pa = hypernym_tmp.get(0);//we will stick with the first hypernym
}
}
//for(int i =0; i< hypernym_list.size();i++){
// System.out.println(hypernym_list.get(i));
//}
return hyp;
}

Related

Checking if item lore contains contains string (loren.contains("§eSigned from "))

I only want to check for:
if (lore.contains("§eSigned of ")) {
but it doesn't get that it does contain "§eSigned of "
I wrote a Minecraft Command /sign you can add a lore to an item ("Signed of playerrank | playername").
Then i wanted to add an /unsign command to remove this lore.
ItemStack is = p.getItemInHand();
ItemMeta im = is.getItemMeta();
List<String> lore = im.hasLore() ? im.getLore() : new ArrayList<String>();
if (lore.contains("§eSigned of " + getChatName(p))) { // this line is important!
for (int i = 0; i < 3; i++) {
int size = lore.size();
lore.remove(size - 1);
}
im.setLore(lore);
is.setItemMeta(im);
p.setItemInHand(is);
sendMessage(p, "§aThis item is no longer signed");
} else {
sendMessage(p, "§aThis item is not signed!");
}
return CommandResult.None;
Everything works fine until you e.g. change your name. than you can't remove the sign because getChatName(p) has changed.
To fix this i only want to check
if (lore.contains("§eSigned of ")) {
but than it doesn't get it and returns false. (it says lore does not contain "§eSigned of ")
I tried a lot but it only works with the string "§eSigned of " and getChatName(p).
As the documentation "contains" searches for the specific string so it should work as I thought right?
Add:
getChatName(p) returns the rank of the player and the playername like: "Member | domi"
sendMessage(p, "") sends a simple message in the Minecraft chat

The problem you run into is that contains(String) looks for a matching string. What you search for is a check if any string in the list starts with "§eSigned of ".
I would suggest adding a function isSignedItem like this:
private boolean isSignedItem(List<String> lore) {
for (String st : lore)
if (st.startsWith("§eSigned of "))
return true;
return false;
}
and then to use this function to check if the item is signed or not:
[...]
List<String> lore = im.hasLore() ? im.getLore() : new ArrayList<String>();
if (isSignedItem(lore)) { // this line is important!
for (int i = 0; i < 3; i++) {
int size = lore.size();
lore.remove(size - 1);
}
[...]

Parsing input from file, delimiters, loops, java

The overall project is creating a system manager for airports. It keeps track of airports, flights, seating sections, seats and other relevent info for each of those catagories. The initial system is set up by importing from a file that's formatted a certain way. I'm having problems parsing the file properly to set up the initial system. the data is parsed from the file and used as method parameters to create the objects: Airport, Airline, Flight, FlightSection, and Seat.
the formatting is:
[list-of-airport-codes] {list-of-airlines}
list-of-airport-codes ::= comma-separated strings
list-of-airlines ::= airline-name1[flightinfo-list1], airline-name2[flightinfo-list2], airlinename3[flightinfo-list3], …
flightinfo-list ::= flightID1|flightdate1|originAirportCode1|destinationAirportCode1[flightsectionlist1], flightID2|flightdate2|originAirportCode2|destinationAirportCode2[flightsection-list2], …
flightdate ::= year, month, day-of-month, hours, minutes
flightsection-list ::= sectionclass: seat-price: layout: number-of-rows, …
sectionclass ::= F, B, E (for first class, business class, economy class)
layout ::= S, M, W (different known seating layouts)
example:
[DEN,NYC,SEA,LAX]{AMER[AA1|2018,10,8,16,30|DEN|LAX[E:200:S:4,F:500:S:2],
AA2|2018,8,9,7,30|LAX|DEN[E:200:S:5,F:500:S:3], …], UNTD[UA21|2018,11,8,12,30|NYC|SEA[E:300:S:6,F:800:S:3], UA12|2018,8,9,7,30|SEA|DEN[B:700:S:5, F:1200:S:2], …], FRONT[…], USAIR[…]}
I tried brute forcing it using a combination of delimiters and while loops. The code successfully creates the Airports, first Airline and Flighsections, but when it gets to creating the second airline it crashes, because i'm not looping properly, and having a hard time getting it right. My code for it as of now, is a mess, and if you're willing to look at it, I would appreciate any constructive input. My question is what would be a better way to approach this? A different design approach? Maybe a smarter way to use the delimiters?
Thanks in advance for your help!!
here's what i've tried.
private void readFile(File file){
System.out.println("reading file");
Scanner tempScan;
String result;
String temp = "";
scan.useDelimiter("\\[|\\{");
try{
// AIRPORTS
result = scan.next();
tempScan = new Scanner(result);
tempScan.useDelimiter(",|\\]");
while(tempScan.hasNext()){
temp = tempScan.next();
sysMan.createAirport(temp);
}
tempScan.close();
/* AIRLINE
* FLIGHT
* FLIGHTSECTION
*/
do{
// AIRLINE (loop<flight & fsection>)
result = scan.next();
sysMan.createAirline(result);
// FLIGHT
result = scan.next();
tempScan = new Scanner(result);
do{
tempScan.useDelimiter(",|\\|");
ArrayList flightInfo = new ArrayList();
while(tempScan.hasNext()){
if(tempScan.hasNextInt()){
flightInfo.add(tempScan.nextInt());
} else {
flightInfo.add(tempScan.next());
}
}
tempScan.close();
sysMan.createFlight(sysMan.getLastAddedAirline(),(String)flightInfo.get(0), (int)flightInfo.get(1), (int)flightInfo.get(2), (int)flightInfo.get(3), (int)flightInfo.get(4), (int)flightInfo.get(5), (String)flightInfo.get(6), (String)flightInfo.get(7));
// FLIGHTSECTION (loop<itself>)
result = scan.next();
tempScan = new Scanner(result);
tempScan.useDelimiter(",|:|\\]");
ArrayList sectInfo = new ArrayList();
int i = 1;
while(!temp.contains("|")){
if(tempScan.hasNextInt()){
sectInfo.add(tempScan.nextInt());
} else {
temp = tempScan.next();
if(temp.equals(""))
break;
char c = temp.charAt(0);
sectInfo.add(c);
}
if(i == 4){
sysMan.createSection(sysMan.getLastAddedAirline(), sysMan.getLastAddedFlightID(), (char)sectInfo.get(0), (int)sectInfo.get(1), (char)sectInfo.get(2), (int)sectInfo.get(3));
i = 1;
sectInfo = null;
sectInfo = new ArrayList();
continue;
}
i++;
}
}while(!temp.equals("\\s+"));
}while(!temp.contains("\\s+"));
}catch(NullPointerException e){
System.err.println(e);
}
}

I'd rather chunk it down by regexp mathing the outer bounds, have a look, I took it a couple of levels broken.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class Tokeni {
static String yolo = "[DEN,NYC,SEA,LAX]{AMER["
+ "AA1|2018,10,8,16,30|DEN|LAX[E:200:S:4,F:500:S:2],"
+ "AA2|2018,8,9,7,30|LAX|DEN[E:200:S:5,F:500:S:3]],"
+ "UNTD[UA21|2018,11,8,12,30|NYC|SEA[E:300:S:6,F:800:S:3],"
+ "UA12|2018,8,9,7,30|SEA|DEN[B:700:S:5, F:1200:S:2]]}";
public static void main(String[] args) {
Matcher airportCodesMatcher = Pattern.compile("\\[(.*?)\\]").matcher(yolo);
airportCodesMatcher.find();
String[] airportCodes = airportCodesMatcher.group(1).split(",");
Matcher airLinesMatcher = Pattern.compile("\\{(.*?)\\}").matcher(yolo);
airLinesMatcher.find();
String airLinesStr = airLinesMatcher.group(1);
System.out.println(airLinesStr);
Pattern airLinePattern = Pattern.compile("\\D+\\[(.*?)\\]\\]");
Matcher airLineMatcher = airLinePattern.matcher(airLinesStr);
while( airLineMatcher.find() ) {
String airLineStr = airLineMatcher.group(0).trim();
if(airLineStr.startsWith(",")) {
airLineStr = airLineStr.substring(1, airLineStr.length()).trim();
}
System.out.println(airLineStr);
Matcher airLineNameMatcher = Pattern.compile("[A-Z]+").matcher(airLineStr);
airLineNameMatcher.find();
String airLineName = airLineNameMatcher.group(0).trim();
System.out.println(airLineName);
airLineStr = airLineStr.substring(airLineStr.indexOf("[")+1, airLineStr.length());
Matcher airLineInfoMatcher = Pattern.compile("\\D+(.*?)\\]").matcher(airLineStr);
while(airLineInfoMatcher.find()) {
String airLineInfoStr = airLineInfoMatcher.group(0).trim();
if(airLineInfoStr.startsWith(",")) {
airLineInfoStr = airLineInfoStr.substring(1, airLineInfoStr.length()).trim();
}
System.out.println(airLineInfoStr);
}
}
}
}

Removing duplicate key-value pairs in a map with values being in a list

Below is my code to detect abbreviations and their long forms. The code loops over a line in a document, loops over each word of that line and identifies an acronym candidate. It then again loops over each line of the document to find an appropriate long form for the abbreviation. My issue is if an acronym occurs multiple times in a document my output contains multiple instances of it. I just want to print an acronym only once with all its possible long forms. Here's my code:
public static void main(String[] args) throws FileNotFoundException
{
BufferedReader in = new BufferedReader(new FileReader("D:\\Workspace\\resource\\SampleSentences.txt"));
String str=null;
ArrayList<String> lines = new ArrayList<String>();
String matchingLongForm;
List <String> matchingLongForms = new ArrayList<String>() ;
List <String> shortForm = new ArrayList<String>() ;
Map<String, List<String>> abbreviationPairs = new HashMap<String, List<String>>();
try
{
while((str = in.readLine()) != null){
lines.add(str);
}
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
String[] linesArray = lines.toArray(new String[lines.size()]);
// document wide search for abbreviation long form and identifying several appropriate matches
for (String line : linesArray){
for (String word : (Tokenizer.getTokenizer().tokenize(line))){
if (isValidShortForm(word)){
for (int i = 0; i < linesArray.length; i++){
matchingLongForm = extractBestLongForm(word, linesArray[i]);
//shortForm.add(word);
if (matchingLongForm != null && !(matchingLongForms.contains(matchingLongForm))){
matchingLongForms.add(matchingLongForm);
//System.out.println(matchingLongForm);
abbreviationPairs.put(word, matchingLongForms);
//matchingLongForms.clear();
}
}
if (abbreviationPairs != null){
//for(abbreviationPairs.)
System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs);
abbreviationPairs.clear();
matchingLongForms.clear();
//System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairsNew);
}
else
continue;
}
}
}
}
Here's the current output:
Abbreviation Pair: {GLBA=[Gramm Leach Bliley act]}
Abbreviation Pair: {NCUA=[National credit union administration]}
Abbreviation Pair: {FFIEC=[Federal Financial Institutions Examination Council]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {CFR=[comments for the Report]}
Abbreviation Pair: {OFAC=[Office of Foreign Assets Control]}

Try to use java.util.Set to store your matching short forms and long forms. From the javadoc of the class:
... If this set already contains the element, the call leaves the set unchanged and returns false. In combination with the restriction on constructors, this ensures that sets never contain duplicate elements...

You want a key value pair for abbreviation and text. So you should use Map.
A map cannot contain duplicate keys; each key can map to at most one value.
The Problem is in the position of the output and not in the map.
You try to output in the loop, so the Map is shown multiple time.
Move the code outside the loop:
if (abbreviationPairs != null){
//for(abbreviationPairs.)
System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs);
abbreviationPairs.clear();
matchingLongForms.clear();
//System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairsNew);
}

Here's the solution
Thanks to code_angel and Holger
Move the printing code outside the loop and create a new list for every matchingLongForm.
for (String line : linesArray){
for (String word : (Tokenizer.getTokenizer().tokenize(line))){
if (isValidShortForm(word)){
for (int i = 0; i < linesArray.length; i++){
matchingLongForm = extractBestLongForm(word, linesArray[i]);
List <String> matchingLongForms = new ArrayList<String>() ;
if (matchingLongForm != null && !(matchingLongForms.contains(matchingLongForm))&& !(abbreviationPairs.containsKey(word))){
matchingLongForms.add(matchingLongForm);
//System.out.println(matchingLongForm);
abbreviationPairs.put(word, matchingLongForms);
//matchingLongForms.clear();
}
}
}
}
}
if (abbreviationPairs != null){
System.out.println("Abbreviation Pair:" + "\t" + abbreviationPairs);
//abbreviationPairs.clear();
//matchingLongForms.clear();
}
}
The new output:
Abbreviation Pair: {NCUA=[National credit union administration], FFIEC=[Federal Financial Institutions Examination Council], OFAC=[Office of Foreign Assets Control], MSSP=[Managed Security Service Providers], IS=[Information Systems], SLA=[Service level agreements], CFR=[comments for the Report], MIS=[Management Information Systems], IDS=[Intrusion detection systems], TSP=[Technology Service Providers], RFI=[risk that FIs], EIC=[Examples of in the cloud], TIER=[The institution should ensure], BCP=[Business continuity planning], GLBA=[Gramm Leach Bliley act], III=[It is important], FI=[Financial Institutions], RFP=[Request for proposal]}

How to use Stanford CoreNLP POS-tags to retrieve synsets from WordNet?

I'm working with Java in Eclipse, & Stanford CoreNLP. I wanna know how to use the POS tags generated by Stanford CoreNLP, to retrieve the synsets of tagged words from WordNet. (I tokenized the sentence before POS-tagging)
Since WordNet has POS-tagged synsets, I'm assuming it can be done. I looked up a few tutorials as well, but there's nothing much helpful.

the following code gets all synsets of searchWord+POS tag, then from them prints their glosses. you can change POS.NOUN to seach other forms of the searchword:
public int stackoverflow(String searchWord) throws IOException {
String s = null;
IDictionary dict = dicitionaryFactory();
try {
IIndexWord idxWord = dict.getIndexWord(searchWord, POS.NOUN);
for (int i = 0; i < idxWord.getWordIDs().size(); i++) {
IWordID wordID = idxWord.getWordIDs().get(i); // ist meaning
edu.mit.jwi.item.IWord word = dict.getWord(wordID);
edu.mit.jwi.item.ISynset synset = word.getSynset();
List<edu.mit.jwi.item.IWord> words;
System.out.println(synset.getGloss());
}
return -1;
} catch (Exception e) {
return -2;
}
}

How to check if the subdomain is also from same domain using java

i have a list of url's i need to filter specific domain and subdomain. say i have some domains like
http://www.example.com
http://test.example.com
http://test2.example.com
I need to extract urls which from domain example.com.

Working on project that required me to determine if two URLs are from the same sub domain (even when there are nested domains). I worked up a modification from the guide above. This holds out pretty well thus far:
public static boolean isOneSubdomainOfTheOther(String a, String b) {
try {
URL first = new URL(a);
String firstHost = first.getHost();
firstHost = firstHost.startsWith("www.") ? firstHost.substring(4) : firstHost;
URL second = new URL(b);
String secondHost = second.getHost();
secondHost = secondHost.startsWith("www.") ? secondHost.substring(4) : secondHost;
/*
Test if one is a substring of the other
*/
if (firstHost.contains(secondHost) || secondHost.contains(firstHost)) {
String[] firstPieces = firstHost.split("\\.");
String[] secondPieces = secondHost.split("\\.");
String[] longerHost = {""};
String[] shorterHost = {""};
if (firstPieces.length >= secondPieces.length) {
longerHost = firstPieces;
shorterHost = secondPieces;
} else {
longerHost = secondPieces;
shorterHost = firstPieces;
}
//int longLength = longURL.length;
int minLength = shorterHost.length;
int i = 1;
/*
Compare from the tail of both host and work backwards
*/
while (minLength > 0) {
String tail1 = longerHost[longerHost.length - i];
String tail2 = shorterHost[shorterHost.length - i];
if (tail1.equalsIgnoreCase(tail2)) {
//move up one place to the left
minLength--;
} else {
//domains do not match
return false;
}
i++;
}
if (minLength == 0) //shorter host exhausted. Is a sub domain
return true;
}
} catch (MalformedURLException ex) {
ex.printStackTrace();
}
return false;
}
Figure I'd leave it here for future reference of a similar problem.

I understand you are probably looking for a fancy solution using URL class or something but it is not required. Simply think of a way to extract "example.com" from each of the urls.
Note: example.com is essentially a different domain than say example.net. Thus extracting just "example" is technically the wrong thing to do.
We can divide a sample url say:
http://sub.example.com/page1.html
Step 1: Split the url with delimiter " / " to extract the part containing the domain.
Each such part may be looked at in form of the following blocks (which may be empty)
[www][subdomain][basedomain]
Step 2: Discard "www" (if present). We are left with [subdomain][basedomain]
Step 3: Split the string with delimiter " . "
Step 4: Find the total number of strings generated from the split. If there are 2 strings, both of them are the target domain (example and com). If there are >=3 strings, get the last 3 strings. If the length of last string is 3, then the last 2 strings comprise the domain (example and com). If the length of last string is 2, then the last 3 strings comprise the domain (example and co and uk)
I think this should do the trick (I do hope this wasn't a homework :D )
//You may clean this method to make it more optimum / better
private String getRootDomain(String url){
String[] domainKeys = url.split("/")[2].split("\\.");
int length = domainKeys.length;
int dummy = domainKeys[0].equals("www")?1:0;
if(length-dummy == 2)
return domainKeys[length-2] + "." + domainKeys[length-1];
else{
if(domainKeys[length-1].length == 2) {
return domainKeys[length-3] + "." + domainKeys[length-2] + "." + domainKeys[length-1];
}
else{
return domainKeys[length-2] + "." + domainKeys[length-1];
}
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Common Hypernym between two words using WordNet (JWI) - JAVA - java

Related

Checking if item lore contains contains string (loren.contains("§eSigned from "))

Parsing input from file, delimiters, loops, java

Removing duplicate key-value pairs in a map with values being in a list

How to use Stanford CoreNLP POS-tags to retrieve synsets from WordNet?

How to check if the subdomain is also from same domain using java

Categories

Resources