How to remove duplicate words using java - java

I have text file. In that i want to remove duplicate words.My text file contains words like
அந்தப்
சத்தம்
அந்த
இந்தத்
பாப்பா
இந்த
கனவுத்
அந்த
கனவு
I remove duplicate words. But the words which has ending 'ப்' , 'த்' are consider as seperate words and not able to remove as duplicate word. If i remove 'ப்' , 'த்' it remove from some other words like பாப்பா, சத்தம். Please suggest any ideas to solve this problem using java.Thanks in advance.

I think I would use a Set with a custom comperator (such as a TreeSet). That way you can define equals any way you like.

I don't understand the given language (google translate's guess is Tamil), but from your question I read, that there are special rules for 'equality' for words written in that language - like words can be equal even if they're written differently (e.g. with different endings).
So you may want to wrap the strings containing words of that language in special object where you can define a custom 'equals' method, like this:
public class TamilWord {
String writtenWord = null;
public TamilWord(String writtenWord) {
this.writtenWord = writtenWord;
}
public String getWrittenWord() {
return writtenWord;
}
#Overwrite
public boolean equals(Object other) {
// Define your custom rules here, so that two words that
// are written differently may be considered as equal
}
}
Then you can create TamilWord objects for all parsed Strings and drop them into
a Set. So if we have the word abcd and abcD which are different in writing but according to rules considered equal, only one of those will be added to the set.

Use a scanner to scan in each line as a string into a set then write the strings in the set to a file.

First you should explain us how you parse your file, as it seems that your tokenization is not working appropriately. Then, to my mind, the obvious suggestion to a query for unduplication is to use a Set (and even a TreeSet) which should ensure uniqueness of your elements according to given Set contains rules.

My way to solve this:
Read word by word and put it to java.util.Set<TheWord>. Finally, you will have the Set with no duplicates. You also should define TheWord class:
class TheWord {
String word;
public TheWord() {}
public String getWord() {
return word;
}
public void setWord(String word) {
this.word = word;
}
public boolean equals(TheWord o) {
// put here your specific way to compare words
// taking into account your language rules and considerations
}
}

Related

LinkedList Nodes

I have a question about an assignment that i am being required to complete.
Write a menu driven program that either accepts words and their meanings, or displays the list of words in lexicographical order (i.e. as in a dictionary). When an entry is to be added to the dictionary you must first enter the word as one string, and then enter the meaning as separate string. A word may have more than one meaning, and may be entered at separate times. When this occurs, place each successive meaning on a separate line. This new meaning must be preceded by a dash. For example, if you enter the following words and with their meanings in the following order: Library, School, Cup, and School, then your output should be a display as shown in the Figure 1.
Use the concept of linked list to carryout this exercise. You will need at minimum the following classes:
• A WordMeaning class that hold the name of a word and its meaning.
• A WordMeaningNode class that creates the node of information and its link field.
• A WordList class that creates and maintain a linked list of words and their meanings.
• A Dictionary class that test your classes.
The question concerns the second bullet. I am not sure about how to go about creating a constructor for WordMeaningNode. This is what I have so far:
import java.lang.*;
import java.util.*;
public class WordMeaningNode
{
String information;
WordMeaningNode locale;
public WordMeaningNode(WordMeaning data)
//WordMeaning is a class that takes two strings(Word and definition) and stores it.
//Entry is a method that returns the strings stored in WordMeaning.
{
information = data.Entry();
locale = null;
}
public String getMeaning()
{
return information;
}
public WordMeaningNode getNext()
{
return locale.getNext();
}
}
A couple things:
First, I think you would be better passing in the String that results from data.entry(). That was you can create your constructors in the following manner:
public WordMeaningNode(String entry){
int index = entry.indexOf(',') + 1; //Assuming your String is comma delimited.
information = entry.substring(0, index);
String next = entry.substring(index);
if(next.isEmpty()){
locale = null;
} else {
locale = new WordMeaningNode(next);
}
}
Second, is your getNext() method working correctly? It seems like that would always give a null pointer exception.
Good luck!

Dictionary Program using Arraylist and Text File

Currently I am writing a program, for learning purposes, that is more or less a user defined dictionary. You are given a word, and then you input the corresponding word in English, and then it should tell you whether or not it is true or false, depending on what was put in as a answer previously.
Now, the trouble I am having is that I am currently unable to figure out how to check if the user input is correct, as I do not know how to compare the 2 values within the arraylist.
Currently I have this:
public void add(String question, String answer) throws FileNotFoundException, IOException
{
wordlist.add(new WordPair(question, answer));
}
to add new elements to the array, which is just 2 strings, and then I am stuck on this:
public boolean checkGuess(String question, String quess)
{
}
which is where I want to compare the 2 strings. Any help is much appreciated.
Override the equals method in your Word class and simply do :
public boolean checkGuess(String question, String quess)
{
return wordlist.contains(new WordPair(question, quess));
}
I assumed that each question is unique.
Note that a Map would be a better implementation.
To compare two strings you would use the equals() method:
String a = "a";
String b = "a";
if(a.equals(b)){
//TRUE
}
To cycle through the array list you can use a for each loop
for(WordPair pair : wordlist){
if(guess.equals(pair.getAnswer())){
// Equals the word
}
}
alternatively equalsIgnoreCase() can be used if the capitalisation of the word is not considered important.
String a = "A";
String b = "a";
if(a.equalsIgnoreCase(b)){
// TRUE
}

Determine Subclass Type of an Object of the Parent Class

I have this problem where there are several parts in my code where I check if these certain conditions are met so that I can understand if what I am checking is of one type or the other. this ends up becoming large if else trees because I am making lots of checks, the same checks in each method, and there are several different types the thing I am checking can be. This I know can be solved using objects!
Specifically, the things I am checking are 4 string values from a file. based on these string values, the 4 strings together can make one of 3 types. Rather than making these same checks every time I need to get the type the 4 strings make up, I am wondering if I can create a general object given these 4 strings and then determine if that object is an instanceof either specific class 1, 2, or 3. Then I would be able to cast that general object to the specific object.
Say I name the general object that the 4 strings create called Sign. I would take those 4 strings and create a new Sign object:
Sign unkownType = new Sign(string1, string2, string3, string4);
I need to check which specific type of sign this sign is.
EDIT:
for more detail, the Signs I am checking are not symbols like "+" or "-", they are signs with text like you would see on the road. there are 4 lines on each sign and they need to be checked to see if each line evaluates to match a specific type of sign.
The first line of SignType1 will be different of the first line of SignType2, and I want to take those 4 lines (Strings) and pass it onto an object and use that object throughout my code to get the values from it rather than making the same checks in each method.
If you want me to show some code, I can, but it won't make much sense.
What you seem to asking for is a factory pattern
public interface ISign {
public void operation1();
public void operation2();
}
and a Factory class to generate classes based on input
public class SignGenerator {
public static ISign getSignObject(String str1,String str2, String str3, String str4) {
if(str1.equals("blah blah"))
return new FirstType();
if(str1.equals("blah blah2") && str2.equals("lorem ipsum"))
return new SecondType();
return new ThirdType();
}
}
public class FirstType implements ISign {
}
public class SecondType implements ISign {
}
public class ThirdType implements ISign {
}
Implement all Type specific logic in these classes so you can call them without checking with tons of if..else clauses first
From what I gathered from your statement.
Say: create the method that returns a certain object provided the given string is equal to whateva value you specify
//provided the objects to be returned are subtypes of Sign
public Sign getInstance(String first, String second, String third, String fourth)
{
if(first==null || second==null || third==null || fourth===null )
return null;
if(compare1.equals(first))
return new SignType1();
else
if(compare2.equals(second))
return new SignType2();
else
if(compare3.equals(third))
return new SignType3();
else
if(compare4.equals(fourth))
return new SignType4();
}
Above code checks and returns thee appropriet instance corresponding to the string passed
Hope that's what was your concern

Using a list inside a map (Java)

I'm using a HashMap in which I use an ArrayList as a value.
Like this:
Map<Movie, List<Grades>> gradedMovies = new HashMap();
I'm trying to create a method with which I could iterate through the values to see if a key(movie) already exists. If it does, I would like to add a new value(grade) into the list that is assigned to the particular key(movie). Something like this:
public void addGrade(Movie movie, Grade grade) {
// stuff here }
Ultimately I wan't to be able to print a Map which would display the Movies and its' grades after they've been added to the map.
How is this accomplished? Or is my whole approach (using a Map) totally wrong?
Thanks for any assistance. (This is homework)
I think you're on the right path, just make sure your movie object implements equals and hashCode so it can work as a true key for the hash map.
If you want pretty printing just implement the toString method.
public void addGrade(Movie movie, Grade grade) {
if (!gradedMovies.containsKey(movie)) {
gradedMovies.put(movie, new ArrayList());
}
gradedMovies.get(movie).add(grade);
}
hope this helps, Cheers!
You can use something like that:
public void addGrade(Movie movie, Grade grade) {
if (!gradedMovies.containsKey(movie)) {
gradedMovies.put(movie, new ArrayList<Grade>());
}
gradedMovies.get(movie).add(grade);
}
You need override the method equals
I don't know why you're looking for an index particularly - the point of a Map is that you can look up entries by their keys.
So as a starting point, the first line of your addGrade method could look like
List<grades> grades = gradedMovies.get(movie);
and you can hopefully take it from there. (Remember to look at the documentation to see what happens if the map doesn't contain the given movie yet...)
I could iterate through the values to see if a key(movie) already exists
You don't need to iterate through the map, just call gradedMovies.containsKey( movieToCheck ).
Note that when using Movie as a key you should provide a sensible implementation of equals() and hashCode().
You're doing ok! but you should consider a couple of things:
While finding a value on the map, your Movie object has to override equals and hashChode. Java will always use the equals method on comparations, mainly when it comes to automatic ones (like verifying if a list contains an item or, in this case, if a key value is equal to a given one). Remember that equals defines the uniquity of an item, so you should make a comparation based on a particulary unique attribute, like an identification number or (for this case) it's name.
To print the map, iterate over the keySet, either manually (enhanced "for" loop) or with an iterator (which can be obtained directly through the .iterator() method). For each movie, you print the list of grades in a similar fashion.
I don't know if you're familiar with String printing, but some special combination of characters can be added to a String to give it some sort of formatting. For example:
\n will insert a line break
\t is a tabulation
Hope this helps to erase some doubts. Good luck!.
Check out Guava's Multimap. That is exactly what it does.
private Multimap<Movie, Grade> map = ArrayListMultimap.create();
public void addGrade(Movie movie, Grade grade){
map.put(movie, grade);
}
It will take care of creating the list for you.
public void addGrade(Movie movie, Grade grade) {
boolean found = false;
for(Movie m : gradedMovies.keyset()) {
// compare the movies
if(/* match on movies */) {
gradedMovies.get(m).add(grade);
found = true;
}
}
if(!found) {
gradedMovies.put(movie, new ArrayList().add(grade));
}
}
gradedMovies.containsKey(movie);
for(Map.Entry<Movie,List<Grades>> entry : gradedMovies.entrySet()){
Movie key = entry.getKey();
}

Java using contains function to match string object ignore capital case?

I want that the contain function should return true even if the following are in capital letters
List<String> pformats= Arrays.asList("odt","ott","oth","odm","sxw","stw","sxg","doc","dot","xml","docx","docm","dotx","dotm","doc","wpd","wps","rtf","txt","csv","sdw","sgl","vor","uot","uof","jtd","jtt","hwp","602","pdb","psw","ods","ots","sxc","stc","xls","xlw","xlt","xlsx","xlsm","xltx","xltm","xlsb","wk1","wks","123","dif","sdc","vor","dbf","slk","uos","pxl","wb2","odp","odg","otp","sxi","sti","ppt","pps","pot","pptx","pptm","potx","potm","sda","sdd","sdp","vor","uop","cgm","bmp","dxf","emf","eps","met","pbm","pct","pcd","pcx","pgm","plt","ppm","psd","ras","sda","sdd","sgf","sgv","svm","tgs","tif","tiff","vor","wmf","xbm","xpm","jpg","jpeg","gif","png","pdf","log");
if(pformats.contains(extension)){
// do stuff
}
A Set is a better choice for a lookup.
private static final Set<String> P_FORMATS = new HashSet<String>(Arrays.asList(
"odt,ott,oth,odm,sxw,stw,sxg,doc,dot,xml,docx,docm,dotx,dotm,doc,wpd,wps,rtf,txt,csv,sdw,sgl,vor,uot,uof,jtd,jtt,hwp,602,pdb,psw,ods,ots,sxc,stc,xls,xlw,xlt,xlsx,xlsm,xltx,xltm,xlsb,wk1,wks,123,dif,sdc,vor,dbf,slk,uos,pxl,wb2,odp,odg,otp,sxi,sti,ppt,pps,pot,pptx,pptm,potx,potm,sda,sdd,sdp,vor,uop,cgm,bmp,dxf,emf,eps,met,pbm,pct,pcd,pcx,pgm,plt,ppm,psd,ras,sda,sdd,sgf,sgv,svm,tgs,tif,tiff,vor,wmf,xbm,xpm,jpg,jpeg,gif,png,pdf,log".split(","));
if(P_FORMATS.contains(extension.toLowerCase())){
// do stuff
}
Short answer: Will not work. You can't overwrite the contains, BUT: You can us the following code:
List<String> pformats= Arrays.asList("odt","ott","oth","odm","sxw","stw","sxg","doc","dot","xml","docx","docm","dotx","dotm","doc","wpd","wps","rtf","txt","csv","sdw","sgl","vor","uot","uof","jtd","jtt","hwp","602","pdb","psw","ods","ots","sxc","stc","xls","xlw","xlt","xlsx","xlsm","xltx","xltm","xlsb","wk1","wks","123","dif","sdc","vor","dbf","slk","uos","pxl","wb2","odp","odg","otp","sxi","sti","ppt","pps","pot","pptx","pptm","potx","potm","sda","sdd","sdp","vor","uop","cgm","bmp","dxf","emf","eps","met","pbm","pct","pcd","pcx","pgm","plt","ppm","psd","ras","sda","sdd","sgf","sgv","svm","tgs","tif","tiff","vor","wmf","xbm","xpm","jpg","jpeg","gif","png","pdf","log");
if(pformats.contains(extension.toLowerCase())){
}
This will make you extension to lowercase, and if within your Array are all extensions are already lowerCase, than it'll wokk.
Convert your List of extensions into a regular expression, compile it with the CASE_INSENSITVE flag, and use that.
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public final class Foo {
public static void main(final String... args) {
final Pattern p = Pattern.compile("odt|ott|oth|odm|sxw|stw|sxg|doc|dot|xml|docx|docm|dotx|dotm|doc|wpd|wps|rtf|txt|csv|sdw|sgl|vor|uot|uof|jtd|jtt|hwp|602|pdb|psw|ods|ots|sxc|stc|xls|xlw|xlt|xlsx|xlsm|xltx|xltm|xlsb|wk1|wks|123|dif|sdc|vor|dbf|slk|uos|pxl|wb2|odp|odg|otp|sxi|sti|ppt|pps|pot|pptx|pptm|potx|potm|sda|sdd|sdp|vor|uop|cgm|bmp|dxf|emf|eps|met|pbm|pct|pcd|pcx|pgm|plt|ppm|psd|ras|sda|sdd|sgf|sgv|svm|tgs|tif|tiff|vor|wmf|xbm|xpm|jpg|jpeg|gif|png|pdf|log", Pattern.CASE_INSENSITIVE);
// Will be true
System.out.println(p.matcher("bmp").matches());
// Will be false
System.out.println(p.matcher("quasar").matches());
}
}
This would probably be easier to read/maintain if you build the regex programatically, but I've left that as an exercise to the reader.
How about:
extension.toLowerCase()
?
Although I'm not sure 100% sure what contains() method will do in this example. You might need to stick your extensions into a Set.
Edit: No it wont work as the contains method checks for the existence of a particular Object. Your string, even with the same value, is a different Object. So yes either a) override the contains method, e.g loop through the array and do a string comparison or b) simpler, use a Set.
Edit 2: Apparently it will work per comments below as ArrayList.contains() checks for equality (so you will get a string match), but this seems to disagree with the top voted answer that says it wont.
If all your formats are lower case, then toLowerCase combined with a HashSet is the preferred solution.
If your formats are in mixed case (and shall stay this way, as you are using them for other things, too) you need a real case-insensitive comparison.
Then a TreeSet (or other SortedSet) with a case insensitive collator as the comparator will do. (It is not as fast as a HashSet, but will still be faster then the ArrayList (except for really small lists).)
Alternatively a HashSet variant using a custom hashCode and equals (or simply a normal HashSet on wrapper objects with a case insensitive implementation of equals and hashCode) would do fine.
Add this extended List class:
private static class ListIgnoreCase<String> extends java.util.LinkedList {
public ListIgnoreCase(Collection<String> c) {
super();
addAll(c);
}
public boolean containsIgnoreCase(java.lang.String toSearch) {
for (Object element : this)
if (java.lang.String.valueOf(element).equalsIgnoreCase(toSearch))
return true;
return false;
}
}
Now you can call asList like this:
if(new ListIgnoreCase(Arrays.asList("odt","ott","oth","odm"))
.containtsIgnoreCase(extension)) {
...
You can use IteracleUtils and Predicate from collections4 (apache).
List<String> pformats= Arrays.asList("odt","ott","oth","odm","sxw","stw","sxg","doc","dot","xml","docx","docm","dotx","dotm","doc","wpd","wps","rtf","txt","csv","sdw","sgl","vor","uot","uof","jtd","jtt","hwp","602","pdb","psw","ods","ots","sxc","stc","xls","xlw","xlt","xlsx","xlsm","xltx","xltm","xlsb","wk1","wks","123","dif","sdc","vor","dbf","slk","uos","pxl","wb2","odp","odg","otp","sxi","sti","ppt","pps","pot","pptx","pptm","potx","potm","sda","sdd","sdp","vor","uop","cgm","bmp","dxf","emf","eps","met","pbm","pct","pcd","pcx","pgm","plt","ppm","psd","ras","sda","sdd","sgf","sgv","svm","tgs","tif","tiff","vor","wmf","xbm","xpm","jpg","jpeg","gif","png","pdf","log");
Predicate<String> predicate = (s) -> StringUtils.equalsIgnoreCase(s, "JPG");
if(IterableUtils.matchesAny(pformats, predicate))
// do stuff
}
org.apache.commons.collections4.IterableUtils

Categories

Resources