Sorting array for Binary Search - java

I have an array of objects (telephone directory entries, stored in the form Entry( surname,initials,extension) ) which I would like to search efficiently. In order to do this I'm trying to use Arrays.binarySearch(). I have two separate methods for searching the array, one using names and the other using numbers. The array is sorted by Surname in alphabetical order as I insert each element in the correct place in my addEntry() method. I can use binarySearch() when searching by name as the array is sorted in alphabetical order, but the problem I have is the array is not sorted when I search by number. I have overridden compareTo() in my Entry class for comparing surnames, but when I search by number I need to sort my array in ascending order of numbers, I am unsure how to do this?
public int lookupNumberByName(String surname, String initials) {
int index = 0;
if (countElements() == directory.length) {
Entry lookup = new Entry(surname, initials);
index = Arrays.binarySearch(directory, lookup);
}
else if (countElements() != directory.length) {
Entry[] origArray = directory;
Entry[] cutArray = Arrays
.copyOfRange(directory, 0, countElements());
directory = cutArray;
Entry lookup = new Entry(surname, initials);
index = Arrays.binarySearch(directory, lookup);
directory = origArray;
}
return index;
}
I would like to do something like this for my LookupByNumber() method -
public int LookupByNumber(int extension) {
Entry[] origArray1 = directory;
Entry[] cutArray1 = Arrays.copyOfRange(directory, 0, countElements());
directory = cutArray1;
Arrays.sort(directory); //sort in ascending order of numbers
Entry lookup1 = new Entry(extension);
int index1 = Arrays.binarySearch(directory, lookup1);
String surname1 = directory[index1].getSurname();
String initals1 = directory[index1].getInitials();
directory = origArray1;
int arrayPos = lookupNumberByName(surname1,initials1);
return arrayPos;
My compareTo method -
public int compareTo(Entry other) {
return this.surname.compareTo(other.getSurname());
}
Help very much appreciated
edit - I realize arrays are not the best data structure for this, but I have been specifically asked to use an array for this task.
Update - How exactly does sort(T[] a, Comparator<? super T> c) work? when I try writing my own Comparator -
public class numberSorter implements Comparator<Entry> {
#Override
public int compare(Entry o1, Entry o2) {
if (o1.getExtension() > o2.getExtension()) {
return 1;
}
if (o1.getExtension() == o2.getExtension()) {
return 0;
}
if (o1.getExtension() < o2.getExtension()) {
return -1;
}
return -1;
}
}
And calling Arrays.sort(directory,new numberSorter()); I get the following exception -
java.lang.NullPointerException
at java.lang.String.compareTo(Unknown Source)
at project.Entry.compareTo(Entry.java:45)
at project.Entry.compareTo(Entry.java:1)
at java.util.Arrays.binarySearch0(Unknown Source)
at java.util.Arrays.binarySearch(Unknown Source)
at project.ArrayDirectory.LookupByNumber(ArrayDirectory.java:128)
at project.test.main(test.java:29)
What exactly am I doing wrong?

Rather than keeping the Entry objects in Arrays, keep them in Maps. For example, you'd have one Map that mapped the Surname to the Entry, and another that mapped the Extension to the Entry. You can then efficiently look up the entry by Surname or Extension by calling the get() method on the appropriate Map.
If the Map is a TreeMap, the lookup is about the same speed as a binary search (O log(n)). If you use a HashMap, it can be even faster once you get a large number of entries.

Related

Should I sort a hashmap that contains frequency with bucketsort or heapsort?

I have a hashmap in Java in this form HashMap<String, Integer> frequency. The key is a string where I hold the name of a movie and the value is the frequency of the said movie.
My program takes input from users so whenever someone is adding a video to favorite I go in the hashmap and I increment its frequency.
Now the problem is at one point I need to take the most k frequent movies. I've found that I could use bucketsort or heapsort in this leetcode problem (check the first comment), however I am not sure if it is more efficient in my case. My hashmap constantly updates, therefore I need to call the sorting algorithm again times if one frequency changed.
From my understanding, it takes O(N) time to build the map, where 'N' is the number of movies even with duplicates as it needs to add to the frequency, which gets me 'M' unique movie titles. Would that mean that heapsort will result in O(M * log(k)) and bucketsort O(M) for any given k?
Having a map that sorts on values (the thing you map to) isn't a thing, unfortunately. You could instead have a set whose keys sort themselves on frequency, but given that frequency is the key at that point, you couldn't look up entries in this set without knowing the frequency beforehand which eliminates the point of the exercise.
One strategy that comes to mind is to have 2 separate data structures. One serves to let you look up the actual object based on the name of the movie, the other is to be self-sorting:
#Data
public class MovieFrequencyTuple implements Comparable<MovieFrequencyTable> {
#NonNull private final String name;
private int frequency;
public void incrementFrequency() {
frequency++;
}
#Override public int compareTo(MovieFrequencyTuple other) {
int c = Integer.compare(frequency, other.frequency);
if (c != 0) return -c;
return name.compareTo(other.name);
}
}
and with that available to you:
SortedSet<MovieFrequencyTuple> frequencies = new TreeSet<>();
Map<String, MovieFrequencyTuple> movies = new HashMap<>();
public int increment(String movieName) {
MovieFrequencyTuple tuple = movies.get(name);
if (tuple == null) {
tuple = new MovieFrequencyTuple(name);
movies.put(name, tuple);
}
// Self-sorting data structures will just fail
// to do the job if you modify a sorting order on
// an object already in the collection. Thus,
// we take it out, modify, put it back in.
frequencies.remove(tuple);
tuple.incrementFrequency();
frequencies.add(tuple);
return tuple.getFrequency();
}
public int get(String movieName) {
MovieFrequencyTuple tuple = movies.get(movieName);
if (tuple == null) return 0;
return tuple.getFrequency();
}
public List<String> getTop10() {
var out = new ArrayList<String>();
for (MovieFrequencyTuple tuple : frequencies) {
out.add(tuple.getName());
if (out.size() == 10) break;
}
return out;
}
Each operation is amortized O(1) or O(logn), even the top10 operation. So, if you run a million times 'increment a movie's frequency, then obtain the top 10', with n = # of times we do that, then the worst case scenario is O(nlogn) performance.
NB: Uses lombok for constructors, getters, etc - if you don't like that, have your IDE generate these things.

How to match the exact string value in the list of comma separated string [duplicate]

I have a String[] with values like so:
public static final String[] VALUES = new String[] {"AB","BC","CD","AE"};
Given String s, is there a good way of testing whether VALUES contains s?
Arrays.asList(yourArray).contains(yourValue)
Warning: this doesn't work for arrays of primitives (see the comments).
Since java-8 you can now use Streams.
String[] values = {"AB","BC","CD","AE"};
boolean contains = Arrays.stream(values).anyMatch("s"::equals);
To check whether an array of int, double or long contains a value use IntStream, DoubleStream or LongStream respectively.
Example
int[] a = {1,2,3,4};
boolean contains = IntStream.of(a).anyMatch(x -> x == 4);
Concise update for Java SE 9
Reference arrays are bad. For this case we are after a set. Since Java SE 9 we have Set.of.
private static final Set<String> VALUES = Set.of(
"AB","BC","CD","AE"
);
"Given String s, is there a good way of testing whether VALUES contains s?"
VALUES.contains(s)
O(1).
The right type, immutable, O(1) and concise. Beautiful.*
Original answer details
Just to clear the code up to start with. We have (corrected):
public static final String[] VALUES = new String[] {"AB","BC","CD","AE"};
This is a mutable static which FindBugs will tell you is very naughty. Do not modify statics and do not allow other code to do so also. At an absolute minimum, the field should be private:
private static final String[] VALUES = new String[] {"AB","BC","CD","AE"};
(Note, you can actually drop the new String[]; bit.)
Reference arrays are still bad and we want a set:
private static final Set<String> VALUES = new HashSet<String>(Arrays.asList(
new String[] {"AB","BC","CD","AE"}
));
(Paranoid people, such as myself, may feel more at ease if this was wrapped in Collections.unmodifiableSet - it could then even be made public.)
(*To be a little more on brand, the collections API is predictably still missing immutable collection types and the syntax is still far too verbose, for my tastes.)
You can use ArrayUtils.contains from Apache Commons Lang
public static boolean contains(Object[] array, Object objectToFind)
Note that this method returns false if the passed array is null.
There are also methods available for primitive arrays of all kinds.
Example:
String[] fieldsToInclude = { "id", "name", "location" };
if ( ArrayUtils.contains( fieldsToInclude, "id" ) ) {
// Do some stuff.
}
Just simply implement it by hand:
public static <T> boolean contains(final T[] array, final T v) {
for (final T e : array)
if (e == v || v != null && v.equals(e))
return true;
return false;
}
Improvement:
The v != null condition is constant inside the method. It always evaluates to the same Boolean value during the method call. So if the input array is big, it is more efficient to evaluate this condition only once, and we can use a simplified/faster condition inside the for loop based on the result. The improved contains() method:
public static <T> boolean contains2(final T[] array, final T v) {
if (v == null) {
for (final T e : array)
if (e == null)
return true;
}
else {
for (final T e : array)
if (e == v || v.equals(e))
return true;
}
return false;
}
Four Different Ways to Check If an Array Contains a Value
Using List:
public static boolean useList(String[] arr, String targetValue) {
return Arrays.asList(arr).contains(targetValue);
}
Using Set:
public static boolean useSet(String[] arr, String targetValue) {
Set<String> set = new HashSet<String>(Arrays.asList(arr));
return set.contains(targetValue);
}
Using a simple loop:
public static boolean useLoop(String[] arr, String targetValue) {
for (String s: arr) {
if (s.equals(targetValue))
return true;
}
return false;
}
Using Arrays.binarySearch():
The code below is wrong, it is listed here for completeness. binarySearch() can ONLY be used on sorted arrays. You will find the result is weird below. This is the best option when array is sorted.
public static boolean binarySearch(String[] arr, String targetValue) {
return Arrays.binarySearch(arr, targetValue) >= 0;
}
Quick Example:
String testValue="test";
String newValueNotInList="newValue";
String[] valueArray = { "this", "is", "java" , "test" };
Arrays.asList(valueArray).contains(testValue); // returns true
Arrays.asList(valueArray).contains(newValueNotInList); // returns false
If the array is not sorted, you will have to iterate over everything and make a call to equals on each.
If the array is sorted, you can do a binary search, there's one in the Arrays class.
Generally speaking, if you are going to do a lot of membership checks, you may want to store everything in a Set, not in an array.
For what it's worth I ran a test comparing the 3 suggestions for speed. I generated random integers, converted them to a String and added them to an array. I then searched for the highest possible number/string, which would be a worst case scenario for the asList().contains().
When using a 10K array size the results were:
Sort & Search : 15
Binary Search : 0
asList.contains : 0
When using a 100K array the results were:
Sort & Search : 156
Binary Search : 0
asList.contains : 32
So if the array is created in sorted order the binary search is the fastest, otherwise the asList().contains would be the way to go. If you have many searches, then it may be worthwhile to sort the array so you can use the binary search. It all depends on your application.
I would think those are the results most people would expect. Here is the test code:
import java.util.*;
public class Test {
public static void main(String args[]) {
long start = 0;
int size = 100000;
String[] strings = new String[size];
Random random = new Random();
for (int i = 0; i < size; i++)
strings[i] = "" + random.nextInt(size);
start = System.currentTimeMillis();
Arrays.sort(strings);
System.out.println(Arrays.binarySearch(strings, "" + (size - 1)));
System.out.println("Sort & Search : "
+ (System.currentTimeMillis() - start));
start = System.currentTimeMillis();
System.out.println(Arrays.binarySearch(strings, "" + (size - 1)));
System.out.println("Search : "
+ (System.currentTimeMillis() - start));
start = System.currentTimeMillis();
System.out.println(Arrays.asList(strings).contains("" + (size - 1)));
System.out.println("Contains : "
+ (System.currentTimeMillis() - start));
}
}
Instead of using the quick array initialisation syntax too, you could just initialise it as a List straight away in a similar manner using the Arrays.asList method, e.g.:
public static final List<String> STRINGS = Arrays.asList("firstString", "secondString" ...., "lastString");
Then you can do (like above):
STRINGS.contains("the string you want to find");
With Java 8 you can create a stream and check if any entries in the stream matches "s":
String[] values = {"AB","BC","CD","AE"};
boolean sInArray = Arrays.stream(values).anyMatch("s"::equals);
Or as a generic method:
public static <T> boolean arrayContains(T[] array, T value) {
return Arrays.stream(array).anyMatch(value::equals);
}
You can use the Arrays class to perform a binary search for the value. If your array is not sorted, you will have to use the sort functions in the same class to sort the array, then search through it.
ObStupidAnswer (but I think there's a lesson in here somewhere):
enum Values {
AB, BC, CD, AE
}
try {
Values.valueOf(s);
return true;
} catch (IllegalArgumentException exc) {
return false;
}
Actually, if you use HashSet<String> as Tom Hawtin proposed you don't need to worry about sorting, and your speed is the same as with binary search on a presorted array, probably even faster.
It all depends on how your code is set up, obviously, but from where I stand, the order would be:
On an unsorted array:
HashSet
asList
sort & binary
On a sorted array:
HashSet
Binary
asList
So either way, HashSet for the win.
Developers often do:
Set<String> set = new HashSet<String>(Arrays.asList(arr));
return set.contains(targetValue);
The above code works, but there is no need to convert a list to set first. Converting a list to a set requires extra time. It can as simple as:
Arrays.asList(arr).contains(targetValue);
or
for (String s : arr) {
if (s.equals(targetValue))
return true;
}
return false;
The first one is more readable than the second one.
If you have the google collections library, Tom's answer can be simplified a lot by using ImmutableSet (http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/ImmutableSet.html)
This really removes a lot of clutter from the initialization proposed
private static final Set<String> VALUES = ImmutableSet.of("AB","BC","CD","AE");
In Java 8 use Streams.
List<String> myList =
Arrays.asList("a1", "a2", "b1", "c2", "c1");
myList.stream()
.filter(s -> s.startsWith("c"))
.map(String::toUpperCase)
.sorted()
.forEach(System.out::println);
One possible solution:
import java.util.Arrays;
import java.util.List;
public class ArrayContainsElement {
public static final List<String> VALUES = Arrays.asList("AB", "BC", "CD", "AE");
public static void main(String args[]) {
if (VALUES.contains("AB")) {
System.out.println("Contains");
} else {
System.out.println("Not contains");
}
}
}
Using a simple loop is the most efficient way of doing this.
boolean useLoop(String[] arr, String targetValue) {
for(String s: arr){
if(s.equals(targetValue))
return true;
}
return false;
}
Courtesy to Programcreek
the shortest solution
the array VALUES may contain duplicates
since Java 9
List.of(VALUES).contains(s);
Use the following (the contains() method is ArrayUtils.in() in this code):
ObjectUtils.java
public class ObjectUtils {
/**
* A null safe method to detect if two objects are equal.
* #param object1
* #param object2
* #return true if either both objects are null, or equal, else returns false.
*/
public static boolean equals(Object object1, Object object2) {
return object1 == null ? object2 == null : object1.equals(object2);
}
}
ArrayUtils.java
public class ArrayUtils {
/**
* Find the index of of an object is in given array,
* starting from given inclusive index.
* #param ts Array to be searched in.
* #param t Object to be searched.
* #param start The index from where the search must start.
* #return Index of the given object in the array if it is there, else -1.
*/
public static <T> int indexOf(final T[] ts, final T t, int start) {
for (int i = start; i < ts.length; ++i)
if (ObjectUtils.equals(ts[i], t))
return i;
return -1;
}
/**
* Find the index of of an object is in given array, starting from 0;
* #param ts Array to be searched in.
* #param t Object to be searched.
* #return indexOf(ts, t, 0)
*/
public static <T> int indexOf(final T[] ts, final T t) {
return indexOf(ts, t, 0);
}
/**
* Detect if the given object is in the given array.
* #param ts Array to be searched in.
* #param t Object to be searched.
* #return If indexOf(ts, t) is greater than -1.
*/
public static <T> boolean in(final T[] ts, final T t) {
return indexOf(ts, t) > -1;
}
}
As you can see in the code above, that there are other utility methods ObjectUtils.equals() and ArrayUtils.indexOf(), that were used at other places as well.
For arrays of limited length use the following (as given by camickr). This is slow for repeated checks, especially for longer arrays (linear search).
Arrays.asList(...).contains(...)
For fast performance if you repeatedly check against a larger set of elements
An array is the wrong structure. Use a TreeSet and add each element to it. It sorts elements and has a fast exist() method (binary search).
If the elements implement Comparable & you want the TreeSet sorted accordingly:
ElementClass.compareTo() method must be compatable with ElementClass.equals(): see Triads not showing up to fight? (Java Set missing an item)
TreeSet myElements = new TreeSet();
// Do this for each element (implementing *Comparable*)
myElements.add(nextElement);
// *Alternatively*, if an array is forceably provided from other code:
myElements.addAll(Arrays.asList(myArray));
Otherwise, use your own Comparator:
class MyComparator implements Comparator<ElementClass> {
int compareTo(ElementClass element1; ElementClass element2) {
// Your comparison of elements
// Should be consistent with object equality
}
boolean equals(Object otherComparator) {
// Your equality of comparators
}
}
// construct TreeSet with the comparator
TreeSet myElements = new TreeSet(new MyComparator());
// Do this for each element (implementing *Comparable*)
myElements.add(nextElement);
The payoff: check existence of some element:
// Fast binary search through sorted elements (performance ~ log(size)):
boolean containsElement = myElements.exists(someElement);
If you don't want it to be case sensitive
Arrays.stream(VALUES).anyMatch(s::equalsIgnoreCase);
Try this:
ArrayList<Integer> arrlist = new ArrayList<Integer>(8);
// use add() method to add elements in the list
arrlist.add(20);
arrlist.add(25);
arrlist.add(10);
arrlist.add(15);
boolean retval = arrlist.contains(10);
if (retval == true) {
System.out.println("10 is contained in the list");
}
else {
System.out.println("10 is not contained in the list");
}
Check this
String[] VALUES = new String[]{"AB", "BC", "CD", "AE"};
String s;
for (int i = 0; i < VALUES.length; i++) {
if (VALUES[i].equals(s)) {
// do your stuff
} else {
//do your stuff
}
}
Arrays.asList() -> then calling the contains() method will always work, but a search algorithm is much better since you don't need to create a lightweight list wrapper around the array, which is what Arrays.asList() does.
public boolean findString(String[] strings, String desired){
for (String str : strings){
if (desired.equals(str)) {
return true;
}
}
return false; //if we get here… there is no desired String, return false.
}
Use below -
String[] values = {"AB","BC","CD","AE"};
String s = "A";
boolean contains = Arrays.stream(values).anyMatch(v -> v.contains(s));
Use Array.BinarySearch(array,obj) for finding the given object in array or not.
Example:
if (Array.BinarySearch(str, i) > -1)` → true --exists
false --not exists
Try using Java 8 predicate test method
Here is a full example of it.
import java.util.Arrays;
import java.util.List;
import java.util.function.Predicate;
public class Test {
public static final List<String> VALUES =
Arrays.asList("AA", "AB", "BC", "CD", "AE");
public static void main(String args[]) {
Predicate<String> containsLetterA = VALUES -> VALUES.contains("AB");
for (String i : VALUES) {
System.out.println(containsLetterA.test(i));
}
}
}
http://mytechnologythought.blogspot.com/2019/10/java-8-predicate-test-method-example.html
https://github.com/VipulGulhane1/java8/blob/master/Test.java
Create a boolean initially set to false. Run a loop to check every value in the array and compare to the value you are checking against. If you ever get a match, set boolean to true and stop the looping. Then assert that the boolean is true.
As I'm dealing with low level Java using primitive types byte and byte[], the best so far I got is from bytes-java https://github.com/patrickfav/bytes-java seems a fine piece of work
You can check it by two methods
A) By converting the array into string and then check the required string by .contains method
String a = Arrays.toString(VALUES);
System.out.println(a.contains("AB"));
System.out.println(a.contains("BC"));
System.out.println(a.contains("CD"));
System.out.println(a.contains("AE"));
B) This is a more efficent method
Scanner s = new Scanner(System.in);
String u = s.next();
boolean d = true;
for (int i = 0; i < VAL.length; i++) {
if (VAL[i].equals(u) == d)
System.out.println(VAL[i] + " " + u + VAL[i].equals(u));
}

compare Object attribute value inside list

Class Order
{
String name;
Order(String n)
{ name = n; }
//setter and getters of name
}
Order a = new Order("same");
Order b = new Order("same");
Order c = new Order("diff");
List<Order> nameList// a,b,c
I want to
seperate list of Orders
List<Order> dupList// a,b
List<Order> nondupList// c
Now I want to check whether same name is available in multiple orders of "nameList".
I achieved that using index of List and compare with other than that index List Orders.
But is there any other better way to achieve this.
Probably one other way could be - Override hashCode method and equals method. Generate hasCode on calculation of string name.
public class Order {
String name;
public Order(String n) {
name = n;
}
// setter and getters of name
#Override
public int hashCode() {
int h = 0;
int len = name.length();
for (int i = 0; i < len; i++)
h = 31 * h + name.charAt(i);
return h;
}
#Override
public boolean equals(Object obj) {
if(obj == null)
return false;
else if(this.hashCode() == obj.hashCode())
return true;
return false;
}
}
...
List<Order> nameList = ...;// a,b,c
Set<Order> nonDuplicate= new HashSet<Order>(nameList);
If you want to use pure java, add the elements to the a List, and sort it with the appropriate comparator. Then iterate over the list, keeping track of the previous element, doing a control break; in other words, if the element is the same as the previous both are a duplicate. If they are not (or it is the first), they are a candidate and you need to wait for the next check to find a duplicate.
If you don't want to sort, you can add the elements to a Set as they appear; if before adding an element it is already in the set, you can add it to the duplicate set. You can do the check on both sets removing as you go, or remove from the complete set the duplicates at the end. You can use any collection, but Set is more efficient since it has a fast contains method.
If you can use libraries, you can just use Guava and add everything to a multiset (http://google-collections.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Multiset.html ) Then iterate over the multiset and you have the count per element.
You could use Map>, get list for given name, if null, create it and put it in, add current order in that list.

how to get duplicated and non duplicated element of arrayList?

I have an object as Riziv with three variables as id, cnk and product. Then I search in a databank for this object and add it to a ArrayList as ArrayList<Riziv> list.
Now I should checkout if all object in his array are the same cnk then return true otherwise I should return all objects which are not the same cnk with error message.
public class Riziv{ String id, cnk, product; }
ArrayList<Riziv> list = getArrayListFromDatabank(id);
public void getDuplicatedWhichHasTheSameCnk(){
}
}
Using standard JVM structures (MultiMap is provided by guava), you can do that:
public List<Riviz> getDuplicates(final List<Riviz> l)
{
final HashMap<String, List<Riviz>> m = new HashMap<String, List<Riviz>>();
final List<Riviz> ret = new ArrayList<Riviz>();
String cnk;
for (final Riviz r: l) {
cnk = r.getCnk();
if (!m.contains(cnk))
m.add(cnk, new ArrayList<Riviz>());
m.get(cnk).add(r);
}
List<Riviz> tmp;
for (final Map.Entry<String, List<Riviz>> entry: m.entrySet()) {
tmp = entry.getValue();
if (tmp.size() == 1) // no dups
continue;
ret.addAll(tmp);
}
return ret;
}
ret will contain the duplicates. You can change that function to return a Map<String, Riviz> instead, and filter out entries where the list size is only one. You'll then get a map with the conflicting cnks as keys and a list of dups as values.
I am not clear exactly what you want however I suspect you want something like this.
MultiMap<Key, Riziv> multiMap =
List<Riziv> list =
for(Riziv r: list)
multiMap.put(r.getCnk(), r);
for(Key cnk: multiMap.keySet()) {
Collection<Riziv> sameCnk = multiMap.get(cnk);
// check size and compare entries
}
The multi-map will have the list of Riziv objects for each Cnk.
One way to do it is write a comparator to sort the list by cnk String and then compare each consecutive cnk String to the next, if you find a duplicate, they will be right next to eachother.
1.) Sort the list using a comparator by sorting on the cnk variable.
2.) Compare each element in the list to the next for duplicates.
There's probably many other ways to solve this, this is just the first that came to mind.
I did not test this so you have been forewarned lol:
ArrayList<Riziv> rizArray = new ArrayList<Riziv>();
//Sort the array by the CNK variable.
Collections.sort(rizArray, new Comparator<Riziv>(){
#Override
public int compare(Riziv arg0, Riziv arg1) {
//Return the comparison of the Strings.
//Use .compareToIgnoreCase if you want to ignore upper/lower case.
return arg0.getCnk().compareTo(arg1.getCnk());
}
});
//List should be in alphabetical order at this point.
List<Riziv> duplicates = new ArrayList<Riziv>();
Riziv rizPrevious = null;
for(Riziv riz: rizArray){
if(rizPrevious == null){
rizPrevious = riz;
continue;
}
if(riz.getCnk().compareTo(rizPrevious.getCnk()) == 0){
duplicates.add(riz);
}
rizPrevious = riz;
}

Find element position in a Java TreeMap

I am working with a TreeMap of Strings TreeMap<String, String>, and using it to implement a Dictionay of words.
I then have a collection of files, and would like to create a representation of each file in the vector space (space of words) defined by the dictionary.
Each file should have a vector representing it with following properties:
vector should have same size as dictionary
for each word contained in the file the vector should have a 1 in the position corresponding to the word position in dictionary
for each word not contained in the file the vector should have a -1 in the position corresponding to the word position in dictionary
So my idea is to use a Vector<Boolean> to implement these vectors. (This way of representing documents in a collection is called Boolean Model - http://www.site.uottawa.ca/~diana/csi4107/L3.pdf)
The problem I am facing in the procedure to create this vector is that I need a way to find position of a word in the dictionary, something like this:
String key;
int i = get_position_of_key_in_Treemap(key); <--- purely invented method...
1) Is there any method like this I can use on a TreeMap?If not could you provide some code to help me implement it by myself?
2) Is there an iterator on TreeMap (it's alphabetically ordered on keys) of which I can get position?
3)Eventually should I use another class to implement dictionary?(If you think that with TreeMaps I can't do what I need) If yes, which?
Thanks in advance.
ADDED PART:
Solution proposed by dasblinkenlight looks fine but has the problem of complexity (linear with dimension of dictionary due to copying keys into an array), and the idea of doing it for each file is not acceptable.
Any other ideas for my questions?
Once you have constructed your tree map, copy its sorted keys into an array, and use Arrays.binarySearch to look up the index in O(logN) time. If you need the value, do a lookup on the original map too.
Edit: this is how you copy keys into an array
String[] mapKeys = new String[treeMap.size()];
int pos = 0;
for (String key : treeMap.keySet()) {
mapKeys[pos++] = key;
}
An alternative solution would be to use TreeMap's headMap method. If the word exists in the TreeMap, then the size() of its head map is equal to the index of the word in the dictionary. It may be a bit wasteful compared to my other answer, through.
Here is how you code it in Java:
import java.util.*;
class Test {
public static void main(String[] args) {
TreeMap<String,String> tm = new TreeMap<String,String>();
tm.put("quick", "one");
tm.put("brown", "two");
tm.put("fox", "three");
tm.put("jumps", "four");
tm.put("over", "five");
tm.put("the", "six");
tm.put("lazy", "seven");
tm.put("dog", "eight");
for (String s : new String[] {
"quick", "brown", "fox", "jumps", "over",
"the", "lazy", "dog", "before", "way_after"}
) {
if (tm.containsKey(s)) {
// Here is the operation you are looking for.
// It does not work for items not in the dictionary.
int pos = tm.headMap(s).size();
System.out.println("Key '"+s+"' is at the position "+pos);
} else {
System.out.println("Key '"+s+"' is not found");
}
}
}
}
Here is the output produced by the program:
Key 'quick' is at the position 6
Key 'brown' is at the position 0
Key 'fox' is at the position 2
Key 'jumps' is at the position 3
Key 'over' is at the position 5
Key 'the' is at the position 7
Key 'lazy' is at the position 4
Key 'dog' is at the position 1
Key 'before' is not found
Key 'way_after' is not found
https://github.com/geniot/indexed-tree-map
I had the same problem. So I took the source code of java.util.TreeMap and wrote IndexedTreeMap. It implements my own IndexedNavigableMap:
public interface IndexedNavigableMap<K, V> extends NavigableMap<K, V> {
K exactKey(int index);
Entry<K, V> exactEntry(int index);
int keyIndex(K k);
}
The implementation is based on updating node weights in the red-black tree when it is changed. Weight is the number of child nodes beneath a given node, plus one - self. For example when a tree is rotated to the left:
private void rotateLeft(Entry<K, V> p) {
if (p != null) {
Entry<K, V> r = p.right;
int delta = getWeight(r.left) - getWeight(p.right);
p.right = r.left;
p.updateWeight(delta);
if (r.left != null) {
r.left.parent = p;
}
r.parent = p.parent;
if (p.parent == null) {
root = r;
} else if (p.parent.left == p) {
delta = getWeight(r) - getWeight(p.parent.left);
p.parent.left = r;
p.parent.updateWeight(delta);
} else {
delta = getWeight(r) - getWeight(p.parent.right);
p.parent.right = r;
p.parent.updateWeight(delta);
}
delta = getWeight(p) - getWeight(r.left);
r.left = p;
r.updateWeight(delta);
p.parent = r;
}
}
updateWeight simply updates weights up to the root:
void updateWeight(int delta) {
weight += delta;
Entry<K, V> p = parent;
while (p != null) {
p.weight += delta;
p = p.parent;
}
}
And when we need to find the element by index here is the implementation that uses weights:
public K exactKey(int index) {
if (index < 0 || index > size() - 1) {
throw new ArrayIndexOutOfBoundsException();
}
return getExactKey(root, index);
}
private K getExactKey(Entry<K, V> e, int index) {
if (e.left == null && index == 0) {
return e.key;
}
if (e.left == null && e.right == null) {
return e.key;
}
if (e.left != null && e.left.weight > index) {
return getExactKey(e.left, index);
}
if (e.left != null && e.left.weight == index) {
return e.key;
}
return getExactKey(e.right, index - (e.left == null ? 0 : e.left.weight) - 1);
}
Also comes in very handy finding the index of a key:
public int keyIndex(K key) {
if (key == null) {
throw new NullPointerException();
}
Entry<K, V> e = getEntry(key);
if (e == null) {
throw new NullPointerException();
}
if (e == root) {
return getWeight(e) - getWeight(e.right) - 1;//index to return
}
int index = 0;
int cmp;
if (e.left != null) {
index += getWeight(e.left);
}
Entry<K, V> p = e.parent;
// split comparator and comparable paths
Comparator<? super K> cpr = comparator;
if (cpr != null) {
while (p != null) {
cmp = cpr.compare(key, p.key);
if (cmp > 0) {
index += getWeight(p.left) + 1;
}
p = p.parent;
}
} else {
Comparable<? super K> k = (Comparable<? super K>) key;
while (p != null) {
if (k.compareTo(p.key) > 0) {
index += getWeight(p.left) + 1;
}
p = p.parent;
}
}
return index;
}
You can find the result of this work at https://github.com/geniot/indexed-tree-map
There's no such implementation in the JDK itself. Although TreeMap iterates in natural key ordering, its internal data structures are all based on trees and not arrays (remember that Maps do not order keys, by definition, in spite of that the very common use case).
That said, you have to make a choice as it is not possible to have O(1) computation time for your comparison criteria both for insertion into the Map and the indexOf(key) calculation. This is due to the fact that lexicographical order is not stable in a mutable data structure (as opposed to insertion order, for instance). An example: once you insert the first key-value pair (entry) into the map, its position will always be one. However, depending on the second key inserted, that position might change as the new key may be "greater" or "lower" than the one in the Map. You can surely implement this by maintaining and updating an indexed list of keys during the insertion operation, but then you'll have O(n log(n)) for your insert operations (as will need to re-order an array). That might be desirable or not, depending on your data access patterns.
ListOrderedMap and LinkedMap in Apache Commons both come close to what you need but rely on insertion order. You can check out their implementation and develop your own solution to the problem with little to moderate effort, I believe (that should be just a matter of replacing the ListOrderedMaps internal backing array with a sorted list - TreeList in Apache Commons, for instance).
You can also calculate the index yourself, by subtracting the number of elements that are lower than then given key (which should be faster than iterating through the list searching for your element, in the most frequent case - as you're not comparing anything).
I agree with Isolvieira. Perhaps the best approach would be to use a different structure than TreeMap.
However, if you still want to go with computing the index of the keys, a solution would be to count how many keys are lower than the key you are looking for.
Here is a code snippet:
java.util.SortedMap<String, String> treeMap = new java.util.TreeMap<String, String>();
treeMap.put("d", "content 4");
treeMap.put("b", "content 2");
treeMap.put("c", "content 3");
treeMap.put("a", "content 1");
String key = "d"; // key to get the index for
System.out.println( treeMap.keySet() );
final String firstKey = treeMap.firstKey(); // assuming treeMap structure doesn't change in the mean time
System.out.format( "Index of %s is %d %n", key, treeMap.subMap(firstKey, key).size() );
I'd like to thank all of you for the effort you put in answering my question, they all were very useful and taking the best from each of them made me come up to the solution I actually implemented in my project.
What I beleive to be best answers to my single questions are:
2) There is not an Iterator defined on TreeMaps as #Isoliveira sais:
There's no such implementation in the JDK itself.
Although TreeMap iterates in natural key ordering,
its internal data structures are all based on trees and not arrays
(remember that Maps do not order keys, by definition,
in spite of that the very common use case).
and as I found in this SO answer How to iterate over a TreeMap?, the only way to iterate on elements in a Map is to use map.entrySet() and use Iterators defined on Set (or some other class with Iterators).
3) It's possible to use a TreeMap to implement Dictionary, but this will garantuee a complexity of O(logN) in finding index of a contained word (cost of a lookup in a Tree Data Structure).
Using a HashMap with same procedure will instead have complexity O(1).
1) There exists no such method. Only solution is to implement it entirely.
As #Paul stated
Assumes that once getPosition() has been called, the dictionary is not changed.
assumption of solution is that once that Dictionary is created it will not be changed afterwards: in this way position of a word will always be the same.
Giving this assumption I found a solution that allows to build Dictionary with complexity O(N) and after garantuees the possibility to get index of a word contained with constat time O(1) in lookup.
I defined Dictionary as a HashMap like this:
public HashMap<String, WordStruct> dictionary = new HashMap<String, WordStruct>();
key --> the String representing the word contained in Dictionary
value --> an Object of a created class WordStruct
where WordStruct class is defined like this:
public class WordStruct {
private int DictionaryPosition; // defines the position of word in dictionary once it is alphabetically ordered
public WordStruct(){
}
public SetWordPosition(int pos){
this.DictionaryPosition = pos;
}
}
and allows me to keep memory of any kind of attribute I like to couple with the word entry of the Dictionary.
Now I fill dictionary iterating over all words contained in all files of my collection:
THE FOLLOWING IS PSEUDOCODE
for(int i = 0; i < number_of_files ; i++){
get_file(i);
while (file_contais_words){
dictionary.put( word(j) , new LemmaStruct());
}
}
Once HashMap is filled in whatever order I use procedure indicated by #dasblinkenlight to order it once and for all with complexity O(N)
Object[] dictionaryArray = dictionary.keySet().toArray();
Arrays.sort(dictionaryArray);
for(int i = 0; i < dictionaryArray.length; i++){
String word = (String) dictionaryArray[i];
dictionary.get(word).SetWordPosition(i);
}
And from now on to have index position in alphatebetic order of word in dictionary only thing needed is to acces it's variable DictionaryPosition:
since word is know you just need to access it and this has constant cost in a HashMap.
Thanks again and Iwish you all a Merry Christmas!!
Have you thought to make the values in your TreeMap contain the position in your dictionary? I am using a BitSet here for my file details.
This doesn't work nearly as well as my other idea below.
Map<String,Integer> dictionary = new TreeMap<String,Integer> ();
private void test () {
// Construct my dictionary.
buildDictionary();
// Make my file data.
String [] file1 = new String[] {
"1", "3", "5"
};
BitSet fileDetails = getFileDetails(file1, dictionary);
printFileDetails("File1", fileDetails);
}
private void printFileDetails(String fileName, BitSet details) {
System.out.println("File: "+fileName);
for ( int i = 0; i < details.length(); i++ ) {
System.out.print ( details.get(i) ? 1: -1 );
if ( i < details.length() - 1 ) {
System.out.print ( "," );
}
}
}
private BitSet getFileDetails(String [] file, Map<String, Integer> dictionary ) {
BitSet details = new BitSet();
for ( String word : file ) {
// The value in the dictionary is the index of the word in the dictionary.
details.set(dictionary.get(word));
}
return details;
}
String [] dictionaryWords = new String[] {
"1", "2", "3", "4", "5"
};
private void buildDictionary () {
for ( String word : dictionaryWords ) {
// Initially make the value 0. We will change that later.
dictionary.put(word, 0);
}
// Make the indexes.
int wordNum = 0;
for ( String word : dictionary.keySet() ) {
dictionary.put(word, wordNum++);
}
}
Here the building of the file details consists of a single lookup in the TreeMap for each word in the file.
If you were planning to use the value in the dictionary TreeMap for something else you could always compose it with an Integer.
Added
Thinking about it further, if the value field of the Map is earmarked for something you could always use special keys that calculate their own position in the Map and act just like Strings for comparison.
private void test () {
// Dictionary
Map<PosKey, String> dictionary = new TreeMap<PosKey, String> ();
// Fill it with words.
String[] dictWords = new String[] {
"0", "1", "2", "3", "4", "5"};
for ( String word : dictWords ) {
dictionary.put( new PosKey( dictionary, word ), word );
}
// File
String[] fileWords = new String[] {
"0", "2", "3", "5"};
int[] file = new int[dictionary.size()];
// Initially all -1.
for ( int i = 0; i < file.length; i++ ) {
file[i] = -1;
}
// Temp file words set.
Set fileSet = new HashSet( Arrays.asList( fileWords ) );
for ( PosKey key : dictionary.keySet() ) {
if ( fileSet.contains( key.getKey() ) ) {
file[key.getPosiion()] = 1;
}
}
// Print out.
System.out.println( Arrays.toString( file ) );
// Prints: [1, -1, 1, 1, -1, 1]
}
class PosKey
implements Comparable {
final String key;
// Initially -1
int position = -1;
// The map I am keying on.
Map<PosKey, ?> map;
public PosKey ( Map<PosKey, ?> map, String word ) {
this.key = word;
this.map = map;
}
public int getPosiion () {
if ( position == -1 ) {
// First access to the key.
int pos = 0;
// Calculate all positions in one loop.
for ( PosKey k : map.keySet() ) {
k.position = pos++;
}
}
return position;
}
public String getKey () {
return key;
}
public int compareTo ( Object it ) {
return key.compareTo( ( ( PosKey )it ).key );
}
public int hashCode () {
return key.hashCode();
}
}
NB: Assumes that once getPosition() has been called, the dictionary is not changed.
I would suggest that you write a SkipList to store your dictionary, since this will still offer O(log N) lookups, insertion and removal while also being able to provide an index (tree implementations can generally not return an index since the nodes don't know it, and there would be a cost to keeping them updated). Unfortunately the java implementation of ConcurrentSkipListMap does not provide an index, so you would need to implement your own version.
Getting the index of an item would be O(log N), if you wanted both the index and value without doing 2 lookups then you would need to return a wrapper object holding both.

Categories

Resources