Best big set data structure in Java - java

I need to find gaps in a big Integer Set populated with a read loop through files and I want to know if exists something already done for this purpose to avoid a simple Set object with heap overflow risk.
To better explain my question I have to tell you how my ticketing java software works.
Every ticket has a global progressive number stored in a daily log file with other informations. I have to write a check procedure to verify if there are number gaps inside daily log files.
The first idea was to create a read loop with all log files, read each line, get the ticket number and store it in a Integer TreeSet Object and then find gaps in this Set.
The problem is that ticket number can be very high and could saturate the memory heap space and I want a good solution also if I have to switch to Long objects.
The Set solution waste a lot of memory because if I find that there are no gap in the first 100 number has no sense to store them in the Set.
How can I solve? Can I use some datastructure already done for this purpose?

I'm assuming that (A) the gaps you are looking for are the exception and not the rule and (B) the log files you are processing are mostly sorted by ticket number (though some out-of-sequence entries are OK).
If so, then I'd think about rolling your own data structure for this. Here's a quick example of what I mean (with a lot left to the reader).
Basically what it does is implement Set but actually store it as a Map, with each entry representing a range of contiguous values in the set.
The add method is overridden to maintain the backing Map appropriately. E.g., if you add 5 to the set and already have a range containing 4, then it just extends that range instead of adding a new entry.
Note that the reason for the "mostly sorted" assumption is that, for totally unsorted data, this approach will still use a lot of memory: the backing map will grow large (as unsorted entries get added all over the place) before growing smaller (as additional entries fill in the gaps, allowing contiguous entries to be combined).
Here's the code:
package com.matt.tester;
import java.util.Collection;
import java.util.Comparator;
import java.util.Iterator;
import java.util.Map;
import java.util.SortedSet;
import java.util.TreeMap;
public class SE {
public class RangeSet<T extends Long> implements SortedSet<T> {
private final TreeMap<T, T> backingMap = new TreeMap<T,T>();
#Override
public int size() {
// TODO Auto-generated method stub
return 0;
}
#Override
public boolean isEmpty() {
// TODO Auto-generated method stub
return false;
}
#Override
public boolean contains(Object o) {
if ( ! ( o instanceof Number ) ) {
throw new IllegalArgumentException();
}
T n = (T) o;
// Find the greatest backingSet entry less than n
Map.Entry<T,T> floorEntry = backingMap.floorEntry(n);
if ( floorEntry == null ) {
return false;
}
final Long endOfRange = floorEntry.getValue();
if ( endOfRange >= n) {
return true;
}
return false;
}
#Override
public Iterator<T> iterator() {
throw new IllegalAccessError("Method not implemented. Left for the reader. (You'd need a custom Iterator class, I think)");
}
#Override
public Object[] toArray() {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public <T> T[] toArray(T[] a) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public boolean add(T e) {
if ( (Long) e < 1L ) {
throw new IllegalArgumentException("This example only supports counting numbers, mainly because it simplifies printGaps() later on");
}
if ( this.contains(e) ) {
// Do nothing. Already in set.
}
final Long previousEntryKey;
final T eMinusOne = (T) (Long) (e-1L);
final T nextEntryKey = (T) (Long) (e+1L);
if ( this.contains(eMinusOne ) ) {
// Find the greatest backingSet entry less than e
Map.Entry<T,T> floorEntry = backingMap.floorEntry(e);
final T startOfPrecedingRange;
startOfPrecedingRange = floorEntry.getKey();
if ( this.contains(nextEntryKey) ) {
// This addition will join two previously separated ranges
T endOfRange = backingMap.get(nextEntryKey);
backingMap.remove(nextEntryKey);
// Extend the prior entry to include the whole range
backingMap.put(startOfPrecedingRange, endOfRange);
return true;
} else {
// This addition will extend the range immediately preceding
backingMap.put(startOfPrecedingRange, e);
return true;
}
} else if ( this.backingMap.containsKey(nextEntryKey) ) {
// This addition will extend the range immediately following
T endOfRange = backingMap.get(nextEntryKey);
backingMap.remove(nextEntryKey);
// Extend the prior entry to include the whole range
backingMap.put(e, endOfRange);
return true;
} else {
// This addition is a new range, it doesn't touch any others
backingMap.put(e,e);
return true;
}
}
#Override
public boolean remove(Object o) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public boolean containsAll(Collection<?> c) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public boolean addAll(Collection<? extends T> c) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public boolean retainAll(Collection<?> c) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public boolean removeAll(Collection<?> c) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public void clear() {
this.backingMap.clear();
}
#Override
public Comparator<? super T> comparator() {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public SortedSet<T> subSet(T fromElement, T toElement) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public SortedSet<T> headSet(T toElement) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public SortedSet<T> tailSet(T fromElement) {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public T first() {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
#Override
public T last() {
throw new IllegalAccessError("Method not implemented. Left for the reader.");
}
public void printGaps() {
Long lastContiguousNumber = 0L;
for ( Map.Entry<T, T> entry : backingMap.entrySet() ) {
Long startOfNextRange = (Long) entry.getKey();
Long endOfNextRange = (Long) entry.getValue();
if ( startOfNextRange > lastContiguousNumber + 1 ) {
System.out.println( String.valueOf(lastContiguousNumber+1) + ".." + String.valueOf(startOfNextRange - 1) );
}
lastContiguousNumber = endOfNextRange;
}
System.out.println( String.valueOf(lastContiguousNumber+1) + "..infinity");
System.out.println("Backing map size is " + this.backingMap.size());
System.out.println(backingMap.toString());
}
}
public static void main(String[] args) {
SE se = new SE();
RangeSet<Long> testRangeSet = se.new RangeSet<Long>();
// Start by putting 1,000,000 entries into the map with a few, pre-determined, hardcoded gaps
for ( long i = 1; i <= 1000000; i++ ) {
// Our pre-defined gaps...
if ( i == 58349 || ( i >= 87333 && i <= 87777 ) || i == 303998 ) {
// Do not put these numbers in the set
} else {
testRangeSet.add(i);
}
}
testRangeSet.printGaps();
}
}
And the output is:
58349..58349
87333..87777
303998..303998
1000001..infinity
Backing map size is 4
{1=58348, 58350=87332, 87778=303997, 303999=1000000}

I believe it's a perfect moment to get familiar with bloom-filter. It's a wonderful probabilistic data-structure which can be used for immediate proof that an element isn't in the set.
How does it work? The idea is pretty simple, the boost more complicated and the implementation can be found in Guava.
The idea
Initialize a filter which will be an array of bits of length which would allow you to store maximum value of used hash function. When adding element to the set, calculate it's hash. Determinate what bit's are 1s and assure, that all of them are switched to 1 in the filter (array). When you want to check if an element is in the set, simply calculate it's hash and then check if all bits that are 1s in the hash, are 1s in the filter. If any of those bits is a 0 in the filter, the element definitely isn't in the set. If all of them are set to 1, the element might be in the filter so you have to loop through all of the elements.
The Boost
Simple probabilistic model provides the answer on how big should the filter (and the range of hash function) be to provide optimal chance for false positive which is the situation, that all bits are 1s but the element isn't in the set.
Implementation
The Guava implementation provides the following constructor to the bloom-filter: create(Funnel funnel, int expectedInsertions, double falsePositiveProbability). You can configure the filter on your own depending on the expectedInsertions and falsePositiveProbability.
False positive
Some people are aware of bloom-filters because of false-positive possibility. Bloom filter can be used in a way that don't rely on mightBeInFilter flag. If it might be, you should loop through all the elements and check one by one if the element is in the set or not.
Possible usage
In your case, I'd create the filter for the set, then after all tickets are added simply loop through all the numbers (as you have to loop anyway) and check if they filter#mightBe int the set. If you set falsePositiveProbability to 3%, you'll achieve complexity around O(n^2-0.03m*n) where m stands for the number of gaps. Correct me if I'm wrong with the complexity estimation.

Well either you store everything in memory, and you risk overflowing the heap, or you don't store it in memory and you need to do a lot of computing.
I would suggest something in between - store the minimum needed information needed during processing. You could store the endpoints of the known non-gap sequence in a class with two Long fields. And all these sequence datatypes could be stored in a sorted list. When you find a new number, iterate through the list to see if it is adjacent to one of the endpoints. If so, change the endpoint to the new integer, and check if you can merge the adjacent sequence-objects (and hence remove one of the objects). If not, create a new sequence object in the properly sorted place.
This will end up being O(n) in memory usage and O(n) in cpu usage. But using any data structure which stores information about all numbers will simply be n in memory usage, and O(n*lookuptime) in cpu if lookuptime is not done in constant time.

Read as many ticket numbers as you can fit into available memory.
Sort them, and write the sorted list to a temporary file. Depending on the expected number of gaps, it might save time and space to use a run-length–encoding scheme when writing the sorted numbers.
After all the ticket numbers have been sorted into temporary files, you can merge them into a single, sorted stream of ticket numbers, looking for gaps.
If this would result in too many temporary files to open at once for merging, groups of files can be merged into intermediate files, and so on, maintaining the total number below a workable limit. However, this extra copying can slow the process significantly.
The old tape-drive algorithms are still relevant.

Here is an idea: if you know in advance the range of your numbers, then
pre-calculate the sum of all the numbers that you expect to be there.
2. Then keep reading your numbers and produce the sum of all read numbers as well as the number of your numbers.
3. If the sum you come up with is the same as pre-calculated one, then there are no gaps.
4. If the sum is different and the number of your numbers is short just by one of the expected number then pre-calculated sum - actual sum will give you your missing number.
5. If the number of your numbers is short by more then one, then you will know how many numbers are missing and what their sum is.
The best part is that you will not need to store the collection of your numbers in memory.

Related

Binary search not detecting duplicates?

I have an array of items, cards, all with String names, so
Card c1= new card("TheCard")
Card c2= new card("TheOtherCard")
And then I am using a quicksort to sort the list and then trying a binary search to see if cards already exist before adding more
So,
if(cards.contains(c3)==true)
//do nothing
else
cards.add(c3)
And my cards.contains method is
Comparator<Card> c = new Comparator<Card>() {
#Override
public int compare(Card u1, Card u2) {
return u1.getName().compareTo(u2.getName());
}
};
int index;
index = Collections.binarySearch(cards, it, c);
if (index == -1) {
return false;
} else {
return true;
}
But the problem is that it's searching the cards array, finding cards that aren't in the list and saying they are and saying cards that are in the list aren't
I am trying to add 10,000 cards, 8,000 of them being unique, but the contains method is returning 2,000 unique cards and when I check the list, they're not even unique https://i.imgur.com/N9kQtms.png
I've tried running the code un-sorted and that just returns about 4,000 results with the same problem of repeating cards, when I brute force and just use the base .contains, that works but it is super slow
(Also sorry if I messed up something in my post, it is my first time posting here)
The javadoc states the following:
Searches the specified list for the specified object using the binary search algorithm. The list must be sorted into ascending order according to the specified comparator (as by the sort(List, Comparator) method), prior to making this call. If it is not sorted, the results are undefined. If the list contains multiple elements equal to the specified object, there is no guarantee which one will be found.
It also states that it returns:
the index of the search key, if it is contained in the list; otherwise, (-(insertion point) - 1). The insertion point is defined as the point at which the key would be inserted into the list: the index of the first element greater than the key, or list.size() if all elements in the list are less than the specified key. Note that this guarantees that the return value will be >= 0 if and only if the key is found.
Your list should be therefore sorted beforehand or it won't return anything that make sense. Then you, it does return either the index or the insertion point of the element. Beware of this technicality. You should check after execution that the element at the index is in fact the correct one and not only the index at which you would insert your element it.
There you could have this test to see if it is your card:
// Test if the card at the index found has got the same name than the card you are actually looking for.
return !index == cards.length && cards[index].getName().equals(it.getName()));
You could also override equals to have something that is closer to:
return !index == cards.length && cards[index].equals(it);
In both case, we ensure that we won't have an ArrayOutOfBoundException if the insertion point is at the end of the list.
The binarySearch gives a non-negative index when it finds an item.
It gives the complement of the insert position: ~index == -index-1 when it is not found.
Search d in a b d e gives 2.
Search d in a b e g gives ~2 == -3, the insert position being 2.
So the check is:
int index = Collections.binarySearch(cards, it, c);
return index >= 0;
Furthermore Card should have a correct equality:
public class Card implements Comparable<Card> {
...
#Override
public int compareTo(Card other) {
return name.compareTo(other.name);
}
#Override
public boolean equals(Object obj) {
if (!(obj instanceOf Card)) {
return false;
}
Card other = (Card) obj;
return name.equals(other.name);
}
#Override
public int hashCode() {
return name.hashCode();
}
}
In this case instead of a Comparator you can implement Comparable<Card> as the name is the read identification of a card. Comparator is more for sorting persons on last name + first name, or first name + last name, or on city.
The hashCode allows using HashMap<Card, ...>.

How to write an implementation class for a Bag (or Multiset) ADT?

I have an assignment in which I need to write an implementation class for a Bag (or Multiset) ADT. Problem is, the assignment is worded in a way that's hard to follow and I'm not sure what exactly I need to do.
Here is the assignment description and here is the interface I was provided. This is my implementation class so far. I haven't written any of my methods yet because I'm not sure where to go from here, especially in regards to the 3 different constructors.
package Bags;
import java.io.*;
public class ConBag implements Bag, Serializable {
private String[] items; // The items in the bag
private int itemCount; // The number of items
private int size; // The size of the bag
// This constructor creates a new empty bag able to hold 100 items.
public ConBag ( ) {
this(100);
}; // Constructor
// This constructor creates a new bag with a specified capacity.
public ConBag ( int size ) {
items = new String[size];
}; // Constructor
// This constructor takes an array of Strings and copies them into a bag of 100 or fewer items.
public ConBag ( String[] items ) {
}; // Constructor
public void add ( String item ) {
try{
if(!contains(item) && (!(size == items.length))){
items[itemCount] = item;
itemCount++;
}
}catch (NoSpaceException exception) {
System.out.println("Bag is full.");
}
}; // Add
public void remove ( String item ) {
for (int i=0; i<size; i++) {
if (contains(item)) {
items[i] = items[itemCount-1];
}else {
NoItemException exception;
System.out.println("Item not in bag.");
}
}
};
public int cardinality ( ) {
return itemCount;
};
public boolean contains ( String item ) {
for (int i=0; i<itemCount; i++) {
if(items[i].equals(item))
return true;
}
return false;
};
public int count ( String item ) {
int count;
return count;
};
public String draw ( ) {
};
}
I feel like I'm missing something important, but I don't know what. I already have NoItemException and NoSpaceException, but I don't think I need to include them in this post as they're pretty basic. Any help or a nudge in the right direction would be great. Thanks!
You need to allow duplication, therefore using String array as a data structure makes things difficult. It's better to use a map where the key is a String and the value is an Integer.
It's unclear what is the limit of the room, so, for now you can define a private member called room, which will be int and whenever you intend to add a String, check cardinality against room. If it's smaller, then increment the value of the map entry if exists. If it did not, then just create it with a value of 1.
remove should check for contains. If the Map you have does not contain the item, throw an exception. Otherwise decrement the value of the map entry if it's higher than 1. If it is 1, then just remove it from the map.
To calculate cardinality traverse the map and calculate the sum of the values.
contains should be simple, you will just have to call a method of your map. count should be simple as well.
draw is interesting. First, calculate cardinality, use it as the unreachable upper bound of your randomization and initialize a sum and start traversing the map. On each iteration increase sum (which is 0 before the loop) with the value of the map entry. If the randomized number is smaller than sum, then call remove passing the key of the item and exit the loop.
EDIT
If you need to do this with an array of String items, then you can do so, but you will also need to store an integer for each String, that would be another array and the easiest representation would be to ensure that every item in the String array would be associated to the int value in the int array at the same index. Not too elegant, but can be used. Now, in this case you could not use Map methods, but will need to implement stuff yourself.

Can you have collections without storing the values in Java?

I have a question about java collections such as Set or List. More generally objects that you can use in a for-each loop. Is there any requirement that the elements of them actually has to be stored somewhere in a data structure or can they be described only from some sort of requirement and calculated on the fly when you need them? It feels like this should be possible to be done, but I don't see any of the java standard collection classes doing anything like this. Am I breaking any sort of contract here?
The thing I'm thinking about using these for is mainly mathematics. Say for example I want to have a set representing all prime numbers under 1 000 000. It might not be a good idea to save these in memory but to instead have a method check if a particular number is in the collection or not.
I'm also not at all an expert at java streams, but I feel like these should be usable in java 8 streams since the objects have very minimal state (the objects in the collection doesn't even exist until you try to iterate over them or check if a particular object exists in the collection).
Is it possible to have Collections or Iterators with virtually infinitely many elements, for example "all numbers on form 6*k+1", "All primes above 10" or "All Vectors spanned by this basis"? One other thing I'm thinking about is combining two sets like the union of all primes below 1 000 000 and all integers on form 2^n-1 and list the mersenne primes below 1 000 000. I feel like it would be easier to reason about certain mathematical objects if it was done this way and the elements weren't created explicitly until they are actually needed. Maybe I'm wrong.
Here's two mockup classes I wrote to try to illustrate what I want to do. They don't act exactly as I would expect (see output) which make me think I am breaking some kind of contract here with the iterable interface or implementing it wrong. Feel free to point out what I'm doing wrong here if you see it or if this kind of code is even allowed under the collections framework.
import java.util.AbstractSet;
import java.util.Iterator;
public class PrimesBelow extends AbstractSet<Integer>{
int max;
int size;
public PrimesBelow(int max) {
this.max = max;
}
#Override
public Iterator<Integer> iterator() {
return new SetIterator<Integer>(this);
}
#Override
public int size() {
if(this.size == -1){
System.out.println("Calculating size");
size = calculateSize();
}else{
System.out.println("Accessing calculated size");
}
return size;
}
private int calculateSize() {
int c = 0;
for(Integer p: this)
c++;
return c;
}
public static void main(String[] args){
PrimesBelow primesBelow10 = new PrimesBelow(10);
for(int i: primesBelow10)
System.out.println(i);
System.out.println(primesBelow10);
}
}
.
import java.util.Iterator;
import java.util.NoSuchElementException;
public class SetIterator<T> implements Iterator<Integer> {
int max;
int current;
public SetIterator(PrimesBelow pb) {
this.max= pb.max;
current = 1;
}
#Override
public boolean hasNext() {
if(current < max) return true;
else return false;
}
#Override
public Integer next() {
while(hasNext()){
current++;
if(isPrime(current)){
System.out.println("returning "+current);
return current;
}
}
throw new NoSuchElementException();
}
private boolean isPrime(int a) {
if(a<2) return false;
for(int i = 2; i < a; i++) if((a%i)==0) return false;
return true;
}
}
Main function gives the output
returning 2
2
returning 3
3
returning 5
5
returning 7
7
Exception in thread "main" java.util.NoSuchElementException
at SetIterator.next(SetIterator.java:27)
at SetIterator.next(SetIterator.java:1)
at PrimesBelow.main(PrimesBelow.java:38)
edit: spotted an error in the next() method. Corrected it and changed the output to the new one.
Well, as you see with your (now fixed) example, you can easily do it with Iterables/Iterators. Instead of having a backing collection, the example would've been nicer with just an Iterable that takes the max number you wish to calculate primes to. You just need to make sure that you handle the hasNext() method properly so you don't have to throw an exception unnecessarily from next().
Java 8 streams can be used easier to perform these kinds of things nowadays, but there's no reason you can't have a "virtual collection" that's just an Iterable. If you start implementing Collection it becomes harder, but even then it wouldn't be completely impossible, depending on the use cases: e.g. you could implement contains() that checks for primes, but you'd have to calculate it and it would be slow for large numbers.
A (somewhat convoluted) example of a semi-infinite set of odd numbers that is immutable and stores no values.
public class OddSet implements Set<Integer> {
public boolean contains(Integer o) {
return o % 2 == 1;
}
public int size() {
return Integer.MAX_VALUE;
}
public boolean add(Integer i) {
throw new OperationNotSupportedException();
}
public boolean equals(Object o) {
return o instanceof OddSet;
}
// etc. etc.
}
As DwB stated, this is not possible to do with Java's Collections API, as every element must be stored in memory. However, there is an alternative: this is precisely why Java's Stream API was implemented!
Streams allow you to iterate across an infinite amount of objects that are not stored in memory unless you explicitly collect them into a Collection.
From the documentation of IntStream#iterate:
Returns an infinite sequential ordered IntStream produced by iterative application of a function f to an initial element seed, producing a Stream consisting of seed, f(seed), f(f(seed)), etc.
The first element (position 0) in the IntStream will be the provided seed. For n > 0, the element at position n, will be the result of applying the function f to the element at position n - 1.
Here are some examples that you proposed in your question:
public class Test {
public static void main(String[] args) {
IntStream.iterate(1, k -> 6 * k + 1);
IntStream.iterate(10, i -> i + 1).filter(Test::isPrime);
IntStream.iterate(1, n -> 2 * n - 1).filter(i -> i < 1_000_000);
}
private boolean isPrime(int a) {
if (a < 2) {
return false;
}
for(int i = 2; i < a; i++) {
if ((a % i) == 0) {
return false;
}
return true;
}
}
}

Why java concurrentmap gives slight difference in calculating doubles using multithread

I am doing matrix multiplication by trying using multi-threads approach, but the calculation between doubles are not always the same for the same matrix.
there are the codes:
for the matrix:
private ConcurrentMap<Position, Double> matrix = new ConcurrentHashMap<>();
public Matrix_2() {}
public double get(int row, int column) {
Position p = new Position(row, column);
return matrix.getOrDefault(p, 0.0);
}
public void set(int row, int column, double num) {
Position p = new Position(row, column);
if(matrix.containsKey(p)){
double a = matrix.get(p);
a += num;
matrix.put(p, a);
}else {
matrix.put(p, num);
}
}
for multiplication
public static Matrix multiply(Matrix a, Matrix b) {
List<Thread> threads = new ArrayList<>();
Matrix c = new Matrix_2();
IntStream.range(0, a.getNumRows()).forEach(r ->
IntStream.range(0, a.getNumColumns()).forEach(t ->
IntStream.range(0, b.getNumColumns())
.forEach(
v ->
threads.add(new Thread(() -> c.set(r, v, b.get(t, v) * a.get(r, t)))))
));
threads.forEach(Thread::start);
threads.forEach(r -> {
try {
r.join();
} catch (InterruptedException e) {
System.out.println("bad");
}
}
);
return c;
}
where get method get the double at specific row and column, get(row, column), and the set method add the given number to the double at that row and column.
This code works fine at the integer level but when it comes to double with a lot precision, it will have different answers for the multiplication of same two matrices, sometimes can be as large as 0.5 to 1.5 for a number. Why is that.
While I haven't fully analyzed your code for multiply, and John Bollinger makes a good point (in the comments) regarding the rounding-error inherent to floating-point primitives, your set method would seem to have a possible race condition.
Namely, while your use of java.util.ConcurrentHashMap guarantees thread safety within Map API calls, it does nothing to ensure that the mappings could not have changed in between invocations, such as between the time that you invoke containsKey and the time that you invoke put, or between the time that you invoke get and the time that you invoke put.
As of Java 8 (which your use of lambdas and streams indicates you are using), one option to rectify this problem is to make the check-existing + get + set sequence atomic via the compute API call. compute allows you to provide a key and a lambda (or method reference) specifying how to mutate the value mapped to that key, and ConcurrentHashMap guarantees that the lambda, which encompasses your full check-and-set logic, will be executed atomically. Using that approach, your code would look something like:
public void set(int row, int column, double num) {
Position p = new Position(row, column);
matrix.compute(p, (Position positionKey, Double doubleValue)->{
if (doubleValue == null) {
return num;
} else {
return num + doubleValue;
}
});
}
A concurrent collection can help you write thread-safe code, but it does not make your code automatically thread-safe. And your code -- principally your set() method -- is not thread safe.
In fact, your set() method does not even make sense, at least not under that name. If the implementation is indeed what you intend, then it seems to be more of an increment() method.
My first suggestion would be to simplify your approach by eliminating the middle IntStream, or at least moving it into the threads. The objective here would be to avoid any two threads ever manipulating the same element of the map. You could then also use a bona fide set() method in conjunction, as there would be no contention for individual matrix elements. The ConcurrentMap would still be helpful in that case. Most likely that would run faster, too.
If you must keep the structure of the computation the same, however, then you need a better set() method, one that accounts for the possibility that another thread updates an element between your get() and put(). If you don't account for that then updates can be lost. Something along these lines would be an improvement:
public void increment(int row, int column, double num) {
Position p = new Position(row, column);
Double current = matrix.putIfAbsent(p, num);
if (current != null) {
// there was already a value at the designated position
double newval = current + num;
while (!matrix.replace(p, current, newval)) {
// Failed to replace -- indicates that the value at the designated
// position is no longer the one we most recently read
current = matrix.get(p);
newval = current + num;
}
} // else successfully set the first value for this position
}
That approach will work with any version of Java that provides ConcurrentMap, but of course, other parts of your code already rely on Java 8. The solution offered by #Sumitsu leverages API features that were new in Java 8; it is more elegant than the above, but less illustrative.
Note also that it would not be surprising to see see small differences in your result in any case, because reordering floating-point operations can cause different rounding.

Efficiently determining whether or not two collections have any items in common in Java

I know that, in Java, I can manually determine if two collections have any overlap by turning one of them into a set then iterating over the other doing contains checks:
<T> boolean anyInCommon(Iterable<T> collection1, Set<T> collection2) {
for (T item : collection1)
if (collection2.contains(item))
return true;
return false;
}
or alternatively:
<T> boolean anyInCommon(Iterable<T> collection1, Set<T> collection2) {
return collection1.stream().anyMatch(collection2::contains);
}
But are there existing utility methods that do this and intelligently choose which collection to iterate, which to turn into a set, take advantage of one already being a set, etc? I know that Guava has Sets.intersection, but it computes the entire intersection instead of just whether or not it's empty.
Note that I'd prefer the comparison to short-circuit as soon as any common item is found. Checking if two huge collections have overlap should take time proportional to the number of non-overlapping items (or better), instead of the total number of items.
Partial answer for when the collections are already Sets.
Sets.intersection is actually closer to what I wanted than I thought because its result is not precomputed. Instead, it's a view of the intersection that's computed on the fly.
Take a look at the anonymous class returned by intersection:
final Predicate<Object> inSet2 = Predicates.in(set2);
return new SetView<E>() {
#Override public Iterator<E> iterator() {
return Iterators.filter(set1.iterator(), inSet2);
}
#Override public int size() {
return Iterators.size(iterator());
}
#Override public boolean isEmpty() {
return !iterator().hasNext();
}
#Override public boolean contains(Object object) {
return set1.contains(object) && set2.contains(object);
}
#Override public boolean containsAll(Collection<?> collection) {
return set1.containsAll(collection)
&& set2.containsAll(collection);
}
};
The isEmpty method doesn't go over every item. Instead, it iterates over the first set while checking if items are in the second set. As soon as it finds one, it returns true. If you're unlucky you'll iterate all the items in set1 that aren't in set2 first, but that's probably unavoidable and better than always iterating all items.
In other words, if you already have sets, an efficient solution that short-circuits appropriately is just:
boolean overlaps = !Sets.intersections(set1, set2).isEmpty();
This won't iterate over the smaller set instead of the larger set, or deal with non-set collections, but it's often useful.

Categories

Resources