Why max and min in Guava for empty arrays throw IllegalArgumentException? - java

I'm reading the code of Guava's Ints.max(int... array) (and similarly, Ints.min, Longs.min, etc.) They throw an IllegalArgumentException if array.length == 0 (This is Guava 15.0).
I wonder why they do not return the "identity element" in this case, instead of throwing an exception. By "identity element" I mean the element behaving like 1 for product, or 0 for sum.
That is, I would expect Ints.min() to be Integer.MAX_VALUE, Ints.max() to be Integer.MIN_VALUE, and so on.
The rationale behind this is that if you split an array in two, the min of the whole array must be the min between the mins of both sub arrays. Or, for the mathematically inclined, the sum over an empty set of real numbers is 0, the product is 1, the union of an empty collection of sets is the empty set, and so on.
Since Guava libraries tend to be carefully produced, I guess there must be an explanation for throwing an exception here. So the question is: why?
Edit: I understand that most people expect max and min of an array to be an element of the array, but this is because max/min of two elements is always one of them. On the other hand, if one regards max/min just as (commutative) binary operations, returning the identity element makes more sense. To me.

Because, IMHO, in 99.99% of the cases, when you ask the minimum element of an array, you want to get an element of this array, and not some arbitrary large value. And thus, most of the time, an empty array is a special condition, that needs a specific treatment. And not handling this special condition is thus a bug, signalled by an exception.

You said it yourself -
The rationale behind this is that if you split an array in two, the min of the whole array must be the min between the mins of both sub arrays. Or, for the mathematically inclined, the sum over an empty set of real numbers is 0, the product is 1, the union of an empty collection of sets is the empty set, and so on.
So [-1] = [-1] union [] but max([-1]) != max([-1] union []). I agree that for product or sum it makes more sense to return the respective identity, but not max/min.
I also prefer the property that max/min(S) be an element of S. Not some element having no relevance with respect to less than and greater than.
In particular if I'm working in a domain with a lot of negative numbers - say temperatures in Northern Canada - a day where my sample of temperatures is empty because the thermometer broke - it should not randomly show up as a relatively very warm day.

The minimum/maximum of array values must come from that array. If the array is empty, there there is no value to take. Returning Integer.MAX_VALUE OR Integer.MIN_VALUE here would be wrong, because those values aren't in the array. Nothing is in the array. Mathematically, the answer is the empty set, but that isn't a valid value among possible int values. There is no possible int correct answer, so the only possible correct course of action is to throw an Exception.

Related

How to find the highest pair from 2 arrays and that isnt above a specified budget

lets say you have two lists of ints: [1, 2, 8] and [3, 6, 7]. If the budget is 6, the int returned has to be 5 (2+3). I am struggling to find a solution faster than n^2.
The second part of the question is "how would you change your function if the number of lists is not fixed?" as in, there are multiple lists with the lists being stored in a 2d array. Any guidance or help would be appreciated.
I think my first approach would be to use for statements and add the elements to each of the next array, then take whatever is close to the budget but not exceeding it. I am new to java and I don't get your solution though, so I don't know if this would help.
For the second part of your question, are you referring to the indefinite length of your array? If so, you can use the ArrayList for that.
I will provide an answer for the case of 2 sequences. You will need to ask a separate question for the extended case. I am assuming the entries are natural numbers (i.e. no negatives).
Store your values in a NavigableSet. The implementations of this interface (e.g. TreeSet) allow you to efficiently find (for example) the largest value less than an amount.
So if you have each of your 2 sets in a NavigableSet then you could use:
set1.headSet(total).stream()
.map(v1 -> Optional.ofNullable(set2.floor(total - v1)).map(v2 -> v1 + v2)
.flatMap(Optional::stream)
.max(Integer::compareTo);
Essentially this streams all elements of the first set that are less than the total and then finds the largest element of the second set that's less than total - element (if any), adds them and finds the largest. it returns an Optional which is empty if there is no such element.
You wouldn't necessarily need the first set to be a NavigableSet - you could use any sorted structure and then use Stream.takeWhile to only look at elements smaller than the target.
This should be O(n * log n) once the sets are constructed. You could perhaps do even better by navigating the second set rather than using floor.
The best way to approach the problem if there are multiple lists, would be to use a hash table to store the pair or sequence of values that sum to the number you want (the budget).
You would simply print out the hash map key and values (key being an integer, values: a list of integers) that sum to the number.
Time complexity O(N) where N is the size of the largest list, Space Complexity of O(N).

How to compare a number in LinkedList using Iterator

I have two linked list which every list represent a number.
for example list 1 represent a number of 29 and the second represent a number of 7.
I want to implement operator less equal for these two linkedlist which works following way:
if number which represented by first linkedlist is less equal to the second return true. if not return false.
The most important thing is to go over a each linkedlist just a once. so using size & get methods which defined in linkedlist are not useful.
The issue which I face it when two numbers represented by linkedlists have a different size of length.
for example 10 ? 1 should return false.
for 1 ? 10 it should return true and also for 10 ? 10. Each number represented by a linkedlist.
I want to go over one or two linkedlist through iterator to understand if the number which represented by first linkedlist is less equal to another.
I wrote a code which works only when numbers represented a same length, for example, 29? 45 or 7 ? 6 and etc.
Iterator<T> iter1 = a1.Iterator();
Iterator<T> iter2 = a2.Iterator();
while (iter1.hasNext()) {
if (!(iter1.next().lessEqual(iter2.next()))) //if iter2 !hasNext this will throw exception; for ex 1?10
return false;
}
return true;
How can I implement to fix it for different sizes of numbers which represented by linkedlist? Please note every single number in the whole number are allocated in link. for example 294. the number 2 is in first link, 9 in the second and 4 in third link of the current linkedlist.
To avoid an exception being thrown, you need to change the loop condition:
while (iter1.hasNext() && iter2.hasNext()) {
However this is not enough to make your program give the correct answer, for that you need a bit more work.
Assuming that there are no leading zeros, notice that:
If one number has fewer digits than the other, the shorter number is the smaller
If both numbers have the same number of digits, then the first different digit decides which one is smaller
The above should be fairly easy to implement if you are using LinkedList from the standard library, because it has a size() method to check the length without iterating over elements.
If you cannot use the LinkedList and you cannot get the size of the list in constant time, then you need to work a bit harder.
Consider these implementation steps:
Iterate both lists until reaching the end of any of the lists (I already gave you the condition for that).
In each iteration, compare the digits: if you find a difference, then save that for later use, and do not overwrite it again in the loop. You could use for example a Boolean firstIsSmaller, initialized to null, and set to true or false when the first different digit is found, otherwise stay null until the end of the loop.
At the end of the loop, if one of the list did not reach the end, that list is the bigger number.
If both lists reached the end at the same time, then use firstIsSmaller to decide which number is smaller. When firstIsSmaller is null, that means no difference was found, the numbers are equal. Otherwise the boolean value decides if the first is smaller or not.

Element type of SortedSet allows for calculation of the successor of a given value

From SortedSet documentation:
several methods return subsets with restricted ranges. Such ranges
are half-open, that is, they include their low endpoint but not their
high endpoint (where applicable). If you need a closed range (which
includes both endpoints), and the element type allows for calculation
of the successor of a given value, merely request the subrange from
lowEndpoint to successor(highEndpoint).
Can you explain what means
the element type allows for calculation of the successor of a given
value
What types allow for calculation of the successors in Java?
the element type allows for calculation of the successor of a given value
It all depends on the sorting method
It means that, for the sorting method on your elements, you can calculate what sorted value would come directly after your given value, with nothing possibly between them.
From the docs:
For example, suppose that s is a sorted set of strings. The following
idiom obtains a view containing all of the strings in s from low to
high, inclusive: SortedSet<String> sub = s.subSet(low, high+"\0");
For Strings: (natural sort) high + "\0" is the successor to high
For Integers: (natural sort) high + 1 is the successor to high. But if your Integers were sorted from high to low, then the successor would be high - 1.
For some values computing the successor is slightly more complicated...
For Doubles: (natural sort) Math.nextAfter(high, Double.POSITIVE_INFINITY) is the successor to high since nextAfter gets the adjacent value after high such that nothing could come between high and nextAfter(high..). Note that you might run into trouble with max/min values or neg/pos infinity values for doubles, so you would probably want to check high first
With real world floating point numbers this would not work (unless you set some limit to the precision).
This only works here because in computers floating point numbers always without exception have limited precision and thus you can calculate the next possible value in that precision (which is what nextAfter does).
Allowing calculation of successors require your type to have discrete values (although that is not sufficient).
Integer is a good example of this - the successor of 2 is 3. The successor of 3 is 4.
For example, the whole set contains 1, 3, and you want to get Integer from between [1, 3], if you directly call
s.subSet(1, 3);
then 3 will not be in the subset.
In this situation, you can calculate the next element after 3 by 3 + 1 = 4 and call:
s.subSet(1, 4);
then 3 will be in the subset.
The calculation mechanism might differes from class to class. With Numberic elements or String, you can calculate the successor by + directly. If you are manipulating on other type, you can custom your own calculation method, and it should be consistent with compare method.

Array Constraint Satisfaction

Given a range of number 1 through N, where N >=3. you have to take an
array of length 2N and place each number ( from the range 1 to N)
twice. such a that the distance between two indexes of a number is
equal to the number. example
N=3
( 3, 1, 2, 1, 3, 2 )
The solution I'm thinking of is as follows:
Generate every permutation array of the range of numbers, ex: {1,2,3|, {3,2,1}, {2,1,3}, etc.
For each permutation array, run the following functions:
foreach(int number : permutationArray){
addToResultArray(number);
}
addToResultArray(int toBeAdded){
for(int i = 0; i < resultArray.length; i++){
//-1 implies the resultArray is empty at that index
if(resultArray[i]==-1 && resultArray[i+toBeAdded+1]==-1)
resultArray[i] = toBeAdded;
}
}
If the above functions do not cause an out of bounds exception, you have a valid solution.
I do not think this solution is very good. Does anyone have something better?
This can be viewed as a constrained problem: you have 2*N variables V={v1,v2,...,v2n} (representing the array indices) each variable has [1,2,..,N] possible values under the constraints:
Every value is assigned to exactly two variables.
The distance between variables with the same value equals the value itself.
An assignment is a mapping from the set of variables to their possible values. For example [v1=1,v2=3,v3=5] is an assignment for v1,v2 and v3. A given assignment is consistent if it satisfies the constraints. An example to inconsistent assignment is [v1=3,v2=1,v3=3]
An assignment is complete if its cardinality equals the variables size (i.e. 2*N). Otherwise it is a partial assignment. Clearly, our goal is to have one or more complete consistent assignment (a.k.a solution).
So the problem is basically a backtracking search problem. In particular, we have an ordering over the variables. Each time we assign a value to the current variable. If the value makes the current assignment inconsistent, we backtrack.
Your approach is actually a generate-and-test which is inefficient. Generating all solutions and counting them is hard problem in general. In most of the cases, we are looking for one solution.
Here is the interesting part: there is a much more efficient way to do this by propagating the values and detecting backtracking sooner (see constraint propagation).

How to optimize checking if one of the given arrays is contained in yet another array

I have an array of integers which is updated every set interval of time with a new value (let's call it data). When that happens I want to check if that array contains any other array of integers from specified set (let's call that collection).
I do it like this:
separate a sub-array from the end of data of length X (arrays in the collection have a set max length of X);
iterate trough the collection and check if any array in it is contained in the separated data chunk;
It works, though it doesn't seem optimal. But every other idea I have involves creating more collections (e.g. create a collection of all the arrays from the original collection that end with the same integer as data, repeat). And that seems even more complex (on the other hand, it looks like the only way to deal with arrays in collections without limited max length).
Are there any standard algorithms to deal with such a problem? If not, are there any worthwhile optimizations I can apply to my approach?
EDIT:
To be precise, I:
separate a sub-array from the end of data of length X (arrays in the collection have a set max length of X and if the don't it's just the length of the longest one in the collection);
iterate trough the collection and for every array in it:
separate sub-array from the previous sub-array with length matching current array in collection;
use Java's List.equals to compare the arrays;
EDIT 2:
Thanks for all the replays, surely they'll come handy some day. In this case I decided to drop the last steps and just compare the arrays in my own loop. That eliminates creating yet another sub-array and it's already O(N), so in this specific case will do.
Take a look at the KMP algorithm. It's been designed with String matching in mind, but it really comes down to matching subsequences of arrays to given sequences. Since that algorithm has linear complexity (O(n)), it can be said that it's pretty optimal. It's also a basic staple in standard algorithms.
dfens proposal is smart in that it incurs no significant extra complexity iff you keep the current product along with the main array, and can be checked in O(1), but it is also quite fragile and produces many false positives and negatives. Just imagine a target array [1, 1, ..., 1], which will always produce a positive test for all non-trivial main arrays. It also breaks down when one bucket contains a 0. That means that a successful check against his test is always a necessary condition for a hit (0s aside), but is never sufficient - aka with that method alone, you can never be sure of the validity of that result.
look at the rsync algorithm... if i understand it correctly you could go about:
you've got a immense array of data [length L]
at the end of that data, you've got N Bytes of data, and you want to know whether those N bytes ever appeared before.
precalculate:
for every offset in the array, calculate the checksum over the next N data elements.
Hold that checksum in a seperate array.
Using a rolling checksum like rsync does, you can do this step in O(N) time for all elements..
Whenever new data arrives:
Calculate the checksum over the last N elements. Using a rolling checksum, this could be O(1)
Check that checksum against all the precalculated checksums. If it matches, check equality of the subarrays (subslices , whatever...). If that matches too, you've got a match.
I think, in essence this is the same as dfen's approach with the product of all numbers.
I think you can keep product of array to for immediate rejections.
So if your array is [n_1,n_2,n_3,...] you can say that it is not subarray of [m_1,m_2,m_3,...] if product m_1*m_2*... = M is not divisible by productn_1*n_2*... = N.
Example
Let's say you have array
[6,7]
And comparing with:
[6,10,12]
[4,6,7]
Product of you array is 6*7 = 42
6 * 10 * 12 = 720 which is not divisible by 42 so you can reject first array immediately
[4, 6, 7] is divisble by 42 (but you cannot reject it - it can have other multipliers)
In each interval of time you can just multiply product by new number to avoid computing whole product everytime.
Note that you don't have to allocate anything if you simulate List's equals yourself. Just one more loop.
Similar to dfens' answer, I'd offer other criteria:
As the product is too big to be handled efficiently, compute the GCD instead. It produces much more false positives, but surely fits in long or int or whatever your original datatype is.
Compute the total number of trailing zeros, i.e., the "product" ignoring all factors but powers of 2. Also pretty weak, but fast. Use this criterion before the more time-consuming ones.
But... this is a special case of DThought's answer. Use rolling hash.

Categories

Resources