Linked lists as matrix and efficiency - java

When a big matrix needs to be used in an algorithm, to speed up complexity we were told to use linked lists if the matrix is sparse. Meaning that if the data is mostly the same we can save only the data that are not that value.
But how do we identify the point where using a sparse matrix is not useful anymore ?
For a square matrix of length n how do we calculate the point where we can say that the matrix has too much non-zero data to be written in a linked list ?
I imagine we need to use the memory sizes of an object, a link between two objects, then use our density factor. But what are the calculations to safely say "This matrix has x% non-zero data, it is better to use a linked list ?

The answer to your question depends on what you optimize for. Do you optimize for space or time?
Let's say you optimize for space. To keep data of a square matrix of length n, you need n*n numbers (to simplify, let's say it's an integer for each value). In case of a linked list, you need to have the actual value, the coordination of the value in the matrix and the pointer to the next non-zero value. To simplify, let's say each of those fields is of an integer size. So for a linked list, you need 4 integers for a single value to keep (plus additional data like the head of the linked list).
IMHO, once less than 1/4 of the values in the matrix is non-zero, it's more optimal to use a linked list than an array of arrays.
Obviously, there are other options to keep the matrix values; then the ratio can be different.
To optimize for time, again, it depends which operations you want to run...

Related

How to find the highest pair from 2 arrays and that isnt above a specified budget

lets say you have two lists of ints: [1, 2, 8] and [3, 6, 7]. If the budget is 6, the int returned has to be 5 (2+3). I am struggling to find a solution faster than n^2.
The second part of the question is "how would you change your function if the number of lists is not fixed?" as in, there are multiple lists with the lists being stored in a 2d array. Any guidance or help would be appreciated.
I think my first approach would be to use for statements and add the elements to each of the next array, then take whatever is close to the budget but not exceeding it. I am new to java and I don't get your solution though, so I don't know if this would help.
For the second part of your question, are you referring to the indefinite length of your array? If so, you can use the ArrayList for that.
I will provide an answer for the case of 2 sequences. You will need to ask a separate question for the extended case. I am assuming the entries are natural numbers (i.e. no negatives).
Store your values in a NavigableSet. The implementations of this interface (e.g. TreeSet) allow you to efficiently find (for example) the largest value less than an amount.
So if you have each of your 2 sets in a NavigableSet then you could use:
set1.headSet(total).stream()
.map(v1 -> Optional.ofNullable(set2.floor(total - v1)).map(v2 -> v1 + v2)
.flatMap(Optional::stream)
.max(Integer::compareTo);
Essentially this streams all elements of the first set that are less than the total and then finds the largest element of the second set that's less than total - element (if any), adds them and finds the largest. it returns an Optional which is empty if there is no such element.
You wouldn't necessarily need the first set to be a NavigableSet - you could use any sorted structure and then use Stream.takeWhile to only look at elements smaller than the target.
This should be O(n * log n) once the sets are constructed. You could perhaps do even better by navigating the second set rather than using floor.
The best way to approach the problem if there are multiple lists, would be to use a hash table to store the pair or sequence of values that sum to the number you want (the budget).
You would simply print out the hash map key and values (key being an integer, values: a list of integers) that sum to the number.
Time complexity O(N) where N is the size of the largest list, Space Complexity of O(N).

How to partition a list so that the sums of the sublists are roughly equal

I have a sorted list of 2000 or fewer objects, each with a numerical value. I'm wondering how I can write (in Java) a way so split this list into sublists, each of roughly 200 objects (with fair leeway) such that the sum of the values of each sublist are roughly equal.
Even if the full list has fewer than 2000 objects, I still want the sublists to be roughly 200 objects each. Thank you!
Here is a quick and dirty greedy approach that should work well.
First, decide how many lists you will wind up with. Call that m.
Break your objects into groups of m, with the one possibly smaller group being the values closest to 0.
Order your groups by descending difference between the biggest and the smallest.
Assign your groups to your lists, with the largest object going into the group with the lowest total, the next largest the next lowest, and so on.
When you are done, you will have lists of the right size, with relatively small differences.
(You can do better than this with dynamic programming. But it will also be harder to write. How to constrain items with multiple randomly selected positions so that the average position for each is within a certain range may give you some ideas about how to do that.)

Efficient alternative to Map<Integer, Integer> in Java, with regards to autoboxing?

I'm using a LinkedHashMap<Integer, Integer> to store values of layers on a tile in a 2D game. Higher numbers are drawn over the lower numbers.
In my draw function, I iterate through the value set and draw each one. This means I'm unboxing values (width * height * numLayers) times. I'm planning to port to Android so I want to be as efficient as possible, and I'm thinking this is too much?
The reason I'm using a Map is because the layer number (key) matters: keys above 4 are drawn over players, etc.. So I'll frequently need to skip over a bunch of keys.
I could probably just use an int[10] since I won't need that many layers, but then all of the unused layers would each be taking up 32 bits over nothing, compared to my current HashMap which can have keys 0, 9 and only take up 64 bits.
Efficient alternative to Map ?
SparseIntArrays is more efficient than HashMap<Integer,Integer>. According to documentation
SparseIntArrays map integers to integers. Unlike a normal array of
integers, there can be gaps in the indices. It is intended to be more
memory efficient than using a HashMap to map Integers to Integers,
both because it avoids auto-boxing keys and values and its data
structure doesn't rely on an extra entry object for each mapping. For containers holding up to hundreds of items, the performance difference is not significant, less than 50%.
for more reference click here
For Non-Android languages :
Write your own Hash-based map class (not implementing collections.Map). Relatively simple using a 'linear probe' in an array of cells -- the other technique is a linked-list, which (again) will be as large as the 'direct array' option.
GNU Trove has primitive maps that will do what you want. But if you are not trying to eke out every byte of memory, I'd second Thomas's suggestion to just use an array.

How to optimize checking if one of the given arrays is contained in yet another array

I have an array of integers which is updated every set interval of time with a new value (let's call it data). When that happens I want to check if that array contains any other array of integers from specified set (let's call that collection).
I do it like this:
separate a sub-array from the end of data of length X (arrays in the collection have a set max length of X);
iterate trough the collection and check if any array in it is contained in the separated data chunk;
It works, though it doesn't seem optimal. But every other idea I have involves creating more collections (e.g. create a collection of all the arrays from the original collection that end with the same integer as data, repeat). And that seems even more complex (on the other hand, it looks like the only way to deal with arrays in collections without limited max length).
Are there any standard algorithms to deal with such a problem? If not, are there any worthwhile optimizations I can apply to my approach?
EDIT:
To be precise, I:
separate a sub-array from the end of data of length X (arrays in the collection have a set max length of X and if the don't it's just the length of the longest one in the collection);
iterate trough the collection and for every array in it:
separate sub-array from the previous sub-array with length matching current array in collection;
use Java's List.equals to compare the arrays;
EDIT 2:
Thanks for all the replays, surely they'll come handy some day. In this case I decided to drop the last steps and just compare the arrays in my own loop. That eliminates creating yet another sub-array and it's already O(N), so in this specific case will do.
Take a look at the KMP algorithm. It's been designed with String matching in mind, but it really comes down to matching subsequences of arrays to given sequences. Since that algorithm has linear complexity (O(n)), it can be said that it's pretty optimal. It's also a basic staple in standard algorithms.
dfens proposal is smart in that it incurs no significant extra complexity iff you keep the current product along with the main array, and can be checked in O(1), but it is also quite fragile and produces many false positives and negatives. Just imagine a target array [1, 1, ..., 1], which will always produce a positive test for all non-trivial main arrays. It also breaks down when one bucket contains a 0. That means that a successful check against his test is always a necessary condition for a hit (0s aside), but is never sufficient - aka with that method alone, you can never be sure of the validity of that result.
look at the rsync algorithm... if i understand it correctly you could go about:
you've got a immense array of data [length L]
at the end of that data, you've got N Bytes of data, and you want to know whether those N bytes ever appeared before.
precalculate:
for every offset in the array, calculate the checksum over the next N data elements.
Hold that checksum in a seperate array.
Using a rolling checksum like rsync does, you can do this step in O(N) time for all elements..
Whenever new data arrives:
Calculate the checksum over the last N elements. Using a rolling checksum, this could be O(1)
Check that checksum against all the precalculated checksums. If it matches, check equality of the subarrays (subslices , whatever...). If that matches too, you've got a match.
I think, in essence this is the same as dfen's approach with the product of all numbers.
I think you can keep product of array to for immediate rejections.
So if your array is [n_1,n_2,n_3,...] you can say that it is not subarray of [m_1,m_2,m_3,...] if product m_1*m_2*... = M is not divisible by productn_1*n_2*... = N.
Example
Let's say you have array
[6,7]
And comparing with:
[6,10,12]
[4,6,7]
Product of you array is 6*7 = 42
6 * 10 * 12 = 720 which is not divisible by 42 so you can reject first array immediately
[4, 6, 7] is divisble by 42 (but you cannot reject it - it can have other multipliers)
In each interval of time you can just multiply product by new number to avoid computing whole product everytime.
Note that you don't have to allocate anything if you simulate List's equals yourself. Just one more loop.
Similar to dfens' answer, I'd offer other criteria:
As the product is too big to be handled efficiently, compute the GCD instead. It produces much more false positives, but surely fits in long or int or whatever your original datatype is.
Compute the total number of trailing zeros, i.e., the "product" ignoring all factors but powers of 2. Also pretty weak, but fast. Use this criterion before the more time-consuming ones.
But... this is a special case of DThought's answer. Use rolling hash.

Sampling data from a large array

Basically I have a very large array of objects and I need to remove 10% of the least fit objects.
Each object has a fitness variable (a double) associated with it. I dont have a number which determines whether or not an object is fit, I just want the least fit of the bunch.
What is the best way of retrieving (sampling) the least fit objects?
One way could be randomly select lets way 20%, sort the data, then remove 10%. But I think this is not a very smart way of doing it.
The other way it to keep the array sorted at all times then remove the first 10%. But I dont think this is very good because you will have to always keep sorting the array whilst inserting/updating which is a big overhead..
Let k be yourCollection.length() * 0.1 and n = yourCollection.length().
Find the k-th smallest element (QuickSelect or Median of 5), where the key is your fitness. Let's call it p. This can be done in O(n).
Then traverse through the collection and remove all the items with fitness less than p.fitness. We've got an O(n) solution.
Or, you can create a heap in O(n) with key=fitness and remove k elements from it in O(k * log(n)).

Categories

Resources