Optimise MySQL reading pattern - java

I have an Integer-collection with filtered row-IDs in which I am trying to search for sequences/ranges to optimise a MySQL select-query. To give you an example:
The Integer-Collection can be either very fragmented:
[1,2,88,101,200] = Sequence(1-2,88,101,200)
Or very entirely:
[1,2,3,4,..,198,199,200] = Sequence(1-200)
Is there any Java-algorithm to find a sequence in the collection or to improve my reading-pattern in general?

How long are your collections? Unless you have millions of items, it is probably fastest to load a collection entirely into memory, sort it and then scan for ranges.
In a sorted list, finding ranges is trivial. Just scan it sequentially; if the next element is not previous element + 1, one range has just ended, and another began.

Related

Fastest way to find an item in a sorted ArrayList

In Java, I have an ArrayList with a list of objects. Each object has a date field that is just a long data type. The ArrayList is sorted by the date field. I want to insert a new object into the ArrayList so that it appears in the correct position with regard to its date. The only solution I can see is to iterate through all the items and compare the date field of the object being inserted to the objects being iterated on and then insert it once I reach the correct position. This will be a performance issue if I have to insert a lot of records.
What are some possible ways to improve this performance? Maybe an ArrayList is not the best solution?
I would say that you are correct in making the statement:
Maybe an ArrayList is not the best solution
Personally, I think that a tree structure would be better suited for this. Specifically Binary Search Tree, which is sorted on the object's date time. Once you have the tree created, you can use binary search which would take O(log n) time.
Whether or not binary search + O(n) insertion is bad for you depends on at least these things:
size of the list,
access pattern (mostly insert or mostly read),
space considerations (ArrayList is far more compact than the alternatives).
Given the existence of these factors and their quite complex interactions you should not switch over to a binary search tree-based solution until you find out how bad your current design is—through measurements. The switch might even make things worse for you.
I would consider using TreeSet and make your item Comparable. Then you get everything out of the box.
If this is not possible I would search for the index via Collections.binarySearch(...).
EDIT: Make sure performance is an issue before you start optimizing
first you should sort ArrayList Using:
ArrayList<Integer> arr = new ArrayList<>();
...
Collections.sort(arr);
Then Your Answer is:
int index = Collections.binarySearch(arr , 5);

Data structure for random access linked list

I have a need for a data structure that will be able to give preceding and following neighbors for a given int that is part of the structure.
Some criteria I've set for myself:
write once, read many times
contain 100 to 1000 int
be efficient: order of magnitude O(1)
be memory efficient (size of the ints + some housekeeping bits ideally)
implemented in pure Java (no libraries for this, as I want to learn)
items are unique
no concurrency requirements
ints are ordered externally, that order will most likely not be a natural ordering, and that order must be preserved (ie. there is no contract whatsoever regarding the difference in value between two neighboring ints - any int may be greater or smaller than the int it preceeds in the order).
This is in Java, and is mostly theoretical, as I've started using the solution described below.
Things I've considered:
LinkedHashSet: very quick to find an item, order of O(1), and very quick to retrieve next neighbor. No apparent way to get previous neighbor without reverse sorting the set. Boxed Integer objects only.
int[]: very easy on memory because no boxing required, very quick to get previous and next neighbor, retrieval of an item is O(n) though because index is not known and array traversal is required, and that is not acceptable.
What I'm using now is a combination of int[] and HashMap:
HashMap for retrieving index of a specific int in the int[]
int[] for retrieving the neighbors of that int
What I like:
neighbor lookup is ideally O(2)
int[] does not do boxing
performance is theoretically very good
What I dislike:
HashMap does boxing twice (key and value)
the ints are stored twice (in both the map and the array)
theoretical memory use could be improved quite a bit
I'd be curious to hear of better solutions.
One solution is to sort the array when you add elements. That way, the previous element is always i-1 and to locate a value, you can use a binary search which is O(log(N)).
The next obvious candidate is a balanced binary tree. For this structure, insert is somewhat expensive but lookup is again O(log(N)).
If the values aren't 32bit, then you can make the lookup faster by having a second array where each value is the index in the first and the index is the value you're looking for.
More options: You could look at bit sets but that again depends on the range which the values can have.
Commons Lang has a hash map which uses primitive int as keys: http://grepcode.com/file/repo1.maven.org/maven2/commons-lang/commons-lang/2.6/org/apache/commons/lang/IntHashMap.java
but the type is internal, so you'd have to copy the code to use it.
That means you don't need to autobox anything (unboxing is cheap).
Related:
http://java-performance.info/implementing-world-fastest-java-int-to-int-hash-map/
HashMap and int as key
ints are ordered externally, that order will most likely not be a natural ordering, and that order must be preserved (ie. there is no contract whatsoever regarding the difference in value between two neighboring ints).
This says "Tree" to me. Like Aaron said, expensive insert but efficient lookup, which is what you want if you have write once, read many.
EDIT: Thinking about this a bit more, if a value can only ever have one child and one parent, and given all your other requirements, I think ArrayList will work just fine. It's simple and very fast, even though it's O(n). But if the data set grows, you'll probably be better off using a Map-List combo.
Keep in mind when working with these structures that the theoretical performance in terms of O() doesn't always correspond to real-word performance. You need to take into account your dataset size and overall environment. One example: ArrayList and HashMap. In theory, List is O(n) for unsorted lookup, while Map is O(1). However, there's a lot of overhead in creating and managing entries for a map, which actually gives worse performance on smaller sets than a List.
Since you say you don't have to worry about memory, I'd stay away from array. The complexity of managing the size isn't worth it on your specified data set size.

Data structure in Java that supports quick search and remove in array with duplicates

More specifically, suppose I have an array with duplicates:
{3,2,3,4,2,2,1,4}
I want to have a data structure that supports search and remove the first occurrence of some value faster than O(n), say if the value is 4, then it becomes:
{3,2,3,2,2,1,4}
I also need to iterate the list from head according to the same order. Other operations like get(index) or insert are not needed.
You can use O(n) time to record the original data(say it's an int[]) in your data structure, I just need the later search and remove faster than O(n).
"Search and remove" is considered as ONE operation as shown above.
If I have to make it myself, I would use a LinkedList to store the data, and HashMap to map every key to a list of all occurrence of nodes together with their previous and next ones.
Is it a right approach? Are there any better choices already there in Java?
The data structure you describe, essentially a hybrid linked list and map, I think is the most efficient way of handling your stated problem. You'll have to keep track of the nodes yourself, since Java's LinkedList doesn't provide access to the actual nodes. The AbstractSequentialList may be helpful here.
The index structure you'll need is a map from an element value to the appearances of that element in the list. I recommend a hash table from hashCode % modulus to a linked list of (value, list of main-list nodes).
Note that this approach is still O(n) in the worst case, when you have universal hash collisions; this applies whether you use open or closed hashing. In the average case it should be something closer to O(ln(n)), but I'm not prepared to prove that.
Consider also whether the overhead of keeping track of all of this is really worth the gains. Unless you've actually profiled running code and determined that a LinkedList is causing problems because remove is O(n), stick with that until you do.
Since your requirement is that the first occurrence of the element should be removed and the remaining occurrences retained, there would be no way to do it faster than O(n) as you would definitely have to move through to the end of the list to find out if there is another occurrence. There is no standard api from Oracle in the java package that does this.

which data structure in java can add key/value pair at constant time and maintain sorted by value?

basically i'm looking for a best data structure in java which i can store pairs and retrieve top N number of element by the value. i'd like to do this in O(n) time where n is number of entires in the data structure.
example input would be,
<"john", 32>
<"dave", 3>
<"brian", 15>
<"jenna", 23>
<"rachael", 41>
and if N=3, i should be able to return rachael, john, jenna if i wanted descending order.
if i use some kind of hashMap, insertion is fast, but retrieving them by order gets expensive.
if i use some data structure that keeps things ordered, then insertion becomes expensive while retrieving is cheaper. i was not able to find the best data structure that can do both very well and very fast.
any input is appreciated. thanks.
[updated]
let me ask the question in other way if that make it clearer.
i know i can insert at constant time O(1) into hashMap.
now, how can i retrieve elements from sorted order by value in O(n) time where n=number of entires in the data structure? hope it makes sense.
If you want to sort, you have to give up constant O(1) time.
That is because unlike inserting an unsorted key / value pair, sorting will minimally require you to compare the new entry to something, and odds are to a number of somethings. Once you have an algorithm that will require more time with more entries (due to more comparisons) you have overshot "constant" time.
If you can do better, then by all means, do so! There is a Dijkstra prize awaiting for you, if not a Fields Medal to boot.
Don't dispair, you can still do the key part as a HashMap, and the sorting part with a Tree like implementation, that will give you O(log n). TreeMap is probably what you desire.
--- Update to match your update ---
No, you cannot iterate over a hashmap in O(n) time. To do so would assume that you had a list; but, that list would have to already be sorted. With a raw HashMap, you would have to search the entire map for the next "lower" value. Searching part of the map would not do, because the one element you didn't check would possibly be the correct value.
Now, there are some data structures that make a lot of trade offs which might get you closer. If you want to roll your own, perhaps a custom Fibonacci heap can give you an amortized performance close to what you wish, but it cannot guarantee a worst-case performance. In any case, some operations (like extract-min) will still require O(log n) performance.

storing sets of integers to check if a certain set has already been mentioned

I've come across an interesting problem which I would love to get some input on.
I have a program that generates a set of numbers (based on some predefined conditions). Each set contains up to 6 numbers that do not have to be unique with integers that ranges from 1 to 100).
I would like to somehow store every set that is created so that I can quickly check if a certain set with the exact same numbers (order doesn't matter) has previously been generated.
Speed is a priority in this case as there might be up to 100k sets stored before the program stops (maybe more, but most the time probably less)! Would anyone have any recommendations as to what data structures I should use and how I should approach this problem?
What I have currently is this:
Sort each set before storing it into a HashSet of Strings. The string is simply each number in the sorted set with some separator.
For example, the set {4, 23, 67, 67, 71} would get encoded as the string "4-23-67-67-71" and stored into the HashSet. Then for every new set generated, sort it, encode it and check if it exists in the HashSet.
Thanks!
if you break it into pieces it seems to me that
creating a set (generate 6 numbers, sort, stringify) runs in O(1)
checking if this string exists in the hashset is O(1)
inserting into the hashset is O(1)
you do this n times, which gives you O(n).
this is already optimal as you have to touch every element once anyways :)
you might run into problems depending on the range of your random numbers.
e.g. assume you generate only numbers between one and one, then there's obviously only one possible outcome ("1-1-1-1-1-1") and you'll have only collisions from there on. however, as long as the number of possible sequences is much larger than the number of elements you generate i don't see a problem.
one tip: if you know the number of generated elements beforehand it would be wise to initialize the hashset with the correct number of elements (i.e. new HashSet<String>( 100000 ) );
p.s. now with other answers popping up i'd like to note that while there may be room for improvement on a microscopic level (i.e. using language specific tricks), your overal approach can't be improved.
Create a class SetOfIntegers
Implement a hashCode() method that will generate reasonably unique hash values
Use HashMap to store your elements like put(hashValue,instance)
Use containsKey(hashValue) to check if the same hashValue already present
This way you will avoid sorting and conversion/formatting of your sets.
Just use a java.util.BitSet for each set, adding integers to the set with the set(int bitIndex) method, you don't have to sort anything, and check a HashMap for already existing BitSet before adding a new BitSet to it, it will be really very fast. Don't use sorting of value and toString for that purpose ever if speed is important.

Categories

Resources