what to use for this requirement, Array, List, Map,? - java

While making my program i have come across this requirement that i have to assign unique id's to some Objects that i create. Now i am creating the objects dynamically on GUI, and initially i used simple counter to assign int value to the created node, and it worked just fine.
However the problem that this approach creates is that if while creating the GUI, if some node has to be deleted, this id is also removed and is never used again. With the next new node, everytime i have to use the latest counter value and this creates lot of missing int values if nodes are deleted during the process.
I wanted to reuse those missing id's upon creating of new nodes, for this i am confused which approach i should addopt.
MY Ideas:
Using a ArrayList that contains the available values, plus if a node
is deleted, it's id is added to this list, i sort this list and use
the minimum value for new node. Fine but, when i use this value, if
i remove it from List, the index is not deleted and this causes
problem.
HashMap, similarly like above i add available id's and remove not used, but not sure how to sort this hashMap???
Can you suggest how i should go about it? May be i need some kind of stack where i can push values, sort it and use the minimum value, and if that i used, it is removed from this stack, please give some ideas about this how to accomplish this task???

Keep a list of the deleted IDs, and when you create a new node, check that list for an ID to re-use (doesn't matter which you take); if the list is empty (as it will be initially), get a new ID "the old way". Even more clever: make the list an object that will generate a new ID if there aren't any deleted ones in it, so the caller doesn't have to worry about HOW the ID was arrived at.

You could use a TreeSet (which automatically sorts all entries added from least to greatest) to store the deleted id's (myTreeSet.add(old_id)). That way, when you go to create a new instance, you would check to see if there are any entries in the TreeSet first. To grab the lowest value, you would use myTreeSet.first() (which should be an O(1) operation). If the TreeSet is empty, which means all known id's are currently in use, then you would go ahead and use the next available id as normal.

How about a TreeSet to store the used IDs? You could then use higher(0) to find the lowest free ID. If it returns null, then you know that you have no used IDs.

The first solution works fine only if you have few nodes! Imagine an application with thousands nodes! What about memory consumption?
The Hashmap solution is better to you aims and need less controls.

Related

How can I avoid growing my hashmaps to 2 keys until I try to add a second key?

I have a situation where I have tons of hashmaps (millions and millions) in my server, and 95% of them have literally only one key, ever.
The server runs out of memory; which probably may have something to do with the fact that initial HashMap default size is 16, so for every HashMap object I'm wasting a ton of memory.
Short of re-designing the server with a new data structure that flexibly stores the data as 1 element OR an entire hashmap (which I'd prefer avoiding), I'm trying to first optimize this by changing the initial size to 1:
Map<String, Type2> myMiniMap = new HashMap<>(1);
However, my concern is that due to default load factor of 0.75 in HashMaps, this would immediately get increased to 2 the moment I add the first key to the map (since 1*0.75 < 1, which is how I understand hash sizing logic in Java)
Assuming my understanding above is correct (that by default, Java will create a space for 2 keys as soon as I add the first key to the hash), is there a way to prevent this from happening till I actually try to insert the second key?
E.g., should I set loadFactor to zero or one?
If they're truly only ever going to be singletons, why not use Collections.singletonMap to create them? The downside of this is that the map that is created is immutable.
Alternatively, you could create your own class implementing Map that will store a key and value in class fields and then, if an attempt is made to add a second key value, it will switch to using a HashMap as its default backing store. It would be more tedious than difficult to accomplish.

Realm: Order of records was changed

I'm trying to develop my Android app with Realm database.
Today I got below problem:
I added a list of records to table and then try to deleted one of them.
after deleting the order of the rest was changed (it's different with the order before deleting).
please see the images below to see detail.
Before deleting
After delete the 3rd item
And the question is: That's is an function or an bug? And how Can I keep the order of record?
I know that I can easy to get the correct order as I want with add a new field as createTime or something like that but I want to find an very simple solution as config something for Realm.
Items in a Realm are not sorted by default, so you should think of any query result as an unordered set unless you explicitly sorted it.
Generally the items will come out in the order you inserted them in, but it is not a guarantee. The underlying reason technical reason is that we compact the data on the disk, so if you delete items in the middle of a list, the last item will be moved to its place.
So the answer is: It is working as intended, and you should use a sorting method if you want your results to be sorted.

Hashing with Linked Lists

I am trying to do a form of coalesced hashing, and to do so, I need to maintain multiple linked lists that get created when you try to insert something into the table and it collides with another object. How would I go about creating multiple linked lists inside of an add(object x) function and then be able to call the same list again in a find(object x) function?
For example, if my hash value is 5, and bucket 5 is occupied, I create a linked list with bucket 5 as a head, and then create a new node where the object I tried to put into 5 ends up getting put. This way when I try to find the object latter, rather than probe the table, I can just follow the linked list I created referencing slot 5 and follow it to my object.
My issue is, I can not figure out how to maintain multiple linked lists for different collisions, and then call the appropriate list later on when I try to find the object. Any help is greatly appreciated.
If you're trying to replicate something like HashMap (and it sounds very much like you are), you'll want to keep the linked lists in search tree, so that you can find the right list for inserting and for finding an object in reasonable time.

Efficiently update an element in a DelayQueue

I am facing a similar problem as the author in:
DelayQueue with higher speed remove()?
The problem:
I need to process continuously incoming data and check whether the data has been seen in a certain timeframe before. Therefore I calculate a unique ID for incoming data and add this data indexed by the ID to a map. At the same time I store the ID and the timeout timestamp in a PriorityQueue, giving me the ability to efficiently check for the latest ID to time out. Unfortunately if the data comes in again before the specified timeout, I need to update the timeout stored in the PriorityQueue. So far I just removed the old ID and re-added the ID along with the new timeout. This works well, except for the time consuming remove method if my PriorityQueue grows over 300k elements.
Possible Solution:
I just thought about using a DelayQueue instead, which would make it easier to wait for the first data to time out, unfortunately I have not found an efficient way to update a timeout element stored in such a DelayQueue, without facing the same problem as with the PriorityQueue: the remove method!
Any ideas on how to solve this problem in an efficient way even for a huge Queue?
This actually sounds a lot like a Guava Cache, which is a concurrent on-heap cache supporting "expire this long after the most recent lookup for this entry." It might be simplest just to reuse that, if you can use third-party libraries.
Failing that, the approach that implementation uses looks something like this: it has a hash table, so entries can be efficiently looked up by their key, but the entries are also in a concurrent, custom linked list -- you can't do this with the built-in libraries. The linked list is in the order of "least recently accessed first." When an entry is accessed, it gets moved to the end of the linked list. Every so often, you look at the beginning of the list -- where all the least recently accessed entries live -- and delete the ones that are older than your threshold.

java efficient de-duplication

Lets say you have a large text file. Each row contains an email id and some other information (say some product-id). Assume there are millions of rows in the file. You have to load this data in a database. How would you efficiently de-dup data (i.e. eliminate duplicates)?
Insane number of rows
Use Map&Reduce framework (e.g. Hadoop). This is a full-blown distributed computing so it's an overkill unless you have TBs of data though. ( j/k :) )
Unable to fit all rows in memory
Even the result won't fit : Use merge sort, persisting intermediate data to disk. As you merge, you can discard duplicates (probably this sample helps). This can be multi-threaded if you want.
The results will fit : Instead of reading everything in-memory and then put it in a HashSet (see below), you can use a line iterator or something and keep adding to this HashSet. You can use ConcurrentHashMap and use more than one thread to read files and add to this Map. Another multi-threaded option is to use ConcurrentSkipListSet. In this case, you will implement compareTo() instead of equals()/hashCode() (compareTo()==0 means duplicate) and keep adding to this SortedSet.
Fits in memory
Design an object that holds your data, implement a good equals()/hashCode() method and put them all in a HashSet.
Or use the methods given above (you probably don't want to persist to disk though).
Oh and if I were you, I will put the unique constraint on the DB anyways...
I will start with the obvious answer. Make a hashmap and put the email id in as the key and the rest of the information in to the value (or make an object to hold all the information). When you get to a new line, check to see if the key exists, if it does move to the next line. At the end write out all your SQL statements using the HashMap. I do agree with eqbridges that memory constraints will be important if you have a "gazillion" rows.
You have two options,
do it in Java: you could put together something like a HashSet for testing - adding an email id for each item that comes in if it doesnt exist in the set.
do it in the database: put a unique constraint on the table, such that dups will not be added to the table. An added bonus to this is that you can repeat the process and remove dups from previous runs.
Take a look at Duke (https://github.com/larsga/Duke) a fast dedupe and record linkage engine written in java. It uses Lucene to index and reduce the number of comparison (to avoid the unacceptable Cartesian product comparison). It supports the most common algorithm (edit distance, jaro winkler, etc) and it is extremely extensible and configurable.
Can you not index the table by email and product ID? Then reading by index should make duplicates of either email or email+prodId readily identified via sequential reads and simply matching the previous record.
Your problem can be solve with a Extract, transform, load (ETL) approach:
You load your data in an import schema;
Do every transformation you like on the data;
Then load it into the target database schema.
You can do this manually or use an ETL tool.

Categories

Resources