Merge Sorted Sublists to Sorted Superlist - java

I have a List of n sorted Lists with m elements (Strings) each. Those elements originate from a List with a distinct order that I don't know. What I know is that all sublists maintain the element's global order. The lists are not disjoint. The union of the lists is a subset of the original list.
Now, I'm struggling to find an algorithm that would efficiently combine them back into a List (of Lists) with maximum sorting accuracy.
Is there a Solution out there for such a problem?
I am using Java, here is some sample code:
List<List<String>> elements = new ArrayList<>();
elements.add(Lists.newArrayList("A","D","F"));
elements.add(Lists.newArrayList("B","D","E"));
elements.add(Lists.newArrayList("A","B","G"));
elements.add(Lists.newArrayList("C","D","H"));
// the required method
List<List<String>> sorted = sortElements(elements);
/* expeced output:
* [["A"],["B"],["C"],["D"],["G","F","E","H"]]
*/

You are seeking for topological sorting.
Your initial lists represent directed graph arcs (A->D, D->F etc)
P.S.
Special kind of topological sort in order to explicitly divide nodes by levels is called in Russian literature "Demoucron algorithm", but I failed to find proper English description (found links to planar graph drawing articles)
Example of its work:

Related

How to efficiently find similar documents

I have lots of document that I have clustered using a clustering algorithm. In the clustering algorithm, each document may belong to more than one clusters. I've created a table storing the document-clusterassignment and another one which stores the cluster-document info. When I look for the list of similar documents to a given document (let's sat d_i). I first retrieve the list of clusters to which it belongs (from the document-cluster table) and then for each cluster c_j in the document-cluster I retrieve the lists of documents which belong to c_j from the cluster-document table. There are more than one c_j, so obviously there will be in multiple lists. Each list have many documents and apparently there might be overlaps among these lists.
In the next phase and in order to find the most similar documents to d_i, I rank the similar documents based on the number of clusters they have in common with d_i.
My question is about the last phase. A naive solution is to create a sorted kind of HashMap which has the document as the key and # common clusters as the value. However as each list might contains many many documents, this may not be the best solution. Is there any other way to rank the similar items? Any preprocessing or ..?
Assuming that the number of arrays is relatively small comparing to the number of elements (and in particular, the number of arrays is in o(logn)), you can do it by a modification of a bucket sort:
Let m be the number of arrays
create a list containing m buckets buckets[], where each bucket[i] is a hashset
for each array arr:
for each element x in arr:
find if x is in any bucket, if so - let that bucket id be i:
remove x from bucket i
i <- i + 1
If no such bucket exist, set i=1
add x to bucket i
for each bucket i=m,m-1,...,1 in descending order:
for each element x in bucket[i]:
yield x
The above runs in O(m^2*n):
Iterating over each array
Iterating over all elements in each array
Finding the relevant bucket.
Note that the last one can be done by adding a map:element->bucket_id, and be done in O(1) using hash tables, so we can improve it to O(m*n).
An alternative is to use a hashmap as a histogram that maps from element to its number of occurances, and then sort the array including all elements based on the histogram. The benefit of this approach: it can be distributed very nicely with map-reduce:
map(partial list of elements l):
for each element x:
emit(x,'1')
reduce(x, list<number>):
s = sum{list}
emit(x,s)
combine(x,list<number>):
s = sum{list} //or size{list} for a combiner
emit(x,s)

Best Way to Implement an Edge Weighted Graph in Java

First of all, I'm dealing with graphs more than 1000 edges and I'm traversing adjacency lists, as well as vertices more than 100 times per second. Therefore, I really need an efficient implementation that fits my goals.
My vertices are integers and my edges are undirected, weighted.
I've seen this code.
However, it models the adjacency lists using edge objects. Which means I have to spend O(|adj|) of time when I'd like to get the adjacents of a vertex, where |adj| is the cardinality of its adjacents.
On the other hand, I'm considering to model my adjacency lists using Map<Integer, Double>[] adj.
By using this method, I would just use adj[v], v being the vertex, and get the adjacents of the vertex to iterate over.
The other method requires something like:
public Set<Integer> adj(int v)
{
Set<Integer> adjacents = new HashSet<>();
for(Edge e: adj[v])
adjacents.add(e.other(v));
return adjacents;
}
My goals are:
I want to sort a subset of vertices by their connectivities (number of adjacents) any time I want.
Also, I need to sort the adjacents of a vertex, by the weights of the edges that connect itself and its neighbors.
I want to do these without using so much space that slows down the operations. Should I consider using an adjacency matrix?
I've used th JGrapht for a library for a variety of my own graph representations. They have a weighted graph implementation here: http://jgrapht.org/javadoc/org/jgrapht/graph/SimpleWeightedGraph.html
That seems to handle a lot of what you are looking for, and I've used it to represent graphs with up to around 2000 vertices, and it handles reasonably well for my needs, though I don't remember my access rate.

Better methods to compare corresponding values in multiple arrays in Java

I have an ArraysList containing M lists which are sorted. Each list in the Arraylist has the same size N. Now I want to compare the first (N-1) corresponding values in each list with others and I want to find those list with the same first(N-1) values. Intuitively, it can be done by two for-loops, but the complexity could be as high as M*N*N. I was wondering whether there are some better algorithms to do this. By the way, M could may be a very large number while N tends to be a smaller one.
Sorry, I might not be clear. I want the final output is pairs of list which have the same first (N-1) values.
Use a good hashing algorithm to calculate a hash code of the N-1 items in each row. Organize rows by their hash code, and do a full compare only when the hash codes match.
Sort the list of lists.
Sorting them is O(N M LOG M) (assuming that a comparison is O(N)).
If you do this in a radix sort approach, it should actually be more on the lines of O(N * M) or even O(M LOG M) total (assuming the lists are not identical).
Then lists with the same prefix must be subsequent in this list.
Assuming that you are trying to reimplement APRIORI: yes, do keep a sorted list of candidate itemsets. This is exactly what Apriori-Gen needs for building the next round candidates. Keeping them organized as a sorted tree is quite neat, as this is also fast when scanning the database for counting itemsets.

Find Element that exists in each list in List of Lists

I have a List of Lists in java (grails) and I am trying to find the elements that exist in each list within the list. Does anyone have a quick way to do this? Thank you!
If lists have unique elements you can do it like this (bu unique elements I understand that one element can be placed in few lists, but only once per list. Otherwise if first list contains [1,2,2,3] and other contain [x,2,y] as output you will see [2,2] not [2] )
List tmpList=new ArrayList<>(lists.get(0));
for(int i=1; i<lists.size(); i++)
tmpList.retainAll(new ArrayList<>(lists.get(i)));
System.out.println(tmpList);
take 1 link list, copy it and then check that against all of the other linked lists.
if a list doesn't have an element, than remove that element from your newly made list.
you can implement your newly made list as a map or a hash table to reduce time complexity a bit.
Either way, unless your lists are sorted or something, your algorithm can't be faster than O(n) where n is the sum of all elements in all lists
the algorithm I outlined is O(nm) where m is the count of your smallest list.

Is This an Efficient Way to Sort Multiple Linked Lists?

You have n sorted linked lists, each of size n. The linked lists references
are stored in an array. What is an efficient algorithm to merge the n linked
lists into a single sorted linked list?
Since they are all sorted:
Incorporate a loop
Check the first node of all the sorted linked lists and sort them by comparing to each other.
Proceed to next node and repeat until null is hit.
Is this the most efficient way of doing this?
Just link them all together (or dump them into a single list) and use a general sort. That will give you nlog(n) performance. Your way is n^2.

Categories

Resources