Find the top N most popular elements

Find the top N most popular elements - java

I have a List of TrackDay objects for a runner going around a track field on different days. Each pair of start/finish times signal a single lap run by the runner. We are guaranteed that there is a matching start/finish date (in the order in which they appear in the appropriate lists) :
TrackDay() {
List<DateTime> startTimes
List<DateTime> finishTimes
}
I would like to find the top N days (lets say 3) that runner ran the most. This translates to finding the N longest total start/finish times per TrackDay object. The naive way would be to do the following:
for (TrackDay td : listOftrackDays) {
// loop through each start/finish lists and find out the finish-start time for each pair.
// Add the delta times (finish-start) up for each pair of start/finish objects.
// Create a map to store the time for each TrackDay
// sort the map and get the first N entries
}
Is there a better, more clean/efficient way to do the above?

The problem you're trying to solve is well-known as Selection algorithm, in particular - Quick select. While sorting in general works good, for large collections it would be better to consider this approach, since it will give you linear time instead of N*log(N).

This solution should be linear time. I have assumed that startTimes and finishTimes support random access. I don't know what API your DateTime is part of, so have used java.time.LocalDateTime.
public List<TrackDay> findTop(List<TrackDay> trackDays, int limit) {
limit = Math.min(limit, trackDays.size());
List<Duration> durations = new ArrayList<>(Collections.nCopies(limit, Duration.ZERO));
List<TrackDay> result = new ArrayList<>(Collections.nCopies(limit, null));
int lastIndex = limit - 1;
for (TrackDay trackDay : trackDays) {
Duration duration = Duration.ZERO;
for (int i = 0, n = trackDay.startTimes.size(); i < n; i++) {
duration = duration.plus(Duration.between(trackDay.startTimes.get(i), trackDay.finishTimes.get(i)));
}
Integer destinationIndex = null;
for (int i = lastIndex; i >= 0; i--) {
if (durations.get(i).compareTo(duration) >= 0) {
break;
}
destinationIndex = i;
}
if (destinationIndex != null) {
durations.remove(lastIndex);
result.remove(lastIndex);
durations.add(destinationIndex, duration);
result.add(destinationIndex, trackDay);
}
}
return result;
}

Related

Hackerrank: Frequency Queries Question Getting Timeout Error, How to optimize the code further?

I am getting timeout error for my code which I wrote using hashmap functions in java 8.When I submitted my answer 5 test cases failed due to timeout error out of 14 test cases on hackerrank platform.
Below is the question
You are given queries. Each query is of the form two integers described below:
x : Insert x in your data structure.
y : Delete one occurence of y from your data structure, if present.
z : Check if any integer is present whose frequency is exactly z. If yes, print 1 else 0.
The queries are given in the form of a 2-D array of where queries[i][0] contains the operation, and queries[i][1] contains the data element.
How should I optimize this code further ?
static HashMap<Integer,Integer> buffer = new HashMap<Integer,Integer>();
// Complete the freqQuery function below.
static List<Integer> freqQuery(List<List<Integer>> queries) {
List<Integer> output = new ArrayList<>();
output = queries.stream().map(query -> {return performQuery(query);}).filter(v -> v!=-1).collect(Collectors.toList());
//get the output array iterate over each query and perform operation
return output;
}
private static Integer performQuery(List<Integer> query) {
if(query.get(0) == 1){
buffer.put(query.get(1), buffer.getOrDefault(query.get(1), 0) + 1);
}
else if(query.get(0) == 2){
if(buffer.containsKey(query.get(1)) && buffer.get(query.get(1))>0 ){
buffer.put(query.get(1), buffer.get(query.get(1)) - 1);
}
}
else{
if(buffer.containsValue(query.get(1))){
return 1;
}
else{
return 0;
}
}
return -1;
}
public static void main(String[] args) {
List<List<Integer>> queries = Arrays.asList(
Arrays.asList(1,5),
Arrays.asList(1,6),
Arrays.asList(3,2),
Arrays.asList(1,10),
Arrays.asList(1,10),
Arrays.asList(1,6),
Arrays.asList(2,5),
Arrays.asList(3,2)
);
long start = System.currentTimeMillis();
System.out.println(freqQuery(queries));
long end = System.currentTimeMillis();
//finding the time difference and converting it into seconds
float sec = (end - start) / 1000F;
System.out.println("FreqQuery function Took "+sec + " s");
}
}

The problem with your code is the z operation. Sepecifically, the method containsValue has linear time complexty, making the whole complexity of the algorithm in the order of O(n*n). Here is a hint: add another hashmap on top of the one that you have which counts the occurences of occurences by value of the other map. In that way you can query directly this second one by the value (because it will be the key in this case).

Compartmentalizing loops over a large iteration

The Goal of my question is to enhance the performance of my algorithm by splitting the range of my loop iterations over a large array list.
For example: I have an Array list with a size of about 10 billion entries of long values, the goal I am trying to achieve is to start the loop from 0 to 100 million entries, output the result for the 100 million entries of whatever calculations inside the loop; then begin and 100 million to 200 million doing the previous and outputting the result, then 300-400million,400-500million and so on and so forth.
after I get all the 100 billion/100 million results, then I can sum them up outside of the loop collecting the results from the loop outputs parallel.
I have tried to use a range that might be able to achieve something similar by trying to use a dynamic range shift method but I cant seem to have the logic fully implemented like I would like to.
public static void tt4() {
long essir2 = 0;
long essir3 = 0;
List cc = new ArrayList<>();
List<Long> range = new ArrayList<>();
// break point is a method that returns list values, it was converted to
// string because of some concatenations and would be converted back to long here
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
// the size of the List is huge about 1 trillion entries at the minimum
long hy = cc.size() - 1;
for (long k = 0; k < hy; k++) {
long t1 = (long) cc.get((int) k);
long t2 = (long) cc.get((int) (k + 1));
// My main question: I am trying to iterate the entire list in a dynamic way
// which would exclude repeated endpoints on each iteration.
range = LongStream.rangeClosed(t1 + 1, t2)
.boxed()
.collect(Collectors.toList());
for (long i : range) {
// Hard is another method call on the iteration
// complexcalc is a method as well
essir2 = complexcalc((int) i, (int) Hard(i));
essir3 += essir2;
}
}
System.out.println("\n" + essir3);
}
I don't have any errors, I am just looking for a way to enhance performance and time. I can do a million entries in under a second directly, but when I put the size I require it runs forever. The size I'm giving are abstracts to illustrate size magnitudes, I don't want opinions like a 100 billion is not much, if I can do a million under a second, I'm talking massively huge numbers I need to iterate over doing complex tasks and calls, I just need help with the logic I'm trying to achieve if I can.

One thing I would suggest right off the bat would be to store your Breakpoint return value inside a simple array rather then using a List. This should improve your execution time significantly:
List<Long> cc = new ArrayList<>();
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
Long[] ccArray = cc.toArray(new Long[0]);
I believe what you're looking for is to split your tasks across multiple threads. You can do this with ExecutorService "which simplifies the execution of tasks in asynchronous mode".
Note that I am not overly familiar with this whole concept but have experimented with it a bit recently and give you a quick draft of how you could implement this.
I welcome those more experienced with multi-threading to either correct this post or provide additional information in the comments to help improve this answer.
Runnable Task class
public class CompartmentalizationTask implements Runnable {
private final ArrayList<Long> cc;
private final long index;
public CompartmentalizationTask(ArrayList<Long> list, long index) {
this.cc = list;
this.index = index;
}
#Override
public void run() {
Main.compartmentalize(cc, index);
}
}
Main class
private static ExecutorService exeService = Executors.newCachedThreadPool();
private static List<Future> futureTasks = new ArrayList<>();
public static void tt4() throws ExecutionException, InterruptedException
{
long essir2 = 0;
long essir3 = 0;
ArrayList<Long> cc = new ArrayList<>();
List<Long> range = new ArrayList<>();
// break point is a method that returns list values, it was converted to
// string because of some concatenations and would be converted back to long here
for (String ari1 : Breakpoint()) {
cc.add(Long.valueOf(ari1));
}
// the size of the List is huge about 1 trillion entries at the minimum
long hy = cc.size() - 1;
for (long k = 0; k < hy; k++) {
futureTasks.add(Main.exeService.submit(new CompartmentalizationTask(cc, k)));
}
for (int i = 0; i < futureTasks.size(); i++) {
futureTasks.get(i).get();
}
exeService.shutdown();
}
public static void compartmentalize(ArrayList<Long> cc, long index)
{
long t1 = (long) cc.get((int) index);
long t2 = (long) cc.get((int) (index + 1));
// My main question: I am trying to iterate the entire list in a dynamic way
// which would exclude repeated endpoints on each iteration.
range = LongStream.rangeClosed(t1 + 1, t2)
.boxed()
.collect(Collectors.toList());
for (long i : range) {
// Hard is another method call on the iteration
// complexcalc is a method as well
essir2 = complexcalc((int) i, (int) Hard(i));
essir3 += essir2;
}
}

Iterator between two given hours

I was in a job interview and got this question: " Write a function that gets 2 strings s,t that represents 2 hours ( in format HH: MM: SS ). It's known that s is earlier than t.
The function needs to calculate how many hours between the two given hours contains at most 2 digits.
For example- s- 10:59:00, t- 11:00:59 -
Answer- 11:00:00, 11:00:01,11:00:10, 11:00:11.
I tried to do while loops and got really stuck. Unfortunately, I didn't pass the interview.
How can I go over all the hours (every second is a new time) between 2 given hours in java as explained above? Thanks a lot

Java 8 allows you to use LocalTime.
LocalTime time1 = LocalTime.parse(t1);
LocalTime time2 = LocalTime.parse(t2);
The logic would require you to count the amount of different digits in a LocalTime, something like
boolean isWinner(LocalTime current) {
String onlyDigits = DateTimeFormatter.ofPattern("HHmmss").format(current);
Set<Character> set = new HashSet<>();
for (int index = 0; index < onlyDigits.length(); index++) {
set.add(onlyDigits.charAt(index));
}
return set.size() <= 2;
}
You can loop between the times like this
int count = 0;
for (LocalTime current = time1; current.isBefore(time2); current = current.plusSeconds(1)) {
if (isWinner(current)) {
count++;
}
}
That's it.
The question is really more geared towards getting a feel of how you'd approach the problem, and if you know about LocalTime API etc.

Genetic Algorithm for Process Allocation

I have the following uni assignment that's been puzzling me. I have to implement a genetic algorithm that allocates processes into processors. More specifically the problem is the following:
"You have a program that is computed in parallel processor system. The program is made up of a N number of processes that need to be allocated on a n number of processors (where n is way smaller than N). The communication of processes during this whole process can be quite time consuming, so the best practice would be to assign processes that need intercommunication with one another to same processor.
In order to reduce the communication time between processes you could allocate of these processes to the same processor but this would negate the parallel processing idea that every processor needs to contribute to the whole process.
Consider the following: Let's say that Cij is the total amount of communication between process i and process j. Assume that every process needs the same amount of computing power so that the limitations of the processing process can be handled by assigning the same amount of processes to a processor. Use a genetic algorithm to assign N processes to n processors."
The above is roughly translated the description of the problem. Now I have the following question that puzzle me.
1) What would be the best viable solution in order to for the genetic algorithm to run. I have the theory behind them and I have deduced that you need a best possible solution in order to check each generation of the produced population.
2) How can I properly design the whole problem in order to be handled by a program.
I am planning to implement this in Java but any other recommendations for other programming languages would be welcome.

The Dude abides. Or El Duderino if you're not into the whole brevity thing.
What you're asking is really a two part question, but the Genetic Algorithm part can be easily illustrated in concept. I find that giving a basic start can be helpful, but this problem as a "whole" is too complicated to address here.
Genetic Algorithms (GA) can be used as an optimizer, as you note. In order to apply a GA to a process execution plan, you need to be able to score an execution plan, then clone and mutate the best plans. A GA works by running several plans, cloning the best, and then mutating some of them slightly to see if the offspring (cloned) plans are improved or worsened.
In this example, I created a array of 100 random Integers. Each Integer is a "process" to be run and the value of the Integer is the "cost" of running that individual process.
List<Integer> processes = new ArrayList<Integer>();
The processes are then added to an ExecutionPlan, which is a List<List<Integer>>. This List of List of Integers will be used to represent 4 processors doing 25 rounds of processing:
class ExecutionPlan implements Comparable<ExecutionPlan> {
List<List<Integer>> plan;
int cost;
The total cost of an execution plan will be computed by taking the highest process cost per round (the greatest Integer) and summing the costs of all the rounds. Thus, the goal of the optimizer is to distribute the initial 100 integers (processes) into 25 rounds of "processing" on 4 "processors" such that total cost is as low as possible.
// make 10 execution plans of 25 execution rounds running on 4 processors;
List<ExecutionPlan> executionPlans = createAndIntializePlans(processes);
// Loop on generationCount
for ( int generation = 0; generation < GENERATIONCOUNT; ++generation) {
computeCostOfPlans(executionPlans);
// sort plans by cost
Collections.sort(executionPlans);
// print execution plan costs
System.out.println(generation + " = " + executionPlans);
// clone 5 better plans over 5 worse plans
// i.e., kill off the least fit and reproduce the best fit.
cloneBetterPlansOverWorsePlans(executionPlans);
// mutate 5 cloned plans
mutateClones(executionPlans);
}
When the program is run, the cost is initially randomly determined, but with each generation it improves. If you run it for 1000 generations and plot the results, a typical run will look like this:
The purpose of the GA is to Optimize or attempt to determine the best possible solution. The reason it can be applied to you problem is that your ExecutionPlan can be scored, cloned and mutated. The path to success, therefore, is to separate the problems in your mind. First, figure out how you can make an execution plan that can be scored as to what the cost will be to actually run it on an assumed set of hardware. Add rountines to clone and mutate an ExecutionPlan. Once you have that plug it into this GA example. Good luck and stay cool dude.
public class Optimize {
private static int GENERATIONCOUNT = 1000;
private static int PROCESSCOUNT = 100;
private static int MUTATIONCOUNT = PROCESSCOUNT/10;
public static void main(String...strings) {
new Optimize().run();
}
// define an execution plan as 25 runs on 4 processors
class ExecutionPlan implements Comparable<ExecutionPlan> {
List<List<Integer>> plan;
int cost;
public ExecutionPlan(List<List<Integer>> plan) {
this.plan = plan;
}
#Override
public int compareTo(ExecutionPlan o) {
return cost-o.cost;
}
#Override
public String toString() {
return Integer.toString(cost);
}
}
private void run() {
// make 100 processes to be completed
List<Integer> processes = new ArrayList<Integer>();
// assign them a random cost between 1 and 100;
for ( int index = 0; index < PROCESSCOUNT; ++index) {
processes.add( new Integer((int)(Math.random() * 99.0)+1));
}
// make 10 execution plans of 25 execution rounds running on 4 processors;
List<ExecutionPlan> executionPlans = createAndIntializePlans(processes);
// Loop on generationCount
for ( int generation = 0; generation < GENERATIONCOUNT; ++generation) {
computeCostOfPlans(executionPlans);
// sort plans by cost
Collections.sort(executionPlans);
// print execution plan costs
System.out.println(generation + " = " + executionPlans);
// clone 5 better plans over 5 worse plans
cloneBetterPlansOverWorsePlans(executionPlans);
// mutate 5 cloned plans
mutateClones(executionPlans);
}
}
private void mutateClones(List<ExecutionPlan> executionPlans) {
for ( int index = 0; index < executionPlans.size()/2; ++index) {
ExecutionPlan execution = executionPlans.get(index);
// mutate 10 different location swaps, maybe the plan improves
for ( int mutationCount = 0; mutationCount < MUTATIONCOUNT ; ++mutationCount) {
int location1 = (int)(Math.random() * 100.0);
int location2 = (int)(Math.random() * 100.0);
// swap two locations
Integer processCostTemp = execution.plan.get(location1/4).get(location1%4);
execution.plan.get(location1/4).set(location1%4, execution.plan.get(location2/4).get(location2%4));
execution.plan.get(location2/4).set(location2%4, processCostTemp);
}
}
}
private void cloneBetterPlansOverWorsePlans(List<ExecutionPlan> executionPlans) {
for ( int index = 0; index < executionPlans.size()/2; ++index) {
ExecutionPlan execution = executionPlans.get(index);
List<List<Integer>> clonePlan = new ArrayList<List<Integer>>();
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
clonePlan.add( new ArrayList<Integer>(execution.plan.get(roundNumber)) );
}
executionPlans.set( index + executionPlans.size()/2, new ExecutionPlan(clonePlan) );
}
}
private void computeCostOfPlans(List<ExecutionPlan> executionPlans) {
for ( ExecutionPlan execution: executionPlans) {
execution.cost = 0;
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
// cost of a round is greatest "communication time".
List<Integer> round = execution.plan.get(roundNumber);
int roundCost = round.get(0)>round.get(1)?round.get(0):round.get(1);
roundCost = execution.cost>round.get(2)?roundCost:round.get(2);
roundCost = execution.cost>round.get(3)?roundCost:round.get(3);
// add all the round costs' to determine total plan cost
execution.cost += roundCost;
}
}
}
private List<ExecutionPlan> createAndIntializePlans(List<Integer> processes) {
List<ExecutionPlan> executionPlans = new ArrayList<ExecutionPlan>();
for ( int planNumber = 0; planNumber < 10; ++planNumber) {
// randomize the processes for this plan
Collections.shuffle(processes);
// and make the plan
List<List<Integer>> currentPlan = new ArrayList<List<Integer>>();
for ( int roundNumber = 0; roundNumber < 25; ++roundNumber) {
List<Integer> round = new ArrayList<Integer>();
round.add(processes.get(4*roundNumber+0));
round.add(processes.get(4*roundNumber+1));
round.add(processes.get(4*roundNumber+2));
round.add(processes.get(4*roundNumber+3));
currentPlan.add(round);
}
executionPlans.add(new ExecutionPlan(currentPlan));
}
return executionPlans;
}
}

Efficient way to implement 'events since x' in Java

I want to be-able to ask an object 'how many events have occurred in the last x seconds' where the x is an argument.
e.g. how many events have occurred in the last 120 seconds..
How I approached is linear based on the number of events occurring but was wanting to see what the most efficient way (space & time) to achieve this requirement?;
public class TimeSinceStat {
private List<DateTime> eventTimes = new ArrayList<>();
public void apply() {
eventTimes.add(DateTime.now());
}
public int eventsSince(int seconds) {
DateTime startTime = DateTime.now().minus(Seconds.seconds(seconds));
for (int i = 0; i < orderTimes.size(); i++) {
DateTime dateTime = eventTimes.get(i);
if (dateTime.compareTo(startTime) > 0)
return eventTimes.subList(i, eventTimes.size()).size();
}
return 0;
}
(PS - i'm using JodaTime for the date/time representation)
Edit:
The key of this algorithm to find all events that have happened in the last x seconds; the exact start time (e.g. now - 30 seconds) is may or maynot be in the collection

Store the DateTime in a TreeSet and then use tailSet to get the most recent events. This saves you from having to find the starting point by iteration (which is O(n)) and instead by searching (which is O (log n)).
TreeSet<DateTime> eventTimes;
public int eventsSince(int seconds) {
return eventTimes.tailSet(DateTime.now().minus(Seconds.seconds(seconds)), true).size();
}
Of course, you could also binary search on your sorted list, but this does the work for you.
Edit
If it's a concern that multiple events could occur at the same DateTime, you can take the exact same approach with a SortedMultiset from Guava:
TreeMultiset<DateTime> eventTimes;
public int eventsSince(int seconds) {
return eventTimes.tailMultiset(
DateTime.now().minus(Seconds.seconds(seconds)),
BoundType.CLOSED
).size();
}
Edit x2
Here's a much more efficient approach that leverages the fact that you only log events that happened after all other events. With each event, store the number of events up to that date:
SortedMap<DateTime, Integer> eventCounts = initEventMap();
public SortedMap<DateTime, Integer> initEventMap() {
TreeMap<DateTime, Integer> map = new TreeMap<>();
//prime the map to make subsequent operations much cleaner
map.put(DateTime.now().minus(Seconds.seconds(1)), 0);
return map;
}
private long totalCount() {
//you can handle the edge condition here
return eventCounts.getLastEntry().getValue();
}
public void logEvent() {
eventCounts.put(DateTime.now(), totalCount() + 1);
}
Then getting the count since a date is super efficient, just take the total and subtract the count of events that occurred before that date.
public int eventsSince(int seconds) {
DateTime startTime = DateTime.now().minus(Seconds.seconds(seconds));
return totalCount() - eventCounts.lowerEntry(startTime).getValue();
}
This eliminates the inefficient iteration. It's a constant time lookup and an O(log n) lookup.

If you were implementing a data structure from scratch, and the data are not in sorted order, you'd want to construct a balanced order statistic tree (also see code here). This is just a regular balanced tree with the size of the tree rooted at each node maintained in the node itself.
The size fields enable efficient calcualtion of the "rank" of any key in the tree. You can do the desired range query by making two O(log n) probes into the tree for the rank of the min and max range value, finally taking their difference.
The proposed tree and set tail operations are great except the tail views will need time to construct, even though all you need is their size. The asymptotic complexity is the same as the OST, but the OST avoids this overhead completely. The difference could be meaningful if performance is very criticial.
Of course I'd definitely use the standard library solution first and consider the OST only if the speed turned out to be inadequate.

Since DateTime already implements Comparable interface, I would recommend storing the data in a TreeMap instead, and you could use TreeMap#tailMap to get a subtree of the DateTime's that occurs in the desired time.
Based on your code:
public class TimeSinceStat {
//just in case two or more events start at the "same time"
private NavigableMap<DateTime, Integer> eventTimes = new TreeMap<>();
//if this class needs to be used in multiple threads, use ConcurrentSkipListMap instead of TreeMap
public void apply() {
DateTime dateTime = DateTime.now();
Integer times = eventTimes.contains(dateTime) != null ? 0 : (eventTimes.get(dateTime) + 1);
eventTimes.put(dateTime, times);
}
public int eventsSince(int seconds) {
DateTime startTime = DateTime.now().minus(Seconds.seconds(seconds));
NavigableMap<DateTime, Integer> eventsInRange = eventTimes.tailMap(startTime, true);
int counter = 0;
for (Integer time : eventsInRange.values()) {
counter += time;
}
return counter;
}
}

Assuming the list is sorted, you could do a binary search. Java Collections already provides Collections.binarySearch, and DateTime implements Comparable (according to the JodaTime JavaDoc). binarySearch will return the index of the value you want, if it exists in the list, otherwise it returns the index of the greatest value less than the one you want (with the sign flipped). So, all you need to do is (in your eventsSince method):
// find the time you want.
int index=Collections.binarySearch(eventTimes, startTime);
if(index < 0) index = -(index+1)-1; // make sure we get the right index if startTime isn't found
// check for dupes
while(index != eventTimes.size() - 1 && eventTimes.get(index).equals(eventTimes.get(index+1))){
index++;
}
// return the number of events after the index
return eventTimes.size() - index; // this works because indices start at 0
This should be a faster way to do what you want.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.