Maximum occurrence of any event in time range

Maximum occurrence of any event in time range - java

I have collection time stamps, e.g 10:18:07.490,11:50:18.251 where first is the start time and second is end time for an event. I need to find a range where maximum events are happening just 24 hours of time. These events are happening in precision of milliseconds.
What I am doing is to divide 24 hours on millisecond scale, and attach events at every millisecond, and then finding a range where maximum events are happening.
LocalTime start = LocalTime.parse("00:00");
LocalTime end = LocalTime.parse("23:59");
for (LocalTime x = start; x.isBefore(end); x = x.plus(Duration.ofMillis(1))) {
for (int i = 0; i < startTime.size(); i++) {
if (startTime.get(i).isAfter(x) && endTime.get(i).isBefore(x))
// add them to list;
}
}
Certainly this is not a good approach, it takes too much memory. How I can do it in a proper way? Any suggestion?

A solution finding the first period of maximum concurrent events:
If you're willing to use a third party library, this can be implemented "relatively easy" in a SQL style with jOOλ's window functions. The idea is the same as explained in amit's answer:
System.out.println(
Seq.of(tuple(LocalTime.parse("10:18:07.490"), LocalTime.parse("11:50:18.251")),
tuple(LocalTime.parse("09:37:03.100"), LocalTime.parse("16:57:13.938")),
tuple(LocalTime.parse("08:15:11.201"), LocalTime.parse("10:33:17.019")),
tuple(LocalTime.parse("10:37:03.100"), LocalTime.parse("11:00:15.123")),
tuple(LocalTime.parse("11:20:55.037"), LocalTime.parse("14:37:25.188")),
tuple(LocalTime.parse("12:15:00.000"), LocalTime.parse("14:13:11.456")))
.flatMap(t -> Seq.of(tuple(t.v1, 1), tuple(t.v2, -1)))
.sorted(Comparator.comparing(t -> t.v1))
.window(Long.MIN_VALUE, 0)
.map(w -> tuple(
w.value().v1,
w.lead().map(t -> t.v1).orElse(null),
w.sum(t -> t.v2).orElse(0)))
.maxBy(t -> t.v3)
);
The above prints:
Optional[(10:18:07.490, 10:33:17.019, 3)]
So, during the period between 10:18... and 10:33..., there had been 3 events, which is the most number of events that overlap at any time during the day.
Finding all periods of maximum concurrent events:
Note that there are several periods when there are 3 concurrent events in the sample data. maxBy() returns only the first such period. In order to return all such periods, use maxAllBy() instead (added to jOOλ 0.9.11):
.maxAllBy(t -> t.v3)
.toList()
Yielding then:
[(10:18:07.490, 10:33:17.019, 3),
(10:37:03.100, 11:00:15.123, 3),
(11:20:55.037, 11:50:18.251, 3),
(12:15 , 14:13:11.456, 3)]
Or, a graphical representation
3 /-----\ /-----\ /-----\ /-----\
2 /-----/ \-----/ \-----/ \-----/ \-----\
1 -----/ \-----\
0 \--
08:15 09:37 10:18 10:33 10:37 11:00 11:20 11:50 12:15 14:13 14:37 16:57
Explanations:
Here's the original solution again with comments:
// This is your input data
Seq.of(tuple(LocalTime.parse("10:18:07.490"), LocalTime.parse("11:50:18.251")),
tuple(LocalTime.parse("09:37:03.100"), LocalTime.parse("16:57:13.938")),
tuple(LocalTime.parse("08:15:11.201"), LocalTime.parse("10:33:17.019")),
tuple(LocalTime.parse("10:37:03.100"), LocalTime.parse("11:00:15.123")),
tuple(LocalTime.parse("11:20:55.037"), LocalTime.parse("14:37:25.188")),
tuple(LocalTime.parse("12:15:00.000"), LocalTime.parse("14:13:11.456")))
// Flatten "start" and "end" times into a single sequence, with start times being
// accompanied by a "+1" event, and end times by a "-1" event, which can then be summed
.flatMap(t -> Seq.of(tuple(t.v1, 1), tuple(t.v2, -1)))
// Sort the "start" and "end" times according to the time
.sorted(Comparator.comparing(t -> t.v1))
// Create a "window" between the first time and the current time in the sequence
.window(Long.MIN_VALUE, 0)
// Map each time value to a tuple containing
// (1) the time value itself
// (2) the subsequent time value (lead)
// (3) the "running total" of the +1 / -1 values
.map(w -> tuple(
w.value().v1,
w.lead().map(t -> t.v1).orElse(null),
w.sum(t -> t.v2).orElse(0)))
// Now, find the tuple that has the maximum "running total" value
.maxBy(t -> t.v3)
I have written up more about window functions and how to implement them in Java in this blog post.
(disclaimer: I work for the company behind jOOλ)

It can be done significantly better in terms of memory (well, assuming O(n) is considered good for you, and you don't regard 24*60*60*1000 as tolerable constant):
Create a list of items [time, type] (where time is the time, and type is
either start or end).
Sort the list by time.
Iterate the list, and when you see a "start", increment a counter, and when you see a "end", decrememnt it.
By storing a "so far seen maximum", you can easily identify the single point where maximal number of events occuring on it.
If you want to get the interval containing this point, you can simply find the time where "first maximum" occures, until when it ends (which is the next [time, type] pair, or if you allow start,end to be together and not counted, just linear scan from this point until the counter decreases and time moved, this can be done only once, and does not change total complexity of the algorithm).
This is really easy to modify this approach to get the interval from the point

Related

Optaplanner: Convert DRL to ConstraintStreams

I have an employee rostering application. Some of the rules are quite similar as in the Nurse Rostering example.
In the last two months I had to convert around 30 rules written in DRL to ConstraintSteams. After struggeling at the beginning I started to like it more and more. At the end I really liked it. Thanks a lot for this awesome work!
I want to mention and ask a few things:
ConsecutiveWorkingDays: Here I use the org.optaplanner.examples.common.experimental.ExperimentalConstraintCollectors to solve it. It works perfect. But comparing to the most other rules (I benchmarked the one by one as described in the documentation), it performes not that good (12,150/s), comparing to dayOffRequest (27,384/s) or fairDistributionGroup (21,864/s). But I imagine thats the complexity of the problem.
Consecutive Working Days 2: What is the best way to include org.optaplanner.examples.common.experimental.ExperimentalConstraintCollectors and these classes in the project? I copied them from the example to the Project.
SleepTime: The law in Switzerland has several rules for how much sleep time are legal. The description here, is minimal simplified:
Less then 8 hours sleep is allways illegal
Twice less the 11 hours sleep in one week is also allways illegal
Once less then eleven hours sleep is illegal, if the average sleep time in the week is less then 11 hours.
With the drl I did an InsertLogical of Freetime-Objects. A Freetime starts at the end of the last shift of a day and ends with the beginning of the next shift, where nexShift.dayIndex > firstShift.DayIndex. This calculation is quite intensive, because an employee can have more then one shift per day.
These Freetime-Objects I also used to calculate the rule TwoDaysOfInARowPerWeek.
With the ConstraintStreams the selection of the relevant shifts is done for every of the four rules. This decreased the calculation speed quite a lot. With the drl it was between 3000/s and 4000/s. With ConstraintStreams it decreased to 1500/s to 2000/s.
Now I managed to do all the sleep time rules in one rule, so the selection of the shifts has not to be done 4 times, but only 2 times. And now the speed is okay (2700/s to 3500/s). But still, there is no way to do something like InsertLogical? Or what alternatives are there?
Here the code, how I select this shifts:
private BiConstraintStream<Shift, Shift> getEmployeeFreetimeRelevantShifts(ConstraintFactory constraintFactory) {
// Shift (1): Employee works on the Shift
return constraintFactory.forEach(Shift.class)
.filter(shift -> shift.isNotEmptyShift() && shift.getHelperShiftForFreetime() != "last")
// No later Shift on the same day as Shift (1)
.ifNotExistsOther(
Shift.class,
Joiners.equal(Shift::getEmployee),
Joiners.equal(Shift::getDayIndex),
Joiners.filtering((shift_1, shift_2)-> shift_2.isNotHelperShiftForFreetime()),
Joiners.filtering((shift_1, later_shift_on_same_day)-> shift_1.getEndDatetimeInMinutesOfDay() < later_shift_on_same_day.getEndDatetimeInMinutesOfDay())
)
// Shift (2) a shift on a later day than shift 1
.join(
Shift.class,
Joiners.equal(Shift::getEmployee),
Joiners.filtering((shift_1, shift_2)-> shift_2.getHelperShiftForFreetime() != "first"),
Joiners.filtering((shift_1, shift_2)-> shift_1.getDayIndex() < shift_2.getDayIndex())
)
// There is no shift after Shift 2 on a earlier day between 1 and 2
.ifNotExists(
Shift.class,
Joiners.equal((shift_1, shift_2)-> shift_1.getEmployee().getId(), not_existing_shift -> not_existing_shift.getEmployee().getId()),
Joiners.filtering((shift_1, shift_2, shift_between_1_and_2 )-> {
return shift_between_1_and_2.isNotHelperShiftForFreetime();
}),
Joiners.filtering((shift_1, shift_2, shift_between_1_and_2 )-> {
return shift_between_1_and_2.getDayIndex() > shift_1.getDayIndex() && shift_between_1_and_2.getDayIndex() < shift_2.getDayIndex();
})
)
// and there is no earlier shift on the same day before Shift 2
.ifNotExists(
Shift.class,
Joiners.equal((shift_1, shift_2)-> shift_1.getEmployee().getId(), not_existing_shift -> not_existing_shift.getEmployee().getId()),
Joiners.equal((shift_1, shift_2)-> shift_2.getDayIndex(), not_existing_shift -> not_existing_shift.getDayIndex()),
Joiners.filtering((shift_1, shift_2, not_existing_shift_before_shift_2 )-> {
return not_existing_shift_before_shift_2.isNotHelperShiftForFreetime();
}),
Joiners.filtering((shift_1, shift_2, not_existing_shift_before_shift_2 )-> {
return shift_2.getStartDatetimeInMinutesOfDay() > not_existing_shift_before_shift_2.getStartDatetimeInMinutesOfDay();
})
);
}
Execution of Rules with 0 score: In my case a user can select which rules he wants to be executed, if it should be a hard or a soft constraint and he can change the penalty value. I solve this with a #ConstraintConfiguration. But as far as I can see, also the rules with a penalty value of 0 are executed (but not penaltized). So if I disable all rules except of one rule, the speed is not higher then when I select all rules. is that correct? And is there a possibility to do that in a different way?
Again, thanks a lot for this awesome project!

First of all, thank you for your kind words, we appreciate that. If I may make one suggestion for your next question - ask your questions separately, as this "aggregate question" will make the answer needlessly hard to read and search.
Wrt. the experimental constraint collector - indeed, the performance of it is not ideal. It does a lot of things to give you a nice and useful API, at the expense of runtime performance. Wrt. using it in your own project - until we decide to make it a public API, copying it is how the collector is intended to be used.
Wrt. insertLogical - you are right that there is no such thing in Constraint Streams, and likely never will be. It may be a natural concept to people coming from Drools, and pretty much no one else. :-) The use case you describe (counting hours of sleep) may possibly be accomplished with shadow variables; the line between what should be done in shadow variables and in constraints is somewhat blurry.
Finally, when it comes to disabled constraints - you are right. We have a JIRA filed to eventually address that shortcoming.

How to penalize gaps between days in OptaPlanner constraint stream?

I have a model where each Course has a list of available TimeSlots from which one TimeSlot gets selected by OptaPlanner. Each TimeSlot has a dayOfWeek property. The weeks are numbered from 1 starting with Monday.
Let's say the TimeSlots are allocated such that they occupy days 1, 3, and 5. This should be penalized by 2 since there's one free day between Monday and Wednesday, and one free day between Wednesday and Friday. By using groupBy(course -> course.getSelectedTimeslot().getDayOfWeek().getValue()), we can get a list of occupied days.
One idea is to use a collector like sum(), for example, and write something like sum((day1, day2) -> day2 - day1 - 1), but sum(), of course, works with only one argument. But generally, maybe this could be done by using a custom constraint collector, however, I do not know whether these collectors can perform such a specific action.
Another idea is that instead of summing up the differences directly, we could simply map each consecutive pair of days (assuming they're ordered) to the difference with the upcoming one. Penalization with the weight of value would then perform the summing for us. For example, 1, 4, 5 would map onto 2, 0, and we could then penalize for each item with the weight of its value.
If I had the weeks in an array, the code would look like this:
public static int penalize(int[] weeks) {
Arrays.sort(weeks);
int sumOfDifferences = 0;
for (int i = 1; i < weeks.length; i++) {
sumOfDifferences += weeks[i] - weeks[i - 1] - 1;
}
return sumOfDifferences;
}
How can we perform penalization of gaps between days using constraint collectors?

An approach using a constraint collector is certainly possible, see ExperimentalCollectors in optaplanner-examples module, and its use in the Nurse Rostering example.
However, for this case, I think that would be an overkill. Instead, think about "two days with a gap inbetween" as "two days at least 1 day apart, with no day inbetween". Once you reformulate your problem like that, ifNotExists(...) is your friend.
forEachUniquePair(Timeslot.class,
Joiner.greaterThan(slot -> slot.dayOfWeek + 1))
.ifNotExists(Timeslot.class,
Joiners.lessThan((slot1, slot2) -> slot1.dayOfWeek, TimeSlot::dayOfWeek),
Joiners.greaterThan((slot1, slot2) -> slot2.dayOfWeek, TimeSlot::dayOfWeek))
...
Obviously this is just pseudo-code, you will have to adapt it to your particular situation, but it should give you an idea for how to approach the problem.

Anylogic moving average of processing times

in my model I have 9 different service blocks and each service can produce 9 different features. Each combination has a different delay time and standard deviation. For example feature 3 need 5 minutes in service block 8 with a deviation of 0.05, but only needs 3 minutes with a deviation of 0.1 in service block 4.
How can I permanently track the last 5 needed times of each combination and calculate the average (like a moving average)? I want to use the average to let the products decide which service block to choose for the respective feature according to the shortes time comparing the past times of all of the machines for the respective feature. The product agents already have a parameter for the time entering the service and one calculating the processing time by subtracting the entering time from the time leaving the service block.
Thank you for your support!

I am not sure if I understand what you are asking, but this may be an answer:
to track the last 5 needed times you can use a dataset from the analysis palette, limiting the number of samples to 5...
you will update the dataset using dataset.add(yourTimeVariable); so you can leave the vertical axis value of the dataset empty.
I assume you would need 1 dataset per feature
Then you can calculate your moving average doing:
dataset.getYMean();
If you need 81 datasets, then you can create a collection as an ArrayList with element type DataSet
And on Main properties, in On Startup you can add the following code and it will have the same effect.
for(int i=0;i<81;i++){
collection.add(new DataSet( 5, new DataUpdater_xjal() {
double _lastUpdateX = Double.NaN;
#Override
public void update( DataSet _d ) {
if ( time() == _lastUpdateX ) { return; }
_d.add( time(), 0 );
_lastUpdateX = time();
}
#Override
public double getDataXValue() {
return time();
}
} )
);
}
you will only need to remember what corresponds to what serviceblock and feature and then you can just do
collection.get(4).getYMean();
and to add a new value to the dataset:
collection.get(2).add(yourTimeVariable);

Java : Issue with capturing execution time per iteration in a Map

I've a requirement to capture the execution time of some code in iterations. I've decided to use a Map<Integer,Long> for capturing this data where Integer(key) is the iteration number and Long(value) is the time consumed by that iteration in milliseconds.
I've written the below java code to compute the time taken for each iteration. I want to ensure that the time taken by all iterations is zero before invoking actual code. Surprisingly, the below code behaves differently for every execution.
Sometimes, I get the desired output(zero millisecond for all iterations), but at times I do get positive and even negative values for some random iterations.
I've tried replacing System.currentTimeMillis(); with below code:
new java.util.Date().getTime();
System.nanoTime();
org.apache.commons.lang.time.StopWatch
but still no luck.
Any suggestions as why some iterations take additional time and how to eliminate it?
package com.stackoverflow.programmer;
import java.util.HashMap;
import java.util.Map;
public class TestTimeConsumption {
public static void main(String[] args) {
Integer totalIterations = 100000;
Integer nonZeroMilliSecondsCounter = 0;
Map<Integer, Long> timeTakenMap = new HashMap<>();
for (Integer iteration = 1; iteration <= totalIterations; iteration++) {
timeTakenMap.put(iteration, getTimeConsumed(iteration));
if (timeTakenMap.get(iteration) != 0) {
nonZeroMilliSecondsCounter++;
System.out.format("Iteration %6d has taken %d millisecond(s).\n", iteration,
timeTakenMap.get(iteration));
}
}
System.out.format("Total non zero entries : %d", nonZeroMilliSecondsCounter);
}
private static Long getTimeConsumed(Integer iteration) {
long startTime = System.currentTimeMillis();
// Execute code for which execution time needs to be captured
long endTime = System.currentTimeMillis();
return (endTime - startTime);
}
}
Here's the sample output from 5 different executions of the same code:
Execution #1 (NOT OK)
Iteration 42970 has taken 1 millisecond(s).
Total non zero entries : 1
Execution #2 (OK)
Total non zero entries : 0
Execution #3 (OK)
Total non zero entries : 0
Execution #4 (NOT OK)
Iteration 65769 has taken -1 millisecond(s).
Total non zero entries : 1
Execution #5 (NOT OK)
Iteration 424 has taken 1 millisecond(s).
Iteration 33053 has taken 1 millisecond(s).
Iteration 76755 has taken -1 millisecond(s).
Total non zero entries : 3
I am looking for a Java based solution that ensures that all
iterations consume zero milliseconds consistently. I prefer to
accomplish this using pure Java code without using a profiler.
Note: I was also able to accomplish this through C code.

Your HashMap performance may be dropping if it is resizing. The default capacity is 16 which you are exceeding. If you know the expected capacity up front, create the HashMap with the appropriate size taking into account the default load factor of 0.75
If you rerun iterations without defining a new map and the Integer key does not start again from zero, you will need to resize the map taking into account the total of all possible iterations.
int capacity = (int) ((100000/0.75)+1);
Map<Integer, Long> timeTakenMap = new HashMap<>(capacity);

As you are starting to learn here, writing microbenchmarks in Java is not as easy as one would first assume. Everybody gets bitten at some point, even the hardened performance experts who have been doing it for years.
A lot is going on within the JVM and the OS that skews the results, such as GC, hotspot on the fly optimisations, recompilations, clock corrections, thread contention/scheduling, memory contention and cache misses. To name just a few. And sadly these skews are not consistent, and they can very easily dominate a microbenchmark.
To answer your immediate question of why the timings can some times go negative, it is because currentTimeMillis is designed to capture wall clock time and not elapsed time. No wall clock is accurate on a computer and there are times when the clock will be adjusted.. very possibly backwards. More detail on Java's clocks can be read on the following Oracle Blog Inside the Oracle Hotspot VM clocks.
Further details and support of nanoTime verses currentTimeMillis can be read here.
Before continuing with your own benchmark, I strongly recommend that you read how do I write a currect micro benchmark in java. The quick synopses is to 1) warm up the JVM before taking results, 2) jump through hoops to avoid dead code elimination, 3) ensure that nothing else is running on the same machine but accept that there will be thread scheduling going on.. you may even want to pin threads to cores, depends on how far you want to take this, 4) use a framework specifically designed for microbenchmarking such as JMH or for quick light weight spikes JUnitMosaic gives good results.

I'm not sure if I understand your question.
You're trying to execute a certain set of statements S, and expect the execution time to be zero. You then test this premise by executing it a number of times and verifying the result.
That is a strange expectation to have: anything consumes some time, and possibly even more. Hence, although it would be possible to test successfully, that does not prove that no time has been used, since your program is save_time();execute(S);compare_time(). Even if execute(S) is nothing, your timing is discrete, and as such, it is possible that the 'tick' of your wallclock just happens to happen just between save_time and compare_time, leading to some time having been visibly past.
As such, I'd expect your C program to behave exactly the same. Have you run that multiple times? What happens when you increase the iterations to over millions? If it still does not occur, then apparently your C compiler has optimized the code in such a way that no time is measured, and apparently, Java doesn't.
Or am I understanding you wrong?

You hint it right... System.currentTimeMillis(); is the way to go in this case.
There is no warranty that increasing the value of the integer object i represent either a millisecond or a Cycle-Time in no system...
you should take the System.currentTimeMillis() and calculated the elapsed time
Example:
public static void main(String[] args) {
long lapsedTime = System.currentTimeMillis();
doFoo();
lapsedTime -= System.currentTimeMillis();
System.out.println("Time:" + -lapsedTime);
}

I am also not sure exactly, You're trying to execute a certain code, and try to get the execution for each iteration of execution.
I hope I understand correct, if that so than i would suggest please use
System.nanoTime() instead of System.currentTimeMillis(); because if your statement of block has very small enough you always get Zero in Millisecond.
Simple Ex could be:
public static void main(String[] args) {
long lapsedTime = System.nanoTime();
//do your stuff here.
lapsedTime -= System.nanoTime();
System.out.println("Time Taken" + -lapsedTime);
}
If System.nanoTime() and System.currentTimeMillis(); are nothing much difference. But its just how much accurate result you need and some time difference in millisecond you may get Zero in case if you your set of statement are not more in each iteration.

How to reduce an algorithm into smaller parts so I can scale it?

I have updated this question(found last question not clear, if you want to refer to it check out the reversion history). The current answers so far do not work because I failed to explain my question clearly(sorry, second attempt).
Goal:
Trying to take a set of numbers(pos or neg, thus needs bounds to limit growth of specific variable) and find their linear combinations that can be used to get to a specific sum. For example, to get to a sum of 10 using [2,4,5] we get:
5*2 + 0*4 + 0*5 = 10
3*2 + 1*4 + 0*5 = 10
1*2 + 2*4 + 0*5 = 10
0*2 + 0*4 + 2*5 = 10
How can I create an algo that is scalable for large number of variables and target_sums? I can write the code on my own if an algo is given, but if there's a library avail, I'm fine with any library but prefer to use java.

One idea would be to break out of the loop once you set T[z][i] to true, since you are only basically modifying T[z][i] here, and if it does become true, it won't ever be modified again.
for i = 1 to k
for z = 0 to sum:
for j = z-x_i to 0:
if(T[j][i-1]):
T[z][i]=true;
break;
EDIT2: Additionally, if I am getting it right, T[z][i] depends on the array T[z-x_i..0][i-1]. T[z+1][i] depends on T[z+1-x_i..0][i-1]. So once you know if T[z][i] is true, you only need to check one additional element (T[z+1-x_i][i-1]) to know if T[z+1][i-1] will be true.
Let's say you represent the fact whether T[z][i] was updated by a variable changed. Then, you can simply say that T[z][i] = changed && T[z-1][i]. So you should be done in two loops instead of three. This should make it much faster.
Now, to scale it - Now that T[z,i] depends only on T[z-1,i] and T[z-1-x_i,i-1], so to populate T[z,i], you do not need to wait until the whole (i-1)th column is populated. You can start working on T[z,i] as soon as the required values are populated. I can't implement it without knowing the details, but you can try this approach.

I take it this is something like unbounded knapsack? You can dispense with the loop over c entirely.
for i = 1 to k
for z = 0 to sum
T[z][i] = z >= x_i cand (T[z - x_i][i - 1] or T[z - x_i][i])

Based on the original example data you gave (linear combination of terms) and your answer to my question in the comments section (there are bounds), would a brute force approach not work?
c0x0 + c1x1 + c2x2 +...+ cnxn = SUM
I'm guessing I'm missing something important but here it is anyway:
Brute Force Divide and Conquer:
main controller generates coefficients for say, half of the terms (or however many may make sense)
it then sends each partial set of fixed coefficients to a work queue
a worker picks up a partial set of fixed coefficients and proceeds to brute force its own way through the remaining combinations
it doesn't use much memory at all as it works sequentially on each valid set of coefficients
could be optimized to ignore equivalent combinations and probably many other ways
Pseudocode for Multiprocessing
class Controller
work_queue = Queue
solution_queue = Queue
solution_sets = []
create x number of workers with access to work_queue and solution_queue
#say for 2000 terms:
for partial_set in coefficient_generator(start_term=0, end_term=999):
if worker_available(): #generate just in time
push partial set onto work_queue
while solution_queue:
add any solutions to solution_sets
#there is an efficient way to do this type of polling but I forget
class Worker
while true: #actually stops when a stop work token is received
get partial_set from the work queue
for remaining_set in coefficient_generator(start_term=1000, end_term=1999):
combine the two sets (partial_set.extend(remaining_set))
if is_solution(full_set):
push full_set onto the solution queue

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.