Anylogic moving average of processing times - java

in my model I have 9 different service blocks and each service can produce 9 different features. Each combination has a different delay time and standard deviation. For example feature 3 need 5 minutes in service block 8 with a deviation of 0.05, but only needs 3 minutes with a deviation of 0.1 in service block 4.
How can I permanently track the last 5 needed times of each combination and calculate the average (like a moving average)? I want to use the average to let the products decide which service block to choose for the respective feature according to the shortes time comparing the past times of all of the machines for the respective feature. The product agents already have a parameter for the time entering the service and one calculating the processing time by subtracting the entering time from the time leaving the service block.
Thank you for your support!

I am not sure if I understand what you are asking, but this may be an answer:
to track the last 5 needed times you can use a dataset from the analysis palette, limiting the number of samples to 5...
you will update the dataset using dataset.add(yourTimeVariable); so you can leave the vertical axis value of the dataset empty.
I assume you would need 1 dataset per feature
Then you can calculate your moving average doing:
dataset.getYMean();
If you need 81 datasets, then you can create a collection as an ArrayList with element type DataSet
And on Main properties, in On Startup you can add the following code and it will have the same effect.
for(int i=0;i<81;i++){
collection.add(new DataSet( 5, new DataUpdater_xjal() {
double _lastUpdateX = Double.NaN;
#Override
public void update( DataSet _d ) {
if ( time() == _lastUpdateX ) { return; }
_d.add( time(), 0 );
_lastUpdateX = time();
}
#Override
public double getDataXValue() {
return time();
}
} )
);
}
you will only need to remember what corresponds to what serviceblock and feature and then you can just do
collection.get(4).getYMean();
and to add a new value to the dataset:
collection.get(2).add(yourTimeVariable);

Related

How to count speed based on steps

I have a step counter app in which I am running a service which counts steps taken and then sends a broadcast to the fragment which is then updated on the fragment. The step counting is working fine but I want to calculate speed based on the steps. Here is what I am trying right now.
The receiver to get step count:
receiver = new BroadcastReceiver() {
#Override
public void onReceive(Context context, Intent intent) {
int steps = intent.getIntExtra(StepCounterService.STEP_INCREMENT_KEY, 0);
if (firstStepTime.equals("0")) {
firstStepTime = intent.getStringExtra(StepCounterService.TIME_STAMP_KEY);
} else if (secondStepTime.equals("0")) {
secondStepTime = intent.getStringExtra(StepCounterService.TIME_STAMP_KEY);
} else {
firstStepTime = secondStepTime;
secondStepTime = intent.getStringExtra(StepCounterService.TIME_STAMP_KEY);
}
updateAllUI(steps);
}
};
So what I am doing is as soon as I start getting steps, I see if the variable firstStepTime is empty. If it is, I save the time in firstStepTime variable. The in the next step I see if secondStepTime is empty, and if it is, I save that time in secondStepTime variable.
Now for the next steps both these are updated.
public void updateAllUI(int numberOfSteps) {
if (!(firstStepTime.equals("0")) && !(secondStepTime.equals("0"))) {
try {
Calendar c = Calendar.getInstance();
SimpleDateFormat timeFormat = new SimpleDateFormat("HH:mm:ss.SSS");
timeDifference = timeFormat.parse(secondStepTime).getTime() - timeFormat.parse(firstStepTime).getTime();
speed = (float) ((0.0254 / (timeDifference * 0.001)) * 3.6);
} catch (Exception e) {
timeDifference = 0;
speed = 0;
}
textview.settext(speed +"Km/h);
}
}
So in this I just check if both are not empty, I take the values and calculate the difference in times. The problem here is sometimes it doesn't count speed properly. And a bigger problem is if the user stops, the speed remains constant and doesn't drop to zero.
Is there any better way to do the speed calculation?
As you're only counting steps and would like to calculate speed with steps taken only i am assuming that you would not like to use GPS for counting speed or you would like to provide backup speed counter if GPS is not there.
You can start a new thread by doing
int interval=1*1000; //specifying interval to check steps
Handler.postDelayed(mySpeedChecker(),1000);
You'll need to do this for every step taken
public void mySpeedChecker(){
// assuming steps taken stored in a global variable
// and there is one more variable which is updated by this function
// to store number of steps before as global variable to calculate number of steps in time interval.
// this should check if another step has been taken in the interval to check
// if stepsTaken>0 then speed is not zero and you'll know how much steps are taken
// if zero steps are taken then speed is zero
// also it will tell you number of steps taken in last second
// so you can use this to calculate and update speed
// increase the interval for high accuracy but lower frequency of updates
}
Also, it's a good idea to calculate user stride length when GPS is available so, whenever user is using GPS the step length can be calculated by finding out the distance covered/number of steps taken with the help of GPS, this will help your system calculate accurately the distance covered by user with the help of number of steps taken.
The usual way to calculate speed ideally would be using Location.getSpeed(), or some analysis on Accelerometer values. but I'm assuming you want to get it based on steps counted.
To solve one of the problems: " if the user stops, the speed remains constant and doesn't drop to zero."
You can use the Android Activity Recognition Api to see the users current activity. Use
ActivityRecognition.getMostProbableActivity()
If the DetectedActivity is type STILL, you can set your speed to be zero.
You can also get the confidence DetectedActivity.getConfidence() to be sure.
For the other problem when you said it doesn't count speed properly, could you elaborate more on that question?
Every information coming from the sensors has an amount of error and even loose of data.
You should store the information received in a queue or database including the time stamp.
Periodically, you will pick the data and analyze it, including filtering to get a more accurate information.
Hope it helps.

Track multiple moving averages with Apache Commons Math DescriptiveStatistics

I am using DescriptiveStatistics to track the moving average of some metrics. I have a thread that submits the metric value every minute, and I track the 10 minute moving average of the metric by using the setWindowSize(10) method on DescriptiveStatistics.
This works fine for tracking a single moving average but I actually need to track multiple moving averages, i.e. the 1 minute average, the 5 minute average, and the 10 minute average.
Currently I have the following options:
Have 3 different DescriptiveStatistics instances with 3 different windows. However, this means we store the raw metrics multiple times which is not ideal.
Have 1 instance of DescriptiveStatistics and do something like the following when querying for a moving average:
int minutes = <set from parameter>;
DescriptiveStatistics stats = <class variable>;
if (minutes == stats.getN()) return stats.getMean();
SummaryStatistics subsetStats = new SummaryStatistics();
for (int i = 0; i < minutes; i++) {
subsetStats.addValue(stats.getElement((int)stats.getN() - i - 1));
}
return subsetStats.getMean();
However, option 2 means that I have to re-compute a bunch of averages every time I query for a moving average whose window is smaller than the DescriptiveStats window size.
Is there a way to do this better? I want to store 1 copy of the metrics data and continually calculate N moving averages of it with different intervals. This might be getting into the land of Codahale Metrics or Netflix Servo, but I don't want to have to use a heavyweight library just for this.
You could use StatUtils utility class and manage the array when adding new values. One alternative is to use CircularFifoQueue of Apache Commons with a size of 10 and Apache Utils to simplify the conversion to array of primitive values.
You can find an example of StatUtils in the User Guide, the following would be something similar to your use case.
CircularFifoQueue<Double> queue = new CircularFifoQueue<>(10);
// Add your values
double[] values = ArrayUtils.toPrimitive(queue.toArray(new Double[0]))
mean1 = StatUtils.mean(values, 0, 1);
mean5 = StatUtils.mean(values, 0, 5);
mean10 = StatUtils.mean(values, 0, 10);

Java : Issue with capturing execution time per iteration in a Map

I've a requirement to capture the execution time of some code in iterations. I've decided to use a Map<Integer,Long> for capturing this data where Integer(key) is the iteration number and Long(value) is the time consumed by that iteration in milliseconds.
I've written the below java code to compute the time taken for each iteration. I want to ensure that the time taken by all iterations is zero before invoking actual code. Surprisingly, the below code behaves differently for every execution.
Sometimes, I get the desired output(zero millisecond for all iterations), but at times I do get positive and even negative values for some random iterations.
I've tried replacing System.currentTimeMillis(); with below code:
new java.util.Date().getTime();
System.nanoTime();
org.apache.commons.lang.time.StopWatch
but still no luck.
Any suggestions as why some iterations take additional time and how to eliminate it?
package com.stackoverflow.programmer;
import java.util.HashMap;
import java.util.Map;
public class TestTimeConsumption {
public static void main(String[] args) {
Integer totalIterations = 100000;
Integer nonZeroMilliSecondsCounter = 0;
Map<Integer, Long> timeTakenMap = new HashMap<>();
for (Integer iteration = 1; iteration <= totalIterations; iteration++) {
timeTakenMap.put(iteration, getTimeConsumed(iteration));
if (timeTakenMap.get(iteration) != 0) {
nonZeroMilliSecondsCounter++;
System.out.format("Iteration %6d has taken %d millisecond(s).\n", iteration,
timeTakenMap.get(iteration));
}
}
System.out.format("Total non zero entries : %d", nonZeroMilliSecondsCounter);
}
private static Long getTimeConsumed(Integer iteration) {
long startTime = System.currentTimeMillis();
// Execute code for which execution time needs to be captured
long endTime = System.currentTimeMillis();
return (endTime - startTime);
}
}
Here's the sample output from 5 different executions of the same code:
Execution #1 (NOT OK)
Iteration 42970 has taken 1 millisecond(s).
Total non zero entries : 1
Execution #2 (OK)
Total non zero entries : 0
Execution #3 (OK)
Total non zero entries : 0
Execution #4 (NOT OK)
Iteration 65769 has taken -1 millisecond(s).
Total non zero entries : 1
Execution #5 (NOT OK)
Iteration 424 has taken 1 millisecond(s).
Iteration 33053 has taken 1 millisecond(s).
Iteration 76755 has taken -1 millisecond(s).
Total non zero entries : 3
I am looking for a Java based solution that ensures that all
iterations consume zero milliseconds consistently. I prefer to
accomplish this using pure Java code without using a profiler.
Note: I was also able to accomplish this through C code.
Your HashMap performance may be dropping if it is resizing. The default capacity is 16 which you are exceeding. If you know the expected capacity up front, create the HashMap with the appropriate size taking into account the default load factor of 0.75
If you rerun iterations without defining a new map and the Integer key does not start again from zero, you will need to resize the map taking into account the total of all possible iterations.
int capacity = (int) ((100000/0.75)+1);
Map<Integer, Long> timeTakenMap = new HashMap<>(capacity);
As you are starting to learn here, writing microbenchmarks in Java is not as easy as one would first assume. Everybody gets bitten at some point, even the hardened performance experts who have been doing it for years.
A lot is going on within the JVM and the OS that skews the results, such as GC, hotspot on the fly optimisations, recompilations, clock corrections, thread contention/scheduling, memory contention and cache misses. To name just a few. And sadly these skews are not consistent, and they can very easily dominate a microbenchmark.
To answer your immediate question of why the timings can some times go negative, it is because currentTimeMillis is designed to capture wall clock time and not elapsed time. No wall clock is accurate on a computer and there are times when the clock will be adjusted.. very possibly backwards. More detail on Java's clocks can be read on the following Oracle Blog Inside the Oracle Hotspot VM clocks.
Further details and support of nanoTime verses currentTimeMillis can be read here.
Before continuing with your own benchmark, I strongly recommend that you read how do I write a currect micro benchmark in java. The quick synopses is to 1) warm up the JVM before taking results, 2) jump through hoops to avoid dead code elimination, 3) ensure that nothing else is running on the same machine but accept that there will be thread scheduling going on.. you may even want to pin threads to cores, depends on how far you want to take this, 4) use a framework specifically designed for microbenchmarking such as JMH or for quick light weight spikes JUnitMosaic gives good results.
I'm not sure if I understand your question.
You're trying to execute a certain set of statements S, and expect the execution time to be zero. You then test this premise by executing it a number of times and verifying the result.
That is a strange expectation to have: anything consumes some time, and possibly even more. Hence, although it would be possible to test successfully, that does not prove that no time has been used, since your program is save_time();execute(S);compare_time(). Even if execute(S) is nothing, your timing is discrete, and as such, it is possible that the 'tick' of your wallclock just happens to happen just between save_time and compare_time, leading to some time having been visibly past.
As such, I'd expect your C program to behave exactly the same. Have you run that multiple times? What happens when you increase the iterations to over millions? If it still does not occur, then apparently your C compiler has optimized the code in such a way that no time is measured, and apparently, Java doesn't.
Or am I understanding you wrong?
You hint it right... System.currentTimeMillis(); is the way to go in this case.
There is no warranty that increasing the value of the integer object i represent either a millisecond or a Cycle-Time in no system...
you should take the System.currentTimeMillis() and calculated the elapsed time
Example:
public static void main(String[] args) {
long lapsedTime = System.currentTimeMillis();
doFoo();
lapsedTime -= System.currentTimeMillis();
System.out.println("Time:" + -lapsedTime);
}
I am also not sure exactly, You're trying to execute a certain code, and try to get the execution for each iteration of execution.
I hope I understand correct, if that so than i would suggest please use
System.nanoTime() instead of System.currentTimeMillis(); because if your statement of block has very small enough you always get Zero in Millisecond.
Simple Ex could be:
public static void main(String[] args) {
long lapsedTime = System.nanoTime();
//do your stuff here.
lapsedTime -= System.nanoTime();
System.out.println("Time Taken" + -lapsedTime);
}
If System.nanoTime() and System.currentTimeMillis(); are nothing much difference. But its just how much accurate result you need and some time difference in millisecond you may get Zero in case if you your set of statement are not more in each iteration.

Maximum occurrence of any event in time range

I have collection time stamps, e.g 10:18:07.490,11:50:18.251 where first is the start time and second is end time for an event. I need to find a range where maximum events are happening just 24 hours of time. These events are happening in precision of milliseconds.
What I am doing is to divide 24 hours on millisecond scale, and attach events at every millisecond, and then finding a range where maximum events are happening.
LocalTime start = LocalTime.parse("00:00");
LocalTime end = LocalTime.parse("23:59");
for (LocalTime x = start; x.isBefore(end); x = x.plus(Duration.ofMillis(1))) {
for (int i = 0; i < startTime.size(); i++) {
if (startTime.get(i).isAfter(x) && endTime.get(i).isBefore(x))
// add them to list;
}
}
Certainly this is not a good approach, it takes too much memory. How I can do it in a proper way? Any suggestion?
A solution finding the first period of maximum concurrent events:
If you're willing to use a third party library, this can be implemented "relatively easy" in a SQL style with jOOλ's window functions. The idea is the same as explained in amit's answer:
System.out.println(
Seq.of(tuple(LocalTime.parse("10:18:07.490"), LocalTime.parse("11:50:18.251")),
tuple(LocalTime.parse("09:37:03.100"), LocalTime.parse("16:57:13.938")),
tuple(LocalTime.parse("08:15:11.201"), LocalTime.parse("10:33:17.019")),
tuple(LocalTime.parse("10:37:03.100"), LocalTime.parse("11:00:15.123")),
tuple(LocalTime.parse("11:20:55.037"), LocalTime.parse("14:37:25.188")),
tuple(LocalTime.parse("12:15:00.000"), LocalTime.parse("14:13:11.456")))
.flatMap(t -> Seq.of(tuple(t.v1, 1), tuple(t.v2, -1)))
.sorted(Comparator.comparing(t -> t.v1))
.window(Long.MIN_VALUE, 0)
.map(w -> tuple(
w.value().v1,
w.lead().map(t -> t.v1).orElse(null),
w.sum(t -> t.v2).orElse(0)))
.maxBy(t -> t.v3)
);
The above prints:
Optional[(10:18:07.490, 10:33:17.019, 3)]
So, during the period between 10:18... and 10:33..., there had been 3 events, which is the most number of events that overlap at any time during the day.
Finding all periods of maximum concurrent events:
Note that there are several periods when there are 3 concurrent events in the sample data. maxBy() returns only the first such period. In order to return all such periods, use maxAllBy() instead (added to jOOλ 0.9.11):
.maxAllBy(t -> t.v3)
.toList()
Yielding then:
[(10:18:07.490, 10:33:17.019, 3),
(10:37:03.100, 11:00:15.123, 3),
(11:20:55.037, 11:50:18.251, 3),
(12:15 , 14:13:11.456, 3)]
Or, a graphical representation
3 /-----\ /-----\ /-----\ /-----\
2 /-----/ \-----/ \-----/ \-----/ \-----\
1 -----/ \-----\
0 \--
08:15 09:37 10:18 10:33 10:37 11:00 11:20 11:50 12:15 14:13 14:37 16:57
Explanations:
Here's the original solution again with comments:
// This is your input data
Seq.of(tuple(LocalTime.parse("10:18:07.490"), LocalTime.parse("11:50:18.251")),
tuple(LocalTime.parse("09:37:03.100"), LocalTime.parse("16:57:13.938")),
tuple(LocalTime.parse("08:15:11.201"), LocalTime.parse("10:33:17.019")),
tuple(LocalTime.parse("10:37:03.100"), LocalTime.parse("11:00:15.123")),
tuple(LocalTime.parse("11:20:55.037"), LocalTime.parse("14:37:25.188")),
tuple(LocalTime.parse("12:15:00.000"), LocalTime.parse("14:13:11.456")))
// Flatten "start" and "end" times into a single sequence, with start times being
// accompanied by a "+1" event, and end times by a "-1" event, which can then be summed
.flatMap(t -> Seq.of(tuple(t.v1, 1), tuple(t.v2, -1)))
// Sort the "start" and "end" times according to the time
.sorted(Comparator.comparing(t -> t.v1))
// Create a "window" between the first time and the current time in the sequence
.window(Long.MIN_VALUE, 0)
// Map each time value to a tuple containing
// (1) the time value itself
// (2) the subsequent time value (lead)
// (3) the "running total" of the +1 / -1 values
.map(w -> tuple(
w.value().v1,
w.lead().map(t -> t.v1).orElse(null),
w.sum(t -> t.v2).orElse(0)))
// Now, find the tuple that has the maximum "running total" value
.maxBy(t -> t.v3)
I have written up more about window functions and how to implement them in Java in this blog post.
(disclaimer: I work for the company behind jOOλ)
It can be done significantly better in terms of memory (well, assuming O(n) is considered good for you, and you don't regard 24*60*60*1000 as tolerable constant):
Create a list of items [time, type] (where time is the time, and type is
either start or end).
Sort the list by time.
Iterate the list, and when you see a "start", increment a counter, and when you see a "end", decrememnt it.
By storing a "so far seen maximum", you can easily identify the single point where maximal number of events occuring on it.
If you want to get the interval containing this point, you can simply find the time where "first maximum" occures, until when it ends (which is the next [time, type] pair, or if you allow start,end to be together and not counted, just linear scan from this point until the counter decreases and time moved, this can be done only once, and does not change total complexity of the algorithm).
This is really easy to modify this approach to get the interval from the point

What determines the number of reducers and how to avoid bottlenecks regarding reducers?

Suppose I have a big tsv file with this kind of information:
2012-09-22 00:00:01.0 249342258346881024 47268866 0 0 0 bo
2012-09-22 00:00:02.0 249342260934746115 1344951 0 0 4 ot
2012-09-22 00:00:02.0 249342261098336257 346095334 1 0 0 ot
2012-09-22 00:05:02.0 249342261500977152 254785340 0 1 0 ot
I want to implement a MapReduce job that enumerates time intervals of five minutes and filter some information of the tsv inputs. The output file would look like this:
0 47268866 bo
0 134495 ot
0 346095334 ot
1 254785340 ot
The key is the number of the interval, e.g., 0 is the reference of the interval between 2012-09-22 00:00:00.0 to 2012-09-22 00:04:59.
I don't know if this problem doesn't fit on MapReduce approach or if I'm not thinking it right. In the map function, I'm just passing the timestamp as key and the filtered information as value. In the reduce function, I count the intervals by using global variables and produce the output mentioned.
i. Does the framework determine the number of reducers in some automatically way or it is user defined? With one reducer, I think that there is no problem on my approach, but I'm wondering if one reduce can become a bottleneck when dealing with really large files, can it?
ii. How can I solve this problem with multiple reducers?
Any suggestions would be really appreciated!
Thanks in advance!
EDIT:
The first question is answered by #Olaf, but the second still gives me some doubts regarding parallelism. The map output of my map function is currently this (I'm just passing the timestamp with minute precision):
2012-09-22 00:00 47268866 bo
2012-09-22 00:00 344951 ot
2012-09-22 00:00 346095334 ot
2012-09-22 00:05 254785340 ot
So in the reduce function I receive inputs that the key represents the minute when the information was collected and the values the information itself and I want to enumerate five minutes intervals beginning with 0. I'm currently using a global variable to store the beginning of the interval and when a key extrapolate it I'm incrementing the interval counter (That is also a global variable).
Here is the code:
private long stepRange = TimeUnit.MINUTES.toMillis(5);
private long stepInitialMillis = 0;
private int stepCounter = 0;
#Override
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
long millis = Long.valueOf(key.toString());
if (stepInitialMillis == 0) {
stepInitialMillis = millis;
} else {
if (millis - stepInitialMillis > stepRange) {
stepCounter = stepCounter + 1;
stepInitialMillis = millis;
}
}
for (Text value : values) {
context.write(new Text(String.valueOf(stepCounter)),
new Text(key.toString() + "\t" + value));
}
}
So, with multiple reducers, I will have my reduce function running on two or more nodes, in two or more JVMs and I will lose the control given by the global variables and I'm not thinking of a workaround for my case.
The number of reducers depends on the configuration of the cluster, although you can limit the number of reducers used by your MapReduce job.
A single reducer would indeed become a bottleneck in your MapReduce job if you are dealing with any significant amount of data.
Hadoop MapReduce engine gurantees that all values associated with the same key are sent to the same reducer, so your approach should work with multile reducers. See Yahoo! tutorial for details: http://developer.yahoo.com/hadoop/tutorial/module4.html#listreducing
EDIT: To guarantee that all values for the same time interval go to the same reducer, you would have to use some unique identifier of the time interval as the key. You would have to do it in the mapper. I'm reading your question again and, unless you want to somehow aggregate the data between the records corresponding to the same time interval, you don't need any reducer at all.
EDIT: As #SeanOwen pointed, the number of reducers depends on the configuration of the cluster. Usually it is configured between 0.95 and 1.75 times the number of maximum tasks per node times the number of data nodes. If the mapred.reduce.tasks value is not set in the cluster configuration, the default number of reducers is 1.
It looks like you're wanting to aggregate some data by five-minute blocks. Map-reduce with Hadoop works great for this sort of thing! There should be no reason to use any "global variables". Here is how I would set it up:
The mapper reads one line of the TSV. It grabs the timestamp, and computes which five-minute bucket it belongs in. Make that into a string, and emit it as the key, something like "20120922:0000", "20120922:0005", "20120922:0010", etc. As for the value that is emitted along with that key, just keep it simple to start with, and send on the whole tab-delimited line as another Text object.
Now that the mapper has determined how the data needs to be organized, it's the reducer's job to do the aggregation. Each reducer will get a key (one of the five-minute buckers), along with the list of all the lines that fit into that bucket. It can iterate over that list, and extract whatever it wants from it, writing output to the context as needed.
As for mappers, just let hadoop figure that part out. Set the number of reducers to how many nodes you have in the cluster, as a starting point. Should run just fine.
Hope this helps.

Categories

Resources