Number of visitors per hour - java

I have a server log formatted like this:
128.33.100.1 2011-03-03 15:25 test.html
I need to extract several things from it but I am mostly stuck on how to get the total number of visits per hour as well as the number of unique visitors per page. Any pointers would be appreciated.

Assuming you have only these lines in your log file. Here is how i think you should go about doing this. (This is assuming database is not involved)
Create a class that represents each line (a model), having IP, Date, time , file
You can add a method on this model which returns a java timestamp based on date time.
Then create an Hash Map which stores file name as keys and list of object of above class as values
Start reading a line at a time.
For each line
a. Use StringTokenizer to get IP, Date, time and file as tokens
b. Populate the object of above class
c. Append this object to the list matching the file name in the hash map. (create new if one doesnt exist)
Now you have all the data in a usable data structure.
To get number of unique visitors for each page:
1. Just retrieve the list corresponding to correct file name form hash map. This you can run a simple algorithm to count number of unique IP addresses. You can also use a Java Collections functionality to do this.
To get number of visits per hour for each page:
1. again retrieve the correct list as above, and fin the min and max time stamp.
2. find out the time in hours. then divide the total entries in the list with hours.
Hope that helps.

If you are splitting the line into an array, I would suggest taking the hour out of the 3rd element and do an a check for all preceding lines the same way from the first time you see the 15 until the first time you see the 16, with a counter store the number of hits in that hour.
Splitting a String can be done like this:
String[] temp;
String str = "firstElement secondElement thirdElement";
String delimiter = " ";
temp = str.split(delimiter); //temp be filled with three elements.
As far as the unique visitors per page go, you can grab the 1st element of the array you used for splitting and putting that value inside a HashMap with the that IP value as the key and the page they visited as a value. Then do a check on the HashMap with every IP that comes in and if its not in the it, insert it and by the end you will have a HashMap filled with unique elements/IPs.
Hope that gives you some help.

Convert the log entries in java.util.Calendar and then perform your maths on per unique IP addresses.
import java.text.SimpleDateFormat;
import java.util.Calendar;
import java.util.Date;
public class Visit
{
public static void main(String[] args) throws Exception
{
String []stats = "128.33.100.1 2011-03-03 15:25 test.html".split("\\s+");
System.out.println("IP Address: " + stats[0]);
SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd hh:mm");
Date date = formatter.parse(stats[1]+" "+stats[2]);
Calendar cal = Calendar.getInstance();
cal.setTime(date);
System.out.println("On Date: " + cal.get(Calendar.DATE)+ "/" + cal.get(Calendar.MONTH)+ "/" + cal.get(Calendar.YEAR));
System.out.println("At time: " + cal.get(Calendar.HOUR_OF_DAY)+ ":" + cal.get(Calendar.MINUTE));
System.out.println("Visited page: " + stats[3]);
/*
* You have the Calendar object now perform your maths
*/
}
}

While parsing the line from the log do:
Allocate a Dictionary (this should be done just once at the start of the program)
Extract the date time part
Convert it to DateTime object (.NET) or similar object for your programming language
Set the minutes and seconds of the date time object to 0
Put the date time in Dictionary object if it doesn't already exist.
Increment the value in the dictionary item where Date time is your current parsed date time
In the end this dictionary will have hourly hits

Related

JavaScript formatting time 24hours format to 12hours

I was having some trouble when trying to format time in 24 hours format to 12 hours format. Here are some of the example of my time in string format:
0:00, 9:00, 12:00, 15:00
I wonder how should I substr the first two character in JavaScript because some of them were one digit and some were two. The output time format should be in 12 hours format like:
12:00AM, 9:00AM, 12:00PM, 3:00PM
Any guides? Thanks in advance.
In comments you clarified that each string you process will have only a single time in it (i.e., you are not processing a single string with four comma-separated times in it). So essentially you have input as follows:
var input = "9:00";
The easiest way to extract the hour and minute is using the String .split() method. This splits up the string at a specified character - in your case you'd use ":" - and returns an array with the pieces:
var parts = input.split(":"),
hour = parts[0],
minute = parts[1];
The obvious answer would be to use regular expressions (but remember AWZ's rule: if you have a problem and decide it can be solved with RE's, then you now have two prolems).
However, save yourself a whole helluva lot of trouble and get moment.js

Convert timestamp to weekNumber mapReduce

I am looking to preprocess timestamps to obtain the corresponding weeknumbers using mapreduce as the dataset has hundreds of millions of instances that need to be processed. I have so far figured out that the first MR job needs to preprocess and sort each line according to timestamp as the key and the rest of the line as value.
The second job then appends the corresponding date to each timestamp object.
I however do not know how to perform the third task I need to accomplish which is to create a continuous timeline of weeknumbers .Meaning, if my minimum timestamp corresponds to the date 03/10/2000 I would like to tag this with a number 10 (indicating that this is the 10th week of the year 2000 let's assume it is if its not in this case.). Then let's say I have the next timestamp corresponding to 02/01/2011, if we assume 52 weeks in the year 2000 and that 02/01/2011 is the 5th week in 2011, I would like to tag this date as week 57 and not as week 5. I would like to know how to achieve this last step in mapreduce. Assuming I have the following input file:
sorted_timestamp1::date::vals....
sorted_timestamp2::date::vals...
...
...
...
sorted_timestampn::date::vals.....
Simple pseudocode with map and reduce in java would suffice for my case, actual code would be great also.
Thanks in advance for your help!
I think you can separate the two problems:
1) map reduce logic:
What do you really want to calculate with map reduce. Depending on this information you have to choose the key values.
Just a guess from my side: If you want to do some aggregations on a weekly level, the mapper should take each line of input (think of line number as a key) and write out the data with new key representing the week (I'll give you some remarks in point 2.
The reducer will then have all data sets with equal week key in access and you can do whatever you want to do / aggregate and write the results out.
2) Week calculations:
Using java.util.Calendar object you can easily calculate the week of a Timestamp/Date. To get a continous week value you can calculate the week offset to a minimum reference date. To keep things simple I propose to use the 1.1. of a senceful date. To calculate the difference of weeks you can for example use
Joda package static method Weeks.weeksBetween
If the concrete value of the "week" key is not of special interest you can also use a composite key like
year*100+week
which is much simpler to evaluate and therefore is faster. If you really need the special week timeline think about using the simple key first (just used for aggregations in map reduce) and do the more expensive week timeline evaluations later after the reducer has generated its result with much less data.
Good luck + regards
Martin

How to plot a graph if only event points are given?

I am trying to plot availability of node (machine). In order to save storage on data collected, instead of recording data on fixed interval, I record them based on events (ADDED, REMOVED). ADDED means "up", REMOVED means "down/unreachable"
Here's the sample data I have:
2012-11-25 11:11:11.1234 - node added.
2012-11-25 15:01:20.1234 - node removed.
2012-11-25 18:12:12.1234 - node added.
Let's say, I want to plot a graph from time range: 2012-11-24 to 2012-11-25 (x-axis), Up/Down (y-axis) , how do I plot the graph?
i think there are some examples (i cant remember which one) in the d3 tools (http://d3js.org)
If you look through the examples you can choose the type of visualisation you want to use.
I think the data set you have would match to what you are trying to do (you may need to write a small operator to convert to the up/down comment to an integer).
If you have all your data stored in an array, simply use JavaScript's built in Array.filter method, and use JavaScript's Date object to convert the timestamp into milliseconds (note that it'll round to the nearest thousandth of a second - only 3 decimal places).
var startTime = Date.parse("2012-11-24"),
endTime = Date.parse("2012-11-25") + 86400000; /* Add one day */
filteredData = data.filter(function(d) {
var time = Date.parse(d.time);
return (time >= startTime && time < endTime);
});
You'll may need to play around with the date ranges, I'm assuming you mean that you want data which is on between 2012-11-24 and 2012-11-25 inclusive.
If your data is in a database, another way would be just to simply query the database to only display data which exists within the time range (you could call some PHP script - using d3.text - which outputs JSON and accepts two GET parameters, startTime and endTime).
d3.text("getNodes.php?startTime=" + startTime + "&endTime=" + endTime, function(json) {
filteredData = JSON.parse(json);
});

How can i separate instances of a class on behalf of their time of creation using java?

Suppose i have 100 instances of a class. I want to make arrays of those instances on behalf of their time of creation. For example there are 40 object which are created in september-2011 and 60 objects which are created in october-2011. Every instance has its time of creation which has type long. How can i tell my java program to make array of all instances which are created in september and another array which contains all instances which are created in october. I created time of creation using this line of code:
Date currentDate = new Date();
long timeOfCreation = currentDate.getTime();
Thanks in advance.
Well, you iterate over the instances, check their creation time and put them into the according list (I'd not use an array here, you could later convert the lists to arrays if you need to).
Basically, it's just a matter of getting the month from the date (you can create a Date object from the timestamp) which should be doable using Calendar (or yet better: use Joda Time).
If the intervals are non-standard (e.g. from 15th to 15th) you might need a start and end value to compare against.
Edit:
If you store those timestamps in your database, you could just create a query to get all the instances between start and end date of each interval (... WHERE timeOfCreation BETWEEN <start> and <end>, note that <start> and <end> are just placeholders for your parameters). Then call that query for every interval you are interested in, e.g. start = September 1st 00:00:00,000 and end = September 30th 23:59:59,999.

Sorting a text file by date - Date looks like DD/MM/YYYY

I am trying to sort the dates from the earliest to the latest.
I was thinking about using the bufferedreader and do a try searching the first 2 characters of the string and then the 4th and 5th characters and finally the 7th and 8th characters, ignoring the slashes.
The following is an example of the text file I have:
04/24/2010 - 2000.0 (Deposit)
09/05/2010 - 20.0 (Fees)
02/30/2007 - 600.0 (Deposit)
06/15/2009 - 200.0 (Fees)
08/23/2010 - 300.0 (Deposit)
06/05/2006 - 500.0 (Fees)
How do I sort records in a text file using Java?
This clubbed with changing your dates to the desired format using SimpleDateFormat in getField(String line) should get you going.
Change your dates to the desired format using SimpleDateFormat, and sort on that.
How big is the file? I would just read in every line, create a date object for each of the lines, and then call Collections.sort(list<myobjectwithdate>)
Date provides a comparator, so you could very easily store everything in memory, sort it, and then write it back to file.
class LineAndDate implements Comparable{
private Date date;
private String line;
public int compareTo( Object other )
{
return date.compareTo( ((LineAndDate)other).date;
}
}
Store a List<LineAndDate> in memory, and then you should just be able to call Collections.sort(myList) and write that.

Categories

Resources