I'm looking for a library or helper class in Java that would allow me to perform date interval sum and subtractions.
For example, lets's say I have the following date intervals:
A = ["2015-01-01 00:00", "2015-01-20 00:00"]
B = ["2015-01-05 00:00", "2015-01-10 00:00"]
C = ["2015-01-11 00:00", "2015-01-14 00:00"]
D = ["2015-01-19 00:00", "2015-01-25 00:00"]
1 A 20
|----------------------------------|
|---------| |----------| |------------|
5 B 10 11 C 14 19 D 25
And let's say I'd like to calculate the following:
A - B - C + D = { ["2015-01-01 00:00", "2015-01-05 00:00"[,
]"2015-01-10 00:00", "2015-01-11 00:00"[,
]"2015-01-14 00:00", "2015-01-25 00:00"] }
1 5 10 11 14 25
|---| |---| |----------------|
I know I can build my own logic using pure Java, but I'd rather not reinvent the wheel...
I was looking into Joda-Time, but I couldn't figure out how to perform such operations using it.
Thanks a lot!
I found exactly what I needed: Ranges, from the guava-libraries.
Works like this:
Range<Date> a = Range.closed(
new GregorianCalendar(2015, 0, 1).getTime(),
new GregorianCalendar(2015, 0, 20).getTime());
Range<Date> b = Range.closed(
new GregorianCalendar(2015, 0, 5).getTime(),
new GregorianCalendar(2015, 0, 10).getTime());
Range<Date> c = Range.closed(
new GregorianCalendar(2015, 0, 11).getTime(),
new GregorianCalendar(2015, 0, 14).getTime());
Range<Date> d = Range.closed(
new GregorianCalendar(2015, 0, 19).getTime(),
new GregorianCalendar(2015, 0, 25).getTime());
RangeSet<Date> result = TreeRangeSet.create();
result.add(a);
result.remove(b);
result.remove(c);
result.add(d);
System.out.println(result);
The code above prints:
[
[Thu Jan 01 00:00:00 BRST 2015‥Mon Jan 05 00:00:00 BRST 2015),
(Sat Jan 10 00:00:00 BRST 2015‥Sun Jan 11 00:00:00 BRST 2015),
(Wed Jan 14 00:00:00 BRST 2015‥Sun Jan 25 00:00:00 BRST 2015]
]
I think it can be done basically using Joda-Time with some custom code. It is assumed that A is the Interval which all other intervals should relate to.
While this code should give the expected results (and should work for different values accordingly) I highly suggest testing it with very different data, especially for the three cases a) an interval not intersecting A at all, b) intersecting A at the beginning and c) an interval which itself intersects B or C or D.
So despite this, it might help for further tests.
Interval a = new Interval(Instant.parse("2015-01-01T00:00Z"), Instant.parse("2015-01-20T00:00Z"));
List<Interval> l = Arrays.asList(
/* b */ new Interval(Instant.parse("2015-01-05T00:00Z"), Instant.parse("2015-01-10T00:00Z")),
/* c */ new Interval(Instant.parse("2015-01-11T00:00Z"), Instant.parse("2015-01-14T00:00Z")),
/* d */ new Interval(Instant.parse("2015-01-19T00:00Z"), Instant.parse("2015-01-25T00:00Z"))
);
List<Interval> results = new ArrayList<Interval>();
for (Interval i : l) {
if (a.contains(i)) {
// if i is completely inside a, then calculate the first part and the remaining part
// whereas the first part will be added to the result
Interval firstPart = new Interval(a.getStart(), i.getStart());
results.add(firstPart);
// followed by i itself (skipped)
// part after i, inside a
Interval remainingPart = new Interval(i.getEnd(), a.getEnd());
a = remainingPart;
} else if (i.overlaps(a)) {
// if the intervals only overlap, then we take the earliest beginning and the latest ending as a result part
DateTime overlapMin = (a.getStart().isBefore(i.getStart())) ? a.getStart() : i.getStart();
DateTime overlapMax = (a.getEnd().isAfter(i.getEnd())) ? a.getEnd() : i.getEnd();
Interval overlapAndBothParts = new Interval(overlapMin, overlapMax);
results.add(overlapAndBothParts);
// if the checked interval i is at the beginning, then a will become the part after this "overlap"
if (i.getStartMillis() < a.getStartMillis()) {
Interval whatsLeft = new Interval(i.getEndMillis(), a.getEndMillis());
a = whatsLeft;
}
}
}
// print result
for (Interval i : results) {
System.out.println("result part: " + i);
}
Related
Environment: Java 1.8, VM Cloudera Quickstart.
I have data into Hadoop hdfs from a csv file. Each row represents a bus route.
id vendor start_datetime end_datetime trip_duration_in_sec
17534 A 1/1/2013 12:00 1/1/2013 12:14 840
68346 A 1/1/2013 12:13 1/1/2013 12:18 300
09967 B 1/1/2013 12:34 1/1/2013 12:39 300
09967 B 1/1/2013 12:44 1/1/2013 12:51 420
09967 A 1/1/2013 12:54 1/1/2013 12:56 120
.........
.........
So, i want for every day, to find the hour that each vendor (A and B) has the most bus routes. With java and spark.
A result could be:
1/1/2013 (Day 1) - Vendor A has 3 bus routes at 12:00-13:00 hour. (That time 12:00-13:00, vendor A had the most bus routes..)
1/1/2013 (Day 1) - Vendor B has 2 bus routes at 12:00-13:00 hour. (That time 12:00-13:00, vendor B had the most bus routes..)
....
Mu java code is:
import static org.apache.spark.sql.functions;
import static org.apache.spark.sql.Row;
Dataset<Row> ds;
ds.groupBy(functions.window(col("start_datetime"), "1 hour").count().show();
But i cant find in which hour are the max routes per day.
I'm not so familiar in Java so I tried to explain it in Scala.
The key to find out the hour of max routes per day per vendor, is to count by (vendor, day, hour), then aggregate by (vendor, day) to calculate the hour corresponding to maximum cnt of each group. The day and the hour of each record could be parsed by start_datetime.
val df = spark.createDataset(Seq(
("17534","A","1/1/2013 12:00","1/1/2013 12:14",840),
("68346","A","1/1/2013 12:13","1/1/2013 12:18",300),
("09967","B","1/1/2013 12:34","1/1/2013 12:39",300),
("09967","B","1/1/2013 12:44","1/1/2013 12:51",420),
("09967","A","1/1/2013 12:54","1/1/2013 12:56",120)
)).toDF("id","vendor","start_datetime","end_datetime","trip_duration_in_sec")
df.rdd.map(t => {
val vendor = t(1)
val day = t(2).toString.split(" ")(0)
val hour = t(2).toString.split(" ")(1).split(":")(0)
((vendor, day, hour), 1)
})
// count by key
.aggregateByKey(0)((x: Int, y: Int) =>x+y, (x: Int, y: Int) =>x+y)
.map(t => {
val ((vendor, day, hour), cnt) = t;
((vendor, day), (hour, cnt))
})
// solve the max cnt by key (vendor, day)
.foldByKey(("", 0))((z: (String, Int), i: (String, Int)) => if (i._2 > z._2) i else z)
.foreach(t => println(s"${t._1._2} - Vendor ${t._1._1} has ${t._2._2} bus routes from ${t._2._1}:00 hour."))
I want to get max value of date field for whole collection 4programmers.
In mongo shell I can write:
db.getCollection("4programmers").aggregate([
{
$group:
{
_id: null,
max : {$max: "$date"}
}
}
])
and it returns a document with the date ISODate("2017-10-20T17:12:37.000+02:00") but when I write in java:
Date d = collection.aggregate(
Arrays.asList(
Aggregates.group("$date", Accumulators.max("maxx", "$date"))
)
).first().getDate("maxx");
System.out.println(d);
as a result I get: Fri Oct 20 00:44:50 CEST 2017
May something is wrong with first()?
First argument of Aggregates.group should be null instead of "$date" (it's actually _id: null).
So code should look like:
Date d = collection.aggregate(
Arrays.asList(
Aggregates.group(null, Accumulators.max("maxx", "$date"))
)
).first().getDate("maxx");
or you can do the same without Aggregates class:
collection.aggregate(asList(new Document("$group", new Document("_id", null)
.append("max", new Document("$max", "$date")))))
.first().getDate("max");
I have an Interval object:
Interval firstInterval =
new Interval(new DateTime(2017,06,26,07,55,30),new DateTime(2017,06,26,22,55,30));
and:
DateTime nightToDay = new DateTime(2017, 06, 26, 8, 0, 0);
DateTime dayToNight = new DateTime(2017, 06, 26, 22, 0, 0);
I want to get a Interval[]:
[
[2017-06-26 07:55:30 ~ 2017-06-26 08:00:00],
[2017-06-26 08:00:00 ~ 2017-06-26 22:00:00],
[2017-06-26 22:00:00 ~ 2017-06-26 22:55:30]
]
Of course, these parameters aren't fixed, and it is just an example.
Assuming that your general case is:
have a first interval with start and end
receives a nightToDay and dayToNight dates
output must contain 3 intervals:
start to nightToDay
nightToDay to dayToNight
dayToNight to end
And for each case above, you also need to check if the start of the interval is before the end.
If that's what you need, just do:
List<Interval> list = new ArrayList<Interval>();
if (firstInterval.getStart().isBefore(nightToDay)) {
list.add(new Interval(firstInterval.getStart(), nightToDay));
}
if (nightToDay.isBefore(dayToNight)) {
list.add(new Interval(nightToDay, dayToNight));
}
if (dayToNight.isBefore(firstInterval.getEnd())) {
list.add(new Interval(dayToNight, firstInterval.getEnd()));
}
The list will contain all the Interval objects you need.
If you need an array, it's easy to convert the list:
Interval[] intervals = new Interval[list.size()];
intervals = list.toArray(intervals);
The intervals array will have all the intervals created.
I am encountering with a senerior like this:
My project has a servlet to catch a request from perl. The request is to download a file. The request is a multipartRequest.
#RequestMapping(value = "/*", method = RequestMethod.POST)
public void tdRequest(#RequestHeader("Authorization") String authenticate,
HttpServletResponse response,
HttpServletRequest request) throws Exception
{
if (ServletFileUpload.isMultipartContent(request))
{
ServletFileUpload sfu = new ServletFileUpload();
FileItemIterator items = sfu.getItemIterator(request);
while (items.hasNext())
{
FileItemStream item = items.next();
if (("action").equals(item.getFieldName()))
{
InputStream stream = item.openStream();
String value = Streams.asString(stream);
if (("upload").equals(value))
{
uploadRequest(items, response);
return;
}
else if (("download").equals(value))
{
downloadRequest(items, response);
return;
}
The problem is not here, it appears on the downloadRequest() function.
void downloadRequest(FileItemIterator items,
HttpServletResponse response) throws Exception
{
log.info("Start downloadRequest.......");
OutputStream os = response.getOutputStream();
File file = new File("D:\\clip.mp4");
FileInputStream fileIn = new FileInputStream(file);
//while ((datablock = dataOutputStreamServiceImpl.readBlock()) != null)
byte[] outputByte = new byte[ONE_MEGABYE];
while (fileIn.read(outputByte) != -1)
{
System.out.println("--------" + (i = i + 1) + "--------");
System.out.println(new Date());
//dataContent = datablock.getContent();
System.out.println("Start write " + new Date());
os.write(outputByte, 0,outputByte.length);
System.out.println("End write " + new Date());
//System.out.println("----------------------");
}
os.close();
}
}
I try to read and write blocks of 1MB from the file. However, it takes too long for downloading the whole file. ( my case is 20mins for file of 100MB)
I try to sysout and I saw a result like this:
The first few blocks can read, write data realy fast:
--------1--------
Mon Dec 07 16:24:20 ICT 2015
Start write Mon Dec 07 16:24:20 ICT 2015
End write Mon Dec 07 16:24:21 ICT 2015
--------2--------
Mon Dec 07 16:24:21 ICT 2015
Start write Mon Dec 07 16:24:21 ICT 2015
End write Mon Dec 07 16:24:21 ICT 2015
--------3--------
Mon Dec 07 16:24:21 ICT 2015
Start write Mon Dec 07 16:24:21 ICT 2015
End write Mon Dec 07 16:24:21 ICT 2015
But the next block is slower than the previous
--------72--------
Mon Dec 07 16:29:22 ICT 2015
Start write Mon Dec 07 16:29:22 ICT 2015
End write Mon Dec 07 16:29:29 ICT 2015
--------73--------
Mon Dec 07 16:29:29 ICT 2015
Start write Mon Dec 07 16:29:29 ICT 2015
End write Mon Dec 07 16:29:37 ICT 2015
--------124--------
Mon Dec 07 16:38:22 ICT 2015
Start write Mon Dec 07 16:38:22 ICT 2015
End write Mon Dec 07 16:38:35 ICT 2015
--------125--------
Mon Dec 07 16:38:35 ICT 2015
Start write Mon Dec 07 16:38:35 ICT 2015
End write Mon Dec 07 16:38:48 ICT 2015
The problem is in the os.write()
I realy cannot understand how the outputStream write, why it take such a long time like that? or I made some mistakes?
Sorry for my bad english. I realy need your support. Thank in advance!
This is the perl code from the client side
# ----- get connected to download the file
#
$Response = $ua->request(POST $remoteHost ,
Content_Type => 'form-data',
Authorization => $Authorization,
'Proxy-Authorization' => $Proxy_Authorization ,
Content => [ DOS => 1 ,
action => 'download' ,
first_run => 0 ,
dl_filename => $dl_filename ,
delivery_dir => $delivery_dir ,
verbose => $Verbose ,
debug => $debug ,
version => $VERSION
]
);
unless ($Response->is_success) {
my $Msg = $Response->error_as_HTML;
# Remove HTML tags - we're in a DOS shell!
$Msg =~ s/<[^>]+>//g;
print "ERROR! SERVER RESPONSE:\n$Msg\n";
print "$remoteHost\n\n" if $Options{'v'};
Error "Could not connect to " . $remoteHost ;
}
my $Result2 = $Response->content();
Error "Abnormal termination...\n$Result2" if $Result2 =~ /_APP_ERROR_/;
open(F, ">$dl_filename") or Error "Could not open '$dl_filename'!";
binmode F; # unless $dl_filename =~ /\.txt$|\.htm$/;
print F $Result2;
close F;
print "received.\n";
}
One problem is that fileIn.read(outputByte) can read random number of bytes, not only full outputByte. You read few KB, then you store full 1MB, and very fast you are running out of space on disk. Try this, notice the "readed" parameter.
void downloadRequest(FileItemIterator items,
HttpServletResponse response) throws Exception
{
log.info("Start downloadRequest.......");
OutputStream os = response.getOutputStream();
File file = new File("D:\\clip.mp4");
FileInputStream fileIn = new FileInputStream(file);
//while ((datablock = dataOutputStreamServiceImpl.readBlock()) != null)
byte[] outputByte = new byte[ONE_MEGABYE];
int readed =0;
while ((readed =fileIn.read(outputByte)) != -1)
{
System.out.println("--------" + (i = i + 1) + "--------");
System.out.println(new Date());
//dataContent = datablock.getContent();
System.out.println("Start write " + new Date());
os.write(outputByte, 0,readed );
System.out.println("End write " + new Date());
//System.out.println("----------------------");
}
os.close();
}
}
It looks like your download performance gets slower and slower, the further you are getting into the download. You start out at one or less seconds per block, by block 72 it is 7+ seconds per block and by block 128 it is 13 seconds per block.
There is nothing on the server side to explain this. Rather, it has the "smell" of the client side doing something wrong. My guess is that the client side is reading the data from the socket into an in-memory data structure, and that data structure (maybe just a String or StringBuffer or StringBuilder) is getting larger and larger. Either the time take to expand it is getting larger, or your memory footprint is growing and the GC is taking longer and longer. (Or both.)
If you showed us the client-side code .....
UPDATE
As I suspected, this line of code will be reading the entire content into the Perl equivalent of a string builder before turning it into a string.
my $Result2 = $Response->content();
Depending on how it is implemented under the hood, this will lead to repeated copying of the data as the builder runs out of buffer space and needs to be expanded. Depending on the buffer expansion strategy that Perl employs for this, it could give O(N^2) behavior, where N is the size of the file you are transferring. (The evidence is that you are not getting O(N) behavior ...)
If you want a faster downloads, you need to stream the data on the client side. Read the response content in chunks and write them to the output file. (I'm not a Perl expert, so I can't offer you code.) This will also reduce the memory footprint on the client side ... which could be important if your file sizes increase.
I have multiple text files that contains information about different programming languages popularity in different countries based off of google searches. I have one text file for each year from 2004 to 2015. I also have a text file that breaks this down into each week (called iot.txt) but this file does not include the country.
Example data from 2004.txt:
Region java c++ c# python JavaScript
Argentina 13 14 10 0 17
Australia 22 20 22 64 26
Austria 23 21 19 31 21
Belgium 20 14 17 34 25
Bolivia 25 0 0 0 0
etc
example from iot.txt:
Week java c++ c# python JavaScript
2004-01-04 - 2004-01-10 88 23 12 8 34
2004-01-11 - 2004-01-17 88 25 12 8 36
2004-01-18 - 2004-01-24 91 24 12 8 36
2004-01-25 - 2004-01-31 88 26 11 7 36
2004-02-01 - 2004-02-07 93 26 12 7 37
My problem is that i am trying to write code that will output the number of countries that have exhibited 0 interest in python.
This is my current code that I use to read the text files. But I'm not sure of the best way to tell the number of regions that have 0 interest in python across all the years 2004-2015. At first I thought the best way would be to create a list from all the text files not including iot.txt and then search that for any entries that have 0 interest in python but I have no idea how to do that.
Can anyone suggest a way to do this?
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.*;
public class Starter{
public static void main(String[] args) throws Exception {
BufferedReader fh =
new BufferedReader(new FileReader("iot.txt"));
//First line contains the language names
String s = fh.readLine();
List<String> langs =
new ArrayList<>(Arrays.asList(s.split("\t")));
langs.remove(0); //Throw away the first word - "week"
Map<String,HashMap<String,Integer>> iot = new TreeMap<>();
while ((s=fh.readLine())!=null)
{
String [] wrds = s.split("\t");
HashMap<String,Integer> interest = new HashMap<>();
for(int i=0;i<langs.size();i++)
interest.put(langs.get(i), Integer.parseInt(wrds[i+1]));
iot.put(wrds[0], interest);
}
fh.close();
HashMap<Integer,HashMap<String,HashMap<String,Integer>>>
regionsByYear = new HashMap<>();
for (int i=2004;i<2016;i++)
{
BufferedReader fh1 =
new BufferedReader(new FileReader(i+".txt"));
String s1 = fh1.readLine(); //Throw away the first line
HashMap<String,HashMap<String,Integer>> year = new HashMap<>();
while ((s1=fh1.readLine())!=null)
{
String [] wrds = s1.split("\t");
HashMap<String,Integer>langMap = new HashMap<>();
for(int j=1;j<wrds.length;j++){
langMap.put(langs.get(j-1), Integer.parseInt(wrds[j]));
}
year.put(wrds[0],langMap);
}
regionsByYear.put(i,year);
fh1.close();
}
}
}
Create a Map<String, Integer> using a HashMap and each time you find a new country while scanning the incoming data add it into the map country->0. Each time you find a usage of python increment the value.
At the end loop through the entrySet of the map and for each case where e.value() is zero output e.key().