I'm just wondering what procedure would be the most appropriate/most efficent way to remove spaces in a data file (.txt) and save the results as a list of objects?
Here is a snippet of the data:
2014-03-24 19:11:42.838 7611.668 UDP 192.168.0.15:5353 -> 224.0.0.251:5353 53 5353 12
2014-03-24 19:03:30.710 8061.709 UDP 192.168.0.12:137 -> 192.168.0.255:137 374 30432 9
2014-03-24 19:13:55.651 7246.821 UDP 192.168.0.21:1024 -> 255.255.255.255:1900 24 9640 8
Just looking to save them as a List of Flows
Just use:
String s = data.replaceAll("\\s", "");
Related
Query:
To filter the data below to find the last date of each month in the list. Note that in this context,
the last date of month in the data may or
may not match with the last date of the calendar month.
The expected output is shown in second list.
Research:
I believe TemporalAdjusters.lastDayOfMonth() will not help in this case as the last date in the list may or may not match with calendar month's last date.
I checked several questions on Stack Overflow and I Googled as well,
but I was unable to find something similar to my need.
I hope the issue is clear and points me in the direction on how this can be done with streams,
as I don't want to use a for loop.
Sample Data:
Date
Model
Start
End
27-11-1995
ABC
241
621
27-11-1995
XYZ
3456
7878
28-11-1995
ABC
242
624
28-11-1995
XYZ
3457
7879
29-11-1995
ABC
243
627
29-11-1995
XYZ
3458
7880
30-11-1995
ABC
244
630
30-11-1995
XYZ
3459
7881
01-12-1995
ABC
245
633
01-12-1995
XYZ
3460
7882
04-12-1995
ABC
246
636
04-12-1995
XYZ
3461
7883
27-12-1995
ABC
247
639
27-12-1995
XYZ
3462
7884
28-12-1995
ABC
248
642
28-12-1995
XYZ
3463
7885
29-12-1995
ABC
249
645
29-12-1995
XYZ
3464
7886
01-01-1996
ABC
250
648
01-01-1996
XYZ
3465
7887
02-01-1996
ABC
251
651
02-01-1996
XYZ
3466
7888
29-01-1996
ABC
252
654
29-01-1996
XYZ
3467
7889
30-01-1996
ABC
253
657
30-01-1996
XYZ
3468
7890
31-01-1996
ABC
254
660
31-01-1996
XYZ
3469
7891
Screenshot
Output required:
Date
Model
Start
End
30-11-1995
ABC
244
630
30-11-1995
XYZ
3459
7881
29-12-1995
ABC
249
645
29-12-1995
XYZ
3464
7886
31-01-1996
ABC
254
660
31-01-1996
XYZ
3469
7891
Screenshot
Well, a combination of groupingBy and maxBy will probably do.
I assume each record of the table to be of type Event:
record Event(LocalDate date, String model, int start, int end) { }
To get the last days of the month which are within the table, we could utilize groupingBy. In order to group this, we could first create a grouping type. Below, I created an EventGrouping record1, with a static method to convert an Event to an EventGrouping. Your desired output suggests that you want to group by each year-month-model combination, so we just picked those two properties:
public record EventGrouping(YearMonth yearMonth, String model) {
public static EventGrouping fromEvent(Event event) {
return new EventGrouping(YearMonth.from(event.date()), event.model());
}
}
Then, we could get our desired result like this:
events.stream()
.collect(Collectors.groupingBy(
EventGrouping::fromEvent,
Collectors.maxBy(Comparator.comparing(Event::date))
));
What happens here is that all stream elements are grouped by our EventGrouping, and then the "maximum value" of each of the event groups is picked. The maximum value is, of course, the most recent date of that certain month.
Note that maxBy returns an Optional, for the case when a group is empty. Also note that the resulting Map is unordered.
We could fix both of these issues by using collectingAndThen and a map factory respectively:
Map<EventGrouping, Event> map = events.stream()
.collect(groupingBy(
EventGrouping::fromEvent,
() -> new TreeMap<>(Comparator.comparing(EventGrouping::yearMonth)
.thenComparing(EventGrouping::model)),
collectingAndThen(maxBy(Comparator.comparing(Event::date)), Optional::get)
));
Note: groupingBy, collectingAndThen and maxBy are all static imports from java.util.stream.Collectors.
We added a Supplier of a TreeMap. A TreeMap is a Map implementation with a predictable order by a given comparator. This allows us to iterate over the resulting entries ordered by year–month–model.
collectingAndThen allows us to apply a function to the result of the given Collector. As already mentioned, maxBy returns an Optional, because maxBy is not applicable if there are no elements in the source stream. However, in our case, this can never happen. So we can safely map the Optional to its contained value.
1 Instead of writing a custom type, you could also use an existing class holding two arbitrary values, such as a Map.Entry, a Pair or even a List<Object>.
I would suggest to create a Map<YearMonth, List<LocalDate> and parse all your dates and fill your map. After that you sort all your lists and your last (or firstdepending on sorting order) value in each list will be your desired value
I posted earlier today about an error I was getting with using the predict function. I was able to get that corrected, and thought I was on the right path.
I have a number of observations (actuals) and I have a few data points that I want to extrapolate or predict. I used lm to create a model, then I tried to use predict with the actual value that will serve as the predictor input.
This code is all repeated from my previous post, but here it is:
df <- read.table(text = '
Quarter Coupon Total
1 "Dec 06" 25027.072 132450574
2 "Dec 07" 76386.820 194154767
3 "Dec 08" 79622.147 221571135
4 "Dec 09" 74114.416 205880072
5 "Dec 10" 70993.058 188666980
6 "Jun 06" 12048.162 139137919
7 "Jun 07" 46889.369 165276325
8 "Jun 08" 84732.537 207074374
9 "Jun 09" 83240.084 221945162
10 "Jun 10" 81970.143 236954249
11 "Mar 06" 3451.248 116811392
12 "Mar 07" 34201.197 155190418
13 "Mar 08" 73232.900 212492488
14 "Mar 09" 70644.948 203663201
15 "Mar 10" 72314.945 203427892
16 "Mar 11" 88708.663 214061240
17 "Sep 06" 15027.252 121285335
18 "Sep 07" 60228.793 195428991
19 "Sep 08" 85507.062 257651399
20 "Sep 09" 77763.365 215048147
21 "Sep 10" 62259.691 168862119', header=TRUE)
str(df)
'data.frame': 21 obs. of 3 variables:
$ Quarter : Factor w/ 24 levels "Dec 06","Dec 07",..: 1 2 3 4 5 7 8 9 10 11 ...
$ Coupon: num 25027 76387 79622 74114 70993 ...
$ Total: num 132450574 194154767 221571135 205880072 188666980 ...
Code:
model <- lm(df$Total ~ df$Coupon, data=df)
> model
Call:
lm(formula = df$Total ~ df$Coupon)
Coefficients:
(Intercept) df$Coupon
107286259 1349
Predict code (based on previous help):
(These are the predictor values I want to use to get the predicted value)
Quarter = c("Jun 11", "Sep 11", "Dec 11")
Total = c(79037022, 83100656, 104299800)
Coupon = data.frame(Quarter, Total)
Coupon$estimate <- predict(model, newdate = Coupon$Total)
Now, when I run that, I get this error message:
Error in `$<-.data.frame`(`*tmp*`, "estimate", value = c(60980.3823396919, :
replacement has 21 rows, data has 3
My original data frame that I used to build the model had 21 observations in it. I am now trying to predict 3 values based on the model.
I either don't truly understand this function, or have an error in my code.
Help would be appreciated.
Thanks
First, you want to use
model <- lm(Total ~ Coupon, data=df)
not model <-lm(df$Total ~ df$Coupon, data=df).
Second, by saying lm(Total ~ Coupon), you are fitting a model that uses Total as the response variable, with Coupon as the predictor. That is, your model is of the form Total = a + b*Coupon, with a and b the coefficients to be estimated. Note that the response goes on the left side of the ~, and the predictor(s) on the right.
Because of this, when you ask R to give you predicted values for the model, you have to provide a set of new predictor values, ie new values of Coupon, not Total.
Third, judging by your specification of newdata, it looks like you're actually after a model to fit Coupon as a function of Total, not the other way around. To do this:
model <- lm(Coupon ~ Total, data=df)
new.df <- data.frame(Total=c(79037022, 83100656, 104299800))
predict(model, new.df)
Thanks Hong, that was exactly the problem I was running into. The error you get suggests that the number of rows is wrong, but the problem is actually that the model has been trained using a command that ends up with the wrong names for parameters.
This is really a critical detail that is entirely non-obvious for lm and so on. Some of the tutorial make reference to doing lines like lm(olive$Area#olive$Palmitic) - ending up with variable names of olive$Area NOT Area, so creating an entry using anewdata<-data.frame(Palmitic=2) can't then be used. If you use lm(Area#Palmitic,data=olive) then the variable names are right and prediction works.
The real problem is that the error message does not indicate the problem at all:
Warning message: 'anewdata' had 1 rows but variable(s) found to have X
rows
instead of newdata you are using newdate in your predict code, verify once. and just use Coupon$estimate <- predict(model, Coupon)
It will work.
To avoid error, an important point about the new dataset is the name of independent variable. It must be the same as reported in the model. Another way is to nest the two function without creating a new dataset
model <- lm(Coupon ~ Total, data=df)
predict(model, data.frame(Total=c(79037022, 83100656, 104299800)))
Pay attention on the model. The next two commands are similar, but for predict function, the first work the second don't work.
model <- lm(Coupon ~ Total, data=df) #Ok
model <- lm(df$Coupon ~ df$Total) #Ko
I am newbie in Druid. My problem is that how to store and query HashMap in Druid using java to interact.
I have network table as follow:
Network f1 f1 f3 .... fn
value 1 3 2 ..... 2
Additional, I have range-time table
time impression
2016-08-10-00 1000
2016-08-10-00 3000
2016-08-10-00 4000
2016-08-10-00 2000
2016-08-10-00 8000
In Druid can I store range-time table as a HashMap and query both of the tables above with the statement:
Filter f1 = 1 and f2 = 1 and range-time between [t1, t2].
Can anyone help me ? Thanks so much.
#VanThaoNguye,
Yes you can store the hashmaps in druid and you can query with bound filters.
You can read more about bound filters here: http://druid.io/docs/latest/querying/filters.html#bound-filter
I am in the difficult situation now where i need to make a parser to parse a formatted document from tekla to be processed in the database.
so on the .CSV i have this
,SMS-PW-BM31,,1,,,,287.9
,,SMS-PW-BM31,1,H350*175*7*11,SS400,5805,287.9
,------------,--------------,----,---------------,--------,------------,---------
,SMS-PW-BM32,,1,,,,405.8
,,SMSPW-H707,1,H350*175*7*11,SS400,6697,332.2
,,SMSPW-EN12,1,PLT12x175,SS400,500,8.2
,,SMSPW-EN14,1,PLT16x175,SS400,500,11
,------------,--------------,----,---------------,--------,------------,---------
That is the document generated from the tekla software. What i expect from the output is something like this
HEAD-MARK COMPONENT-TYPE QUANTITY PROFILE GRADE LENGTH WEIGHT
SMS-PW-BM31 1 287.9
SMS-PW-BM31 SMS-PW-BM31 1 H350*175*7*11 SS400 5805 287.9
SMS-PW-BM32 1 405.8
SMS-PW-BM32 SMSPW-H707 1 H350*175*7*11 SS400 6697 332.2
SMS-PW-BM32 SMSPW-EN12 1 PLT12X175 SS400 500 8.2
SMS-PW-BM32 SMSPW-EN14 1 PLT16X175 SS400 500 11
How do i start from in Java ? the most complicated thing is distributing the head mark that separated by the '-'
CSV format is quite simple, there is a column delimiter that is a comma(,) and a row delimiter that is a new line(\n). Some columns will be surrounded by quotes(") to contain column data but it looks like you wont have to worry about that given your current file.
Look at String.split and you will find your answer after a bit of pondering it.
Hi i would like to know how i can parse data from a non structured file. the data is like in a table but there is juste spaces. Here is an example :
DEST ULD ULD XXX NEXT NEXT XXX XXX
XXX/ XXX TYPE/ XXX XXX PCS WGT
BULK NBR SUBTYPE NBR DEST
XXX BULK BF 0
XXX BULK BB 39
XXX BULK BB 1
XXX BULK BF 0
XXX BULK BB 0
I can't use delimiter as useDelimiter("\\s{2,9"); because the spaces changes between column...
Any idea ?
What you have is called fixed-length format. In some ways it is easier. What's the best way of parsing a fixed-width formatted file in Java?