Java Parsing a String to Extract Data - java

I have to write a program but I have no idea where to start. Can anyone help me with an outline of how I should go about it? please excuse my novice level at programming. I have provided the input and output of the program.
The trouble that I'm facing is how do I handle the input text? How should I store the input text to extract the data that I need to produce the output commands? Any guidance would be so helpful.
A little explanation of the input:
The output will start with APPLE1: CT= (whatever number is there for CT in line 4)
The following lines of the output will begin with "APPLES:"
I must include and extract the values for CR, PLANTING and RW in the output.
Wherever there is a non-zero or not null in the DATA portion, it will appear in the output.
When the program reads END, "APP;APPLER:CT=(whatever number);" will be the last two commands
INPUT:
<apple:ct=12;
FARM DATA
INPUT DATA
CT CH CR PLANTING RW DATA
12 YES PG -0 FA=1 R=CODE1 MM2 COA COB CI COC COD
0 0 1 0
COE RN COF COG COH
4 00 0
COI COJ D
0
FA=2 R=CODE2 112 COA COB CI COC COD
0 0 0 0
COE RN COF COG COH
4 00 0
COI COJ D
7
END
OUPUT:
APPLE1:CT=12;
APPLES:CR=PG-0,FA=1,R=CODE1,RW=MM2,COC=1,COE=4;
APPLES:FA=2,R=CODE2,RW=112,COE=4,COI=7;
APP;
APPLER:CT=12;

Related

Talend : capture value from row1 and replace it in the entire column

I want to take uk from 1st row and replace it in the entire country column without changing the values in zones. I have tried regex expression from expression builder but failed.
COUNTRY
ZONE
UK
12
AU
44
FR
21
GER
20
FR
02
Your job design will look like this
Second , using a tSampleRow you will get the range of lines (in your case you would like first line )
Third , stock your wanted line in a global variable like this
Finally , in the tmap just get your global variable as such
Here is the output (I have 201 lignes i will have 201 UK printed ):
.--------.
|tLogRow_1|
|=------=|
|mystring|
|=------=|
|UK|
|UK |
|UK |
|UK |
|UK |
'--------'
[statistics] disconnected
Job operation ended at 14:00 21/02/2022. [exit code = 0]

Predict function R returns 0.0 [duplicate]

I posted earlier today about an error I was getting with using the predict function. I was able to get that corrected, and thought I was on the right path.
I have a number of observations (actuals) and I have a few data points that I want to extrapolate or predict. I used lm to create a model, then I tried to use predict with the actual value that will serve as the predictor input.
This code is all repeated from my previous post, but here it is:
df <- read.table(text = '
Quarter Coupon Total
1 "Dec 06" 25027.072 132450574
2 "Dec 07" 76386.820 194154767
3 "Dec 08" 79622.147 221571135
4 "Dec 09" 74114.416 205880072
5 "Dec 10" 70993.058 188666980
6 "Jun 06" 12048.162 139137919
7 "Jun 07" 46889.369 165276325
8 "Jun 08" 84732.537 207074374
9 "Jun 09" 83240.084 221945162
10 "Jun 10" 81970.143 236954249
11 "Mar 06" 3451.248 116811392
12 "Mar 07" 34201.197 155190418
13 "Mar 08" 73232.900 212492488
14 "Mar 09" 70644.948 203663201
15 "Mar 10" 72314.945 203427892
16 "Mar 11" 88708.663 214061240
17 "Sep 06" 15027.252 121285335
18 "Sep 07" 60228.793 195428991
19 "Sep 08" 85507.062 257651399
20 "Sep 09" 77763.365 215048147
21 "Sep 10" 62259.691 168862119', header=TRUE)
str(df)
'data.frame': 21 obs. of 3 variables:
$ Quarter : Factor w/ 24 levels "Dec 06","Dec 07",..: 1 2 3 4 5 7 8 9 10 11 ...
$ Coupon: num 25027 76387 79622 74114 70993 ...
$ Total: num 132450574 194154767 221571135 205880072 188666980 ...
Code:
model <- lm(df$Total ~ df$Coupon, data=df)
> model
Call:
lm(formula = df$Total ~ df$Coupon)
Coefficients:
(Intercept) df$Coupon
107286259 1349
Predict code (based on previous help):
(These are the predictor values I want to use to get the predicted value)
Quarter = c("Jun 11", "Sep 11", "Dec 11")
Total = c(79037022, 83100656, 104299800)
Coupon = data.frame(Quarter, Total)
Coupon$estimate <- predict(model, newdate = Coupon$Total)
Now, when I run that, I get this error message:
Error in `$<-.data.frame`(`*tmp*`, "estimate", value = c(60980.3823396919, :
replacement has 21 rows, data has 3
My original data frame that I used to build the model had 21 observations in it. I am now trying to predict 3 values based on the model.
I either don't truly understand this function, or have an error in my code.
Help would be appreciated.
Thanks
First, you want to use
model <- lm(Total ~ Coupon, data=df)
not model <-lm(df$Total ~ df$Coupon, data=df).
Second, by saying lm(Total ~ Coupon), you are fitting a model that uses Total as the response variable, with Coupon as the predictor. That is, your model is of the form Total = a + b*Coupon, with a and b the coefficients to be estimated. Note that the response goes on the left side of the ~, and the predictor(s) on the right.
Because of this, when you ask R to give you predicted values for the model, you have to provide a set of new predictor values, ie new values of Coupon, not Total.
Third, judging by your specification of newdata, it looks like you're actually after a model to fit Coupon as a function of Total, not the other way around. To do this:
model <- lm(Coupon ~ Total, data=df)
new.df <- data.frame(Total=c(79037022, 83100656, 104299800))
predict(model, new.df)
Thanks Hong, that was exactly the problem I was running into. The error you get suggests that the number of rows is wrong, but the problem is actually that the model has been trained using a command that ends up with the wrong names for parameters.
This is really a critical detail that is entirely non-obvious for lm and so on. Some of the tutorial make reference to doing lines like lm(olive$Area#olive$Palmitic) - ending up with variable names of olive$Area NOT Area, so creating an entry using anewdata<-data.frame(Palmitic=2) can't then be used. If you use lm(Area#Palmitic,data=olive) then the variable names are right and prediction works.
The real problem is that the error message does not indicate the problem at all:
Warning message: 'anewdata' had 1 rows but variable(s) found to have X
rows
instead of newdata you are using newdate in your predict code, verify once. and just use Coupon$estimate <- predict(model, Coupon)
It will work.
To avoid error, an important point about the new dataset is the name of independent variable. It must be the same as reported in the model. Another way is to nest the two function without creating a new dataset
model <- lm(Coupon ~ Total, data=df)
predict(model, data.frame(Total=c(79037022, 83100656, 104299800)))
Pay attention on the model. The next two commands are similar, but for predict function, the first work the second don't work.
model <- lm(Coupon ~ Total, data=df) #Ok
model <- lm(df$Coupon ~ df$Total) #Ko

Origin Destination matrix with Open Trip Planner scripting in Jython / Python

I'm trying to write a Python script to build an Origin Destination matrix using Open Trip Planner (OTP) but I'm very new to Python and OTP.
I am trying to use OTP scripting in Jython / Python to build an origin-destination matrix with travel times between pairs of locations. In short, the idea is to launch a Jython jar file to call the test.py python script but I'm struggling to get the the python script to do what I want.
A light and simple reproducible example is provided here And here is the python Script I've tried.
Python Script
#!/usr/bin/jython
from org.opentripplanner.scripting.api import *
# Instantiate an OtpsEntryPoint
otp = OtpsEntryPoint.fromArgs(['--graphs', 'C:/Users/rafa/Desktop/jython_portland',
'--router', 'portland'])
# Get the default router
router = otp.getRouter('portland')
# Create a default request for a given time
req = otp.createRequest()
req.setDateTime(2015, 9, 15, 10, 00, 00)
req.setMaxTimeSec(1800)
req.setModes('WALK,BUS,RAIL')
# Read Points of Origin
points = otp.loadCSVPopulation('points.csv', 'Y', 'X')
# Read Points of Destination
dests = otp.loadCSVPopulation('points.csv', 'Y', 'X')
# Create a CSV output
matrixCsv = otp.createCSVOutput()
matrixCsv.setHeader([ 'Origin', 'Destination', 'min_time' ])
# Start Loop
for origin in points:
print "Processing: ", origin
req.setOrigin(origin)
spt = router.plan(req)
if spt is None: continue
# Evaluate the SPT for all points
result = spt.eval(dests)
# Find the time to other points
if len(result) == 0: minTime = -1
else: minTime = min([ r.getTime() for r in result ])
# Add a new row of result in the CSV output
matrixCsv.addRow([ origin.getStringData('GEOID'), r.getIndividual().getStringData('GEOID'), minTime ])
# Save the result
matrixCsv.save('traveltime_matrix.csv')
The output should look something like this:
GEOID GEOID travel_time
1 1 0
1 2 7
1 3 6
2 1 10
2 2 0
2 3 12
3 1 5
3 2 10
3 3 0
ps. I've tried to create a new tag opentripplanner in this question but I don't have enough reputation for doing that.
Laurent Grégoire has kindly answered the quesiton on Github, so I reproduce here his solution.
This code works but still it would take a long time to compute large OD matrices (say more than 1 million pairs). Hence, any alternative answers that improve the speed/efficiency of the code would be welcomed!
#!/usr/bin/jython
from org.opentripplanner.scripting.api import OtpsEntryPoint
# Instantiate an OtpsEntryPoint
otp = OtpsEntryPoint.fromArgs(['--graphs', '.',
'--router', 'portland'])
# Start timing the code
import time
start_time = time.time()
# Get the default router
# Could also be called: router = otp.getRouter('paris')
router = otp.getRouter('portland')
# Create a default request for a given time
req = otp.createRequest()
req.setDateTime(2015, 9, 15, 10, 00, 00)
req.setMaxTimeSec(7200)
req.setModes('WALK,BUS,RAIL')
# The file points.csv contains the columns GEOID, X and Y.
points = otp.loadCSVPopulation('points.csv', 'Y', 'X')
dests = otp.loadCSVPopulation('points.csv', 'Y', 'X')
# Create a CSV output
matrixCsv = otp.createCSVOutput()
matrixCsv.setHeader([ 'Origin', 'Destination', 'Walk_distance', 'Travel_time' ])
# Start Loop
for origin in points:
print "Processing origin: ", origin
req.setOrigin(origin)
spt = router.plan(req)
if spt is None: continue
# Evaluate the SPT for all points
result = spt.eval(dests)
# Add a new row of result in the CSV output
for r in result:
matrixCsv.addRow([ origin.getStringData('GEOID'), r.getIndividual().getStringData('GEOID'), r.getWalkDistance() , r.getTime()])
# Save the result
matrixCsv.save('traveltime_matrix.csv')
# Stop timing the code
print("Elapsed time was %g seconds" % (time.time() - start_time))

Java - get parent process

I think Java doesn't provide much in their API about getting processes, is there a way that you could get a parent's process PID/ID in Java?
If you're running on Linux you can check procfs using /proc/self/stat.
Following from #Imz's answer, on Linux, grab the output of /proc/self/stat (it's a one-line file, so just read it like a normal file)
43732 (java) S 43725 43725 11210 34822 43725 4202496 127791 387073
4055 0 3188 79 4597 253 20 0 53 0 16217706 39231705088 188764
18446744073709551615 4194304 4196452 140735605394256 140735605376816
274479481597 0 0 2 16800973 18446744073709551615 0 0 17 13 0 0 0 0 0
The 4th field (in bold above) is your parent process id

How to parse custom text file for analyize

I'm having a logfile like this:
1352711989.822313 SENDING
SR packet
SSRC 3760482201
NTP timestamp: 1352711989.822293
RTP timestamp: 30163617
Packets sent: 17
Octets sent: 85
RR block 1
SSRC 2520738017
Fraction lost: 0
Packets lost: 0
Ext. high. seq. nr: 64175
Jitter: 2947
LSR: 1035041236
DLSR: 284839
RR block 2
SSRC 2158728709
Fraction lost: 14
Packets lost: 43
Ext. high. seq. nr: 54178
Jitter: 394
LSR: 1035176766
DLSR: 149303
RR block 3
SSRC 100700967
Fraction lost: 36
Packets lost: 120
Ext. high. seq. nr: 45647
Jitter: 2365
LSR: 1035002733
DLSR: 323342
SDES Chunk:
SSRC: 3760482201
And I would like to parse every block into an object, So I made some classes in java.
Now is there a way to smoothly go through this text and put everyting in the right var and do that for the whole text file?
So at the end I have a list with objects that contains there information.
Since it looks like everything here is name value pairs the easiest thing to do would be to iterate over the lines of the file and add each line to a map. If the name blocks are important (ex. RR block 2) you can create a list that you add the current name piece to when you don't have a value, then pop it off once you get to the next piece with the same indentation.

Categories

Resources