Math problem bleg
Can you math wizards answer this one plz?
Year 0 16.83
Year 5 18.57
Year 10 21.38
Year 15 29.36
Year 20 36.05
What does this series extrapolate out to in years 25, 30, and 35?
TIA
You are here: Home » General » Math problem bleg
Can you math wizards answer this one plz?
Year 0 16.83
Year 5 18.57
Year 10 21.38
Year 15 29.36
Year 20 36.05
What does this series extrapolate out to in years 25, 30, and 35?
TIA
If you enjoyed this article, subscribe to receive more just like it.
Paul Campos, Above the Law 2011 Lawyer of the Year
Erik Loomis, HNN Cliopatria 2011 Best Series of Posts
Simple linear regression indicates the values should be:
25 yrs: 39.21
30 yrs: 44.13
35 yrs: 49.05
I haven’t looked at the possibility this is a curve rather than linear, so take that with a grain of salt.
I got the same. Campos’s data fits a quadratic very nicely, but I am wary of over-fitting given that we have such a small data set.
I agree – one data point every five years is enough to make me squirm in my chair. :o)
It’s parabolic — can’t tell for sure from the data points but rate of increase goes up through Year 15 then declines back toward approximately previous increment. So it should either stabilize in year 20 or even start to decline.
A quadratic equation fits the points better (r^2 for linear is 0.930, for quadratic it’s 0.992, and the improvement in fit is almost significant, P=0.06). With the quadratic, the predicted values are
25 yrs: 44.67
30 yrs: 59.20
35 yrs: 73.81
The difference in predictions is a good illustration of why extrapolation is scary.
Adding: Given that we have no idea where this data comes from, it makes no sense to resort to P-values.
It’s an institutional budget, with values in constant dollars.
Well, I no do math gud like the people answering you but holy shit that looks bad.
Depending on your audience, I might do a comparison of projections based on all 20 years vs. the last 10, or the first 10 versus the last 10.
Especially if you want to use linear curves, and really hammer them on how unsustainable the second decade is in comparison to the first.
If you have values in constant dollars, you can also normalize for inflation, given the starting year. That might give a better clue as to what curve fits better.
I’ll grant that future inflation is more difficult to predict, but it’s probably a good assumption that over the next 15 years, the Fed will still try to keep inflation lower than it should be.
$36.05 isn’t much of a budget. No wonder you want free modelling.
“It’s an institutional budget, with values in constant dollars.”
then toss out anything after the 10 year mark, they’re worthless as predictors that far out, too many unknowns.
Agreed–the choice of whether to fit a linear, quadratic, or some other equation to these data has to depend on an understanding of what the data mean, not just the numbers themselves. That the choice of model, which could be based on a subjective hunch about what’s going on, has such a large effect on the predictions, is sobering.
Yep.
If a significant and predictable portion of the budget is dedicated to fundraising, you could even justify fitting it to an exponential.
In the absence of information about what kind of curve … quadratic? harmonic? exponential? hyperbolic? … or how those numbers were generated … it could literally literally be anything at all. You would do as well to plot the points and draw a curve that is pleasing to you. That’s how it’s usually done, I suppose.
Looks like any of growth, exponential or logistic will fit the best. I would argue the quadratic overfits and likely is a model misspecification (i.e., backcasting would also get increases from year 0). The predictions for growth would be:
25yr: 42.86
30yr: 52.35
35yr: 63.94
It seems fitting that what drives me to stop lurking and make an account would be a stats question.
Results using R (lm function)
modeling y = x^2 gives an R^2 value of 0.9295
modeling log(y) = x, R^2 is 0.9613
for a linear fit, R^2 = 0.9295
Just looking at the fits, log(y) = x seems to work best with your data, in which case for years 25, 30, 35 should be 42.47, 51.78, and 63.13
whomp. modeled quadratic wrong – modeling properly ( lm(y~poly(x,2) ) gives the same results (and best fit) as John McDonald.
I made a graph, but the link thing isn’t working? https://drive.google.com/file/d/0B-AHksfDF9IXMzVuTkR2Nl9MRFE/view?usp=sharing
The footnote that Paul will write to attach to the extrapolated figure should be rather amusing.
*See flyingsquidwithgoggles, delazur, John McDonald, and cthulhu . . .
Exponential doesn’t look great. Rate jumps around from 2%/year to almost 7%/year.
Isn’t that like saying linear is a bad fit because the slope between adjacent data points jumps around?
I agree with Whirllaway. Depends on what is generating the growth in this data. The growth rates in each 5-year period are:
0-5: 10.3%
5-10: 15.1%
10-15: 37.3%
15-20: 22.8%
So, was year 10-15 just a bigger than expected bump in a continuing pattern of increasing growth in which case we might expect 27% growth to year 25 and 31% growth to year 30 (yielding 45.8 in year 25 and 60.0 in year 30)?
Or was year 10-15 a local peak in growth rates so that we might expect growth down to, say 16% in 20-25 and 11% in 25-30 (yielding 41.8 in year 25 and 46.4 in year 30)?
I’d do this problem with a Bayesian model, with an AR(1) process for growth. Since those numbers look like percentages I’d do an inverse logit transform first, and I’d also get some kind of estimate of measurement error, and put an informative empirical Bayesian prior to shrink the estimates.
I guess that this would give an point estimate in the range of 39-41 for year 25 and 41-45 for year 30. I expect that the error around that point estimate, assuming (ridiculously) that you had the right model and prior, would be about 2-3 points in year 25 and 3-5 in year 30. I’d then mentally scale that error by 1.25-1.5, to account for misspecification.
If you really wanted me to do it, I’d need to know what the data set was.
That error I suggested was for 1 standard error. If you want a 95% interval you double it. Meaning that you can’t rule out the series starting to shrink in years 20-25, though the odds are against it.
Why I am an engineer and not a statistician. My answer would be a piece-wise linear interpolation: perfect fit!
Sure, it can’t predict out any further than the next data point with any confidence that would be interesting, but then again nothing really can.
So year 25, answer is around 42-43.
Years past that, it’s an unreasonable question, unless you’re just trying to make a rhetorical point.
Agreed. Piece-wise linear. The question is: What happened in year 10 and will that continue?
They’re all the same order of magnitude, basically flat! /computer scientist
After the singularity, none of them will matter.
Ahem. The wingularity.
Piecewise linear makes sense – but given the data, I’d go with exponential. Running this for 25 and 30?
Year 25 … 42.47
Year 30 … 51.77
Year 35 … 63.12
Year 40 … 76.96
Characteristically, you asked for a regression of five data points, and got about half a dozen different answers. Highly instructive.
As others have stated it takes knowledge of the underlying process to know what curve to fit, and if fitting a curve is even appropriate. Things like shifts in costs, political climate, legal environment, personnel changes etc, could undermine any estimate.
On the other hand, if you’re using math to make a rhetorical point about something that appears to be growing without bound, fitting an exponential might provide the most bang for buck.
I can’t be fooled – these are the reciprocal probabilities of Hillary winning the presidency in 2016, 2020, 2024, etc as calculated by Sam Wang.
And the curve is really a flatline
Wait a minute, did they stop projections merely because they only have a mandate to do it until 20 years out, or because of the ‘2038 problem’ of systems that use a signed 32-bit int for time_t?
This is all too nerdy. Pick any curve fit from Excel and then tell everybody “I have proved with my spreadsheet that it’s going to be x.” That’s what most consultants do. Not the EIA: when forecasting the trend in renewable energy, they always go for linear, because Tradition.