Subscribe via RSS Feed

Math problem bleg

[ 35 ] February 17, 2017 |

Can you math wizards answer this one plz?

Year 0   16.83

Year 5    18.57

Year 10  21.38

Year 15  29.36

Year 20 36.05

What does this series extrapolate out to in years 25, 30, and 35?


  • Simple linear regression indicates the values should be:

    25 yrs: 39.21
    30 yrs: 44.13
    35 yrs: 49.05

    I haven’t looked at the possibility this is a curve rather than linear, so take that with a grain of salt.

    • delazeur

      I got the same. Campos’s data fits a quadratic very nicely, but I am wary of over-fitting given that we have such a small data set.

      • I agree – one data point every five years is enough to make me squirm in my chair. :o)

    • It’s parabolic — can’t tell for sure from the data points but rate of increase goes up through Year 15 then declines back toward approximately previous increment. So it should either stabilize in year 20 or even start to decline.

  • John McDonald

    A quadratic equation fits the points better (r^2 for linear is 0.930, for quadratic it’s 0.992, and the improvement in fit is almost significant, P=0.06). With the quadratic, the predicted values are

    25 yrs: 44.67
    30 yrs: 59.20
    35 yrs: 73.81

    The difference in predictions is a good illustration of why extrapolation is scary.

    • mikenmar

      Adding: Given that we have no idea where this data comes from, it makes no sense to resort to P-values.

      • Paul Campos

        It’s an institutional budget, with values in constant dollars.

        • Rob in CT

          Well, I no do math gud like the people answering you but holy shit that looks bad.

        • Aaron Morrow

          Depending on your audience, I might do a comparison of projections based on all 20 years vs. the last 10, or the first 10 versus the last 10.

          Especially if you want to use linear curves, and really hammer them on how unsustainable the second decade is in comparison to the first.

        • SIWOTI

          If you have values in constant dollars, you can also normalize for inflation, given the starting year. That might give a better clue as to what curve fits better.

          I’ll grant that future inflation is more difficult to predict, but it’s probably a good assumption that over the next 15 years, the Fed will still try to keep inflation lower than it should be.

        • njorl

          $36.05 isn’t much of a budget. No wonder you want free modelling.

        • cpinva

          “It’s an institutional budget, with values in constant dollars.”

          then toss out anything after the 10 year mark, they’re worthless as predictors that far out, too many unknowns.

      • John McDonald

        Agreed–the choice of whether to fit a linear, quadratic, or some other equation to these data has to depend on an understanding of what the data mean, not just the numbers themselves. That the choice of model, which could be based on a subjective hunch about what’s going on, has such a large effect on the predictions, is sobering.

        • njorl

          If a significant and predictable portion of the budget is dedicated to fundraising, you could even justify fitting it to an exponential.

  • Whirrlaway

    In the absence of information about what kind of curve … quadratic? harmonic? exponential? hyperbolic? … or how those numbers were generated … it could literally literally be anything at all. You would do as well to plot the points and draw a curve that is pleasing to you. That’s how it’s usually done, I suppose.

  • cthulhu

    Looks like any of growth, exponential or logistic will fit the best. I would argue the quadratic overfits and likely is a model misspecification (i.e., backcasting would also get increases from year 0). The predictions for growth would be:

    25yr: 42.86
    30yr: 52.35
    35yr: 63.94

  • shrinni

    It seems fitting that what drives me to stop lurking and make an account would be a stats question.

    Results using R (lm function)

    modeling y = x^2 gives an R^2 value of 0.9295
    modeling log(y) = x, R^2 is 0.9613
    for a linear fit, R^2 = 0.9295

    Just looking at the fits, log(y) = x seems to work best with your data, in which case for years 25, 30, 35 should be 42.47, 51.78, and 63.13

  • Peterr

    The footnote that Paul will write to attach to the extrapolated figure should be rather amusing.

    *See flyingsquidwithgoggles, delazur, John McDonald, and cthulhu . . .

  • Joe

    Exponential doesn’t look great. Rate jumps around from 2%/year to almost 7%/year.

    • were-witch

      Isn’t that like saying linear is a bad fit because the slope between adjacent data points jumps around?

  • Victor Matheson

    I agree with Whirllaway. Depends on what is generating the growth in this data. The growth rates in each 5-year period are:

    0-5: 10.3%
    5-10: 15.1%
    10-15: 37.3%
    15-20: 22.8%

    So, was year 10-15 just a bigger than expected bump in a continuing pattern of increasing growth in which case we might expect 27% growth to year 25 and 31% growth to year 30 (yielding 45.8 in year 25 and 60.0 in year 30)?

    Or was year 10-15 a local peak in growth rates so that we might expect growth down to, say 16% in 20-25 and 11% in 25-30 (yielding 41.8 in year 25 and 46.4 in year 30)?

  • Gregor Sansa

    I’d do this problem with a Bayesian model, with an AR(1) process for growth. Since those numbers look like percentages I’d do an inverse logit transform first, and I’d also get some kind of estimate of measurement error, and put an informative empirical Bayesian prior to shrink the estimates.

    I guess that this would give an point estimate in the range of 39-41 for year 25 and 41-45 for year 30. I expect that the error around that point estimate, assuming (ridiculously) that you had the right model and prior, would be about 2-3 points in year 25 and 3-5 in year 30. I’d then mentally scale that error by 1.25-1.5, to account for misspecification.

    If you really wanted me to do it, I’d need to know what the data set was.

    • Gregor Sansa

      That error I suggested was for 1 standard error. If you want a 95% interval you double it. Meaning that you can’t rule out the series starting to shrink in years 20-25, though the odds are against it.

    • Jon_H11

      Why I am an engineer and not a statistician. My answer would be a piece-wise linear interpolation: perfect fit!

      Sure, it can’t predict out any further than the next data point with any confidence that would be interesting, but then again nothing really can.

      So year 25, answer is around 42-43.

      Years past that, it’s an unreasonable question, unless you’re just trying to make a rhetorical point.

      • PeteW

        Agreed. Piece-wise linear. The question is: What happened in year 10 and will that continue?

  • Philip

    They’re all the same order of magnitude, basically flat! /computer scientist

  • N__B

    After the singularity, none of them will matter.

    • Rob in CT

      Ahem. The wingularity.

  • sk7326

    Piecewise linear makes sense – but given the data, I’d go with exponential. Running this for 25 and 30?

    Year 25 … 42.47
    Year 30 … 51.77
    Year 35 … 63.12
    Year 40 … 76.96

  • sigaba

    Characteristically, you asked for a regression of five data points, and got about half a dozen different answers. Highly instructive.

  • weirdnoise

    As others have stated it takes knowledge of the underlying process to know what curve to fit, and if fitting a curve is even appropriate. Things like shifts in costs, political climate, legal environment, personnel changes etc, could undermine any estimate.

    On the other hand, if you’re using math to make a rhetorical point about something that appears to be growing without bound, fitting an exponential might provide the most bang for buck.

  • LosGatosCA

    I can’t be fooled – these are the reciprocal probabilities of Hillary winning the presidency in 2016, 2020, 2024, etc as calculated by Sam Wang.

    And the curve is really a flatline

  • bilditup1

    Wait a minute, did they stop projections merely because they only have a mandate to do it until 20 years out, or because of the ‘2038 problem’ of systems that use a signed 32-bit int for time_t?

  • This is all too nerdy. Pick any curve fit from Excel and then tell everybody “I have proved with my spreadsheet that it’s going to be x.” That’s what most consultants do. Not the EIA: when forecasting the trend in renewable energy, they always go for linear, because Tradition.