Yes, I’m still on vacation. But I couldn’t resist a quick response to this comment (and the subsequent debate):
BBC: Do you agree that from 1995 to the present there has been no statistically-significant global warming
Phil Jones: Yes, but only just.
Both trends are clearly upwards.
Phil Jones was referring to the CRU data, so let’s start with that. If you fit a linear least-squares regression (or a generalised linear model with a gaussian distribution and identity link function, using maximum likelihood), you get the follow results (from Program R):
glm(formula = as.formula(mod.vec), family = gaussian(link = "identity"), data = dat.2009) Deviance Residuals: Min 1Q Median 3Q Max -0.175952 -0.040652 0.001190 0.051519 0.192276 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -21.412933 11.079377 -1.933 0.0754 . Year 0.010886 0.005534 1.967 0.0709 . --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 0.008575483) Null deviance: 0.14466 on 14 degrees of freedom Residual deviance: 0.11148 on 13 degrees of freedom AIC: -24.961
Two particularly relevant things to note here. First, the Year estimate is 0.010886. This means that the regression slope is +0.011 degrees C per year (or 0.11 C/decade or 1.1 C/century). The second is that the “Pr” or p-value is 0.0709, which, according to the codes, is “not significant” at Fisher’s alpha = 0.05.
What does this mean? Well, in essence it says that if there was NO trend in the data (and it met the other assumptions of this test), you would expect to observe a slope at least that large in 7.1% or replicated samples. That is, if you could replay the temperature series on Earth, or replicate Earths, say 1,000 times, you would, by chance, see that trend or larger in 71 of them. According to classical ‘frequentist’ statistical convention (which is rather silly, IMHO), that’s not significant. However, if you only observed this is 50 of 1,000 replicate Earths, that WOULD be significant.
Crazy stuff, eh? Yeah, many people agree.
Alternatively, and more sensibly, we can fit two models: a ‘null’ with no slope, and a TREND model with a slope, and then compare how well they fit the data (after bias corrections). A useful way to do this comparison is via the Akaike Information Criterion – in particular, the AICc evidence ratio (ER). The ER is the model probability of slope model divided by that of the intercept-only model, and is, in concept, akin to Bayesian odds ratios. The ER is preferable to a classic null-hypothesis significance test because the likelihood of the alternative model is explicitly evaluated (not just the null). Read more about it in this free chapter that Corey Bradshaw and I wrote.
Here is what we get:
k -LogL AICc dAICc wAIC pcdev CRU ~ Year 3 15.48054 -22.77926 0.0000000 0.5897932 22.93616 CRU ~ 1 2 13.52652 -22.05304 0.7262213 0.4102068 0.00000
The key thing to look at here is the wAIC values. The ER in this case is 0.5897932/0.4102068 = 1.44. So, under this test, the model that says there IS a trend in the data is 1.44 times better supported by the data than the model that says there isn’t. The best supported model is the TREND model, but really, it’s too hard with this data to separate the alternative hypotheses.
With more data comes more statistical power, however. Say we add the results of 2010 to the mix (well, the average anomaly so far). Then, for the null hypothesis test, we get:
glm(formula = as.formula(mod.vec), family = gaussian(link = "identity"), data = dat) Deviance Residuals: Min 1Q Median 3Q Max -0.174040 -0.041956 0.008072 0.044350 0.193146 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -22.456037 9.709805 -2.313 0.0365 * Year 0.011407 0.004849 2.353 0.0338 * --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for gaussian family taken to be 0.007993787) Null deviance: 0.15616 on 15 degrees of freedom Residual deviance: 0.11191 on 14 degrees of freedom AIC: -27.996
For the ER test, we get:
k -LogL AICc dAICc wAIC pcdev CRU ~ Year 3 16.99796 -25.99592 0.000000 0.7552163 28.33275 CRU ~ 1 2 14.33287 -23.74266 2.253259 0.2447837 0.00000
The ER = 3.1.
So, the “significance test” suddenly (almost magically….) goes from being non-significant to significant at p = 0.05 (because Pr is now 0.0338), or 38 times out of 1,000 by chance.
Whereas although the ER test is strengthened, the previous, result that the TREND is the best model (of these two alternatives), doesn’t change. This test is a little more robust, and certainly less arbitrary because no matter what the data, we are always evaluating the strength of our evidence rather than whether some pre-defined threshold is crossed.
You can do the same exercise with GISTEMP, but it’s less ‘interesting’, because GISTEMP shows a stronger trend, due largely to its inclusion of Arctic areas.
For GISTEMP, the 1995-2009 data yield a slope of 0.0163 C/year, a p-value = 0.0082, and an ER = 13.4 (that is, the TREND model is >10 times better supported by this data). The 1995-2010 (December-November averages) for GISTEMP gives a slope of 0.0174 C/year, a p-value = 0.0021, and an ER = 57.8 (TREND now >50 times better supported!).
You can see that for relatively short time series like this, adding extra years can make a reasonable difference, so longer series are preferable in noisy systems like this.
Okay, does that answer the top quote? I think so, but I’m happy to answer questions on details. Otherwise, it’s back to my holidays (but I’ve got a guest post from DV82XL that I’ll put up on 8 Jan, to re-invigorate our debate out carbon prices)…