## On the use of error bars

Dave Munger over at Cognitive Daily just wrote this post about people’s lack of understanding of error bars in the graphical representation of data. The post is very interesting and I encourage people to take the quiz that he has posted on the correct interpretation of error bars.

A particular comment on that post concerns me, and I am going to use this post to give my two cents on error bars and their importance in the understanding of data. Specifically I will try to address some misconceptions and problems with how people use and read error bars.

The comment that concerns me is:

I may, in the future, forget the exact definition of what the error bars mean, but I will still be capable of saying “Whoo, small error bar, that figure is probably pretty accurate” and “Whoa, look at that huge error bar, I’ll use a bigger grain of salt to look at that figure”.

This comment frightens me. I can’t help but to think about the book How to Lie With Statistics (link to book at powells). The main problem with this reasoning is that there are many ‘types’ of error bars that are often included in scientific graphics, with most researchers choosing some multiple of either the standard error or the standard deviation. One can not just look at the length of the error bars and assume that it means accurate data (to get a bit picky on semantics – also, error bars do not reflect the accuracy of the data, rather it reflects the precision with which you can measure the data). It will all depend on which error measurement is being plotted, and it is highly variable among scientific papers. I tend to use error bars that are the length of 2 * Standard error for reasons I will get to in a bit, and thus relative to other graphics that usually plot 1 SE my data may seem ‘less accurate’ to the reader, and that would be a shame and completely incorrect.

The appropriate use of error bars

Data that is plotted without error bars are data that cannot be put into relevant scientific context. Are two means the same? What is the measurement error on the observations? Is there a pattern of variability among groups? These are all incredibly important scientific questions that cannot be addressed without estimates of errors of one form or another. As such, error bars should ALWAYS be included in scientific graphics. The lack of error bars in figures immediately raises suspicion in my mind as to the appropriateness of any conclusions drawn from the data. There are a few exceptions where a complex graphic would lose all meaning if the error bars were included (say there were too many points and if the error bars were included you would not be able to see any of the data), but under these conditions, the text associated with the figure should make very clear the level of error on the data.

In my publications, I tend to use error bars representing two standard errors (SE) around a mean. This is because the standard two-group t-test (or F-test) has a 95% confidence interval of ~2SE. Therefore you can use this to directly estimate the significance of a difference in means, rather than having to visually double the length of 1SE error bars that most people use (mostly because they make people like the one quoted above more trusting of their data, rather than for any worthwhile reason). With 2 SE error bars, one can look to see if the mean of one group is included in the confidence interval of the other group – if so then there is likely no difference among the groups. Note that it is not relevant whether the error bars ‘overlap’ but whether the mean of one group ‘overlaps’ with the error bars of the other.

Here is fictitious example with some randomly generated data.

In this case the two groups are significantly different using a Students t-test (t =3.59, df = 198, p = 0.0004. I have plotted the same data twice showing that the two samples are different, with the plot on the left-side having 1SE error bars and the one on the right having 2SE error bars. There is not much difference in interpretation of these graphs. In either case they look different (OK, for this example, I have two groups that are highly different to try to make this easier to visualize). The 2SE error bars do not make the data look ‘less accurate’ but they do make it easier to see what is going on. The mean of either sample is not included in within the error bars of the other sample – thus the two samples are different. This is easier and more appropriate to interpret than the left plot with which in order to correctly interpret the error bars, you must first visually double their length. The person quoted above may have less trust in ‘accuracy’ of the data on the right, even though it is the same data, just with a different choice of error bar.

The following example is again randomly generated data, but in this case there is no significant difference among the groups (t = 0.96, df = 198, p = 0.336)

In this case the difference between the left plot (with 1SE error bars) and the right plot (with 2SE error bars) is clear. The right figure yields to the most appropriate interpretation of the data. It is clear with 2SE that the mean of one group ‘overlaps’ with the error bars of the other group, therefore suggesting that there is no difference among groups, which is the case here. If people were incorrectly using the same reasoning in the plot with 1SE error bars, they would incorrectly conclude that the means of the two groups were different.

My main conclusions are the following:

1. Error bars should ALWAYS be included in scientific graphics or at least have associated text describing the error measurements.

2. Do not just look at the width of error bars as an estimate of ‘accuracy’ of the data – it is context dependent on what the data are and which type of error bars the author has decided to use.

3. I encourage the use of 2SE error bars in the majority of cases to improve the clarity of the relationships of the data and to minimize the mis-interpretation of the error bars, even though it may make your data look ‘more noisy’.

4. Teach others what error bars really mean so that they can accurately read scientific figures.

Hey, since you are comparing means why not directly plot the 95% CIs in the graphs rather than 2SE? After all, the later is an approximation for the comparison being made in the post.

Greg – Absolutely true – that would also be good. 2SE is very close to the 95% CI, so that would work great.