Viewing a single comment thread. View all comments

MrMitchWeaver t1_izfknkm wrote

1 is perfectly OK for when you need to zoom in to see the difference. Perhaps it can be better notified that the axis doesn't start at zero

2 is perfectly OK when you want to show correlation between to series that don't necessarily have the same unit or magnitude.

3 is the most questionable one because three years is a very short time frame (for some things). You can address that by adding a previous trend line.
I don't know if it qualifies as cherry picking though, or at least it's not what people mean when they use that term.

All in all these are not deceptive if you know how to look at a chart and if there's a modicum of context to the chart.

I appreciate the effort but not necessarily the execution.

52

draypresct t1_izflrr7 wrote

>1 is perfectly OK for when you need to zoom in to see the difference.

Agreed. There are lots of examples where you really shouldn't start the Y axis at zero, e.g. if zero is not a reasonable value of whatever measure you're displaying. If I want to display the past few years' average temperatures in Miami, I should not start either the X-axis (year) or the Y-axis (temperature) at zero.

42

bosschucker t1_izg17rg wrote

I have to disagree with #2. I'm a fan of this blog post by datawrapper, which features this graphic (and has more arguments against dual axis charts besides being misleading). you can manipulate the axes to show literally any correlation that you want, which is a pretty fatal flaw imo for any data visualization

21

MrMitchWeaver t1_izgmap3 wrote

Of course it can be manipulated. As I said, it can be OK if the units are different or if the series have different standard deviations.

In every case it's important for the reader to look at the axes and draw their own conclusions.

I guess the larger lesson is Do Your Own Research.

4

Stannic50 t1_iziugzl wrote

If the units are different, then you can't plot the two series with only one vertical axis and so of course two different axes is ok.

But this example is in percent, so the units are not different. If the purpose is to compare the magnitude of series A to the magnitude of series B, then they should use the same axis. Using different axes would be acceptable if the purpose were to compare change over time (or whatever horizontal axis is) within A to change over time within B (as you might with, say, % of state budget spent on education vs % graduation rate). In this case, it's useful to zoom in on each series independently so the change over time is maximized.

2

MrMitchWeaver t1_iziwq8f wrote

If the unit is the same but the magnitude is very different it does not make sense to use the same axis.

Take housing growth YoY, unemployment, loan delinquency, labor force participation rate, yield curve.

These are all expressed in percentage points but they have wildly different ranges and magnitudes. It would make no sense to use one single axis for two or more of those.

As I said in my first comment. If the series justify the double axis chart it makes sense to use it.

Creator needs to be honest and consumer needs to be vigilant. Same as it ever was.

4

marsman t1_izj00gg wrote

>These are all expressed in percentage points but they have wildly different ranges and magnitudes. It would make no sense to use one single axis for two or more of those.

And importantly, there is the potential for trends to be highlighted by that sort of chart that wouldn't otherwise be visible, and that are accurately reflected in the data (so its not a manipulation).

2

Stannic50 t1_izjhx9m wrote

I agree. That's what I meant by "change over time within A/B." If the purpose of a graph is to show whether dogs or cats are preferred, then there should be a single % of households containing [pet] axis so the magnitude of the values can be directly compared. Whereas if the purpose is to show the effect of the 2008 recession on pet ownership, it may be more appropriate to have two separate axes so the magnitude of the change in values can be compared.

1

MrMitchWeaver t1_iziz6qy wrote

2

bosschucker t1_izkebe4 wrote

I don't really love this example tbh. look at where the lines cross at 82.5% - what does that tell you? the viz is clearly saying that there is some significance to 82.5% of workers being full time by nature of having that be where the lines meet - but what does it actually mean? you could move the axes so that the lines cross at whatever arbitrary point you want. if your viz is going to imply that a certain data point is significant, I think it actually should be

1

MrMitchWeaver t1_izklsm9 wrote

I think it's a good example insofar it shows two series that need different axes of the same unit and are absolutely correlated. I'm not talking about the data itself. It's more a response to the other person's points.

1

spiral8888 t1_izfurx3 wrote

  1. As someone commented. If you make the Y-axis such that the left one is 10% of the top and the right one 90%, you can make any change, big or small look exactly the same on the graph. In those cases the conveys zero information. You might as well give the values as numbers.

The only situations where it could make sense to suppress the zero are those where the absolute value of the plotted thing has no meaning, such as air temperature. So,.most likely you would never want to plot air temperatures starting from 0K. In most cases the absolute values have meaning, which is why the suppression of the zero just misleads the reader.

9

MrMitchWeaver t1_izgn5wo wrote

I agree that it can be used to mislead but that isn't always the case.

Take disposable income. Straight from Fred. https://fred.stlouisfed.org/series/DSPIC96

If you click on "view last 5 years" your Y axis is going to start way above zero. It just makes sense. If you click on "view max" you will get Y axis closer to zero because the range of values justifies it.

7

spiral8888 t1_izi4gl0 wrote

First, I have to say that there is something wrong with the data behind the graph. I can't believe the yearly disposable income could have 20%+ jumps in a month.

Second, yes the 5 year graph is misleading as it makes it look like the disposable income doubled in a month and then fell back to the old level.

−1

MrMitchWeaver t1_izim1f2 wrote

First, that's because of the stymulus payments. It's an anomaly. We're not here to talk about the data itself though.

Second, if you actually look at the y axis it's not even a little bit misleading. This is the default setting for all Fred graphs. If you're showing a value starts at 15.000.000.000 you are not going to start the Y axis at zero...

2

spiral8888 t1_izivw2d wrote

Yes, you can look at the Y-axis. But if you think that just by having the Y-axis values available removes all misleading, then no suppression of zero is ever misleading. For instance, by your logic the OP's first graph is not misleading as the values are there.

Regarding the Fed graph, the thing that you named as anomaly is amplified when you suppress the zero. When you don't the effect of the stimulus is put more context of how much effect it actually had on people's disposable income.

0

Skulltown_Jelly t1_izgdzp4 wrote

That's not the only situation. Trend lines are graphs that are used to show...well.. the trends, and the absolute quantities are not as important in many cases.

Stock prices from a certain year are a good example. It's not that it doesn't have meaning, the price of the stock is valuable information, it's just not as important as the trend and depending on the amounts it could make the trend hard to read

1

spiral8888 t1_izi3wng wrote

Two things. First, the stock prices are a bit like temperature in a sense that the absolute value of the share price has very little meaning. The share price of $10/share doesn't really tell you anything. It only tells you something in relation to the past.

Second, the relative change of the share price does matter. So, 50% drop in price is a different thing than a 1% drop. If you suppress the zero, they look the same on the graph.

2

MrMitchWeaver t1_izgoma0 wrote

In OP's chart the problem is more the scale than the start point, but it's always about context.

1

MeltBanana t1_izgdpfb wrote

I use 2 all the damn time, because it's very frequently necessary.

Like, I'm trying to show the strong correlation between Current(A) and Motor RPM. My Current values range from 8-15, and my rpm ranges from 10,000-18,000. I'm absolutely scaling or normalizing them so the correlation between the two is visually clear.

9

ellWatully t1_izj0kk4 wrote

I was thinking the same thing. Having two y axis scales left and right is only misleading if the two sets of data are displaying the same information for different groups. If they're displaying two different attributes of a system, different axes are often the only way to make the plot useful.

2

TownAfterTown t1_izg7dzj wrote

This is a good point in that these presentations CAN be used to mislead but can be used to highlight useful information. But they should be transparent and provide that context.

7

bruff9 t1_izfye8o wrote

I have an issue with 3. It very much depends on the data set and what is actually being portrayed/the context. Who is to say that 6 years is enough vs 2? We need to know a lot more in order to say xyz is bad because it’s 3 years.

4

Andoverian t1_iziw4pk wrote

Part of the point with 3 is that it assumes whoever made the chart has access to the data going back much further, meaning they knew the last few years are not representative of the longer trend. By only showing the last few years anyway, they're deliberately misleading people.

1

dark_o3 OP t1_izfpdjr wrote

I made a seperate comment explaining the idea of the infographic, and yes sometimes it is OK to do it but

#1 is for me the most common way people lie and its not ok in majority of cases.

#2 I would say its only ok for correlation but even here it can mislead users.

#3 maybe there is a better example, the idea is that users should know the full story.

3

farsh19 t1_izfswa7 wrote

I agree with both points, depending on the context; although, I would caution against phrases like, "majority of cases" unless you have the data to support such a claim.

These are responsible rules for graphs aimed towards the general public. However these are not good rules to follow in, for example, scientific literature. Hence, the context and intent of a graph is also important.

11

shmerham t1_izg5udz wrote

I’m not sure I’d agree that 1 is not ok in most instances. It’s okay if you’re comparing values against a reference, particularly if you’re trying to show outliers.

Take, for example, 100 meter dash times. There’s a huge difference between 10.0 and 9.9 seconds (a body length). …and if you’re trying to compare Usain Bolt’s record against the other fastest times, you would need to truncate the axis to see that his fastest stands out against the next 9 fastest runners which are clustered together.

There just one example but there’s plenty of others.

3

[deleted] t1_izg74y0 wrote

[deleted]

1

shmerham t1_izggf1c wrote

I agree with you and those scenarios are probably more common, but it seems like it would be incredibly hard to quantify that, so it’s susceptible to cognitive biases.

1

marsman t1_izizrmv wrote

3 is fine if the period covered is the relevant period, it's not fine if you are trying to display a continuous trend. It could be problematic, or fine if you are showing a point of change where the previous period isn't relevant (so you aren't after a change in trend from a previous period).

2

TheProf t1_izh28jx wrote

To show differences, you use a line graph. To show magnitude you use a bar graph (as a general rule).

The principle of proportional ink states that sizes should be relative, meaning bar graphs should all start at zero.

If you wish to demonstrate the change in a variable, use a line graph.

Units matter as well. If zero means a lack of quantity for the variable, zero is a valid starting point. If zero does NOT represent a lack of quantity, you do not have to start at zero.

Think temperatures: zero degrees does not mean a lack of degrees. Also, we typically consider the change in temperature over time. Hence, temperatures should be represented in a line graph.

1