Briney, Kristin

Data Visualization 101

This is a recording from the Data Visualization 101 workshop held on 2020-05-21 over Zoom. This workshop covered how to choose the right chart type for your data and how good design choices will make your chart easier to understand. The workshop focused on visualization best practices, independent of any specific visualization software, and consisted of lecture and hands on activities.

2020-05-27

DOI for this page https://doi.org/10.7907/7q90-b029 CaltechDATA Record https://doi.org/10.22002/D1.1440

Transcript

[Slide 1] Okay, so welcome to the data visualization 101 workshop. We're going to be talking today about making better charts. And I just love this XKCD because there's an XKCD for everything. And this one talks about one way to improve your visualizations. I'm not sure it's the best way to improve your visualizations.

So to introduce myself, my name is Kristin Briney. I am the BBE librarian. So if you are in the Biology and Biological Engineering division and you have questions about the library and resources, I am the person that you will probably be hearing from. We also have another team of librarians and library corresponding one each for each of the divisions. And a lot of us have science backgrounds, so we're always happy to help you deal with any literature issues, finding information, publishing information, questions that you have. You can always email library@caltech.edu.

So that's my plug for the library as I get started. But let's get into data visualization and talk about making better charts.

[Slide 2] So what I want to do today is talk about how to visualize your data, because it's not enough to collect data. You also need to effectively convey it. And if you don't have nice charts in your articles, if they're too complex, if they're too messy, people will just skip right over them because it takes so much mental effort to engage with them and learn from them.

[Slide 3] So we're going to be doing two things today in this session. Oh, I should also say that if you have questions, please pop them into the chat and I will try to answer them. Okay. Back to the two things we're going to talk about today. One in the first half, we're going to be talking about matching up what you're trying to do in your visualization, the message of your visual with the correct type of data and the correct type of chart. A useful type of chart that really helps you convey that message. So really taking it from message to type of data, to choosing different types of charts that convey different messages better or worse.

In the second part of the workshop, we're going to be talking about making design decisions to highlight what's important in your chart and eliminating excess information. So you notice neither of these learning outcomes talk about any particular data visualization software. We're going to be talking more about design because I have definitely seen people make very bad data visualizations in Tableau, which is a very high end data visualization software.

[Slide 4] So design goes a long way. I make a lot of visualizations in Excel. And because I think about their design, I make really good visualizations using probably the most generic charting tool there is. So as I said, we're going to focus on the very basics for good foundation.

[Slide 5] So part one, let's dig into choosing the right chart for your data. So as I kind of talked about already and highlighted, and gave you a hint at, we're going to be talking about choosing the right chart based upon what you're trying to say with your data. [Slide 6] And this approach comes from a book called "Effective Data Visualization" by Stephanie Evergreen. We have it in print in the library. We don't have an ebook copy, apologies for that right now. So when we get back on campus, you can check it out.

And she really advocates for starting with the concept of what are you actually trying to visualize, what are you trying to convey in your visualization and making all of your decisions from there? [Slide 7] And I want to give you an example from this blog that I really like called FlowingData. And he took the same data set and visualized it in 25 different ways. I'm not going to show you all 25 of them today. It's too much. But I want to give you just a little flavor of what this means in terms of how we think about visualization.

So we're looking at life expectancy across countries over time. So the first graph here is literally the life expectancy for each country graphed on top of each other over the years. And it's a lot of data here. It's really a mess. And really all that we can say from this chart is that life expectancies are going up, and there's some jitter in some of this data. And it might be interesting to engage with those individual countries and see why for example, we see that big dip around 2010 for one country.

[Slide 8] A different way to visualize this data is called a small multiple where you basically have the same chart repeated again and again for each individual not data set, but each line or each bar chart. You break it out and you basically make a bunch of copies of your data. And that way you can compare across charts because they have the same scales.

So here we're looking at small multiples for different countries over time. And now we can actually start to see why are certain countries, maybe is that Haiti that has that big dip in the middle? What happened there? Why are some countries more flat? And we can actually start engaging around individual countries to see what their general paths are.

[Slide 9] Another way to visualize this data, and sorry it's very tiny, but I kind of just want to give you an overall sense. We're looking at male to female. And we ranked this from longest life expectancies to shortest. So now we can say why does Japan have so much longer life expectancies than Sierra Leone? And why are there some gaps between male and female? Why are some of them so large, and some of them are so small?

[Slide 10] This is I believe number four, we're looking at histograms now. And it's also a small multiple, where we're looking at the distribution of life expectancies over time. And you can see in 2000, we might have a little bit of a bimodal distribution. And things have really shifted up to 2015 where life expectancies are longer. And we've lost some of that bimodal distribution.

[Slide 11] And finally example five is comparison to average. So some countries have higher life expectancy than average, and lower. And what makes a country on top of this list, and what makes a country on the bottom of this list?

[Slide 12] So five very different charts using the same data. And I think this really illustrates how important the chart type is in thinking about how we visualize and how we interpret the data. Because looking at each of these charts, we start to interact with that data in a different way. And we start to engage with it and ask different questions because it's charted differently. It draws your eye to different things.

So this is why it's so important to really think about why are you visualizing something, and go from there. Is it important to talk about comparison to average? Is it important to talk about the change overall over time? Is it important to talk about breakdown by country? [Slide 13] And one thing that can really reinforce this decision around what your message is, is just to make it the title of your chart. So that's a little hot tip for you.

[Slide 14] So once you start thinking about what message you want to convey with your chart, the next step is to start thinking about what that actually represents in terms of the type of data that you're using. And I'm pulling from Stephanie Evergreen's book "Effective Data Visualization" here, and she breaks things down by eight different data types. Single number, comparison, beating a benchmark, survey results, parts of a whole, correlations, change over time.

Oh, and I should say that I'm making the slides for this presentation. They're already available. I'm putting the link into the notes [https://resolver.caltech.edu/CaltechAUTHORS:20200519-101809774]. So you can go there. And if you want to download a copy of slides yourself, you can do that in CaltechAUTHORS. So you don't actually have to take notes right now. It will be helpful if you have a pad of paper and a pencil later. We're going to be doing some sketching. So put that aside, and you can also download the slides right now and follow along. All right. Where was I? Survey results, parts of a whole, correlations, change over time, and qualitative data, which I'm not going to be talking about today.

So the point of breaking your message down by data type is that there are better strategies for graphing particular data types, that are more effective and easier for people to understand. [Slide 15] And to really talk about that, I need to take a step back and talk about Cleveland and McGill's research, because people have actually done research on how people perceive visual information. And it's a really brilliant, brilliant study.

So Cleveland and McGill found that people most easily understand and can accurately interpret position on a common scale. And second easiest to interpret is position on non-aligned scales. So the scales are the same. They're just two graphs. So this is why small multiples work because you have two scales. They're the same scale, they're just two different graphs, and you can make a comparison back and forth.

After that, people are pretty good at visualizing and properly quantifying length. They're okay at direction and they're not as good at angle. So this is why bar charts, where you're interpreting quantitative information by length, are easier for people to understand than pie charts, where you're interpreting the quantitative information due to the angle of the pie pieces.

Finally at the bottom, the hardest things for us to interpret our area, followed by volume, followed by curvature. This is a reason why area plots are less good and donut plots are the worst. So donut plots are basically pie charts with the center cut out. So instead of being able to interpret the data by angle, you switch to interpreting the data by curvature. And that is actually really hard for us to quantitatively understand as compared to angle as compared to length.

So we can use this to start thinking about the ways that we want to quantify and to visualize our data. Because points and lines are effective. They work. We don't need to get fancy because the fancier we get or the more complex we get, people have a harder time understanding some of those ways to encode information.

[Slide 16] Okay. So let's hop back into our main message here. So we've talked about what our message is, we talked about data type, and now we want to talk about chart type. And here again, we're drawing heavily from Stephanie Evergreen. Thank you. I cribbed this little table. And you can see for our different data types on the left here, we have a number of different chart types on the top of this table. We've got the big number, icon array, pie chart, bar or column chart, side-by-side column chart, slope graph, back-to-back bar chart, dot plot, small multiples, column chart with benchmark, a combo chart, a stacked bar/column chart, number and icon, histogram, map, scatterplot, diagram, line chart, deviating bar chart, and sometimes do not visualize (that might actually be the case where it's an option, maybe a table would be better).

So I'm not going to go over all of these, but there are numbers next to some of these that you may or may not have heard of. [Slide 17] And I want to go over some of these right now to give you some exposure to different types of charts and where they might be effective.

[Slide 18] So these nonstandard charts, I'm going to go through nine of them very quickly. The big number. The icon array, slope graph, back-to-back bar chart, dot plot, small multiples. We've seen that one already. The combo chart with benchmark line. A column chart with benchmark line, sorry. A combo chart and a histogram.

[Slide 19] So let's start with the big number. So the big number is actually really handy if you're trying to convey one single number. You could use a pie chart. But honestly, the best way for people to understand this information is just to make that number really big. And if it's a single number, if it's a big text number, people actually will get that information and they'll retain that information. They understand this information.

So here's just a simple visualization about the number of measles cases in Brooklyn, New York and in the United States. And they made the numbers really big so people could quickly grasp that information and interpret it.

[Slide 20] The next type of chart is one that I really like, is very simple for a single number, is the icon array. And this is probably out of date. It's the U.S. Senate. And it's a very simple way to say, is the Senate red? Is it blue? Is it Republican or Democrat? And you visualize every individual in the Senate and you give them a color red for Republican, blue for Democrat. And then there's two independents that caucus with the Democrats so they're that light blue color. And it's very easy for people to quickly understand and grasp that information. You just use the same icon over and over again? Four out of five dentists recommend this toothpaste and you have five toothpaste icons. And one is gray and the other four are bright green. And people grasp the information very quickly. Very simple. It doesn't have to be charted, anything complex. But it conveys the point very well.

[Slide 21] Number three, slope graph. So the slope graph is nice when you're making a comparison. And is nice when you're making the comparison where most of the information is either increasing or decreasing except for one or two lines. Because the slope graph, what it does is it really highlights the change in direction. And here is a really good example of the slope graph because all of the prices are going up between East and West except for strawberries. So now we can actually engage around this data around strawberries. Why is the price of strawberries lower in the West than it is in the East? And the slope graph really draws your eye to the fact that everything is going up except for the strawberries. So it's another option for doing comparisons you might not have seen before.

[Slide 22] The back-to-back bar chart. I love the back-to-back bar chart. This is really nice for doing kind of a head to head or tail to tail comparison. You basically take two bar charts and you stick them back to back. And this allows you to compare two data sets with similar categories.

And you can actually do this in Excel. Let me see if I can explain this clearly. So to do it in Excel, you plot, you end up seeing two sets of data, but you really plot four sets of data. So you plot the data on the left plus a buffer, and that equals 100%. And you plot the data on the right. And that plus the extra, the fourth data ... let me start over. So there's four data sets. So data set one ends up being clear. You end up setting it to being clear on your graph, plus data two which you see.

Data one plus data two equals 100%. On the right you have data three which is the lighter blue here. And then data four, and data three and data four add up to be 100%. But you end up making data four clear or you basically suppress it. So in the end, you end up having everything line up in the middle because data one and data two add up to 100% and data three and data four add up to 100% meaning they align in the middle. But you end up having hidden data on both sides of your chart.

Hopefully I explained that clearly. If you have questions, throw them in the chat. Because it's a little confusing and once you get it, it's like, "Aha, of course I can add hidden data and make it invisible in my final chart."

So that is one way to make a back-to-back bar chart. And I think it just is a nice way to do direct comparisons between two data sets, two related data sets. Not seeing any questions in the chat. So hopefully you guys got that. Or if you have other questions, just let me know.

[Slide 23] Okay. Dot plots, number five that we're going to talk about. Dot plot is a nice way to interpret progress, but really progress that's improving. So numbers that are getting higher. Because it's drawing your eye to numbers that are moving to the right. So here we're seeing kindergarten readiness for fall as compared to spring. And we're really seeing that the spring numbers are better the fall.

So this is actually a scatter plot where you just artificially make those lines. So creative arts is line one, science is line two, mathematics is line three, etc. And you're artificially adding Y values to separate out the data.

[Slide 24] Small multiples we saw before. Basically you make one plot, stick in data set A, you copy it, you change out the data set for B, you copy it again, you change out the data set for C. And all of a sudden we're able to compare across different data directly between multiple graphs.

Small multiples are really useful if you are plotting a ton of lines on a graph. Think about doing a small multiple because otherwise it can be overwhelming. The key to a small multiple as I said before, your axes have to be identical between all of the graphs.

[Slide 25] Number seven, bar chart with benchmark line. So this is really useful for when you want to plot something as compared to a benchmark such as an average. So we're looking at religion in the United States as compared to higher education, how much higher education people have by different religions. And there's that nice line that literally you can just throw a line on your graph. And it has to be in the right place of course. But there, you have a way to compare against that average, that benchmark. This might be good for toxicity. What is the EPA standard, and where is your measurement of toxicity of a particular compound in the water as compared to the EPA standard?

[Slide 26] All right, number eight is the combo chart. It's very similar to the bar chart with benchmark line, but that benchmark might change over time. The key with the combo chart is don't do a dual axis. And I'm not really fond of this combo chart in particular. I think it has some issues. But now you can actually compare two things on here, and that's why it's called a combination. But as I said, sometimes people do combination charts and they do two axes. And that just makes it really, really, really hard to understand. So if you're going to do a combo chart, use only one Y axis and have all of the data be on the same scale.

But sometimes it's necessary. You want to say, maybe your benchmarks change over time. Maybe you're doing something with data over time. And every year, your benchmark is different and you need to show that. A combo chart might work.

[Slide 27] All right, finally a histogram. A histogram is just a column chart without any gaps between the bars. And that basically tells you that all of this data is on a spread. It's distributed and there's no gaps in it. So it's really just a way for you to take a column chart and make it visually show how everything is really just a spread of information. In Excel if you're going to do this, you basically set 'gap width' to be zero.

[Slide 28] Okay. So I just went through a lot of information. We're going to take a little break for a five to 10 minute activity here. It would help if you have a pencil and some scratch paper, or you can just kind of take notes on your computer.

So we are going to talk about this data set and we're going to walk through that process that I just talked about in terms of choosing your message. Figuring out what data type that is, and kind of sketching out a little chart, picking a chart type and sketching out a little chart.

So I pulled some data from Pew on Americans and privacy. And here is the data set. It's the percent of people who feel that they have, a lot, a little, or no control over who can access the following types of their information. I apologize, this slide is very busy. And if I was doing this in person, I wouldn't do this to you. But I want to make sure you have everything on one slide on your computer so you can see.

[Slide 29] So the data is on the top right. I'm bringing back, sort of. You can sort of see that. Do you see all this, I'll make sure you can see that last row. The change over time is on the bottom of that chart type table. So what I want you to do for this activity is just take a couple minutes, look at the data in the top right. And there's no right or wrong answer here. What interests you? What do you want to say about this data? What message are you going to pull out?

When you decide on a message, think about what type of data that you have based on that message. And then pick a type of chart and draw a preliminary sketch. So what I'm going to do is going to give you five minutes to kind of sketch through this, and then we'll come back and we'll do the second half on thinking about how to make design decisions about your chart. Please do let me know if you have any questions by popping them in the chat.

We'll do one more minute here and then we'll reconvene.

All right, well thank you to everyone who kind of stuck through this. Thank you for kind of working through this activity. I think it really helps to reinforce this process. If we were doing this in person, I would ask people to share, but I won't make people try to figure that out over Zoom today. So let me at least give you an example of how I might interpret this.

[Slide 30] So for me being a librarian, I'm really interested in how many people feel that they don't have control over the search terms they use online. So my message is that nine out of 10 Americans say they have little to no control over who can access the search terms they use online, which is really staggering to me. And that's what I really want to highlight out of all this data.

So looking back at this overwhelming slide here. So this is really just a single number. And I think given my passion for icon arrays, I think I'm actually going to visualize it as an icon array. So here's my little icon array. Literally just did this in PowerPoint. It's nothing fancy. But if, those people are really big on my screen, wow. But kind of gets the point across very simply that nine out of 10 Americans say they have little to no control over who can access their search terms they use online. So nothing complicated. Just a little figure that tells my message. So I went from message, to data type, to chart type, to this figure.

[Slide 31] So this figure, the data came from Pew. I also want to show what Pew does with data, and it's actually worth looking at how Pew visualizes their data. Because their figures are quite good. And we can pull a lot of good things from it. So here is the figure that I pulled a lot of the data from. They're doing a couple of really good things. So you can see right off the bat that they have a really nice title. They say about half of Americans feel as if they have no control over who can access their online searches. So this is the message, and that really determines how they have visualized this information.

Something else to look at. You can see that they are using a simplified color scheme. They've also sorted their information from the people who say that they have a lot of control, to the people who, most people who say they have a lot of control to the least people who say they have a lot of control. And this really helps you contextualize and interpret that data better because you can make side by side comparisons. Another thing I like about Pew here is that they give you the source right on the bottom, and they give you some clarity, and they add annotations.

So these are just a few things I like about this figure. You can see they are visualizing all the information. But they've given you some structure in which to interpret it. I think it's very clean and it's very easy to scan and to understand.

[Slide 32] So we're going to take this as a good pivot point to talking about chart design. Last call for whatever questions you have on thinking about chart types because we're going to move into thinking about chart design. Throw them in the chat if you have them. Otherwise, we're going to go forward.

All right, let's talk about chart design. [Slide 33] And I redone this section. I've taught data visualization for several years now, and I've redone the section to really focus on something that I've been reading about a lot recently, which is cognitive load. And I hope this helps you understand design. Because this really helped clarify design choices for me.

So I've been reading, literally have it right behind me. Stephen Few's "Show Me the Numbers". And he talks about cognitive load. And basically, the average person can hold only about three things in our working memory at one time. So we can liken this to a computer. You have your hard drive, which is longterm memory. It stores a lot of information. But you only have a limited amount of RAM, which you can actually do your immediate computations using. So you can move between, you're processing something at RAM, you can save it at the hard drive. You can pull things off the hard drive and put them in RAM. But RAM is limited. And people's brains operate in the same way.

So we can really only hold three pieces of information roughly based upon research, in our working memory at one time. We can process that information and put it into longterm memory. But when we're coming into contact with new information such as that in a figure, really think about how we can reduce the cognitive load.

Say we have five different data points in our figure, and there are five different colors, and you have a legend. And all of a sudden you're asking somebody to interpret five different colors and remember what each color stands for. And that's on top of trying to interpret this information. So you've really put a lot of cognitive load on your viewer.

So when we talk about making good design decisions, I'm going to be talking about things that will reduce cognitive load for a person looking at your chart so they can easily understand that chart and engage with it, and get to the point more quickly.

[Slide 34] So when we talk about good chart design, we want to highlight key information. So really draw the eye. We want to suppress or delete extra information, extra lines on our figures. So things that distract the viewer and we want to make it clean and visually easy to scan. And easy does not necessarily mean simple. You could have a complex chart. But if it's easy to scan, somebody's going to say, "That's a very lovely chart. I'm going to engage with it." So you can actually draw the viewer in by having a clean chart. But if you have a messy chart, it turns people away immediately. Because you're adding cognitive load, it makes it harder for them to engage with because they have to do extra work.

One rule of thumb is, at least in Western societies, we tend to read in a Z pattern. So we start at the top left, read across, jump down to the next row to the left again, and go across. So one way to think about how people might be interpreting your charts is to say what's most important? We give a good title, we start in the top left. And we work our way down to the bottom right.

[Slide 35] So these are several strategies for reducing cognitive load. This is how I think about it, but these aren't set in stone. And we're going to be talking about each of these strategies in the second half here.

So number one, identify what is important. Number two, remove chart junk. And this is a term that comes from Edward Tufte, who wrote a very well known book in the '80's. The name is escaping me right now. Number three, draw the viewer's eye, and number four, be consistent. And all of these strategies will help reduce the cognitive load and make your charts visual appealing.

[Slide 36] We're going to be using this example figure. This was published recently. I'm kind of in love with this figure because the 2014 data in this figure is actually my data. And I published a paper in 2015, shared the data. And somebody went and took my open data set and used the same methodology and combined and collected that data about libraries in 2019. And did a comparison and published it.

Now this chart is perfectly fine. You can kind of tell it uses the default Excel style, and colors, and output. So it's not bad. I'm just wondering if we can improve it.

[Slide 37] So what we're going to do is we're going to start with a message. And this message talks about university library data support because I'm interested in that. And I want to show that it increased from 2014 to 2019. So basically, the orange bars in this graph are bigger than the blue bars for three of the four categories. And that's what I want to talk about.

[Slide 38] So I going to be using a slope graph here because I want to talk about the change, the increase over time. And this, when I do a slope graph and make sure my data is accurately in the right place here, is the default output in Excel. So I'm doing all of this in Excel, and I'm going to walk you through all of these four steps that I've outlined for charting this data in Excel. So the original graph on the left, and here's my starting point in Excel on the right.

[Slide 39] So the first thing I want to do is figure out what's important. So you can see that I gave it a helpful title to really reinforce what's important. So data support at the AAU libraries increased from 2014 to 2019. And that increase is really what I'm talking about. So we're talking about those three lines out of the four that increase from left to right from 2014 to 2019. And those are really what I'm going to highlight and make decisions around in the next three steps.

[Slide 40] Remove chart junk. You can see going back and forth here. I've reduced a lot of stuff on this chart, and I want to walk you through it. So basically, this comes from Edward Tufte again. He really thinks that any extra ink is a waste. And I'm not going to 100% agree with him, but extra ink can lead to extra cognitive load.

And you'll notice I've done a couple things. So on the background, I've gotten rid of those lines, those horizontal lines. And I've actually gotten rid of that Y axis and just literally labeled the data instead. This is really depending on your audience, you might choose to label the data. You might choose to keep your axes. Some of these are really design decisions at this point, depending on what you want to do and how you think your audience is going to interpret it. So your numbers might be really important to put on the chart. It might be better to have an axis. And this is where it really comes down to understanding what you're trying to visualize, and who you're trying to visualize it for.

The other thing I've done is actually deleted the legend. And I am not 100% anti-legend, but legends can add cognitive load. Because you're going back and forth between the legend and the lines on your chart. But here, I'm actually literally labeling the lines on my chart. And you can say, "Okay, this line is data services on the top, and the line on the bottom is data repository." And it actually eases the cognitive load. And for somebody who's trying to understand it because they're not going back and forth across the chart.

So all of these decisions, I'm kind of keeping what's necessary for the structure of my chart. But I'm really trying to reduce the amount of ink, reduce the extra information that I'm putting on here. Reduce the cognitive load for somebody coming to this figure. So that was step two, remove chart junk.

[Slide 41] Step three, draw the viewer's eye. And this is where we talk about some of the design things that people tend to think of when they think of data visualization, mainly color. So you can see I, between the previous step and this step, I've added those dots. This was a visual design decision you may not agree with. I kind of like the weight of the dots to really show where those data points are. I've also made that third line gray.

So gray is a great color to use when you're visualizing things. Particularly if you have information that you want to show on your graph as kind of a baseline, as other information, as information that's not central to your message but needs to be there. Gray is really helpful. And then you can use color and use it very sparingly. But color is really the thing that draws your eye, that is for emphasis.

So you can use several different shades of gray and a color or a couple different colors, or different shades of blue for example. To really be strategic about your use of color. Because color is one of those things that's really going to draw the viewer's eye.

A couple of things you do want to avoid though. Pairing red and green. And other common color combinations for people who are colorblind. Red and green is in particular seems to be the one that people use a lot what is impossible for not every colorblind person, a lot of colorblind people to see. Also pink for women, blue for men. Gender is not binary. And pink and blue, that's old. Pick different colors.

One thing you do want to make sure you do, and I've definitely run into this before. Please, please print out your charts in black and white. People are not going to pay the money to print all of your articles in color. So it's worth printing everything out in black and white to make sure that you have enough contrast between the colors in your figure, so that people can still understand them if they're printing them out in black and white. Because this will crop up people. People are just going to print in black and white. You want to make sure your figures still stay up.

One other thing that you can do to draw the viewer's eye, and it's not something that I can do on this figure here. But something I showed you before. [Slide 42] So let's hop back to the combo chart. And I'm sorry that this is kind of visually jumping out of things here. But in the combo chart, you can see those gray lines are reordered. And I really have a issue with these labels on the bottom. That's a whole separate thing. You can't see all the labels for all of the different columns. But if you could see all the labels, you could actually do some pretty good comparison and interpretation between the different categories.

So if your X axis for example, doesn't have a particular order to it. If it's not time, for example. Time has a particular order, you want to keep that order. If you're just using categories, really think about reordering your data from most to least, or least to most. Because when you do that, you're putting those bars right next to each other and letting the viewer be able to make comparisons between bars that are very, very, very similar in length. That might be hard to compare if they were on different sides of the figure.

So hopefully this is a good example. I know I just kind of took us out of this main thread here. But you need to see an example of it to understand how effective it can be. [Slide 43] So if needed, rearrange your data based on the data values, if that makes sense. It doesn't always make sense. It wouldn't make sense to reorganize this figure for example, and take the chronology out of it and mix it up.

[Slide 44] Okay. Step four. Be consistent. This is where it really all comes together. And you can see between this that are drawing the viewer's eye step and the being consistent step. All of a sudden, things got really nice and easy for us to scan. And I did a couple of things. You can see I aligned everything on the left or the right. So that left, all that text on the left is aligned. And it just reduces the cognitive load. My brain looks at that and says, "Ahhh." Because if you have four objects in a row and one of them is not in line, your eye is immediately drawn to that object. So doing something as simple as aligning content can really help reduce cognitive load.

The other thing that I'm doing is limiting my color palette, and I'm limiting font types and sizes. So I've probably got two or three different colors of gray in here. The title and the 2014 and 2019 are the darkest. The data itself is a gray, and I'm trying to remember if I did the numbers, the labels as a different color of gray. But it's all consistent. The other thing you can see between the previous slide and this one is I've made the labels consistent in color with the data. So I'm really reinforcing that this label goes with this data. So I think this is a much easier visual to look at, because everything is consistent.

The other thing I want to say here about consistency is wherever possible, make color assignments consistent across multiple figures. I've definitely read articles where I'm reading it and green means good and red means bad for example (and that's a bad color choice to begin with) for three figures. And then I get to figure four, and all of a sudden red means good and green means bad. And it's just so confusing for me as somebody who's trying to interpret this information because you've just added cognitive load. Because I've got my brain used to, I've put in permanent memory now, okay green is good. Red is bad. And you get to figure four. And now I have to think about that again. And now I have to keep that piece of information in my working memory as I understand these plots. So you've taken away one item that I can keep in my working memory. So these are just strategies for being consistent.

[Slide 45] One more thing. I've gone through those four steps. We've talked about figuring out what's important, reducing extra information, drawing the viewer's eye, and making things consistent.

Finally, it's okay to annotate data on the graph. This might be your benchmark line. This might be adding numbers. For here, I want to actually talk about why that gray line is going down as opposed to going up. And the answer is that data in IRs is going down because data in repositories is going up. So that big increase in the bottom is actually causing that decrease in the middle.

So that's another thing you can do. You can literally explain things on a graph. May or may not be appropriate to your situation. And again, it's a design decision.

[Slide 46] Okay. So our original image is on the left, our new image is on the right. I think they're both fine, honestly. I think the one on the right is a little bit more consistent. I think to me it looks a little bit cleaner and it just makes my brain feel a little bit better because I think it reduces the cognitive load and helps me engage with that figure much more easily.

Some of this does come down to design. You might make different design decisions than me. But I hope that you still take some of those principles away in terms of making things consistent, in terms of using color strategically, in terms of reducing extra information on your chart that may or may not be necessary.

[Slide 47] Okay. We've gotten to the end of this process. We're getting to the end of this workshop. And I want to wrap up. We have another exercise we'll do just very briefly at the end, before we wrap up. I want to say one thing that you can do to figure out if your figures are working is just to test them. You can do this with yourself. You can take a break, walk away, come back. And think through how you interpret the figure after you haven't seen it for a while. What is your eye drawn to? How do you process it? Or you can give your figure to a friend and ask them to walk through step-by-step, what they see and what they understand. What their eye is drawn to, what information they see first, how they process that information, what they're having trouble with. And this is another really good tool in your toolbox for improving your data visualizations. Okay.

[Slide 80] The last thing I want to do, and I'm not even sure we're going to do all of this. Is to talk through improving a bad data visualization. Sorry to the Institutional Research Office. I found this on your website, and it is pretty bad. Basically talking about the different cohorts at Caltech and underrepresented minorities. Let's see, this is graduation rates for, I can't see that because my Zoom bar is on top here. Give me a second. Six year graduation rates. Okay. Thank you for letting me make that small, make that big again. So six year graduation rates for freshmen cohorts based upon different demographics. And this is just a lot to look at.

[Slide 49] So if I were thinking about how to improve this, I would start all the way at the beginning. And I would say, what am I really trying to do with this figure? Perhaps I do want to include all this information, but I want to actually say why am I showing all this information? What does that mean in terms of chart type?

For example, I want to talk about how different demographics look over time between 2006 and 2010, and the shift over time. So perhaps that means I'm going to use a small multiple for each different demographic category.

So here, the change over time is important. And actually here, we're going to be talking about each group individually and then having multiple figures for each group. And if I were doing this in person, I would draw a little, actually let's see. [Slide 50] Let's see if I can get my little, let's try drawing here. I'm going to have little small multiples here. Hello Zoom. So we've got little small multiples. And here we go.

And I'm just going to make up this data. So I'm going to be talking about how this data changes over time. Let's do orange, because Caltech is orange. I'm going to try to minimize this information. So we might have some information that goes like this, and some like this, and some like this. And we're basically just doing a comparison here.

So those orange lines are really what's important. That's what I want to compare. We're going to do things that make it easy to understand. So we might have, don't make that blue. That's awful. I'm going to make the underrepresented minority here. This is really interesting to do this over Zoom. And I might make these labels on the text so as to reduce cognitive load, instead of having a legend, I'm going to draw the viewer's eye. Perhaps I actually do want to talk about one particular graph, and talk about why this graph down here is so high. So maybe I want to talk about this one, to talk about why these numbers are so high. So let's do that with, let's put some text back on here to annotate here. These numbers are high.

So different design decisions about why I might go through this. This is really quite challenging to do over Zoom. But this is the kind of activity that I might do. I might start from scratch thinking about what my message is, what that breaks down the chart type. Really think through what I'm trying to do. I really want to show these patterns over time. And therefore, the lines themselves are really important. And that's what I'm trying to highlight. And within that, I want to highlight this middle bottom chart. So I might do something extra to draw the viewer's eye to that one. Or I might put that at the top. I might use order to draw the eye, instead of putting it in the middle of bottom, which is happen to be where it is. I might rearrange them. And then finally, going through and making all of these graphs look very consistent, and making sure everything looks good.

That was really a very quick run through of what in person would be a group activity, but just kind of giving you a sense of how I might go through and approach this and change, this is going to be really interesting because I'm going to go back, change this clown barf. I use that term sometimes, it's clown barf. And change it to something that might be a little bit more streamlined and a little bit easier to understand. All right, we're going to clear this. Clear all my drawings. Okay. We're going to wrap up.

[Slide 51] I've mentioned a few readings here today. I've mentioned Stephanie Evergreen. Her book is really helpful for choosing the right chart for the right data. And she gives a lot of pointers for making non-sucky charts in Excel. Number two, the Nussbaumer book "Storytelling with Data" talks about how to about using visualizations to augment the messages overall. So using visualizations in context with text to support a story.

And then Stephen Few has really a nice overview of thinking about cognitive load. If you want more information about that, I really recommend that you look at Stephen Few. Unfortunately, "Storytelling with Data" is the only one the library has electronically. The other two, we have in print.

[Slide 52] Okay. I am going to open up the chat once again and paste the link [https://resolver.caltech.edu/CaltechAUTHORS:20200519-101809774] to the workshop that has the activities that we kind of did via Zoom, but has them on activities handout. And also has the slides for this section, apparently also in this slide here. Please if you have questions, let me know at briney@caltech.edu. You can also in the last couple of minutes here, post any questions that you have in the chat. And I really appreciate all of you sitting through this presentation and I hope you have learned something.