[MUSIC] In this lecture, we'll talk about the concept of effect sizes. I'll explain what they're good for, how you can use them, and why you should report them. And in the next lectures, we'll dive into more detail in two families of effect sizes, Cohen's d and correlations. Now, there are some people who think that effect sizes are in essence the main thing that we're interested in when we perform a study. There might be a lot of attention to the specific level of the p-value, but Cohen was one of the first to point out the importance of effect sizes in the scientific literature, and he says this is basically the main thing that you want to know, the primary product of a research inquiry. So it's important to always report these effect sizes, but not just to report them, but also to correctly interpret them. Effect sizes serve three main goals. First of all, they allow you to communicate the practical significance of the results. An effect might be statistically significant, but does it actually really matter in practice? Second, effect sizes allow you to draw meta-analytic conclusions. If there are a number of studies in the published literature, you can group all these studies together, calculate the meta-analytic effect size and get the best estimate of the true effect size in the population. If we take a look at simulated studies where we perform independent t-tests and each group has 25 participants, you can see that there's quite a lot of variation in the effect size estimate. This forest plot illustrates effect sizes by the squares. And you can see that the squares jump around a little bit on the axis, and the axis is the effect size. So in this case, there's actually no true effect size, there's no true effect in the population, so the true effect size is zero. But we've drawn random samples from this population and you can see that the effect size estimate within each individual sample actually differs from quite reasonably high and quite reasonably low effect size estimate, even though the true estimate should be zero. Now finally, effect sizes allow you to perform a power analysis. When we talk about controlling Type II errors, it's important to think about what the true effect might be in the population. And an effect size tells you this, and you can use it in a power analysis to determine the amount of participants that the study would require to achieve a certain probability of finding a true effect if it's really there. Now, this is an example of G*Power, which is free software, and I'm not going to explain exactly how you should use it. We'll talk about this in an assignment. I just want to highlight this field where you see that the effect size, Cohen's d, needs to be inserted if you want to perform the power calculations. There are different types of effect sizes. There are unstandardized effect sizes, which are basically just the difference that you've observed between two groups. Let's say you measure how quickly I can run down the corridor. You measure how quickly someone else runs down the corridor. You do this multiple times, and you want to see whether one or the other is fastest. Now in this case, there's no need to calculate any weird type of effect size. You can just use the time as an unstandardized effect size. People all understand how to interpret time. So time is a very logical skill to use, and if I'm two seconds faster, you know how you should interpret this. Now, not all research allows you to use unstandardized effect sizes. For example, think about people in the literature who study happiness. They might measure happiness by asking people, "How happy are you at this moment, ranging from not happy at all at 1, to 7, very happy." Other people in the literature might use a five point scale to measure happiness. We can immediately see that the happiness score of 4 does not have any fixed meaning. It really depends on the scale that was used. So if we want to compare the size of the difference between these studies, we need to find a way to standardize these differences. And that's what a standardized effect size can do. Now, it's very important to interpret the effect size of a study. A statistically significant effect alone might not be real reason to worry. And this is a good example. There was a study performed by Facebook which influenced how happy people would feel. The way that they did this was either by removing some very positive posts on people's timelines or they removed very negative posts on people's timelines. And then they measured the comments that people typed on Facebook and how positive or negative were the Facebook posts that people made. Now, there's quite a lot of attention for this study because the results show that when people saw less positive information, people actually were slightly less positive in the Facebook messages that they posted themselves. Now, if we look really at the data themselves and especially not on whether they were statistically significant or not, but at the effect size, we can see that the effect size here was a Cohen's d of 0.001. Now that's a very tiny effect size. It was statistically significant because Facebook has a huge amount of data. They have millions and millions of Facebook posts to analyze. But the difference that they observed was so small that it was for all practical purposes almost meaningless. This implies that the effect that they observed was so small - in the emotional connotation of the words that people typed - that after 3,570 words, one more negative word was typed in the condition where people did not see positive feedback. Now, unless you type really, really long Facebook posts, this is not a noticeable effect at an individual level. This is only statistically significant over a huge number of people. But people were really worried, and there was even one blog post somewhere where they said we know that mood is correlated with suicide. So maybe by experimenting on people, Facebook actually caused more people to commit suicide. But if you interpret the effect size, you can see that the effect is so, so tiny that it is basically impossible to have real life consequences. So it's important to think that effects might be statistically significant, but they can be practically insignificant. Just because there is a true effect doesn't mean that we need to care or worry about it. On the other hand, it's also important to realize that small effects might have real world consequences. In the meta-analysis in which researchers studied interventions for juvenile delinquency, they found that these interventions had reliable but small effect size, Cohen's d of 0.11. Now, even though we consider this a small effect, it has real life consequences that changes people's lives after the intervention. Instead of 50% of the young people leaving jails going on to commit more crimes, this reduces to about 45%. So this might be a small difference, but it's a very impactful difference for these people. It's also important to interpret effect sizes because they can be implausibly large. We talked about small effect sizes maybe being not really practically relevant. But big effects can also make you wonder what's really being studied and whether the result really makes sense. Let's look at one example. In this study, researchers found a correlation between the suicide rate of white individuals and the amount of country music that was played on the radio. Now, this study won an award, an Ig Nobel prize, which is a prize for studies that make you laugh and then make you think. Now, in this case I think this study should make you think effect sizes, and how implausibly large effect sizes might indicate that this is not a reliable effect, but just a fluke. Indeed it turns out that follow up research was not able to confirm this result. And if you look at the effect size in this case, we can see that there's a correlation of .54, which is a large correlation. That's a sizeable, noticeable effect. And it should make you wonder whether it's likely that listening to country music on the radio really has such a huge effect on the likelihood that people will commit suicide. If you see an effect size this large you should say, "I don't think that's very plausible." And you should make sure that you keep some doubts about the reliability of this finding. There are two types of families of effect sizes. A differentiation is made between the Cohen's d family, which are standardized mean differences. And on the other hand, we have the r family, which are measures of strength of association, things like correlations. And in the next two lectures, we'll go into more detail about both these families. Now in this lecture, we talked about the importance of reporting and interpreting effect sizes. Effect sizes might arguably be the most important outcome of a study. The p-value will tell you something about whether there's a signal or a noise. But even if there is noise, you might want to just accurately estimate that the true effect is almost completely zero. So you should always report effect sizes in your results sections, and more importantly, you should also always interpret the size of the effect. [MUSIC]