Statistics in I CAN Network’s 2023 Social Impact Report

In the last blog post, I briefly explained what I CAN Network’s peer mentoring programs were aiming to achieve and described how I have presented internal evaluation results in I CAN Network’s 2023 Social Impact Report. I represented changes in outcomes during the peer mentoring programs by comparing the distribution of responses before and after the program and using 100% stacked bar charts. From these results, we can calculate the difference in the proportion of positive responses before and after the program and test whether the result is significant and relevant. This is important for assessing whether the changes we are seeing in the peer mentoring programs are real or have arisen through chance.

In this blog post, I would like to provide some information on how to interpret the statistical results in the 2023 Social Impact Report so that you have a better understanding of how we have drawn the conclusions of the report.

Why we conduct statistical analyses

I CAN Network runs surveys and polls with Autistic young people (aged 5-20 years) before and after participating in the peer mentoring programs. In these surveys and polls, mentees rate how much they agree with statements relating to the program outcomes. We then compile these survey responses to look at how they are distributed before and after the program and calculate percentage changes in positive responses. We use these results to explain changes in outcomes among Autistic young people to different stakeholders.

A limitation of these surveys and polls is that they only sample a proportion of Autistic young people who have come to I CAN Network’s peer mentoring programs and were willing to provide feedback. It does not encompass all Autistic young people attending these peer mentoring programs, let alone across Australia. Statistics allows us to extend these survey results to describe what the general effect of the peer mentoring programs would be to all Autistic young people across Australia. It does this by testing how significant and relevant improvements in outcomes due to the peer mentoring programs are among Autistic young people.

The below figure summarises the purpose of statistics.

Statistics visual show the sample being a sub-set of the population, and an arrow going from sample to population to represent statistics.
How statistics can be used to generalise survey responses from the sample (Autistic young people surveyed) to the population (Autistic young people across Australia)

There are a range of statistical techniques that can be used to test the significance and relevance of the findings. The 2023 Social Impact Report uses three statistical techniques to assess the significance and relevance of the findings:

  1. Hypothesis testing: Hypothesis testing allows us to see whether we should accept a statement about a population based on whether the result would have arisen through chance. This is represented by the p-value. If the difference did not arise through chance, then we say that the peer mentoring programs have a statistically significant effect on the outcome.
  2. 95% confidence intervals: Hypothesis testing does not give us the range of possible values that could be experienced by the general population. 95% confidence intervals not only give us that range, but also describe how confident we can be of the result.  
  3. Effect size: The effect size gives us a measure of how relevant our results are. This is done by standardising the changes in outcomes.

These statistical techniques produce numerical outputs that accompany the percentage increases. As an example, in the I CAN Imagination Club® mentoring program (the peer mentoring program for primary schools), we reported an 11% increase in positive responses towards the self-confidence statement. These are accompanied by a p-value of 0.006, a 95% confidence interval of [3%, 18%] and an effect size of 0.22.

What do these numbers mean, and how are they calculated? The next few sections will explain how these statistical outputs are calculated and interpreted.

Hypothesis testing

Distributing surveys to Autistic young people and analysing their responses allows us to calculate percentage increases in positive responses that indicate how much they have changed during the program. However, we are unsure of whether this result would have arisen through chance (i.e., we get a different result if we run the survey again), or if it is an effect that can be generalised to other Autistic young people across Australia. Hypothesis testing allows us to answer this question.

In hypothesis testing, we want to see whether there is a higher proportion of positive responses after the program compared to before the program. In other words, does pafter > pbefore? To see whether that is the case, I used a two-proportion z-test. This statistical technique allows us to convert the difference in proportions into a standard z-value that can be used to determine the p-value. The p-value can be used to see whether the difference in proportions would have arisen through chance or not.

To calculate the z-value, we need two things: the percentage change in positive responses and the standard error (SE). The percentage change in positive responses is derived by calculating the proportion of positive responses before and after the program and taking the difference between the two. We can take the difference between the two proportions as we can rearrange the inequality from pafter > pbefore to pafter – pbefore > 0. If the difference between the two proportions is positive, it means there is a higher proportion of positive responses after the program compared to before. 

At the same time, we calculate the standard error of the difference between two proportions. The standard error describes the spread of calculated differences from running multiple hypothetical tests to see whether we would get similar results. From there, we divide the difference between two proportions by its standard error to get the z-value. 

We match the z-value to the z-distribution curve to get the area underneath the tail on one side of the curve. The area underneath one side of the curve is our one-tailed p-value, the probability that we would produce a difference between two proportions that is just as big, if not bigger, than what we would get by chance. If we multiply the p-value by 2, we get a two-tailed p-value which describes the probability that we would get a difference (positive or negative) that is just as big, if not bigger, than what we would get by chance. We use the two-tailed p-value in our hypothesis testing to account for both increases and decreases in the proportion of positive responses after the program.

Shading of areas underneath one or both sides of the curve to represent one-tailed and two-tailed p-values respectively.
One-tailed and two-tailed p-values in a standard normal distribution

If the two-tailed p-value is 0.05 or below, the result is considered to be statistically significant. In other words, it is unlikely that the increased proportion of positive responses arose randomly. This result indicates that the peer mentoring programs may have contributed to positive outcomes among Autistic young people, something that is evident in our analyses of comments and mentee creations for the 2023 Social Impact Report.  

Going back to our sample self-confidence result, we reported a p-value of 0.006. This means we have a 0.6% chance that we would get an increase in self-confidence that is 11% or more randomly. Given the small chance of it happening, we would likely conclude that the I CAN Imagination Club® mentoring program had an effect on students’ self-confidence.

Confidence intervals

A hypothesis test might indicate that a specific result is statistically significant as indicated by a low p-value. However, it does not give us the range of possibilities that could be experienced by the general population as a result of the program. This is where the 95% confidence interval comes in. The 95% confidence interval indicates a range of possible values where we are 95% sure the true percentage increase lies. The size of the confidence interval can tell us how confident we can be of the result:

  1. A small 95% confidence interval indicates a small margin of error, indicating a high level of confidence in the result.
  2. In contrast, a large 95% confidence interval makes us less certain of the result as the true percentage increase could take a wide range of values.

To calculate the lower and upper bounds of the 95% confidence interval, we multiply the standard error by 1.96 and subtract or add it to the observed percentage increase respectively. The lower and upper bounds of the 95% confidence interval are encased in square brackets to indicate the range of possible values of the true percentage increase.

Showing the 95% confidence interval of [3%, 18%], with end-points shaded, on a number line.
A visual of the 95% confidence interval for increased self-confidence

In our sample self-confidence results, we had a 95% confidence interval of [3%, 18%]. This means we are 95% confident that the true increase in self-confidence among students attending the I CAN Imagination Club® mentoring program could be as low as 3%, or as high as 18%. This confidence interval gives us a range of possibilities that the I CAN Imagination Club® mentoring program could take to boost self-confidence among Autistic young people in Australia. 

Effect size 

A result could be statistically significant as indicated by a low p-value. However, this result might not be relevant in real life as the change is too small to have an impact on the population of interest. This is particularly true when we survey hundreds or even thousands of people.

This is where effect size comes in. The effect size describes whether the increase in the proportion of positive responses has an impact on real life. To do this, we use Cohen’s h which describes how far off the proportion of positive responses after the program is from before the program. A larger h value indicates bigger changes in outcomes. We use cut-offs of h = 0.2, h = 0.5 and h = 0.8 to indicate small, medium and large effect sizes respectively.

In the 2023 Social Impact report, Cohen’s h in most outcome statements range from 0.15 to 0.35, representing insignificant to small effect sizes. Given that we survey around 300 to 500 students per time point; though, getting a small effect size is still good in a policy setting as it describes a sizeable change that is happening in a large population

Effect size labels with Cohen's h cut-off values.
The effect sizes, along with thresholds of Cohen’s h

Going back to our sample self-confidence result one last time, the effect size (Cohen’s h) is 0.22. As it falls between the Cohen’s h thresholds of 0.2 and 0.5, the effect size is considered to be small. Given our 11% increase in self-confidence over 300 students; though, we still have a notable improvement in self-confidence that would be relevant in the real world. 

Conclusion

This blog post provides an overview of the statistical techniques that were used in the 2023 Social Impact Report. The result is a series of numbers that accompany the percentages changes to underline the significance and relevance of the findings. These analyses are important in providing an evidence base behind the effectiveness of I CAN Network’s peer mentoring programs to Autistic young people across Australia.

Personal disclaimer

This blog post was written by James Ong in his personal capacity. The content, views and opinions represented in this blog post are solely my own and do not reflect those of I CAN Network Ltd. 

One thought on “Statistics in I CAN Network’s 2023 Social Impact Report

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.