Graphing in I CAN Network’s 2023 Social Impact Report

I CAN Network is an Autistic-led social enterprise that aims “to prove what Autistics CAN do”. It is working towards this aim by running a variety of programs that engage Autistic people and their supporters (for example, parents, carers and teachers). In particular, I CAN Network runs peer mentoring programs in school and online environments for Autistic young people aged 5-20 years to come together and build their social connections, improve their self-esteem and develop life skills. It also runs professional development programs for schools to create safe environments for Autistic young people to be themselves and engage in school.

Over the past 10 years, I have been working on internal evaluations for I CAN Network. I planned and conducted evaluations measuring outcomes of I CAN Network’s programs and wrote numerous reports describing the results of these evaluations. These reports describe the outcomes I CAN Network has achieved in its programs, particularly peer mentoring programs for Autistic young people. While most reports are delivered to state governments and funders, I also wrote Social Impact Reports that are publicly available on I CAN Network’s website, with the 2023 edition being recently released.

With the 2023 Social Impact Report being the last evaluation report I will write for I CAN Network, in this blog post I will describe why I have designed the graphs in a particular way to make them easy for the reader to interpret and understand. 

What are the aims of I CAN Network’s programs?

I CAN Network runs school and online peer mentoring programs for Autistic young people aged 5-20 years. At school, I CAN Network runs I CAN Imagination Club® and I CAN School® mentoring programs for Autistic young people in primary and secondary schools respectively. These programs are run during school hours. In comparison, I CAN Network runs I CAN Online, delivering online mentoring programs after school for Autistic young people across Australia.

All peer mentoring programs aim to achieve outcomes in the following three areas:

  • Self-esteem: The peer mentoring programs aim to improve Autistic young people’s views of themselves and their Autism.
  • Social connection: Autistic young people in the peer mentoring programs interact with each other to establish social connections and friendships, as well as their parents, teachers and mentors to develop support networks.
  • Skill development: These relate to Autistic young people acquiring and improving life skills such as communication and stress management so that they are more likely to advocate for themselves and manage their own lives.

These outcome areas are supported by a positive program environment, where Autistic young people feel safe to be themselves without feeling judged. This allows them to open up and interact with other people.

The below figure summarises the outcomes that I CAN Network’s peer mentoring programs aim to achieve.

Venn diagram of the three outcome areas of I CAN Network's peer mentoring programs, encompassed by a bigger circle representing 'positive program environment'.
A visual summarising the aims of I CAN Network’s peer mentoring programs

These aims provide a structure in which the peer mentoring programs can be assessed. One way this can be done is by running internal evaluations within the organisation.

Why I have not used mean scores in the report

I CAN Network runs internal evaluations by distributing surveys and polls to Autistic young people before and after the program. Each survey or poll contains statements relating to the outcomes of the program, where mentees rate on a 3- or 5-point Likert scale how much they agree with the statements. We then compare responses before and after the program to see whether mentees have changed during the program. These can be shown both numerically and graphically.

One way to compare responses is to convert them into a score, calculate the mean score before and after the program and compare them to see whether that has changed. For I CAN Imagination Club® mentoring programs, I can convert the yes, maybe and no responses into scores of 1, 0.5 and 0 respectively and calculate a mean score out of 1. Similarly, for I CAN School® mentoring programs, I can convert the responses into scores ranging from 1 (for strongly disagree) to 5 (for strongly agree) and calculate a mean score out of 5.

We can compare the mean scores before and after the program in a table along with the standard deviation (SD which measures the spread of scores) and the total number of responses received.

StatementBeforeAfter
Mean scoreSDTotal # responsesMean scoreSDTotal # responses
I can try new things0.740.293310.800.27303
I know what makes me special0.700.373230.790.34302
I think my brain is awesome0.690.363270.760.34298

We can also visualise the scores in a bar chart, placing the mean scores before and after the program alongside each other. We can rule black solid lines to separate the statements, making it easier to compare mean scores before and after the program within a statement.

Bar chart comparing mean scores before and after I CAN Imagination Club program over three outcomes
Mean scores before and after the program when the scale starts from 0

There are two main problems with using mean scores to compare outcomes. Numerically, calculating mean scores simplifies the responses too much. This makes it difficult to explain a specific mean score (“What does a mean score of 0.74 mean?”) or changes in a mean score (“What does a mean increase of X units mean?”) and relate them to the outcome. Furthermore, because we design our own statements instead of using a standardised tool, it is not possible to compare the changes to a specific standard to explain how relevant they are. The only way to explain the changes is to use statistics to assess whether the changes are statistically significant and relevant. This makes it difficult to explain to a lay audience how large outcomes have changed.

Graphically, it is hard to accurately visualise changes in mean scores. Scaling the scores from 0 makes it very hard to differentiate mean scores before and after the program. In contrast, scaling the scores from a higher base value distorts the differences before and after the program, deceiving the reader. For example, in the ‘I can try new things’ statement, the mean score only increased by 0.06 after the program. That difference is quite small when scaling the scores from 0, but it is magnified when scaling the scores from a higher base value (specifically 0.62), distorting the difference. 

For the above reasons, I decided not to describe changes in outcomes using mean scores. Instead, I represented the changes in a different way. 

Why I have used 100% stacked bar charts in the report

To represent changes in outcomes among mentees attending the peer mentoring programs, I decided to compare the distribution of responses before and after the program. For each timepoint, I calculate the proportion of responses belonging to a particular category out of the total number of responses received (excluding not sure and missing responses). I then bring the categories together to form 100% stacked bar charts. I use black solid lines to separate the statements and place the distribution of responses before and after the program next to each other to make it easier to compare them. 

100% stacked bar chart comparing the distribution of responses before and after I CAN Imagination Club over 3 outcomes
Changes in I CAN Imagination Club® outcomes, expressed as proportions of the total number of responses

Using the distribution of responses is a better way to represent changes in outcomes compared to using mean scores. Numerically, it is easier to explain changes in outcomes by saying how much responses in a particular category have increased or decreased after the program. For instance, in the I CAN Imagination Club® graph above, we can compare the proportion of ‘yes’ responses before and after the program and calculate how much it has increased. We can see that compared to before the program, mentees coming into the program show an:

  1. 11% increase in trying new things (‘self-confidence’); 
  2. 14% increase in knowing what makes them feel special (‘self-acceptance’); and
  3. 9% increase in thinking their brain is awesome (‘neurodiversity acceptance’).

Representing outcomes as percentage changes makes it easy to explain to the reader how Autistic young people have changed in the peer mentoring program.

100% stacked bar chart comparing the distribution of responses before and after I CAN School over 4 outcomes
Changes in I CAN School® outcomes, expressed as proportions of the total number of responses

Graphically, showing the distribution of responses makes it easier to visualise how different responses shift over time. For instance, in the I CAN School® graph above, we can see that the proportions of ‘strongly disagree’ and ‘disagree’ responses have fallen, while the proportions of ‘agree’ and ‘strongly agree’ responses have risen. Visualising the distribution of responses in this way makes it easy for the reader to interpret the result to see whether Autistic young people are improving as a result of the program. It also does not deceive the reader as the graph is scaled from 0% to 100%, ensuring that the changes are properly represented.

Hence, using 100% stacked bar charts to show the distribution of responses allows us to retain all the information we have collected from our surveys and makes it easier to explain changes in outcomes to a lay audience. 

Visualising mentee demographics using different graphs

In the 2023 Social Impact Report, I have also visualised some demographic data to describe who participates in I CAN Network’s peer mentoring programs. I used two different graphs to visualise demographic data: pie charts and bar charts.

There has been some talk as to why pie charts are bad visualisation tools and that bar charts should always be used when looking at proportions. In my opinion, pie charts are still useful for visualising proportions. However, there are some conditions that need to be met:

  1. It should show as few categories as possible (five at most).
  2. It should be as visually simple as possible, with the parts clearly signposted.
  3. The pie chart should be in 2-D instead of 3-D to prevent distortion of the data.
Pie chart showing the proportion of I CAN Online mentees belonging to different genders
Gender among I CAN Online mentees as a pie chart

For gender, I used a pie chart as it only has three categories (male, female and trans and gender diverse), making it easy to show the sizes of each gender as a whole. I have also kept the pie chart as visually simple as possible. 

  1. I have earmarked each section with a solid black outline and different colours representing each gender.
  2. I have also included some data labels indicating what each section represents, along with the number of mentees and the proportion out of the total number of mentees.

Using a pie chart provides a clear picture of how mentees are split up over the three gender categories.

Pie chart showing the proportion of I CAN Online mentees belonging to different Australian states and territories
States and territories among I CAN Online mentees as a pie chart

In contrast, I used a bar chart instead of a pie chart to represent Australian states and territories where I CAN Online mentees live. A pie chart is not appropriate as there are eight states and territories in Australia, resulting in numerous sections being crammed in one space. Using different colours to represent each section and data labels to identify different sections further increase the visual complexity of the graph. Hence, it is hard for the reader to comprehend what proportion of mentees live in a specific state or territory.

Bar chart showing the proportion of I CAN Online mentees belonging to different Australian states and territories
States and territories among I CAN Online mentees as a bar chart

In comparison, using a bar chart provides more room to show different categories. It is also easier to look at the length of each bar and align it to the y-axis to see what proportion of mentees live in a specific state or territory. The limitation of a bar chart is that it is hard to see how different states and territories can be pieced together to form a whole like what can be done in a pie chart.

Hence, pie charts have their uses, but their benefits and drawbacks compared to bar charts need to be considered when deciding how to visualise the results.

Conclusion

In this blog post, I explained how I have designed the graphs that appear in the 2023 Social Impact Report. People may have different opinions of how data should be analysed and visualised. From my perspective, I place great importance on presenting the results as simply as possible so that everyone can easily grasp what they are seeing. This facilitates their understanding of how I CAN Network’s programs are making an impact to Autistic young people attending them. 

Personal disclaimer

This blog post was written by James Ong in his personal capacity. The content, views and opinions represented in this blog post are solely my own and do not reflect those of I CAN Network Ltd. 

Applying the value for investment framework to evaluate NDIS reform

As problems around the world are mounting, it is important to use resources in a way that maximises outcomes and minimises costs. How we assess that can be tricky. Thankfully, we can use the value for investment (VFI) framework to assess the value for money (VFM) of a program or policy. As a research assistant at the University of Melbourne, I was first exposed to the VFI framework while reading a resource on the framework. I was lucky enough to attend a half-day workshop on the VFI framework presented by Julian King, the creator of the framework, during the 2023 AES conference. I did a bit of further research on the VFI framework in preparation for this blog post. 

In this blog post, I would like to give an overview of what I have learnt about the VFI framework, and apply it to a hypothetical evaluation of a reform that is currently happening.

Defining value for money (VFM)

Value for money (VFM) is commonly used when designing and assessing programs to measure how much value is produced and how much resources are consumed. However, different organisations have various ways of defining VFM. In the context of the VFI framework, VFM measures good resource use in terms of how well resources are used to produce outcomes. Essentially, VFM boils down to three key questions:

  1. How well are we using resources?
  2. Is the resource use creating enough value?
  3. How can we create more value from available resources?

Value in VFM does not just encompass money. It can also cover other aspects of a program such as social, environmental and economic; social objectives such as equity, sustainability and human rights; and resources invested such as people, power and inspiration. 

An overview of the value for investment framework

The value for investment (VFI) framework is an evaluation framework that can help evaluators assess the VFM of a currently running program or policy. Rather than being an economic evaluation framework, the VFI framework is a process in which evaluators use evidence generated from mixed methods research and evaluative reasoning to judge the VFM of a program. Central to the VFI framework is its participatory nature, where various stakeholders, including end-users, are involved in defining how a program or policy will be judged. This involves meeting stakeholders frequently to hear and understand what they have to say.

Step-by-step flowchart showing (from left to right) the 8 steps in conducting an evaluation using the VFI framework.
A flowchart of the 8 steps in the value for investment (VFI) framework. Source

The VFI framework consists of 8 steps: the first four relate to formulating the evaluation design while the final four relate to conducting the evaluation itself.

  1. Step 1 is background research behind the program which often involves collecting and reading documents of the program.
  2. Steps 2 and 3 engages stakeholders to co-design rubrics that can be used to judge a program or policy. Each rubric consists of two elements:
    • Criteria: What stakeholders value; and
    • Standards: What evidence is needed to meet a certain level of performance for each criterion.
  3. Step 4 identifies what evidence needs to be collected to judge a program or policy and how that evidence will be collected. 
  4. Steps 5 and 6 collate evidence behind a program. This involves data collection and analysis respectively, often employing both qualitative and quantitative research methods to understand what the evidence is saying. 
  5. Step 7 makes sense of the evidence to render a judgement for each criterion, leading to an overall assessment of the VFM of a program or policy.
  6. Step 8 reports on the evaluation results and disseminates it to relevant stakeholders.

The result of using the VFI framework is an overall judgement of the VFM of a program or policy based on reasoning and evidence. 

The background behind the NDIS reforms

In this blog post, I would like to apply the VFI framework to plan a hypothetical evaluation of the National Disability Insurance Scheme (NDIS) reforms, a policy that is starting to occur. Before doing so, I would like to offer some background information on the NDIS and its reforms. 

The National Disability Insurance Scheme (NDIS) is a social insurance scheme run by the National Disability Insurance Agency (NDIA) for people with disability in Australia. Launched on 1st July 2013, people with significant and permanent disability receive individual funding from governments to purchase supports that meet their reasonable and necessary needs. Although the NDIS has made a difference to the lives of people with disability and their families and carers, a number of problems emerged while implementing the NDIS such as cost blowoutsinconsistency in support and funding decisions made among different NDIS participants and accompanying community and mainstream supports not being developed for people with disability, irrespective of their eligibility of the NDIS.

Hence, on 18th October 2022, the Hon Bill Shorten MP, Commonwealth Minister for the NDIS, launched an Independent Review of the National Disability Insurance Scheme (NDIS Review). The NDIS Review panel was asked to examine the design, operations and sustainability of the NDIS and report back to federal and state disability ministers. What distinguished the NDIS Review from other government inquiries was its deep engagement with the disability community to hear their views on the NDIS and its consideration of a number of challenges the NDIS was facing. 

Front cover of the final report of the NDIS review
The front cover of the final report of the NDIS review.

In tabling its final report on October 2023, the NDIS Review panel put forward 26 recommendations and 139 supporting actions to put people with disability back at the centre of the NDIS and to bring federal and state governments together to make the NDIS and disability services accessible and inclusive for people with disability. One of the key issues explored in the NDIS Review was the sustainability of the NDIS which is being cast in doubt due to rising costs associated with the scheme. In its final report, the NDIS Review panel defined sustainability as follows: 

“Where the NDIS provides supports that are reasonable and necessary, demonstrably net-beneficial, and cost-effective. Governance arrangements provide clear accountabilities for managing lifecycle costs and financial risks. Scheme expenditure is predictable and provides benefits to participants, carers and the broader community, ensuring that Australians remain willing to contribute to it in an enduring manner.” (emphasis added)

In other words, the panel stated that sustainability does not just cover the costs, but also the benefits of the NDIS. Improving sustainability of the NDIS is important to provide certainty that the NDIS will be available to present and future generations and to maintain the trust that Australians have towards the NDIS. Given that governments will be held accountable for achieving this goal, sustainability of the NDIS is something that can be evaluated using the VFI framework. 

Applying the VFI framework to NDIS reform

There are a number of tools that we can use to evaluate the sustainability of the NDIS. Cost-effectiveness analysis involves presenting the changes in costs and benefits as a result of NDIS reform before calculating the net change in cost per unit of benefit. In contrast, cost-benefit analysis converts all costs and benefits to monetary values before subtracting benefits by costs to measure how much is gained from implementing the NDIS reforms. The limitation of these techniques is that they are reductionist and oversimplistic, providing an incomplete picture of the key benefits of NDIS reform and how costs are being reduced.

This is where the VFI framework can be used to evaluate NDIS reforms. By evaluating NDIS reform over a number of criteria, we can assess whether the NDIS is becoming more sustainable not only in terms of reduced financial costs and improved efficiencies, but also deliver increased benefits to all stakeholders. The participatory nature of the VFI framework also allows all stakeholders, particularly people with disability, to participate in the evaluation and to identify what they value the most. This is important given the call from the NDIS Review panel to put people with disability at the centre of the NDIS.

The VFI framework could be applied to evaluate the sustainability of the NDIS over the following steps: 

Step 1 (Understand the program): Initially, documents relating to the NDIS before and during the NDIS review would be collected. This would not only include the final reports of the NDIS review and other government inquiries, but also documents that are publicly available such as webpages and newspaper articles. Information from these documents would provide a birds-eye view of what needs to be considered when planning the evaluation such as the context around the NDIS reforms and who we need to engage.

Step 2 (Criteria and standards): Numerous criteria can be used to evaluate NDIS reform. Relevant criteria would be raised by engaging stakeholders who are involved in or are affected by the NDIS. Respectively, these would include policymakers and public servants in the NDIA, as well as people with disability who are eligible or ineligible for the NDIS and their families and carers. It is important that the views of both groups are considered when planning and conducting an evaluation of NDIS reform.

These groups would meet separately over two engagement sessions. For each group, the first engagement session would involve identifying criteria to rate the NDIS reforms, while the second engagement session would come up with standards to assess each criterion. The end-result of these engagement sessions would be a set of rubrics that would allow NDIS reforms to be judged over a number of areas. For example, here are two rubrics on cost-effectiveness and equity that could be developed as a result of consultations with stakeholders. 

Cost-effectiveness
ExcellentThe costs of running the NDIS are only increasing by 8% per annum or below.
All people who are on the NDIS are benefiting from being on the scheme.
GoodThe costs of running the NDIS are starting to decrease towards 8% per annum.
Most people who are on the NDIS are benefiting from being on the scheme. 
AdequateThe costs of running the NDIS are continuing to increase unsustainably at more than 8% per annum. 
The benefits of the NDIS are increasing for more eligible people with disability.
PoorThe costs of running the NDIS are continuing to increase unsustainably at more than 8% per annum. 
The benefits of the NDIS for eligible people with disability have not increased or have even decreased. 
A hypothetical rubric for the cost-effectiveness criterion
Equity
ExcellentThe NDIS is accessible to all people with disability eligible for the scheme, including everyone from underrepresented groups (for example, Aboriginal and Torres Strait Islanders, rural and regional areas). 
Mainstream and community disability services are accessible to all people with disability, regardless of their NDIS eligibility and location.
GoodMost people with disability can access the NDIS, with only a small number of eligible people with disability finding it difficult to access the scheme.
Most people with disability can access mainstream and community disability services, regardless of their NDIS eligibility, though there are areas where these services remain inaccessible.
AdequatePeople with disability can access the NDIS, but it remains inaccessible for people with disability from underrepresented groups (for example, Aboriginal and Torres Strait Islanders, rural and regional areas).
Some people with disability can access mainstream and community disability services, but they remain inaccessible for most people with disability. 
PoorThe NDIS remains inaccessible to all people with disability.
Mainstream and community disability services remain inaccessible to all people with disability ineligible for the NDIS.
A hypothetical rubric for the equity criterion

Step 3 (Evidence needed): Once rubrics are developed for each criterion, both groups would be engaged to brainstorm evidence that needs to be collected. This evidence would encompass both qualitative and quantitative data that directly assesses the impacts of the NDIS reforms and the experiences of implementing them. The end-result would be a data collection matrix that would guide data collection and analysis during the evaluation.

    Step 4 (Gather and analyse evidence): Over a number of years, a variety of evidence would be collected on different parts of NDIS reform. This would include, but is not limited to, surveys to end-users, interviews and focus groups with NDIA staff and people with disability and observations of NDIS reform in action. These pieces of evidence would be analysed to describe what is happening in the NDIS reforms. 

    Step 5 (Synthesis and judgement): From the evidence collected, evaluators, ideally in consultation with different stakeholders, would use the rubrics to reach an evaluative judgement on each criterion, as well as the overall VFM of the NDIS reforms.  

    Step 6 (Reporting): An evaluation report on the NDIS reforms would be produced summarising the VFM and criteria judgements that are reached, and the evidence and reasoning behind these judgements. Additionally, a one-page infographic would be produced summarising the evaluation results for busy policymakers and people with disability and their families and carers. 

    Conclusion

    The VFI framework is a simple but effective tool for making evaluative judgements on a program or policy. It provides a number of steps that can be followed to come up with a set of relevant criteria, and then to collect evidence to judge a program or policy based on evaluative reasoning. As someone who normally dislikes judging a program or policy, the VFI framework is helpful in giving a step-by-step guide of making judgements on a program. Additionally, the VFI framework stipulates the participatory nature of the evaluation to ensure that everyone’s views are considered. In an age where underrepresented voices need to be considered when planning and implementing programs, running an evaluation based on what people value the most is paramount to ensuring the evaluation is relevant and useful in improving the implementation and outcomes of programs and policies.

    References

    King, J., Crocket, A., & Field, A. (2023). Value for investment: Application and insights. Dovetail Consulting. https://www.julianking.co.nz/wp-content/uploads/2023/09/YPMHA-exemplar-report-230901-1.pdf

    For more resources on the VFI framework, see the “Value for Investment Resources” from Julian King & Associates at https://www.julianking.co.nz/vfi/resources/ and Julian King’s blog on https://juliankingnz.substack.com.

    Tracing the evolution of evaluation using the evaluation tree

    Anyone can do program evaluation. It can be as simple as distributing a survey to participants to get their feelings towards a program, or as complicated as tracking the implementation and outcomes of a program over many years. However, to become an evaluator, one needs to understand evaluation theories.

    An evaluation theory is a set of guidelines that describe what a good evaluation is. Different evaluation theories provide different perspectives of how evaluation should be conducted. Selecting one or multiple evaluation theories can guide us on what methods and processes are appropriate for a specific evaluation. One framework which has helped me sort these evaluation theories is the evaluation tree by Marvin C. Alkin.

    In this blog post, I will explain what the evaluation tree is and how it is useful for learning how to conduct an evaluation.

    Introducing the evaluation tree

    Marvin C. Alkin is a Professor Emeritus of Education at the University of California, Los Angeles (UCLA). After receiving his doctorate in education from Stanford University in 1964, he moved to UCLA where he has been working in the Graduate School of Education and Information Studies ever since. He is best known for his research on the use of evaluations and comparing different evaluation theories. He, along with Christina A. Christie, first introduced the evaluation tree as part of the book Evaluation Roots in 2004. The evaluation tree has been updated numerous times, with the latest iteration published in 2023 as part of the third edition of Evaluation Roots. 

    The third edition of the evaluation tree (Alkin, 2023).

    The evaluation tree is a conceptual framework that highlights the foundations of evaluation and sorts different evaluation theories into three groups (represented by the three branches) depending on which part of program evaluation they emphasise the most: 

    1. Methods: The study design describing how the evaluation will be conducted.
    2. Valuing: How and by whom evaluations can make judgements on a program.
    3. Use: How the evaluation will be utilised.

    These groups are interrelated to each other, with each evaluation theory being placed in a primary branch and aligned to a neighbouring branch. For example, evaluation theories that are placed to the left of the valuing branch are also closely aligned to the methods branch.  

    The foundations of evaluation

    The foundations of the evaluation tree (Alkin, 2023).

    The evaluation tree is supported by a trunk describing three foundations of evaluation. These foundations provide the “why” and the “how” of conducting any program evaluation.

    Evaluation foundation #1: Social accountability

    The first foundation is social accountability. Evaluation provides information on what has been accomplished as a result of implementing the program. Policymakers and program managers can use this information to make decisions on their programs, while advocates and end-users can use the same information to hold decision-makers accountable. According to Alkin (1972), there are three types of accountability, each of them covered by an evaluation type:

    1. Goal accountability which assesses whether appropriate goals and objectives of the program were set. This is associated with formative evaluation which is designed to generate a program theory, a description of how a program should work to produce outcomes. 
    2. Program accountability which assesses whether appropriate procedures have been implemented as intended to work towards the program goals and objectives. This is associated with process evaluation which assesses how the program is being implemented and whether that is following the plan. 
    3. Outcome accountability which assesses whether and how much the goals and objectives of the program have been achieved. This is associated with summative evaluation which measures the short-, medium- and long-term outcomes of a program.

    Evaluation foundation #2: Social inquiry

    The second foundation is social inquiry. This describes the application of social research philosophies and methods to study how people behave in social groups and settings. This can range from quantitative studies that assess whether a program is effective in producing the desired outcomes to qualitative studies that explore the experiences of people participating in a program. 

    Evaluation foundation #3: Epistemology

    Relating to social inquiry is epistemology, the third foundation of evaluation. Epistemology describes what counts as knowledge and how reality can be interpreted. There are three main schools of thought in epistemology:

    1. Post-positivism: The goal of post-positivists is to measure truth within a single reality with some degree of uncertainty. Post-positivism is associated with identifying and controlling values and biases to study the link between causes and outcomes as cleanly as possible.
    2. Constructivism: Constructivists believe that there is not one single reality, but rather multiple realities that are ‘constructed’ from the subjective beliefs of individual people. In contrast to post-positivists who try to control bias, constructivists embrace bias to consider how different people may view a program.
    3. Pragmatism: Pragmatists sit in the middle between post-positivists and constructivists. While they agree with constructivists in that there are multiple views of reality, they also align with post-positivists in identifying a view that is most aligned to what is happening in reality. 

    The next few sections will describe each branch in detail and highlight, in my opinion, some of the most prominent evaluation theories in each branch, along with the principal people behind the theories.

    The methods branch

    The methods branch of the evaluation tree (Alkin, 2023).

    Emerging from the social inquiry foundation of evaluation, the methods branch is concerned with applying appropriate research methods to evaluation in order to generate knowledge. Most evaluation theories in this branch use or adapt experimental methods that were pioneered in the social sciences and applied research. Prominent evaluation theories in the methods branch include:

    1. Experimental methods (Donald Campbell): Campbell is best known for introducing experimental and quasi-experimental designs into social science research and evaluation. These study designs aim to reduce bias when studying the effects of programs on outcomes. The difference between the two study designs is that while participants are randomised into a control and treatment group in experimental designs, no such randomisation occurs in quasi-experimental designs. 
    2. Quasi-experiments (Thomas Cook): Cook extended the concept of quasi-experimental designs in evaluation, arguing that evaluators need to take the context of the program and the views of stakeholders into consideration when planning quasi-experimental designs (which can range from pre/post-tests to comparing groups without randomisation).
    3. Tailored evaluation (Peter Rossi): Tailored evaluation contends that research methods should be adapted to what is currently happening in a program. For instance, if a program was currently running, the evaluation would focus less on designing the program and more on measuring what is currently happening. 
    4. Theory-driven evaluation (Huey-Tsyh Chen): Rossi and Chen developed the idea of theory-driven evaluation, where a program theory is designed first describing how a program should work. This identifies potential areas of investigation for an evaluation to measure intended outcomes, as well as identify unexpected consequences. 
    5. Evidence-based policy use (Carol Weiss): This theory stresses the importance of running a methodically sound evaluation to produce results that are reliable and generalisable. It also introduces the notion that evaluation is a political activity, where the political context of a program, including vested interests, negotiation, supporters and critics, influences how evaluation results are interpreted and received.

    The use branch

    The use branch of the evaluation tree (Alkin, 2023).

    The use branch is concerned with producing evaluations that are not only used to inform decisions about a program, but also make an impact to decisions and changes seen in a program. Evaluation theories within this branch emphasise who and how evaluation results will be used to ensure evaluation is utilised in a meaningful way, as well as empowering individuals to conduct evaluations themselves. Prominent theories in the use branch include:

    1. CIPP model (Daniel Stufflebeam): Standing for Context, Input, Process and Product, the CIPP model incorporates evaluation in the program design process to continually provide decision makers with information to improve their programs. The CIPP model is also associated with the establishment of a representative stakeholder panel that works with the evaluator to tailor the evaluation to their needs. 
    2. Utilisation-focused evaluation (Michael Patton): One of the most prominent evaluation theories in the use branch, utilisation-focused evaluation stresses the need to identify primary intended users who are likely to use the evaluation or to have a stake on the results generated. From there, the evaluator engages primary intended users at all stages of evaluation to foster buy-in to use the evaluation results. Evaluations in utilisation-focused evaluation are adaptive, with evaluation questions and designs altered in the face of changing conditions and results generated. 
    3. Learning-oriented evaluation (Hallie Preskill and Rosalie Torres): The goal of learning-oriented evaluation is to motivate individuals, teams and organisations to learn from the evaluation. The evaluator acts as a facilitator to identify needs within the organisation and to guide staff to learn from the evaluation, given the organisation’s capacity for learning.  
    4. Interactive evaluation (Jean King): Interactive evaluation guides the evaluator to continually engage stakeholders, create a participatory environment and foster community leaders in order to build trust, increasing the chances of evaluation use.  

    The valuing branch

    The valuing branch of the evaluation tree (Alkin, 2023).

    Deriving from the epistemology foundation of evaluation, the valuing branch is concerned with how we judge programs. The valuing branch can be split into two sub-branches, with different epistemologies influencing each one. 

    The first sub-branch is objectivist-influenced valuing which is aligned to the methods branch of the evaluation tree. Theories in this sub-branch focus on the evaluators being able to make objective judgements about a program based on what the evaluation has found. Prominent theories in the objectivist-influenced valuing sub-branch include:

    1. Goal-free evaluation (Michael Scriven): This theory emphasises the role of the evaluator in deciding which program outcomes to examine and presenting a single value judgement of whether a program is good or bad based on their own set of criteria. 
    2. Educational connoisseurship (Elliott Eisner): The evaluator is positioned as a connoisseur, similar to an art critic. As a connoisseur, evaluators use their area of expertise to identify important areas of the program to evaluate and describe and make judgements on a program.
    3. Responsive evaluation (Robert Stake): Responsive evaluation advocates on the use of case studies that are tied to specific contexts. Here, the evaluator themselves collects and interprets the beliefs and values of stakeholders to provide a thick description of a program.

    The second sub-branch is subjectivist-influenced valuing which is aligned to the use branch of the evaluation tree. Theories in this sub-branch reject the notion of a single reality, instead arguing that reality can be interpreted differently by individual people. Applied to evaluation, these theories focus on the stakeholders to understand what they value and how this affects the evaluation. Prominent theories in the subjectivist-influenced valuing sub-branch include:

    1. Deliberative democratic evaluation (Ernest House): Placed in the perspective of social justice, this theory emphasises that evaluators have to be responsive to the needs of stakeholders and inclusive of those who would be powerless and disadvantaged during program evaluation.
    2. Values-engaged evaluation (Jennifer C. Greene): Values-engaged evaluation highlights the idea of engaging with multiple stakeholders and the program context to define a set of stakeholder-driven criteria that determine the value of a program.
    3. Fourth-generation evaluation (Egon Guba and Yvonna Lincoln): This theory pushes the responsibility of valuing a program to stakeholders who have different perspectives on the program. The evaluator considers the values of different stakeholders when planning and conducting the evaluation.
    4. Transformative evaluation (Donna Mertens): Transformative evaluation emphasises the need to involve a diverse range of people in evaluation, particularly those from marginalised communities, and to tailor evaluations to challenge the status quo by being inclusive of, building positive relationships with and improving the lives of marginalised communities.

    Which branch do I belong to?

    Out of the three evaluation branches, I am most closely associated with the methods branch. That is because I place a huge emphasis on planning and conducting rigorous, ethical evaluations to produce results that end-users can trust to make decisions on programs. This has been derived from my science and public health training, where I not only have an understanding of experimental designs and qualitative and quantitative research methods, but a commitment to generate evidence to support my conclusions. As an evaluator, I am aligned to theory-driven evaluation as I create program logic maps to design evaluation frameworks and tools that measure how programs are being implemented and how effective they are in producing outcomes. I am also a fan of quasi-experimental designs due to its flexibility in designing evaluations in public health and education where randomisation is either impractical or unethical. Specifically, I use pre-post test designs within a group to more easily track changes in outcomes during the program. 

    Looking at the use branch, I am gradually thinking more about how my evaluations can be utilised more by various stakeholders to make changes to their programs. In my work, I have started to engage with primary intended users to learn more about the programs they run and how my evaluations can be tailored to deliver useful information to them. I am also starting to think about how my evaluations can be widely disseminated to other people and groups and how I can be involved in implementing the recommendations that I have written to ensure that the evaluation is used.

    Going to the valuing branch, I am not a huge fan of evaluation theories in the objectivist-influencing valuing sub-branch. Even though valuing programs is a central activity for evaluators, judging programs under my own set of criteria can be harsh. Furthermore, not involving stakeholders in planning and conducting the evaluation can reduce the chances of them being receptive to the evaluation findings and recommendations, instead increasing their resistance. In some cases, commissioners or program managers may not appreciate the judgements or recommendations I pass down on a program, particularly if they are heavily invested in a program and may not want to change course. 

    On the other hand, I can see how evaluation has evolved under evaluation theories placed in the subjectivist-influencing valuing sub-branch. These theories emphasise the importance of getting stakeholders involved in the evaluation in order to understand their views. Often, the end-product is a set of criteria that is aligned to the values of stakeholders, leading to conclusions that would be considered fair to stakeholders. Consequently, these theories have made evaluation more inclusive of stakeholders, particularly those who historically have been underrepresented such as indigenous people. 

    As evaluation continues to evolve, I see the increasing importance of engaging and working with stakeholders to design evaluations that not only balances rigour with context, but would also be useful to stakeholders. That is something I am developing in my line of work. 

    Conclusion

    Learning different evaluation theories is important because it provides some guidelines and principles of how to conduct a good evaluation. They provide the foundations for designing evaluations that are trustworthy, adapted to context and useful to stakeholders. The evaluation tree provides one framework in which various evaluation theories can be sorted into different groups. The parts of the evaluation tree work together to outline what makes a good evaluation.

    In the process of doing research for this post, I can see how evaluation has evolved to be grounded in strong foundations, yet adaptable to the changing demands of stakeholders. I learnt that while I am most closely associated with the theories in the methods branch, there are other theories in the valuing and use branches that closely resonate with me. Writing this post to summarise the evaluation tree has sparked my interest in learning more about evaluation theories and how I can adapt them in my work. I may showcase some evaluation books and theories that I find interesting, and summarise them in my future blog posts.

    Resources

    Alkin, M.C., & Christie, C.A. (Eds.). (2023). Evaluation roots: Theory influencing practice (3rd ed.). Gilford Publications

    Defining evaluation in my words 

    Five stars on the border between red and blue.

    In late 2023, I graduated with a Master of Public Health (MPH) from the University of Melbourne. During my degree, I specialised in both infectious disease epidemiology, as a way to link back to my previous vocation in immunology, and health program evaluation as I was already doing evaluation work in my two jobs. By completing coursework and a capstone project in the area, I gained a theoretical perspective of how evaluation works which has helped inform my work.

    Now, I am working as an evaluator over two existing jobs to build my evaluation skills, experience and credentials. Eventually, I would like to work as an evaluator in either the public sector, non-government organisation or philanthropy. In that role, I would provide information and data to organisations so that they can improve programs that have a positive impact to society.

    When I tell people that I work as an evaluator, many people ask what kind of work I do. I give a brief explanation of what an evaluator does and how the work I do makes a difference. This blog post is an opportunity for me to provide a detailed explanation of what evaluation is and why I have decided to pursue a career as an evaluator.

    The typical definition of evaluation

    The most common definition of evaluation comes from Michael Scriven who was one of the foremost evaluation theorists. In his words

    “Evaluation is the systematic process to determine merit, worth, value, or significance [of a program]”. 

    In other words, evaluation involves assessing how much a program has contributed to society and whether the outcomes produced are worthwhile. This is typically done by collecting data and evidence behind a program which can range from distributing surveys and collecting data to interviewing program staff and end-users. 

    I do not agree with that definition as it is more applicable to external evaluators who evaluate programs independently of the organisation running them. Many people are resistant to the idea of someone externally evaluating their work as the process can be invasive and disruptive. This is complicated by the fact that they may not be involved or have any control over the evaluation process. 

    Furthermore, they may not accept the results or recommendations of an evaluation. They may continue to run a program as is even when there is evidence that it is producing little value to society. I feel that this definition is a harsh one, and not necessarily reflective of the work that I do.

    My definition of evaluation

    In contrast, most of the evaluation work I have done so far has been internal. I work closely with organisations and program managers to conduct evaluations of their programs. This involves:

    1. Planning the program and evaluation;
    2. Collecting data (which can range from surveys to interviews and more); 
    3. Analysing the data to generate results; and
    4. Reporting on the results internally within the organisation as well as externally to funders and governments. 
    Evaluation cycle showing 4 key steps of an evaluation
    The evaluation cycle

    When I report on the results, I highlight the evidence base we have built up to show that our programs are working to achieve outcomes. At the same time, I identify what we are doing well and what we need to improve to make our programs better. Both aspects are important to organisations as they face increasing calls to be held accountable by funders and governments. They not only need to show that their programs are producing outcomes for end-users, but they also need information to continue improving their programs. Without that information, organisations run the risk of not obtaining or maintaining enough funding to run their programs.

    With that background, I would define evaluation as: 

    “The process of collecting data to provide evidence behind the effectiveness [or ineffectiveness] of a program and to identify strengths and areas of improvement behind a program”.

    To me, that definition softens the blow of evaluation as it does not place an all-or-nothing judgement on the program. Instead, it provides a health check to see how well a program is doing and what needs to be done to make it better. This provides some breathing space for organisations and program managers to abandon, pivot or refine a program based on the results of an evaluation. 

    Viewing evaluation from this perspective improves organisations’ and program managers’ perceptions towards evaluation as they see a purpose behind this activity. They can use the results to highlight the evidence base behind a program, increasing the chances of receiving funding, and to identify areas of the program to improve. Both of these tasks are important for ensuring the sustainability of the program and the organisation.

    Why am I pursuing a career as an evaluator?

    I am pursuing a career as an evaluator as my existing background aligns well with my chosen career. I am someone who thinks through things rationally, who synthesises information and who mostly acts on decisions based on facts and data. I also have a background in science which has allowed me to take accurate measurements, generate and describe results and link them back to what we already know. 

    These two elements of myself are what drove me to pursue a career as an evaluator. I feel that being an evaluator is a natural progression from a science career. That is because I am able to use my rationality and my existing skillset in science to generate and report on results that would be helpful to organisations running programs. The cross-over between science and evaluation also makes it easier for me to transition to being an evaluator, instead of moving to a career where my previous experience could be less useful. 

    Evaluation is also a way to make a difference to society. I am driven to work on things that would benefit all of humanity by improving their lives and living standards. Even though I am not someone who generates novel ideas, I am someone who is willing to take on a supporting role to help people refine their ideas. Being an internal evaluator within an organisation allows me to take on a role that is and will be increasingly important to an organisation’s future. The demand for evaluation will only increase, increasing my chances that I can secure long-term work that would benefit society. 

    Lastly, my previous work is what drove me to pursue an evaluation career. While I was doing my PhD in translational research, I was doing evaluation as a volunteer. In that volunteer role, my work had a more immediate impact to the organisation where I was volunteering, compared to my PhD work where the impacts of my research might either not occur or not be felt for years. Hence, I moved into a career in evaluation so that I can make a more immediate impact to society, where my work has been included in a number of reports and policy submissions.

    Conclusion

    In this post, I have provided my views of what evaluation is and the work that I do. I also outlined the reasons why I have decided to pursue a career as an evaluator. I hope that by reading this post, you will understand the work that I do and how valuable my work is to organisations in improving their programs and to governments and funders in supporting programs that have been shown to work.

    Personal disclaimer

    Any opinions I post in this blog post are solely my own and do not relate to those of my employer or any organisations or clients I work for or with.