The decline of SIFAS (Part 1): How KLab lost the trust of its players

What’s most important to a business isn’t money. It’s trust… There’s no future for a business that’s lost its trust.

Masato Sanada, High School Prodigies Have It Easy Even In Another World

Love Live! School Idol Festival ALL STARS (SIFAS) is a rhythm action RPG mobile game that combines the rhythm game and RPG genres. In this game, players collect and train cards representing their favourite Love Live! school idols to clear songs. Unlike conventional rhythm games where stamina is lost every time a player misses or hits a note too early or late, in this game stamina always gets depleted regardless of whether a person hits a note or not. Hence, players have to employ unique tactics to score in songs while maintaining enough stamina to finish the song. Personally, I enjoy the game as it involves intellect and strategy to build teams to clear songs. 

However, since the game was launched on 26 September 2019 in Japan (JP), the game has been on a slow but steady decline in revenue and player counts. These are a result of the game’s many problems which have not been sufficiently addressed by the developer KLab. These unresolved problems have generated a bad reputation for SIFAS and broken the trust of its casual and serious gamers who have quit SIFAS to play other games. SIFAS’ decline has also motivated KLab to move the game’s development to MyNet Games, a developer with a bad reputation for shutting down mobile games. This post will present the declines in SIFAS’ revenues and player counts, and explain how KLab’s missteps during SIFAS’ second year have contributed to its decline. 

Tracking declines in revenue and player counts over SIFAS’ lifespan

Over 2021, I have been tracking Japanese (JP) revenue of both SIFAS and SIF from game-i. game-i is a Japanese forecasting website for mobile apps which reports on revenue for all JP mobile apps based on data from both the iOS App Store and Google Play. Revenue data can be used to judge the performance of mobile apps in Japan, including mobile games. From there, one can distinguish strong mobile games (e.g., Fate/Grand Order and Genshin Impact) from weak ones (e.g., 22/7 Ongaku no Jikan which was shut down on 22 December 2021 due to persistent low revenues). Hence, I used revenue data from game-i to track the financial performance of SIFAS over its lifespan. 

SIFAS revenues over 2019-2021

Looking at SIFAS alone, we can see a decreasing trend in SIFAS revenue. Initial revenue during year 1 of the game was high, averaging around 300-400 million Japanese yen/month. However, revenue constantly dropped during year 2 of the game, particularly from January 2021 onwards. By July 2021, average monthly revenue had halved to 200 million Japanese yen/month. This decrease in revenue is sustained even in year 3 of the game, even experiencing a 26% drop in November 2021 to 135 million Japanese yen. This dip in revenue had motivated KLab to run guaranteed UR gachas throughout December 2021 in an effort to deplete players’ free star gems (the premium currency of the game), forcing them to pay for more star gems to roll for stronger cards. 

Player counts between July 2021 and November 2021 over two events

The decrease in revenues is mirrored by the decrease in the number of players participating in SIFAS events in JP. In item exchange events, the number of players participating in July 2021 was 82,324. However, that fell to 70,032 players by November 2021. Similarly, for story events, 77,772 players participated in the event in July 2021, but that fell to 64,813 by November 2021. In both event types, the decreases are equivalent to losing around 3,000 players per month in JP, amounting to around 4% of the player base.

Given that revenues remained steady but player counts have fallen from July to November 2021, these results are indicative of two things:

  • First, casual players that are free-to-play (i.e., they do not pay money to play the game) are dropping out of SIFAS due to being disengaged. This represents a loss in potential customers who are willing to pay for the game and to spread the positive word about SIFAS.
  • Second, KLab had not done enough to attract and retain new players in the game due to various problems that I will explain later. Instead, they are relying on the existing player base, particularly the whales (players who pay a lot of money to play the game), to keep SIFAS afloat. While it can maintain constant revenues, it precludes the game from further growth and also exposes it to potential dips in monthly revenue (as seen in November 2021). 

How does SIFAS’ revenues compare to other KLab games?

In contrast to SIFAS, Love Live! School Idol Festival (SIF) is a conventional rhythm game, where players tap notes in time with the music. Stamina is only depleted upon missing a note or hitting a note badly. How does SIFAS’ revenue compare with SIF’s? 

Revenues between the two games SIFAS and SIF over 2019-2021.

During SIFAS’ lifespan, SIF has maintained consistent, albeit lower, levels of revenue. SIF averaged around 150 million yen per month in 2020 before dropping to 100 million yen per month in 2021. Other than that, there was no general decreasing trend in SIF’s revenues. 

What is more interesting; though, is the revenues of both games during their anniversary periods (represented by the green points in the above graph). Anniversary periods in SIF are associated with a doubling of revenue to 200 million Japanese yen in April 2020 and April 2021. These increases in revenue are fuelled by players buying paid sets to roll for limited edition cards and/or obtain level up materials. In contrast, there was no ‘anniversary bump’ in SIFAS’ first and second year anniversaries as they did not increase monthly revenue. Worse, SIFAS’ revenue have slightly decreased between September and October 2021. 

The absence of an ‘anniversary bump’ in SIFAS’ 2nd year anniversary is a result of the underwhelming 2nd year anniversary rewards in SIFAS, as lamented by most players. Compared to the 1st year anniversary, the 2nd year anniversary:

  • Nerfed the rates of free pulls to 1% and 2% for UR and SR cards respectively (vs 5% and 10% in the 1st year anniversary), making it harder to obtain stronger cards; 
  • Did not give out a free UR ticket on anniversary day, unlike the 1st year anniversary;
  • Reduced log-in rewards; and
  • Replaced sparkable gachas, where you can obtain a UR card after doing a certain number of pulls, with step-up gachas that are random in the cards obtained. 

SIFAS’ anniversary rewards pale in comparison to the anniversary celebrations of other games that were run at the same time:

  • BanG Dream!, a rhythm game that was half way through its 4th year, announced a lot of things that exceeded what players were expecting. In summary, they re-introduced a gacha that was highly demanded by players, introduced a new game mode and set the roadmap for the next few months. These announcements have kept players engaged in the game, assured players that the game will keep running and provided information on what to expect in the future. This livestream set the standard of what SIFAS’ 2nd anniversary needed to deliver, something that KLab did not meet.
  • Additionally, Project SEKAI COLORFUL STAGE!, another rhythm game, held its 1st year anniversary in JP. As part of the celebrations, they ran a competition livestream at the same time as SIFAS’ anniversary livestream. That livestream attracted more concurrent viewers than SIFAS’ anniversary livestream, highlighting the contrast in reputation and interest between the two games.  

Combined with the general decline in revenue and player counts, these results show how out-of-touch the developers were in keeping SIFAS competitive in the JP mobile gaming market, both in engaging existing players as well as attracting and retaining new players.

What has fuelled the declines in revenues and player counts in SIFAS’ 2nd year? 

Given the results in this post, one question has to be asked: what has contributed to SIFAS’ decline? I argue that it is mostly KLab’s fault for bringing SIFAS down to its current state. Although much discussion has been made on this question, the responses can be condensed down to three main elements. 

Controversial game story

Season 2 of the game story, which was run in SIFAS’ second year, was controversial and negatively received by most players. In brief, the first chapter of season 2 introduced a new character who shut down the old school idol club and replaces it with a new school idol club. This splits the existing characters of the game into two factions. Due to the controversial nature of the first chapter, the game writers had to write on the run, dropping unexpected events in the story that retconned previous chapters and were left unresolved. What resulted was a game story with a pointless conflict that was not resolved to a satisfactory standard. The negative reception of the game story can be backed up by the data.

A sentiment analysis of discussion posts for each chapter of SIFAS season 2

Firstly, redditors in the SIFAS subreddit have generally reacted negatively towards each chapter of season 2. Sentiment analysis of SIFAS subreddit discussion posts for each season 2 chapter has indicated that as much as twice the number of negative words were used in the comments than positive words. In particular, chapters 21, 23 and 27 were highly negatively received by the redditors (chapter 21 due to the fallout of chapter 20, chapter 23 due to the many retcons introduced in that chapter and chapter 27 due to the negative reaction towards Lanzhu’s actions in that chapter).

Secondly, from Reddit polling data, the redditors were split on how they viewed SIFAS’ season 2 story. While around half the redditors did not care about the story, of the remaining respondents, there was a split between those that loved the story and those that hated the story. This is in stark contrast to RAISE A SUILEN’s band story 1 in BanG Dream which touched on similar events as SIFAS’ season 2 story. Here, the story was more positively received, with 52% loving the story and only 2% hating it. These results show how out-of-touch the writers were in planning and writing SIFAS’ season 2 story which have made most players angry, resulting in them disengaging from the game. 

Stagnant gameplay

Not only is the game not friendly to new players, the gameplay is also repetitive, disengaging players from SIFAS. First, SIFAS is not beginner-friendly. It takes time for new players to learn the unique mechanics of SIFAS, particularly the fact that you can fail the song even if you play it perfectly. This is not helped by the game offering scant detail to its mechanics and leaving it up to the player to work them out. Consequently, players have to resort to an external teambuilding guide to play and enjoy the game. Additionally, for new players, the game is overwhelming in terms of the amount of things they need to do such as going through two seasons’ worth of story to collect star gems. As a result, players are likely to drop the game very quickly. 

Secondly, there is little flexibility in how one plays SIFAS. The introduction of new skills in the game has not dislodged the core concept of pairing two scorers with one defender to clear songs or three scorers to score high. There is little flexibility beyond these formations, making the gameplay stale as you can simply use the same team to clear songs. Additionally, the harder songs, particularly the Challenge difficulty songs, require specific cards to clear them. Not having the right cards precludes players from clearing songs, demotivating players. 

Lastly, the game cycle is repetitive. Each month, the same game modes are being played in the same order, consisting of SBL (SIFAS Big Live Show), item exchange event, DLP (Dream Live Parade) and story event. Each mode has its problems that disengages players from the game:

  • Item exchange and story events do not involve much as players can skip songs during the period to accumulate enough event points and currency to obtain the desired items.
  • SBL is time-consuming. It takes time for players to play the same song three times each day to acquire SBL medals and rank highly in voltage ranking. This has come to the point where some players forget to play the songs on a particular day. This massively hurts their voltage ranking which affects the rewards they can receive. 

These problems are not helped by KLab not introducing any new game modes that would keep players invested in the game as well as use their cards differently. School Idol Channels, a new game mode that was introduced in the middle of 2021, quickly became stale as it involved a repetitive gameplay cycle of collecting shouts, skipping songs and playing the weekly song. It has become an afterthought for some players, making them not exciting for the overall player base.  

Outperformed by other games

SIFAS covers both the RPG and rhythm game genres, but it is a master to neither of them. Consequently, SIFAS is being outperformed in terms of revenue and player counts by other games that specialise in one genre. Here are some examples of similar games that are better than SIFAS.

An obvious RPG game that is similar in some aspects but better than SIFAS is Uma Musume Pretty Derby. It is safe to say that Uma Musume has been a raging success for Cygames, not only generating a lot of attention and revenue but also regularly being in the top 10 in monthly game rankings. Much of the success of the game can be condensed to two things. First is an addictive gameplay cycle that involves training horse girls to use in horse races and passing off their traits to other horse girls. This keeps the player invested as they strive to build the best horse girl that can compete against other players. Second is the widespread promotion of the franchise through celebrity endorsements (including VTubers) and other media such as anime and manga. These all funnel into the game which translate to new downloads, new players and increased revenue. 

In contrast, SIFAS had a repetitive gameplay cycle that did not keep the player engaged in the game. In particular, once a player obtained the most powerful cards, they were able to clear most of the game, making the players bored as there was no accessible challenge for them. Additionally, there were no voice actresses, celebrities or VTubers that were promoting SIFAS and its best elements, hurting both SIFAS’ reach and reputation. This is in contrast to other games such as Genshin Impact and BanG Dream! where voice actresses of these games actively play them and even pay money to pull for their desired characters.

Idolm@ster is another franchise with games that excel in either the rhythm game or raising simulation genres. There are some Idolm@ster games such as Idolm@ster Cinderella Girls Starlight Stage that are pure rhythm games, with players tapping notes in time with the music. The rhythm game elements take centre stage, with other elements such as the gachas and card raising elements accompanying the rhythm game. Other Idolm@ster games such as Idolm@ster Shiny Colors are pure raising simulation games. In the case of Idolm@ster Shiny Colors, players collect idols (in the form of cards) that they then raise via different activities to perform in lives and festivals. By separating the rhythm game and raising simulation genres into different games, Idolm@ster appeals to different types of players, both those who like playing rhythm games and those who just want to know more about their favourite idols.

In contrast, SIFAS tries to do too much in balancing both the rhythm game and RPG elements in one game. The rhythm game is paradoxically easy and difficult. It is easy in that there are only two buttons to tap in time with the music. It is also difficult in that there are two side arrows to swap subunits. Swapping to the correct subunit while hitting the notes in time with the music, particularly if the notes are dense, and paying attention to the song requirements can be difficult in some songs (for example, Daisuki Dattara Daijoubu!). In contrast, the RPG elements of the game is tacked on with a lot of moving parts. In addition to levelling up cards, players also have to deal with unlocking nodes and limit breaking cards to make them more powerful. Players also have to deal with bond levels and bond boards to make school idols more powerful in general. These all contribute to the steep learning curve that is not made clearer by KLab, demotivating players from continuing with the game. 

One additional thing to note is that Idolm@ster is also famous for its excellent character stories. The storytelling in Idolm@ster games is top-notch with a lot of detailed back stories for each character. As a result, players become invested in their favourite characters and are willing to learn more about them, leading to increased engagement with the games. In contrast, the storytelling in SIFAS is not good which is capped off with the very controversial season 2. The game also introduces characters that are not universally liked by the fandom. Additionally, the character side stories are mostly not linked to the main story, so it is difficult to get players invested in these characters.

How do these results explain SIFAS’ move to MyNet Games? 

The Growth Share Matrix from Boston Consulting Group

The growth share matrix is a tool that companies use to decide where to invest their resources based on the current market share and potential growth of each part of the company. We can use the growth share matrix to plot where SIFAS sits in KLab’s strategy. From the data presented, KLab did not see potential in SIFAS

  • SIFAS’ market share is low. SIFAS is being outperformed by other mobile games that do a better job in either the RPG or rhythm game aspects. This results in lower revenues and player counts in SIFAS.
  • SIFAS’ growth is, I would argue, also low. Although SIFAS earns more revenue than SIF, its future growth is either non-existent or even negative. This signals the unattractiveness of SIFAS from both players who will avoid the game and KLab who will invest less in the game.

Taken together, SIFAS belongs to the “pet” category of the matrix, meaning that KLab should liquidate, divest or reposition SIFAS. It is this decision that has motivated KLab to sell off SIFAS to MyNet Games. Of course, the growth share matrix is not the only deciding factor. There are other factors that motivated KLab to move SIFAS to another developer, something that I will cover in the next post. 

Conclusion

Stagnant or decreasing revenue and player counts highlighted the huge problems SIFAS faced in both its story and gameplay and how it was being outperformed by other mobile games. There is enough data to explain why KLab had to sell off SIFAS to MyNet Games. KLab saw SIFAS as a sinking ship that could not be saved with its financial and human resources. Hence, they decided to ditch SIFAS and re-invest their resources to other mobile games. Future posts will explain further why KLab had to hand off SIFAS to MyNet Games and its further decline from here.

Fes Setsuna: how difficult is it to get her in the gacha?

You surrounded by six Fes Setsuna heads

I have been playing Love Live School Idol Festival All Stars (SIFAS) for the past couple of months. SIFAS is a mobile rhythm game where players build teams composed of cards and tap notes during songs to get high scores. These cards have different skills, attributes and stats and come in different rarities. R (rare) cards are common but weak while SR (super rare) and UR (ultra rare) cards are rarer but more powerful. These cards can be picked up (or pulled) by gacha, where a player spends Star Gems, the in-game currency, to receive random cards. Rate-up cards are new cards added to the gacha that initially receive a slightly increased chance of pulling them for a limited period of time.

Continue reading

Applying hypothesis testing and confidence intervals to assess the 2016 EU referendum polling results

In the last blog post, I discussed how polls do not account for variations that arise when sample results are extended to the population. I proposed that hypothesis testing and confidence intervals should be used in polls to enable stakeholders and the general public to assess the decisiveness of the results. While hypothesis testing gives a clear-cut answer on whether a poll can conclusively favour one side over the other, confidence intervals reveal the possibilities that might arise from a poll.

In this blog post, I used hypothesis testing and confidence intervals to analyse results from EU referendum polls in 2016. Using these techniques, I drew some interesting insights identifying polls that show a significant result and evaluated the usefulness of telephone polls compared to online ones.

Methodology

I collected a list of online and telephone British polls in 2016 from Wikipedia that surveyed people on how they would respond to the EU referendum question “Should the United Kingdom remain a member of the European Union or leave the European Union?”. I only included polls that had the raw numbers (not proportions) of people that would vote Remain or Leave in the referendum. These were required for calculating the proportion of the sample voting Remain or Leave to a high level of precision (to two decimal places) and to calculate the sample size (the total number of voters participating in the poll) which are used to calculate confidence intervals and p-values (see my previous blog post for more details). I collected weighted values from these polls as they make adjustments to create a nationally-represented sample of the UK.

From these polls, I excluded voters who were undecided or would refuse to vote for two reasons. Firstly, this reduces the number of possibilities to two (Remain or Leave) which enabled me to apply hypothesis testing and confidence intervals to a specific mutually-exclusive side. Secondly, on the day of the referendum, the count excludes anyone who is undecided or refused to vote when deciding whether the UK should Remain or Leave the EU. This can be simulated by excluding the number of voters who are undecided or refuse to vote. I then calculated p-values and confidence intervals of each poll according to the formulae from the last blog post.

Comparing statistical significance via z-values and confidence intervals

## Observations: 93
## Variables: 15
## $ poll_start <date> 2016-01-08, 2016-01-08, 2016-01-15, 2016-01-15, 20...
## $ poll_end   <date> 2016-01-10, 2016-01-14, 2016-01-16, 2016-01-17, 20...
## $ pollster   <chr> "ICM", "Panelbase", "Survation", "ICM", "ORB", "Com...
## $ poll_type  <chr> "Online", "Online", "Online", "Online", "Online", "...
## $ num_remain <dbl> 901, 704, 368, 857, 1050, 544, 821, 289, 589, 664, ...
## $ num_leave  <dbl> 778, 757, 392, 815, 965, 362, 826, 186, 569, 724, 7...
## $ total      <dbl> 1679, 1461, 760, 1672, 2015, 906, 1647, 475, 1158, ...
## $ prop_leave <dbl> 0.4633711, 0.5181383, 0.5157895, 0.4874402, 0.47890...
## $ z_value    <dbl> -3.00178625, 1.38659861, 0.87057150, -1.02714357, -...
## $ p_value    <dbl> 2.684006e-03, 1.655642e-01, 3.839882e-01, 3.043529e...
## $ z_sig      <chr> "Yes", "No", "No", "No", "Maybe", "Yes", "No", "Yes...
## $ error      <dbl> 0.02385241, 0.02562212, 0.03553061, 0.02395912, 0.0...
## $ low_ci     <dbl> 0.4395186, 0.4925161, 0.4802589, 0.4634811, 0.45709...
## $ high_ci    <dbl> 0.4872235, 0.5437604, 0.5513201, 0.5113993, 0.50072...
## $ ci_sig     <chr> "Yes", "No", "No", "No", "No", "Yes", "No", "Yes", ...

In total, 93 telephone and online polls were included in the analysis. Statistical significance can be determined by both the z-value which measures the deviation of the proportion of Leave voters from the null value of 50% (representing equal numbers of Remain and Leave voters) or via confidence intervals which depicts statistical significance by the confidence interval not crossing the 50% proportion threshold. I initially compared z-values and confidence intervals to see whether both techniques can similarly detect statistical significance at the p < 0.05 level.

##        
##         No Yes
##   Maybe  4   1
##   No    54   0
##   Yes    0  34

The rows and columns in the table represent statistical significance detected by z-values and confidence intervals respectively. Overall, both techniques can identically delineate statistically significant and non-significant polls. I also included a “maybe” category for z-value hypothesis testing representing polls of borderline statistical significance (i.e., p-values between 0.05 and 0.10). Of these polls, four of them had a non-significant result as derived from confidence intervals.

We can visualise the comparison of statistical significance from z-values and confidence intervals:

In general, polls that are statistically significant via z-values have confidence intervals that do not cross over the 50% threshold. In contrast, polls that are not statistically significant via z-values have confidence intervals that cross over the 50% threshold. Polls that are borderline statistically significant (represented by orange) had one end of the confidence interval “touching” or slightly crossing over the 50% threshold. This visualisation shows the ability of statistical techniques to distinguish polls that decisively favour one side from those that show a balance of votes between the two sides.

##        
##           No  Yes
##   Maybe 0.04 0.01
##   No    0.58 0.00
##   Yes   0.00 0.37

Most polls (58%) are not statistically significant with the proportion of Leave voters not deviating significantly from the 50% null value. These results suggest that these polls cannot decisively favour Leave over Remain and vice-versa. The remaining polls have a Leave result that is significantly (37%) or borderline (5%) different from the 50% null value.

Overall, these results underline that statistical significance derived from z-values and confidence intervals are equivalent to each other. Therefore, in subsequent analyses, I used statistical significance derived from z-values to investigate polling results further.

How can statistical significance assist in interpretability of polls?

I firstly counted the number of statistically significant and non-significant results from online and telephone polls.

From the graph, most non-statistically significant results come from online polls while most statistically significant results are derived from telephone polls. Identifying an interesting result, I investigated further at the characteristics of online and telephone polls in EU referendum polls.

Surprisingly, all telephone polls that had a statistically significant result favoured Remain as indicated by the lower than 50% proportion of Leave voters. They also had relatively small sample sizes, surveying less than 1,000 people. In contrast, most online polls, which survey more than 1,000 people, do not show a statistically significant result, describing a 50:50 split between Remain and Leave. Of the handful of online polls that are statistically significant, the number of online polls that favoured Leave (indicated by >50% Leave voters) is double that of those favouring Remain.

I also investigated the margins of error between online and telephone polls. Telephone polls had higher margins of error (median = 3.5%) compared to online polls (median = 2.4%) due to the smaller sample sizes. Given that the 2016 EU referendum favoured Leave over Remain, these results suggest that while online polls are more likely to be robust as they survey a larger sample size, telephone polls are more likely to declare a decisive result but are more prone to favour the wrong side. This might be one of the myriad of contributing factors why phone polls are more likely to get the EU referendum result wrong compared to online polls.

How do EU referendum polls track overtime?

I next looked at the polling results overtime and how they are affected by statistical significance, taking into consideration the poll type and sample size.

Most polls are not statistically significant, straddling along the 50% threshold. These polls suggest a 50:50 split between Remain and Leave voters, not favouring one side over the other. Of the polls that showed statistical significance, up until June nearly all of them are telephone polls that favoured Remain (indicated by lower than 50% Leave voters). There were only four online polls that showed statistically significant results with two favouring Remain and the other two favouring Leave. However, from June onwards, nearly all statistically significant results came from online polls. Most of them favoured Leave up until just before the referendum where the last two statistically significant online polls favoured Remain.

These online polls were different from the referendum result which favoured Leave over Remain. This is because there are other sources of error such as voter turnout that are not accounted for in the statistical techniques used. Nevertheless, hypothesis testing is a very powerful tool for limiting polls that show a decisive result from those that do not. This gives us a subset of the data from which insightful conclusions can be made.

Conclusion

Hypothesis testing is very useful for selecting polls that show a significant deviation from a 50:50 split between Remain and Leave voters. Combined with confidence intervals, these statistical techniques have unveiled some interesting results. For instance, while telephone polls are more likely to declare a decisive result, they are also more prone to favour the wrong side as can be seen in the 2016 EU referendum. In contrast, online polls are less likely to declare a decisive result but are more likely to favour the correct side due to surveying more people. It is why nearly all Brexit polls after the 2016 EU referendum are conducted online rathern than by phone.

In summary, the use of hypothesis testing and confidence intervals will serve to better clarify the usefulness of polls and whether they show a decisive result on a particular issue and what range of possibilities are available. Given the explosion of data and information in the modern age, it is more important than ever that people are equipped with the tools to interpret and assess the legitimacy and accuracy of facts. Teaching people how to use and interpret hypothesis tests and confidence intervals will serve them well for not only deciding whether they should care about a poll but also for assessing and debating findings from different sources of information.

Appreciating statistical variation to improve the interpretability of polls

The 2016 European Union (EU) referendum produced one of the biggest surprises of the 21st century. Most polls leading up to the referendum suggested that the majority of UK would vote Remain in the EU. However, the referendum produced a different result with 51.9% voters wanting to Leave the EU. Since then, there have been chaotic scenes on whether and how Brexit would be enforced. The polling industry has also been under attack with debate surrounding whether polls are still useful for predicting how the population would vote on important issues such as Brexit.

The conflict between Remaining and Leaving the EU still rages on in the UK. Source

What the mass media and the general public do not appreciate is that polls, whose results are taken as the overall view of the population, only survey one small part of a population that might vote differently from the sample. This introduces variation in the polling result which might introduce a situation where it does not favour one side over the other. Hypothesis testing and confidence intervals can be used to decide whether the public should care about a polling result and the range of referendum results that are possible from a poll. If communicated simply to politicians and the public, more informed decisions can be made on what, if any, one should do to influence people’s views towards voting Remain or Leave in the EU referendum.

How do opinion polls report variation in results?

Opinion polls are conducted on people drawn randomly from a population to gauge the population’s views of an issue. It is like tasting a small sample of a meal such as soup and using our thoughts of the sample to make a general judgement of the meal. In our case, we use statistics to extend the results of the sample to make conclusions about a population. The problem of this method is that people outside the sample may vote differently from those in it, causing population results to differ from a poll.

Hence, in statistics, it is important to account for the variation in polling results to capture the true value of the population. This is encapsulated by the margin of error which is added or subtracted from the sample value obtained in a poll. Mathematically, this can be defined as:

Population value = sample value ± margin of error (± means add or subtract)

In the case of an EU referendum poll, the sample value would be the proportion of the sample that vote Remain or Leave while the margin of error provides the space to capture the proportion of the population that would vote Remain or Leave. This margin of error is set to 3% in most polls which is not reported by mass media. Hence, readers might erroneously assume that the polling results represent the true proportions of the population voting in a particular way. Although the margin of error is an essential tool to account for variations of the population value from a poll, it is insufficient as readers cannot comprehend the different outcomes that might be generated. This can be resolved by using confidence intervals.

A better tool for reporting variation in results: confidence intervals!

We can subtract or add the margin of error from the sample value to produce the lower and upper bounds of a population value respectively. Combining these bounds produces confidence intervals, the range of values that we are almost certain captures the true population value. Although they can be produced at varying levels of confidence, they are usually set at 95% confidence to almost guarantee (at 95% confidence) that what we conclude from a poll can be generalised to the whole population. This makes it very useful for thinking about the different outcomes of the EU referendum that might be generated from conducting a poll.

Calculating the confidence interval of a polling result

1. Find the sample value of a poll. In our case, we want to calculate the proportion of the sample that vote Leave. This can be calculated as:

prop_{Leave} = \frac{Total(Leave)}{Total(voters)}

2. Calculate the margin of error (MOE) to measure variation in the poll results. In our case, to calculate the MOE for our 95% confidence interval, we use the formula:

MOE_{Leave} = 1.96 \times \sqrt{\frac{prop_{Leave} \times (1 - prop_{Leave})}{Total(voters)}}

The MOE is influenced by the sample size which describes the total number of people that have voted (Total(voters)) in the poll.

3. Subtract or add the MOE from the sample value to get the lower and upper bounds of the confidence interval respectively.

Lower bound = sample value – margin of error

Upper bound = sample value + margin of error

4. Combine the lower and upper bound values to generate the 95% confidence interval. This describes the range of values that we are 95% sure captures the true population value (in our case, the proportion of the population that vote Leave).  

Confidence interval = (lower bound, upper bound)

The basics of hypothesis testing

Although confidence intervals can describe the different outcomes of the EU referendum, it does not give a clear-cut answer of whether we should care about a poll. This is where hypothesis testing comes in.

In hypothesis testing, we assess assumptions of a population value against data of a random sample. It is analogous to a criminal trial, where a criminal is presumed innocent until enough evidence is collected to prove guilt. In the same sense, we assume that the null (H0) hypothesis is true until proven otherwise. The null hypothesis proposes that there is no deviation from a set population value in response to some event. This arises when we produce a polling result that would have most likely appeared by chance. Conversely, if we have a polling result that is so rare and unusual that it would not have arisen by chance, then we have enough evidence to reject the null hypothesis and accept the alternative (Ha) hypothesis. The alternative hypothesis describes a deviation of the population value from some set value.

In our case, we want to assess whether a poll can decisively conclude that most of the population would vote Remain or Leave. We write our two hypotheses as follows (prop0,Leave is the proportion of the population that would vote Leave from the null hypothesis):

H0: there is an even split of Remain and Leave voters in the poll.  prop0,Leave = 0.5

Ha: the poll decisively favours Remain or Leave. prop0,Leave ≠ 0.5

Should we care about a polling result? Let’s use a hypothesis test to find out!

There are many statistical tests that can be used depending on the kind of data that we are analysing. As we are analysing the proportion of people that vote Remain or Leave in a poll, we convert the proportion to a standardised z-value that can be used to calculate probabilities on a normal z-distribution (better known as a “bell curve”).

What a normal z-distribution looks like. Source

The z-value can be calculated by the formula:

z-value = \frac{prop_{Leave} - prop_{0,Leave}}{\sqrt{\frac{prop_{0,Leave} \times (1-prop_{0,Leave})}{Total(voters)}}}

If we set prop0,Leave = 0.5 (meaning an even split of Remain and Leave voters in the population), we can simplify the z-value to:

z-value = \frac{prop_{Leave} - 0.5}{\sqrt{\frac{0.5 \times (1-0.5)}{Total(voters)}}} = 2 \times \sqrt{Total(voters)} \times (prop_{Leave}-0.5)

This z-distribution (Z) can be used to calculate the probability (the p-value) that we generate a random result that is just as or more extreme that the polling result given some set value from a null hypothesis. This is represented mathematically as:

p-value = Pr(Z \leq -z-value \hspace{2mm} OR \hspace{2mm} Z \geq +z-value)

And can be calculated using normal tables, a calculator or a computer. We compare the p-value to an alpha value which is the threshold that the p-value has to go below to reject the null hypothesis. Although we can set different alpha-values between 0 and 1, it is usually set to 0.05 (which describes a 5% chance that we get a random result that is just as or more extreme than the polling result given some null value).

  • If the p-value is more than the alpha value (i.e., p > 0.05), then we have failed to reject the null hypothesis. We conclude that the poll cannot decide whether most of the population would vote Remain or Leave in the EU referendum.
  • If the p-value falls below the alpha value (i.e., p < 0.05), then we reject the null hypothesis and accept the alternative hypothesis. We conclude that the poll can decisively favour Remain or Leave in the EU referendum among the population.

Hypothesis testing is useful for deciding whether the public and stakeholders should care about a polling result, facilitating informed decisions on how campaigning needs to be done.

Applying hypothesis testing and confidence intervals to a real-life EU referendum poll

Let’s look at an online poll run from 27th to 29th May 2016 by the polling company ICM. Out of 1753 people, 848 (48.37%) voted Remain and 905 (51.63%) voted Leave. Should we care about the ICM poll?

First, let’s use hypothesis testing to decide whether the ICM poll is decisive. We declare two hypotheses:

H0: There is an even split of Remain and Leave voters in the population. prop0,Leave = 0.5

Ha: There are more Remain voters in the population than Leave voters. prop0,Leave ≠ 0.5

Since we have propLeave = 0.5163 (converted from percentage to decimal), we calculate the z-value as follows:

z-value = \frac{0.5163 - 0.5}{\sqrt{\frac{0.5 \times (1-0.5)}{1753}}} = 1.3614

And calculate its p-value:

p-value = Pr(Z \leq -1.3614 \hspace{2mm} OR \hspace{2mm} Z \geq +1.3614) = 0.1738

The p-value of 0.1738 exceeds the alpha-value of 0.05, so we failed to reject the null hypothesis. The ICM poll cannot decisively favour Remain or Leave, implying an even split of two sides among voters in the population.

How can we visualise the indecisiveness of this poll? We can use confidence intervals to do this.

First, calculate the margin of error (MOE). The MOE will be the same regardless of whether the proportions of Remain or Leave voters are used.

MOE_{Leave} = 1.96 \times \sqrt{\frac{0.5163 \times (1-0.5163)}{1753}} = 2.34\%

This is within the 3% MOE mentioned in most polls.

We use the MOE to calculate the confidence intervals of Leave and Remain voters.

Leave confidence interval = 51.63 ± 2.34% = (49.29%, 53.97%). This confidence interval states that we are 95% sure that the true proportion of the population that would vote Leave is between 49.29% and 53.97%.

Remain confidence interval = 48.37 ± 2.34% = (46.03%, 50.71%). This confidence interval states that we are 95% sure that the true proportion of the population that would vote Remain is between 46.03% and 50.71%.

These confidence intervals appreciate that the proportion of the population voting Remain or Leave might differ from the polling results. The real power of confidence intervals; though, comes when we visualise them in a number line.

The number lines above show the ICM poll results (indicated by the middle point of the line) along with the Leave and Remain confidence intervals. Two things can be observed from the number line:

  1. A 50:50 split between Leave and Remain voters is possible in an EU referendum (indicated by a dashed line) because the confidence intervals of both the Leave and Remain sides contain the 50% proportion. This result would not provide a clear indication of which side would win, something the mass media does not appreciate when hyping up a particular result.
  2. A referendum involving the population might produce a different result from a poll. Although the poll had a higher proportion of Leave than Remain voters in the sample, it is possible that in a referendum over the population, there might be more Remain than Leave voters. Hence, the poll cannot conclusively favour one side over the other.

These two points open the possibility that the poll might not capture the views of the population. This is something the reader overlooks not only because the mass media excludes the margin of error but because they do not realise that the polling results may not reflect the views of the whole population. If the confidence intervals of two groups in a sample overlap each other, it is possible that the referendum results of a population might be very different from the polling results of a sample.

Conclusion

How polling results are reported by the mass media today covers up the dangers of extending results from a sample to infer conclusions about a population. Even citing the margin of error does not paint a true picture of the range of possibilities that might arise from a poll. In contrast, hypothesis testing and confidence intervals can produce a lot of insights of how we interpret polls. While hypothesis testing can tell us whether we should care about a polling result, confidence intervals can reveal the variability produced when polling results are extended to the overall population.

Ideally, the mass media would adopt hypothesis testing and confidence intervals as tools to correctly interpret polls and to responsibly extend results to the population. Given the mass media’s interest in hyping up polling results regardless of whether they are warranted or not, this is most likely not possible. Hence, independent companies should be set up to analyse polling results and to provide a truthful interpretation of the polls to the public so that they can decide whether they should act on a poll or not. Keeping the polling industry accountable to these statistical measures will ensure the viability of polls in painting a truthful picture of how the population thinks on various issues of the country.

What are the themes and sentiments of Poppin’Party’s and SILENT SIREN’s songs? A look into the bands of the “NO GIRL NO CRY” band battle

Poppin’Party and SILENT SIREN are two all-female Japanese bands that play similar styles of music. Poppin’Party is one of the bands that is part of the BanG Dream! franchise established by Bushiroad. Spanning multiple forms of media, the band consists of anime characters whose voice actresses also perform their own instruments in live shows. SILENT SIREN was established in 2010 by a group of amateur female models. They have released many albums and have also performed in various live shows.

The participants of the “NO GIRL NO CRY” band battle. Left is Poppin’Party and right is SILENT SIREN. Source: https://bandori.fandom.com/wiki/File:NGNC_Main_Visual.jpg

These two bands will perform in the band battle event “NO GIRL NO CRY” in Japan on May 18th and 19th. In celebration of this event, I looked at the lyrics of Poppin’Party’s and SILENT SIREN’s songs to identify the themes and sentiments between the two bands. This was done using a methodology established in my last blog post. Additional analyses were also conducted to glean more insights from the songs of both bands.

Exploratory Data Analysis of lyrics

## # A tibble: 2 x 3
##   band         num_songs num_words
##   <chr>            <int>     <dbl>
## 1 Poppin'Party        30      3255
## 2 Silent Siren        38      3659

SILENT SIREN released many songs over their nearly ten years of existence. Luckily, I was able to find enough English translations of their songs to match the number of English translations of Poppin’Party songs. This enabled comparable text and sentiment analyses to be conducted between the two bands.

Both bands had a word that appeared two to three times more frequently in the lyrics compared to other words. For Poppin’Party, that word was “dream” which appeared two times more frequently than other words. For SILENT SIREN’S, the word “love” appeared three times more frequently than other words. These observations may underline the predominant themes of each band’s songs which will be explained later in the blog post.

Commonality cloud: which words are common across both bands’ lyrics?

A commonality cloud visualises the frequency of words that appear in the lyrics of both bands. The size of each word in a commonality cloud is based on how frequently the word appears in both groups of lyrics. Note that a word that appears very frequently in both groups of songs will be bigger in the commonality cloud than a word that appears frequently in one group of songs but not the other.

According to the commonality cloud, both Poppin’Party’s and SILENT SIREN’s songs have words that were associated with feelings and experiences. In particular, love was a common word found in the lyrics of both bands’ songs. Other words relating to experiences that appeared in both groups of songs included “time”, “world” and “summer”.

Comparison cloud: which words are more frequent in one band’s songs over the other?

In contrast to a commonality cloud, a comparison cloud plots words based on whether the word appears more frequently in the lyrics of a band’s songs compared to the other. A difference is taken between the word frequencies of both groups of lyrics. Following this, the word is plotted to one side of the comparison cloud and the size is varied based on the magnitude of the difference. A comparison cloud allows us to identify words and potential themes that are prevalent in a band’s songs over another group of songs.

From the comparison cloud, “dream” appeared more frequently in Poppin’Party’s songs compared to SILENT SIREN’s songs. Also in Poppin’Party’s side of the comparison cloud are words that are related to experiences and the future such as “song”, “future” and “tomorrow”. The appearance of these words in the comparison cloud indicates that Poppin’Party’s songs tend to touch on achieving goals for the future and creating experiences while doing so.

In contrast, SILENT SIREN’s songs tend to touch on romance. “Love” is a word that appeared frequently in SILENT SIREN’s songs, moreso than Poppin’Party’s songs. In SILENT SIREN’s side are words that are also associated with romance such as “sweet”, “darling” and “kiss”. The appearance of these words in SILENT SIREN’s songs indicates that their songs tend to deal with love and romance and how people react to them.

Bing sentiment analysis of the bands’ songs

I conducted a “bing” sentiment analysis of the songs to measure the proportion of positive and negative words of each band’s lyrics. Overall, Poppin’Party’s songs had a higher proportion of positive-associated words compared to SILENT SIREN’s songs.

Half of the negative- and positive-associated words are similar across Poppin’Party’s and SILENT SIREN’s songs. For Poppin’Party’s songs, most of the negative-associated words are linked to sensations relating to negative emotions such as “throb”, “shake” and “painful”. They also have positive-associated words that describe a person’s internal strengths such as “courage”, “strong” and “gentle”. SILENT SIREN’s songs tend to have negative-associated words linked to loneliness such as “cry”, “lonely” and “ambiguous”. They also have positive-associated words that describe sensations such as “happy”, sparkle” and “flutter”. These words might appear more frequently in SILENT SIREN’s songs due to their focus on romance.

NRC sentiment analysis of the bands’ songs

I conducted an NRC sentiment analysis to measure the proportion of words belonging to specific emotions. The proportion of words in six of the eight emotions are similar between the two bands. However, SILENT SIREN’s songs had a lower proportion of words associated with anticipation and a higher proportion of words belonging to fear compared to Poppin’Party’s songs.

Some of the most frequent words associated with each emotion such as “feeling” and “smile” were similar across both bands. There were some unique words in each band; however, that may represent the overall themes of their songs. For Poppin’Party, “sing” is a word that appears across many emotions, namely anticipation, joy, sadness and trust. These emotions can be found in some of their songs which touch on many themes, particularly the idea of playing together as a band.

In contrast, SILENT SIREN’s songs can be split into two broad areas. The words “sweet” and “kiss” can be found in many positive emotions such as anticipation, joy, surprise and trust. These words relate to the romantic theme of their songs. Another area touched on in SILENT SIREN’s songs could be the feeling of loneliness when losing friends or breaking up. This can be seen in the words “lonely” in anger and disgust, “disappear” and “escape” in fear and “leave” and “cry” in sadness.

Conclusion

Conducting text and sentiment analyses of the songs has uncovered some interesting insights about both bands. Poppin’Party’s songs tend to talk about setting and achieving future goals while creating memories and experiences. Their songs tend to be quite positive but they also touch on a variety of sensations and emotions as evidenced in their songs. On the other hand, SILENT SIREN’s songs tend to talk about romance and the various emotions elicited, both positive (in the case of joy) and negative (in the case of loneliness). It is surprising; then, that the similar music styles of both bands can cover up the different topics touched on by both bands. Based on these results, it will be interesting to see how these two bands will clash when they meet in the band battle this weekend.

Acknowledgements

I would like to acknowledge the following people who have translated the Poppin’Party and SILENT SIREN songs from Japanese to English:

  • Arislation
  • BlaZofgold
  • Eureka
  • Gessami
  • Kei
  • Kikkikokki
  • Komichi
  • LuciaHunter
  • Maki
  • ManekiKoneko
  • MijukuNine
  • Misa
  • NellieFeathers
  • Ohyododesu
  • Starlogakemi
  • Thaerin
  • Tsushimayohane
  • UnBound
  • Youraim

I may have missed other people who have translated songs for this analysis, but I thank you all the same.