A blog showcasing my evaluation knowledge, analysis skills and useful information
Category: Side projects
These are old blog posts that I have written before I decided to become a program evaluator. These include posts relating to my previous interest in human health as well as data analysis posts on my interests.
In my last blog post, I built a model to predict how difficult a song is to play (measured by actual song level) in the BanG Dream! Girls Band Party! mobile game by its number of notes or notes/min. The number of notes gives an indicator of how many beats a player has to hit to complete (or full combo) a song while notes/min measures how quickly notes appear on the screen. I then grouped the expert difficulty songs into three categories based on the difference between the predicted and actual levels:
“Higher” group contains songs whose predicted levels are higher than the actual level from the game. These songs might be harder to play than expected, either due to the sheer number of notes in the song or notes appearing very rapidly on the screen.
“Lower” group contains songs whose predicted levels are lower than the actual level from the game. These songs might be easier to play than expected due to the low number of notes in the song or appearing on the screen at once.
“Same” group contains songs whose predicted levels matched the actual level from the game. In other words, the actual level is a good indicator of how difficult it is to play the song.
In this blog post, I would like to list the songs that are under the “higher” and “lower” groups based on their number of notes or notes/min. This will make it easier for a player who is just starting to play “expert” difficulty songs to decide which songs to play next and work towards a full combo.
For each song in the list, the actual levels from the game along with the predicted levels from using the number of notes or notes/min as the independent variable of the model are shown. If a – appears under a predicted level column of a song, it means the model has not identified that song as easier or harder to play than expected (i.e., not under its “lower” or “higher” group respectively). However, if both predicted level columns have a value, it means that both models have identified the song as easier or harder to play than expected.
Now, a few caveats to the lists:
First, the number of notes and notes/min do not take into account the types of notes present in the song or the note patterns. The results are only based on the number of notes that appear in the song (total number of notes) or how quickly they appear on the screen (measured by notes/min). Just because a song appears in the “harder” list does not necessarily mean it is hard; it may have a note pattern that is easy to play.
Second, the “lower” list identified some high-level songs (levels 27 or above) that are identified by the model(s) as easier to play than expected. Songs that are level 27 or above are very difficult to play let alone full combo, so they should not be taken as songs that a novice player can play first.
Third, if a song does not appear in either the lower or higher groups, it means it has a predicted level that is the same as the actual level (i.e., it belongs in the “same” group).
Lastly, individual experience will vary from song to song. The following lists should only be used as a guide to deciding which song to play next or to work towards a full combo.
Without further ado, here are the songs in the “lower” group. These are songs that are easier to play than expected and hence can be chosen first to work towards a full combo:
Lower songs list
Song name
English name
Actual level
Notes model predicted level
Notes/min model predicted level
# notes
Notes/min
Egao no Orchestra!
Orchestra of Smiles!
23
22
-
427
246
BLACK SHOUT
BLACK SHOUT
25
24
24
506
279
fantastic dreamer
fantastic dreamer
25
24
-
553
335
Fuwa Fuwa Time
Fluffy Time
25
24
-
544
344
Hacking to the Gate
Hacking to the Gate
25
23
24
487
273
Happiness! Happy Magical
Happiness! Happy Magical
25
24
-
569
338
Hashiri Hajimeta Bakari no Kimi ni
On Your New Journey
25
24
24
531
270
Kimi ga Inakucha!
It's Got To Be You!
25
24
-
569
322
Kimi no Kioku
Memories of You
25
-
24
651
300
Little Busters!
Little Busters!
25
24
-
575
314
Pasupa Revolutions
Pasupa Revolutions
25
24
24
511
292
Romeo
Romeo
25
-
24
620
291
secret base ~kimi ga kureta mono~
secret base ~What You Gave Me~
25
24
23
560
235
STAR BEAT! ~Hoshi no Kodou~
STAR BEAT! ~The Heartbeat of the Stars~
25
-
24
642
292
Tamashii no Refrain
Soul's Refrain
25
-
24
698
276
Tokimeki Experience!
Tokimeki Experience!
25
24
24
508
288
True color
True color
25
24
24
574
280
Yumemiru Sunflower
Sunflower Dreams
25
-
24
628
292
1, 2 Fanclub
1, 2 Fanclub
26
25
25
652
346
1000-kai Urunda Sora
1000 Crying Skies
26
-
25
868
343
Alchemy
Alchemy
26
25
24
603
304
Believe in my existence
Believe in my existence
26
25
-
589
380
DISCOTHEQUE
DISCOTHEQUE
26
25
-
626
379
Dream Parade
Dream Parade
26
24
25
575
322
GLAMOROUS SKY
GLAMOROUS SKY
26
-
25
780
332
great escape
great escape
26
25
-
676
446
Karma
Karma
26
25
25
582
346
Mae e Susume!
Keep On Moving!
26
-
25
723
321
Miku Miku ni Shite Ageru (Shite Yan Yo)
I'll Miku-Miku You (For Reals)
26
25
25
639
358
Neo-Aspect
Neo-Aspect
26
-
25
707
356
Nesshoku Starmine
Passionate Starmine
26
-
25
691
348
Saa Ikou!
Saa Ikou!
26
25
25
666
351
Sorairo Days
Sky Blue Days
26
25
-
626
361
Taiyou Iwaku Moeyo Chaos
Burning Chaos According to the Sun
26
25
-
652
399
Time Lapse
Time Lapse
26
-
25
728
352
Tsunagu, Soramoyou
The Look Of The Sky, Connected
26
25
-
646
366
Goka! Gokai!? Phantom Thief!
Goka! Gokai!? Phantom Thief!
27
26
26
698
415
Guren no Yumiya
Crimson Bow and Arrow
27
26
26
707
433
LOUDER
LOUDER
27
26
-
828
456
ONENESS
ONENESS
27
26
26
746
367
Redo
Redo
27
26
26
712
440
This game
This game
27
-
26
907
446
God knows…
God knows…
28
27
27
1081
507
Hey-day Capriccio
Hey-day Capriccio
28
27
27
878
479
Opera of the wasteland
Opera of the wasteland
28
26
26
696
373
Re:birth day
Re:birth day
28
26
26
848
410
Sugar Song to Bitter Step
Sugar Song to Bitter Step
28
26
26
811
438
Roku-chou Nen to Ichiya Monogatari
Six Trillion Years and Overnight Story
29
27
27
895
548
And here are the songs in the “higher” group. These songs are harder to play than expected and hence should be put off until the player has mastered other songs of the same actual level:
BanG Dream! is a Japanese multi-media franchise by Bushiroad where different girl bands play songs. These girl bands include:
Poppin’ Party, a girl band in pursuit of a sparkling, heart-pounding beat;
Afterglow, a rock band of childhood friends;
Pastel*Palettes, an idol band that sing and play instruments;
Roselia, a gothic rock band aiming to reach the top; and
Hello, Happy World!, a band aiming to make the world happy.
The franchise spans multiple modes of media including music, anime and a rhythm mobile game called “BanG Dream! Girls Band Party” (which I will shorten to Bandori from now on). The game was initially launched in Japan on March 16th 2017 and was later made available in Taiwan and Korea. The game was launched to the rest of the world one year later on April 4th 2018.
The game involves the player hitting different kinds of notes while playing a song. These notes can range from simple tap notes to more complicated hold and swipe notes. Each song has four difficulty settings: easy, normal, hard and expert, with higher difficulty settings presenting more numerous and various note types.
After the song finishes, the player receives a score based on how well they played the song and the cards the player has in their team. These cards are received randomly from gacha events and vary in many factors such as rarity, type and ability. These markedly influence the score a player receives after playing a song. On the other hand, everyone plays the same note pattern or beat map for a specific song in a particular difficulty. In this blog post, I investigated whether the number of notes as well as the related variable notes/min can explain how difficult it is to play a song, measured by the dependent variable “song level”. This data is contained in the band_tidy dataset which is imported below.
# Import CSV file of Bandori dataset
band_tidy <- read_csv("song_list_csv_rate_051118.csv")
# Convert the difficulty column into an ordered factor with "easy" difficulty being the base group
band_tidy$difficulty <- factor(x = band_tidy$difficulty,
levels = c("easy", "normal", "hard", "expert"))
Plotting the relationship between song level and number of notes or notes/min
Different songs have various numbers of notes that need to be hit to achieve a full combo, a situation where no notes are missed or hit too early or too late. The number of notes that need to be hit in a song increases as higher difficulties are selected. The number of notes in a song can also be standardised by its duration to notes/min which measures the rate at which notes appear on the screen. The higher the notes/min for a song, the more quickly the player has to react to notes on a screen.
These two measurements measure contrasting elements of a song which may differentially influence song level. Hence, I plotted song level based on the total number of notes or notes/min on separate graphs to look at the relationship between the variables. These graphs were further subsetted by song difficulty to see whether the relationship changes as the difficulty setting is adjusted.
Song level is found to be positively associated with both the number of notes and notes/min. This agrees with the principle that a song will become harder to play as the number of notes increases. However, the rate at which song level increases is reduced as higher difficulties are selected. While the easy difficulty songs form the steepest slope in its relationship between song level and the number of notes or notes/min, the expert difficulty songs form the flattest slope due to the wider range of song level, number of notes and notes/min.
For the rest of the blog post, I will use the number of notes and song difficulty as independent variables to predict song level. It should be noted; however, that similar results were found when the number of notes was replaced by notes/min to predict song level.
Building a model of song level vs number of notes
A linear model has two components: a gradient that represents the rate of change and the y-intercept that represents the initial value. Given that the relationship between song level and the number of notes varies for each difficulty, it makes sense to change both the gradient and the y-intercept parts of the model. Hence, I incorporated an interaction term into the model that allows the song difficulty to influence the relationship between song level and the number of notes.
#Create a model with an interaction term between difficulty and number of notes
level_diff_int <- lm(level ~ notes*difficulty, data = band_tidy)
summary(level_diff_int)
levelpred = (0.027 – 0.012 x difficultynormal – 0.015 x difficultyhard – 0.020 x difficultyexpert) x notespred + (3.96 + 5.49 x difficultynormal + 8.68 x difficultyhard + 16.09 x difficultyexpert)
The “difficulty” parts of the model can be defined as 0 or 1 depending on which difficulty the model is covering. For example, if we wanted to model the “expert” difficulty songs, we can set difficulty~expert~ = 1 and difficulty~normal~, difficulty~expert~ = 0. This allows the gradient and the y-intercept of the model to be adjusted for each difficulty. The performance of the model is very good with an R2^ value of 0.984 and a relatively low residual standard error (RSE) of 0.8302.
#Predict levels from data points in band_tidy using model
band_tidy_int <- augment(level_diff_int, data = band_tidy)
For each difficulty, plotting the residuals formed a random distribution of points around residual = 0. This indicates that it is appropriate to incorporate an interaction term into the model to predict song level.
For each difficulty, the model closely fits with the data points because the interaction term is present to change both the gradient and y-intercept of the model for each difficulty. However, for the expert difficulty songs, the model fails to account for the sharp drop-off in song level as the number of notes decreases.
To fix this problem, I fitted the expert difficulty songs onto a natural cubic spline, a piecewise graph that consists of cubic functions. I fitted a linear relationship for the other song difficulties.
#Place the expert difficulty songs in a natural cubic spline and maintain a linear relationship for the other difficulties
level_piece <- lm(level ~ notes + difficulty + notes:difficulty + ns(notes, 2):I(difficulty == "expert"), data = band_tidy)
#Look at performance of model
summary(level_piece)
Although it is difficult to describe the relationship between song level and the number of notes with a natural cubic spline, the model performance improved. The R2 value increased from 0.984 in the previous model to 0.9848 with a natural cubic spline. The RSE was also reduced from 0.8302 in the previous model to 0.8096 with a natural cubic spline.
#Predict song levels in band_tidy dataset using level_piece model
band_tidy_piece <- augment(level_piece, data = band_tidy)
#Change column name of ".fitted" to "predicted_level"
colnames(band_tidy_piece)[colnames(band_tidy_piece) == ".fitted"] <-
"predicted_level"
Compared to the residual plot of the previous model, incorporating a natural cubic spline into the model resulted in the expert difficulty songs showing a more concave relationship in its residuals. This has increased the randomness of the residuals, indicating improvements in the model.
The natural cubic spline more closely fits the data points on the expert difficulty songs, particularly as the number of notes decreases. A linear relationship between song level and the number of notes is also maintained for the other song difficulties.
In summary, the models show a significant relationship between song level and the number of notes when the difficulty setting is incorporated. This could be used to predict song level based on the total number of notes or notes/min.
From the augmented band_tidy_piece data frame, I rounded the predicted song levels to the nearest whole value and subtracted it from the actual level, creating a “difference” column. The difference column can be thought of as the song level residual: it measures how far off the predicted level is from the actual level. From there, I sorted the songs into the following categories based on the sign of the difference value:
If the difference value was positive (i.e., the predicted level is lower than the actual level), the song was placed under the “lower” category;
If the difference value was negative (i.e., the predicted level is higher than the actual level), the song was placed under the “higher” category; and
If the difference value was equal to 0 (i.e., the predicted and actual levels matched), it was placed under the “same” category.
#Filter rows to only include the "expert" difficulty data of each song
level_rate_diff_expert <- band_tidy_piece %>%
filter(difficulty == "expert")
I then filtered the data set to only include the expert difficulty data of each song. I did this because there is a wider variance on the ability to play and complete an expert difficulty song compared to the other difficulties.
Identifying the easier or harder expert difficulty songs
The songs that were placed under the “lower” group tended to be easier to play and complete than expected. This is due to them having a lower number of notes, allowing the player to more easily complete the song. On the other hand, songs under the “higher” group tended to have songs that are more difficult to play than expected due to the higher number of notes in the song.
In addition to the model where the number of notes is used to predict song level, I also generated a separate model where notes/min was used as an independent variable of song level. Both of these models produced similar “lower”, “higher” and “same” lists. Comparing the “higher” lists of the two models; however, revealed songs that were identified to be more difficult than expected in one model but not the other. For instance, Natsuzora SUN! SUN! SEVEN! (Summer Skies & SUN! SUN! SEVEN!) by Poppin’Party was seen as a difficult song to complete when the number of notes was used as the independent variable of the model. That song was not in the “higher” group when notes/min was used as the independent variable, instead appearing in the “same” group. That is because the song is one of the longest in the game, offsetting the high number of notes for a level 25 song. This results in a notes/min that is similar to other level 25 songs. This seems to suggest that the total number of notes and notes/min differentially influence song level, so they could be treated as separate independent variables if the model continues to be refined.
Comparing the model results with real-life surveys
Some surveys were conducted by a Japanese Youtuber Mihaya Gaming on what people thought were the most difficult expert songs. In these surveys, he asked thousands of people what were the most difficult level 25, 26 and 27 songs in the game. He then compiled a top 10 list of the most difficult songs for each song level. I wanted to compare the model and survey results to see whether the model was able to identify the most difficult songs from the surveys.
The notes/min model was able to identify half of the most difficult level 25 songs from the survey, including two of the top three most difficult songs: “Kimi ja Nakya Dame Mitai” and “Su-Suki Nanka Janai!”. In contrast, the notes model could only identify two of the most difficult level 25 songs. However, the model was able to identify the most difficult level 25 song in the survey that was not picked up by the notes/min model: “Teardrops”. This is due to the song having the highest number of notes of a level 25 song. It was not identified by the notes/min model as a difficult song because it is also one of the longest songs in the game. This resulted in a notes/min that was comparable to other level 25 songs.
Moving onto the most difficult level 26 songs, the notes/min model identified three of the most difficult level 26 songs, including the most difficult one “Tenka Toitsu A to Z”. However, the notes/min model did not pick up the third most difficult song “Happy Synthesizer” which was identified by the notes model. This result reinforces the notion that the total number of notes and notes/min measure different aspects of a song, distinguishing them as different independent variables.
Both the notes and notes/min models were able to identify some of the most difficult level 25 and 26 songs. However, they were unable to identify the most difficult level 27 songs, categorising half of them as easier than the actual song level. This is because the natural cubic spline flattens out at around level 27 as the number of notes or notes/min is increased, making it near impossible to predict higher song levels. This highlights the limitation of the current models, song difficulty and number of notes or notes/min may not be enough to predict song levels beyond level 26. There are other independent variables that could be incorporated into the model such as song duration and tempo. As well as that, the number of notes can be split into their constituent parts such as off-beats, holds and swipes. Therefore, incorporating more independent variables into the model may allow higher song levels to be predicted for the expert difficulty songs.
Conclusion
In summary, both the number of notes and notes/min can be used to predict song level, a measure of how difficult it is to complete or full combo a song. This can be used to identify songs that are easier or harder to play than expected, allowing one to choose songs that are easy to play at a specific song level. The models are also robust in identifying the most difficult songs according to the surveys but also highlight improvements that need to be made to predict higher song levels, perhaps by incorporating other independent variables such as duration and tempo.