Using the number of notes to predict the most difficult songs in the BanG Dream! rhythm game

BanG Dream! is a Japanese multi-media franchise by Bushiroad where different girl bands play songs. These girl bands include:

  • Poppin’ Party, a girl band in pursuit of a sparkling, heart-pounding beat;
  • Afterglow, a rock band of childhood friends;
  • Pastel*Palettes, an idol band that sing and play instruments;
  • Roselia, a gothic rock band aiming to reach the top; and
  • Hello, Happy World!, a band aiming to make the world happy.

The franchise spans multiple modes of media including music, anime and a rhythm mobile game called “BanG Dream! Girls Band Party” (which I will shorten to Bandori from now on). The game was initially launched in Japan on March 16th 2017 and was later made available in Taiwan and Korea. The game was launched to the rest of the world one year later on April 4th 2018.

The game involves the player hitting different kinds of notes while playing a song. These notes can range from simple tap notes to more complicated hold and swipe notes. Each song has four difficulty settings: easy, normal, hard and expert, with higher difficulty settings presenting more numerous and various note types.

After the song finishes, the player receives a score based on how well they played the song and the cards the player has in their team. These cards are received randomly from gacha events and vary in many factors such as rarity, type and ability. These markedly influence the score a player receives after playing a song. On the other hand, everyone plays the same note pattern or beat map for a specific song in a particular difficulty. In this blog post, I investigated whether the number of notes as well as the related variable notes/min can explain how difficult it is to play a song, measured by the dependent variable “song level”. This data is contained in the band_tidy dataset which is imported below.

# Import CSV file of Bandori dataset 
band_tidy <- read_csv("song_list_csv_rate_051118.csv") 

# Convert the difficulty column into an ordered factor with "easy" difficulty being the base group 
band_tidy$difficulty <- factor(x = band_tidy$difficulty,                                
                               levels = c("easy", "normal", "hard", "expert"))

Plotting the relationship between song level and number of notes or notes/min

Different songs have various numbers of notes that need to be hit to achieve a full combo, a situation where no notes are missed or hit too early or too late. The number of notes that need to be hit in a song increases as higher difficulties are selected. The number of notes in a song can also be standardised by its duration to notes/min which measures the rate at which notes appear on the screen. The higher the notes/min for a song, the more quickly the player has to react to notes on a screen.

These two measurements measure contrasting elements of a song which may differentially influence song level. Hence, I plotted song level based on the total number of notes or notes/min on separate graphs to look at the relationship between the variables. These graphs were further subsetted by song difficulty to see whether the relationship changes as the difficulty setting is adjusted.

Song level is found to be positively associated with both the number of notes and notes/min. This agrees with the principle that a song will become harder to play as the number of notes increases. However, the rate at which song level increases is reduced as higher difficulties are selected. While the easy difficulty songs form the steepest slope in its relationship between song level and the number of notes or notes/min, the expert difficulty songs form the flattest slope due to the wider range of song level, number of notes and notes/min.

For the rest of the blog post, I will use the number of notes and song difficulty as independent variables to predict song level. It should be noted; however, that similar results were found when the number of notes was replaced by notes/min to predict song level.

Building a model of song level vs number of notes

A linear model has two components: a gradient that represents the rate of change and the y-intercept that represents the initial value. Given that the relationship between song level and the number of notes varies for each difficulty, it makes sense to change both the gradient and the y-intercept parts of the model. Hence, I incorporated an interaction term into the model that allows the song difficulty to influence the relationship between song level and the number of notes.

#Create a model with an interaction term between difficulty and number of notes
level_diff_int <- lm(level ~ notes*difficulty, data = band_tidy)
summary(level_diff_int)
## 
## Call:
## lm(formula = level ~ notes * difficulty, data = band_tidy)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5671 -0.5275  0.0224  0.5320  2.6145 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             3.959007   0.257040  15.402  < 2e-16 ***
## notes                   0.027317   0.001784  15.311  < 2e-16 ***
## difficultynormal        5.487872   0.427006  12.852  < 2e-16 ***
## difficultyhard          8.676859   0.478188  18.145  < 2e-16 ***
## difficultyexpert       16.086420   0.460639  34.922  < 2e-16 ***
## notes:difficultynormal -0.012280   0.002228  -5.512 5.39e-08 ***
## notes:difficultyhard   -0.015418   0.001986  -7.762 3.95e-14 ***
## notes:difficultyexpert -0.019645   0.001868 -10.514  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8302 on 564 degrees of freedom
## Multiple R-squared:  0.9842, Adjusted R-squared:  0.984 
## F-statistic:  5012 on 7 and 564 DF,  p-value: < 2.2e-16

The song level can be predicted by the equation:

levelpred = (0.027 – 0.012 x difficultynormal – 0.015 x difficultyhard – 0.020 x difficultyexpert) x notespred + (3.96 + 5.49 x difficultynormal + 8.68 x difficultyhard + 16.09 x difficultyexpert)

The “difficulty” parts of the model can be defined as 0 or 1 depending on which difficulty the model is covering. For example, if we wanted to model the “expert” difficulty songs, we can set difficulty~expert~ = 1 and difficulty~normal~, difficulty~expert~ = 0. This allows the gradient and the y-intercept of the model to be adjusted for each difficulty. The performance of the model is very good with an R2^ value of 0.984 and a relatively low residual standard error (RSE) of 0.8302.

#Predict levels from data points in band_tidy using model
band_tidy_int <- augment(level_diff_int, data = band_tidy)

For each difficulty, plotting the residuals formed a random distribution of points around residual = 0. This indicates that it is appropriate to incorporate an interaction term into the model to predict song level.

For each difficulty, the model closely fits with the data points because the interaction term is present to change both the gradient and y-intercept of the model for each difficulty. However, for the expert difficulty songs, the model fails to account for the sharp drop-off in song level as the number of notes decreases.

To fix this problem, I fitted the expert difficulty songs onto a natural cubic spline, a piecewise graph that consists of cubic functions. I fitted a linear relationship for the other song difficulties.

#Place the expert difficulty songs in a natural cubic spline and maintain a linear relationship for the other difficulties
level_piece <- lm(level ~ notes + difficulty + notes:difficulty + ns(notes, 2):I(difficulty == "expert"), data = band_tidy)

#Look at performance of model
summary(level_piece)  
## 
## Call:
## lm(formula = level ~ notes + difficulty + notes:difficulty + 
##     ns(notes, 2):I(difficulty == "expert"), data = band_tidy)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2737 -0.5101 -0.0035  0.5050  2.4150 
## 
## Coefficients: (2 not defined because of singularities)
##                                               Estimate Std. Error t value
## (Intercept)                                   4.127861   0.428830   9.626
## notes                                         0.024572   0.005919   4.151
## difficultynormal                              5.386206   0.466126  11.555
## difficultyhard                                8.053912   1.365749   5.897
## difficultyexpert                              6.819089   1.733146   3.935
## notes:difficultynormal                       -0.011774   0.002410  -4.886
## notes:difficultyhard                         -0.013446   0.004502  -2.987
## notes:difficultyexpert                       -0.013044   0.005984  -2.180
## ns(notes, 2)1:I(difficulty == "expert")FALSE  1.539715   3.172823   0.485
## ns(notes, 2)2:I(difficulty == "expert")FALSE        NA         NA      NA
## ns(notes, 2)1:I(difficulty == "expert")TRUE  12.060487   2.170404   5.557
## ns(notes, 2)2:I(difficulty == "expert")TRUE         NA         NA      NA
##                                              Pr(>|t|)    
## (Intercept)                                   < 2e-16 ***
## notes                                        3.82e-05 ***
## difficultynormal                              < 2e-16 ***
## difficultyhard                               6.39e-09 ***
## difficultyexpert                             9.38e-05 ***
## notes:difficultynormal                       1.34e-06 ***
## notes:difficultyhard                          0.00294 ** 
## notes:difficultyexpert                        0.02969 *  
## ns(notes, 2)1:I(difficulty == "expert")FALSE  0.62767    
## ns(notes, 2)2:I(difficulty == "expert")FALSE       NA    
## ns(notes, 2)1:I(difficulty == "expert")TRUE  4.25e-08 ***
## ns(notes, 2)2:I(difficulty == "expert")TRUE        NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8096 on 562 degrees of freedom
## Multiple R-squared:  0.985,  Adjusted R-squared:  0.9848 
## F-statistic:  4103 on 9 and 562 DF,  p-value: < 2.2e-16

Although it is difficult to describe the relationship between song level and the number of notes with a natural cubic spline, the model performance improved. The R2 value increased from 0.984 in the previous model to 0.9848 with a natural cubic spline. The RSE was also reduced from 0.8302 in the previous model to 0.8096 with a natural cubic spline.

#Predict song levels in band_tidy dataset using level_piece model
band_tidy_piece <- augment(level_piece, data = band_tidy)

#Change column name of ".fitted" to "predicted_level"
colnames(band_tidy_piece)[colnames(band_tidy_piece) == ".fitted"] <-
  "predicted_level"

Compared to the residual plot of the previous model, incorporating a natural cubic spline into the model resulted in the expert difficulty songs showing a more concave relationship in its residuals. This has increased the randomness of the residuals, indicating improvements in the model.

The natural cubic spline more closely fits the data points on the expert difficulty songs, particularly as the number of notes decreases. A linear relationship between song level and the number of notes is also maintained for the other song difficulties.

In summary, the models show a significant relationship between song level and the number of notes when the difficulty setting is incorporated. This could be used to predict song level based on the total number of notes or notes/min.

A methodology to predict song level

#Calculate the difference between observed and predicted levels
band_tidy_piece <- 
  band_tidy_piece %>%
  mutate(predicted_level = round(predicted_level), 
         difference = level - predicted_level, 
         verdict = case_when(difference > 0 ~ "lower", 
                             difference == 0 ~ "same", 
                             difference < 0 ~ "higher"))

From the augmented band_tidy_piece data frame, I rounded the predicted song levels to the nearest whole value and subtracted it from the actual level, creating a “difference” column. The difference column can be thought of as the song level residual: it measures how far off the predicted level is from the actual level. From there, I sorted the songs into the following categories based on the sign of the difference value:

  • If the difference value was positive (i.e., the predicted level is lower than the actual level), the song was placed under the “lower” category;
  • If the difference value was negative (i.e., the predicted level is higher than the actual level), the song was placed under the “higher” category; and
  • If the difference value was equal to 0 (i.e., the predicted and actual levels matched), it was placed under the “same” category.
#Filter rows to only include the "expert" difficulty data of each song
level_rate_diff_expert <- band_tidy_piece %>%
  filter(difficulty == "expert")

I then filtered the data set to only include the expert difficulty data of each song. I did this because there is a wider variance on the ability to play and complete an expert difficulty song compared to the other difficulties.

Identifying the easier or harder expert difficulty songs

The songs that were placed under the “lower” group tended to be easier to play and complete than expected. This is due to them having a lower number of notes, allowing the player to more easily complete the song. On the other hand, songs under the “higher” group tended to have songs that are more difficult to play than expected due to the higher number of notes in the song.

In addition to the model where the number of notes is used to predict song level, I also generated a separate model where notes/min was used as an independent variable of song level. Both of these models produced similar “lower”, “higher” and “same” lists. Comparing the “higher” lists of the two models; however, revealed songs that were identified to be more difficult than expected in one model but not the other. For instance, Natsuzora SUN! SUN! SEVEN! (Summer Skies & SUN! SUN! SEVEN!) by Poppin’Party was seen as a difficult song to complete when the number of notes was used as the independent variable of the model. That song was not in the “higher” group when notes/min was used as the independent variable, instead appearing in the “same” group. That is because the song is one of the longest in the game, offsetting the high number of notes for a level 25 song. This results in a notes/min that is similar to other level 25 songs. This seems to suggest that the total number of notes and notes/min differentially influence song level, so they could be treated as separate independent variables if the model continues to be refined.

Comparing the model results with real-life surveys

Some surveys were conducted by a Japanese Youtuber Mihaya Gaming on what people thought were the most difficult expert songs. In these surveys, he asked thousands of people what were the most difficult level 25, 26 and 27 songs in the game. He then compiled a top 10 list of the most difficult songs for each song level. I wanted to compare the model and survey results to see whether the model was able to identify the most difficult songs from the surveys.

The most difficult level 25 songs

Rank

Song

English translation

% votes

ID by notes model

ID by notes/min model

1

Teardrops

58%

Yes

No

2

Kimi ja Nakya Dame Mitai

It Looks Like It Has To Be You

16%

No

Yes

3

Su-Suki Nanka Janai!

I Never Said Love!

12%

No

Yes

4

Hidamari Rhodonite

Sunkissed Rhodonite

6%

No

No

5

Zankoku na Tenshi no Thesis

A Cruel Angel’s Thesis

0.36%

No

No

6

Pride Kakumei

Pride Revolution

0.267%

No

Yes

7

Circling

0.2%

Yes

Yes

8

Yuriyurarararayuruyuri Daijiken

The Great YuriYurarararaYuruYuri Incident

0.187%

No

Yes

9

Romeo

0.18%

No

No

10

Alien Alien

0.147%

No

No

Source: https://youtube.com/watch?v=FGzn-PUsa-4

The notes/min model was able to identify half of the most difficult level 25 songs from the survey, including two of the top three most difficult songs: “Kimi ja Nakya Dame Mitai” and “Su-Suki Nanka Janai!”. In contrast, the notes model could only identify two of the most difficult level 25 songs. However, the model was able to identify the most difficult level 25 song in the survey that was not picked up by the notes/min model: “Teardrops”. This is due to the song having the highest number of notes of a level 25 song. It was not identified by the notes/min model as a difficult song because it is also one of the longest songs in the game. This resulted in a notes/min that was comparable to other level 25 songs.

The most difficult level 26 songs

Rank

Song

English translation

% votes

ID by notes model

ID by notes/min model

1

Tenka Toitsu A to Z

The World Stands As One

37%

No

Yes

2

Y.O.L.O!!!!! (You Only Live Once)

31%

No

No

3

Happy Synthesizer

20%

Yes

No

4

Go! Go! Maniac

10%

No

Yes

5

Imagination

0.141%

No

No

6

Light Delight

0.125%

No

No

7

Asu no Yozora Shoukaihan

Night Sky Patrol of Tomorrow

0.108%

No

No

8

Lost One no Goukoku

The Lost One’s Weeping

0.1%

Yes

Yes

9

Tsunagu, Soramoyou

The Look Of The Sky, Connected

0.091%

No

No

10

MOON PRIDE/R (tie)

-/-

0.066%

No

No

Source: https://www.youtube.com/watch?v=28eocJj3y8I

Moving onto the most difficult level 26 songs, the notes/min model identified three of the most difficult level 26 songs, including the most difficult one “Tenka Toitsu A to Z”. However, the notes/min model did not pick up the third most difficult song “Happy Synthesizer” which was identified by the notes model. This result reinforces the notion that the total number of notes and notes/min measure different aspects of a song, distinguishing them as different independent variables.

The most difficult level 27 songs

Rank

Song

English translation

% votes

ID by notes model

ID by notes/min model

1

Zettai Sengen Recital

Absolute Declaration ~Recital~

38.48%

No

No

2

Goka Gokai Phantom Thief

27.48%

No

No

3

Passionate Anthem

12.17%

No

No

4

Determination Symphony

5.82%

No

No

5

Teardrops (special)

Teardrops (special)

5.75%

No

No

6

Oneness

4.60%

No

No

7

Louder

3.59%

No

No

8

This Game

1.74%

No

No

9

Guren no Yumiya

Crimson Bow and Arrow

1.12%

No

No

10

Redo

0.63%

No

No

Source: https://www.youtube.com/watch?v=5lA0zX01rtk

Both the notes and notes/min models were able to identify some of the most difficult level 25 and 26 songs. However, they were unable to identify the most difficult level 27 songs, categorising half of them as easier than the actual song level. This is because the natural cubic spline flattens out at around level 27 as the number of notes or notes/min is increased, making it near impossible to predict higher song levels. This highlights the limitation of the current models, song difficulty and number of notes or notes/min may not be enough to predict song levels beyond level 26. There are other independent variables that could be incorporated into the model such as song duration and tempo. As well as that, the number of notes can be split into their constituent parts such as off-beats, holds and swipes. Therefore, incorporating more independent variables into the model may allow higher song levels to be predicted for the expert difficulty songs.

Conclusion

In summary, both the number of notes and notes/min can be used to predict song level, a measure of how difficult it is to complete or full combo a song. This can be used to identify songs that are easier or harder to play than expected, allowing one to choose songs that are easy to play at a specific song level. The models are also robust in identifying the most difficult songs according to the surveys but also highlight improvements that need to be made to predict higher song levels, perhaps by incorporating other independent variables such as duration and tempo.

2 thoughts on “Using the number of notes to predict the most difficult songs in the BanG Dream! rhythm game

  1. Is there a list anywhere where I can visually see each song and how many notes it has at each difficulty level?

    1. Thank you for your question. Unfortunately, there isn’t a specific list that lists the number of notes for every song and difficulty (without fetching that data via an API). The best place to start is Bestdori (bestdori.com) – you can search for a specific song and see how many notes each difficulty has (under the “meta” table). For this post, I noted down the number of notes for each song individually. Whether you want to do that or not is up to you.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.