What are the themes and sentiments of Poppin’Party’s and SILENT SIREN’s songs? A look into the bands of the “NO GIRL NO CRY” band battle

Poppin’Party and SILENT SIREN are two all-female Japanese bands that play similar styles of music. Poppin’Party is one of the bands that is part of the BanG Dream! franchise established by Bushiroad. Spanning multiple forms of media, the band consists of anime characters whose voice actresses also perform their own instruments in live shows. SILENT SIREN was established in 2010 by a group of amateur female models. They have released many albums and have also performed in various live shows.

The participants of the “NO GIRL NO CRY” band battle. Left is Poppin’Party and right is SILENT SIREN. Source: https://bandori.fandom.com/wiki/File:NGNC_Main_Visual.jpg

These two bands will perform in the band battle event “NO GIRL NO CRY” in Japan on May 18th and 19th. In celebration of this event, I looked at the lyrics of Poppin’Party’s and SILENT SIREN’s songs to identify the themes and sentiments between the two bands. This was done using a methodology established in my last blog post. Additional analyses were also conducted to glean more insights from the songs of both bands.

Exploratory Data Analysis of lyrics

## # A tibble: 2 x 3
##   band         num_songs num_words
##   <chr>            <int>     <dbl>
## 1 Poppin'Party        30      3255
## 2 Silent Siren        38      3659

SILENT SIREN released many songs over their nearly ten years of existence. Luckily, I was able to find enough English translations of their songs to match the number of English translations of Poppin’Party songs. This enabled comparable text and sentiment analyses to be conducted between the two bands.

Both bands had a word that appeared two to three times more frequently in the lyrics compared to other words. For Poppin’Party, that word was “dream” which appeared two times more frequently than other words. For SILENT SIREN’S, the word “love” appeared three times more frequently than other words. These observations may underline the predominant themes of each band’s songs which will be explained later in the blog post.

Commonality cloud: which words are common across both bands’ lyrics?

A commonality cloud visualises the frequency of words that appear in the lyrics of both bands. The size of each word in a commonality cloud is based on how frequently the word appears in both groups of lyrics. Note that a word that appears very frequently in both groups of songs will be bigger in the commonality cloud than a word that appears frequently in one group of songs but not the other.

According to the commonality cloud, both Poppin’Party’s and SILENT SIREN’s songs have words that were associated with feelings and experiences. In particular, love was a common word found in the lyrics of both bands’ songs. Other words relating to experiences that appeared in both groups of songs included “time”, “world” and “summer”.

Comparison cloud: which words are more frequent in one band’s songs over the other?

In contrast to a commonality cloud, a comparison cloud plots words based on whether the word appears more frequently in the lyrics of a band’s songs compared to the other. A difference is taken between the word frequencies of both groups of lyrics. Following this, the word is plotted to one side of the comparison cloud and the size is varied based on the magnitude of the difference. A comparison cloud allows us to identify words and potential themes that are prevalent in a band’s songs over another group of songs.

From the comparison cloud, “dream” appeared more frequently in Poppin’Party’s songs compared to SILENT SIREN’s songs. Also in Poppin’Party’s side of the comparison cloud are words that are related to experiences and the future such as “song”, “future” and “tomorrow”. The appearance of these words in the comparison cloud indicates that Poppin’Party’s songs tend to touch on achieving goals for the future and creating experiences while doing so.

In contrast, SILENT SIREN’s songs tend to touch on romance. “Love” is a word that appeared frequently in SILENT SIREN’s songs, moreso than Poppin’Party’s songs. In SILENT SIREN’s side are words that are also associated with romance such as “sweet”, “darling” and “kiss”. The appearance of these words in SILENT SIREN’s songs indicates that their songs tend to deal with love and romance and how people react to them.

Bing sentiment analysis of the bands’ songs

I conducted a “bing” sentiment analysis of the songs to measure the proportion of positive and negative words of each band’s lyrics. Overall, Poppin’Party’s songs had a higher proportion of positive-associated words compared to SILENT SIREN’s songs.

Half of the negative- and positive-associated words are similar across Poppin’Party’s and SILENT SIREN’s songs. For Poppin’Party’s songs, most of the negative-associated words are linked to sensations relating to negative emotions such as “throb”, “shake” and “painful”. They also have positive-associated words that describe a person’s internal strengths such as “courage”, “strong” and “gentle”. SILENT SIREN’s songs tend to have negative-associated words linked to loneliness such as “cry”, “lonely” and “ambiguous”. They also have positive-associated words that describe sensations such as “happy”, sparkle” and “flutter”. These words might appear more frequently in SILENT SIREN’s songs due to their focus on romance.

NRC sentiment analysis of the bands’ songs

I conducted an NRC sentiment analysis to measure the proportion of words belonging to specific emotions. The proportion of words in six of the eight emotions are similar between the two bands. However, SILENT SIREN’s songs had a lower proportion of words associated with anticipation and a higher proportion of words belonging to fear compared to Poppin’Party’s songs.

Some of the most frequent words associated with each emotion such as “feeling” and “smile” were similar across both bands. There were some unique words in each band; however, that may represent the overall themes of their songs. For Poppin’Party, “sing” is a word that appears across many emotions, namely anticipation, joy, sadness and trust. These emotions can be found in some of their songs which touch on many themes, particularly the idea of playing together as a band.

In contrast, SILENT SIREN’s songs can be split into two broad areas. The words “sweet” and “kiss” can be found in many positive emotions such as anticipation, joy, surprise and trust. These words relate to the romantic theme of their songs. Another area touched on in SILENT SIREN’s songs could be the feeling of loneliness when losing friends or breaking up. This can be seen in the words “lonely” in anger and disgust, “disappear” and “escape” in fear and “leave” and “cry” in sadness.

Conclusion

Conducting text and sentiment analyses of the songs has uncovered some interesting insights about both bands. Poppin’Party’s songs tend to talk about setting and achieving future goals while creating memories and experiences. Their songs tend to be quite positive but they also touch on a variety of sensations and emotions as evidenced in their songs. On the other hand, SILENT SIREN’s songs tend to talk about romance and the various emotions elicited, both positive (in the case of joy) and negative (in the case of loneliness). It is surprising; then, that the similar music styles of both bands can cover up the different topics touched on by both bands. Based on these results, it will be interesting to see how these two bands will clash when they meet in the band battle this weekend.

Acknowledgements

I would like to acknowledge the following people who have translated the Poppin’Party and SILENT SIREN songs from Japanese to English:

Arislation
BlaZofgold
Eureka
Gessami
Kei
Kikkikokki
Komichi
LuciaHunter
Maki
ManekiKoneko
MijukuNine
Misa
NellieFeathers
Ohyododesu
Starlogakemi
Thaerin
Tsushimayohane
UnBound
Youraim

I may have missed other people who have translated songs for this analysis, but I thank you all the same.

A methodology of conducting sentiment analysis of Bandori lyrics

This blog post was written with the intention of describing how I conducted my sentiment analyses so that others can replicate what I did and possibly adapt it to their projects. The analysis was conducted in R using the tidyverse series of packages.

Setting up

I started by loading the following packages I needed to conduct sentiment analysis into R:

tidyverse, a suite of packages that makes it easy to manipulate tibbles (a type of table or dataframe) and generate graphs using the dplyr and ggplot2 packages respectively;
stringi as I needed the stri_isempty() function to remove empty lines;
corpus for the text_tokens() function to stem and complete words while cleaning the lyrics;
qdap and tm for cleaning lyrics and conducting initial text mining analyses;
tidytext to access the “bing” and “nrc” sentiment lexicons I needed to conduct sentiment analyses;
wordcloud2 to visualise word frequencies within Bandori songs; and
broom to convert the test results into a table so that parts of the test results can be easily extracted.

#Load the required packages
library(tidyverse)
library(stringi) 
library(corpus) 
library(qdap)
library(tm)
library(tidytext)
library(wordcloud2)
library(broom)

I created an empty tibble “lyricsTbl” containing four columns: “doc_id” for numerically identifying the lyrics; “text” for holding lyric data, “title” for song name and “band” for band name. This empty tibble was used to import lyrics and associated data. Note that the columns were set in this exact layout because when the corpus is created, the tibble splits in half with the last two columns being defined as metadata for the lyrics data from the first two columns.

#Create an empty tibble to import lyrics
lyricsTbl <- tibble(doc_id = numeric(), text = character(), title = character(), band = character())

Function building

I created three functions to make it easier to run all importing and cleaning steps in one go with fewer inputs.

I first built the lyric_import() function to import lyrics into the lyricsTbl tibble. This function takes a string containing the name of the lyrics document (doc) and a number (id) for the doc_id and returns a lyricsTbl tibble containing the doc_id and lyrics data as well as its associated metadata (i.e., song and band names). Note that the lyrics documents have to be set up in a specific layout so that parts of the lyrics document are correctly imported into lyricsTbl:

The first line contains the song and band titles, separated with a “ by ” separator.
The second line contains the URL to the lyrics.
The third line is a credits line, acknowledging the person that has translated the lyrics and the date in which the translated lyrics were first uploaded.
The fourth line onwards contains the lyrics.

#Build a function to import lyrics into lyricsTbl
lyric_import <- function(doc, id) {

  #Import raw lyrics document into R
  document <- readLines(doc)

  #Collect song information and lyrics from the raw lyrics document
  first_line <- document[1]
  title <- str_split(first_line, " by ")[[1]][1]
  band <- str_split(first_line, " by ")[[1]][2]
  lyrics <- document[4:length(document)]

  #Remove empty lines in lyrics
  lyrics <- lyrics[!stri_isempty(lyrics)]

  #Combine lyrics into one line
  line <- str_c(lyrics, collapse = " ")

  #Add lyrics and song information into lyricsTbl table
  lyricsTbl <- add_row(lyricsTbl, doc_id = id, text = line, title = title, band = band)

  #Return the updated lyricsTbl table
  return (lyricsTbl)
}

The second function I built is the corpus_clean() function. This function is designed to run the cleaning steps of each lyrics document so that punctuation and common stopwords (commonly-used words that do not add meaning to text analyses) are removed. This function takes a corpus of lyrics documents and a vector of additional stopwords to remove and returns a corpus where punctuation and stopwords are removed.

#Build a function containing a pipe to clean the corpus of lyrics documents
corpus_clean <- function(corpus, stopword = "") {
  #Define stopwords first to remove common and user-defined stopwords
  stopwords <- c(stopwords("en"), stopword)  

  #Build a stemmer dictionary from lexoconista.com
  stem_list <- read_tsv("C:\\D\\2015\\PhD\\data science blog\\bandori lyric analysis\\lemmatization-en.txt")
  names(stem_list) <- c("stem", "term")
  stem_list2 <- new_stemmer(stem_list$term, stem_list$stem)
  stemmer <- function (x) text_tokens(x, stemmer = stem_list2)

  #Replace all mid-dots (present in some lyrics) with an empty space. This function was defined as removePunctuation cannot remove mid-dots
  remove_mid_dot <- function (x) str_replace_all(x, "·", " ")

  #Replace original apostrophies in the lyrics with the alternative apostrophe because retaining original apostrophies prevents replace_contraction from working
  replace_apos <- function(x) str_replace_all(x, "'", "'")

  #Clean corpus through a pipe
  corpus <- corpus %>%
    tm_map(content_transformer(tolower)) %>%
    tm_map(content_transformer(remove_mid_dot)) %>% 
    tm_map(content_transformer(replace_apos)) %>%
    tm_map(content_transformer(replace_abbreviation)) %>%
    tm_map(content_transformer(replace_contraction), sent.cap = FALSE) %>%
    tm_map(content_transformer(replace_symbol)) %>%
    tm_map(content_transformer(replace_number)) %>%
    tm_map(content_transformer(stemmer)) %>% 
    tm_map(removeWords, stopwords) %>%     
    tm_map(removePunctuation) %>%      
    tm_map(stripWhitespace)

  return(corpus)
}

Lastly, I built the word_freq() function which generates a word frequency table of each song in tidy format (with each row defining a word-song pair and each column representing a variable). This function takes a corpus of cleaned lyrics documents and returns a tidied tibble containing columns for words, song names and their frequencies.

#Create a function showing frequencies of each word in a song in tidy tibble format
word_freq <- function(corpus) {
  #Generate a Term Document Matrix (TDM) and convert it into a matrix
  tdm <- TermDocumentMatrix(corpus) 
  matrix <- as.matrix(tdm) 

  #Name columns (i.e., the songs) in the matrix
  colnames(matrix) <- meta(corpus)$title 

  #Convert matrix into a tibble and add a column containing the words
  tdm_song <- as_tibble(matrix) %>% mutate(word = rownames(matrix))

  #Swap columns so that word column is moved from the last to the first column
  tdm_song <- tdm_song[, c(ncol(tdm_song), 1:ncol(tdm_song) - 1)] 

  #Tidy the table so that all song names are placed in one column and remove any rows with 0 frequency 
  tdm_tidy <- tdm_song %>% 
    gather(key = "song", value = "freq", -word) %>%
    filter(freq != 0) 

  return(tdm_tidy)
}

Writing these three functions made it easier to generate the tables of data that were required to conduct sentiment analyses.

Importing and cleaning lyrics data

English-translated lyrics were copied from the Bandori Wikia (https://bandori.wikia.com/wiki/BanG_Dream!_Wikia) and pasted into separate .txt files in Notepad. These .txt files were then saved into one folder containing all the English-translated lyrics of Bandori original songs. From there, the lyric_import() function was used to import translated lyrics from all bands into lyricsTbl. Each lyrics document was assigned a unique doc_id according to the order in the folder.

#Import all lyrics into lyricsTbl
for (i in 1:length(dir())) {
  lyricsTbl <- lyric_import(dir()[i], i)
}

The lyricsTbl containing the lyrics was converted into a volatile corpus. In the creation of the corpus, the lyricsTbl was split into content (containing the “doc_id” and “text” columns) and metadata (containing “title” and “band” columns) tables. The lyrics were then cleaned with the corpus_clean() function. Note that no stopwords were added to the corpus_clean() function alongside the most common stopwords.

#Create a volatile corpus containing the Bandori lyrics
bandori_corpus <- VCorpus(DataframeSource(lyricsTbl))

#Clean bandori_corpus with corpus_clean() function
bandori_corpus_clean <- corpus_clean(bandori_corpus)

This is because each band had words that are overused in a small number of songs (typically one to two songs) without giving much context. Examples of such words include words that were used consecutively (e.g., “fight”), exclamations (e.g., “cha”) and sound effects (e.g., “nippa”). These words were identified for each band using the term frequency-inverse document frequency (Tf-Idf) which identifies the most frequent words spread over few documents. These words along with the band were stored in a CSV file which was loaded into R as “stopwords.”

#Load CSV file containing band-specific stopwords
stopwords <- read_csv("C:\\D\\2015\\PhD\\data science blog\\bandori lyric analysis\\compiled lyrics\\band_stopwords.csv")

A tidied word frequency table containing frequencies of all words for each song (minus common stopwords) was generated with the word_freq() function. From there, two anti-joins were conducted to remove the stopwords: the common stopwords that were not initially removed in the corpus_clean() function (under stop_words) and the band-specific stopwords (contained in stopwords). As the word frequency tibble contained the song names but not the band names, the column containing the band names was also added via an inner_join() function between the word frequency table and the meta table of the corpus.

#Create a tidied word frequency table, removing more common stopwords and band-specific stopwords.
bandori_noStop <- word_freq(bandori_corpus_clean) %>%
                  anti_join(stop_words) %>%
                  inner_join(meta(bandori_corpus), by = c("song" =                         "title")) %>%
                  anti_join(stopwords, by = c("word", "band"))

Exploratory data analysis

I initially generated a table which had counts for the number of songs and words for each band. Song counts were derived from the original lyricsTbl tibble while word counts were obtained from the bandori_noStop tibble. They were then combined into one table so that the number of songs and words were matched to their bands.

#Count the number of songs for each band
band_count <- lyricsTbl %>%
                group_by(band) %>%
                summarise(num_songs = n())

#Count the total number of words for each band
word_count <- bandori_noStop %>%
                group_by(band) %>%
                summarise(num_words = sum(freq))

#Combine the total number of songs and words into one table
(band_summary <- band_count %>%
                  left_join(word_count, by = "band"))

## # A tibble: 6 x 3
##   band                num_songs num_words
##   <chr>                   <int>     <dbl>
## 1 Afterglow                   8       900
## 2 Hello, Happy World!         7       653
## 3 Pastel*Palettes             8       690
## 4 Poppin'Party               30      3255
## 5 RAISE A SUILEN              4       462
## 6 Roselia                    18      1939

From the word frequency table of Bandori songs, I generated a wordcloud of the 100 most frequently used words using the wordcloud2 package. A colour gradient was used with gray, yellow and red representing increasing word frequencies.

#Include only 100 most frequently used words in Bandori songs
top_100_nostop <- bandori_noStop[, c(1, 3)] %>%
  group_by(word) %>%
  summarise(total = sum(freq)) %>%
  arrange(desc(total)) %>%
  head(100)

#Define colour gradient for wordcloud
cloud_colour3 <- ifelse(top_100_nostop$total > 66, "#E50050", 
                        ifelse(top_100_nostop$total >= 40, "#F2B141", 
                               "#808080"))

#Generate the wordcloud
wordcloud2(top_100_nostop,
           size = 0.25, 
           shape = "star", 
           shuffle = FALSE, 
           color = cloud_colour3)

“Bing” sentiment analysis of lyrics

Here was the ggplot2 theme that I used in most graphs of this blog post.

#Define a theme to be used across all graphs
bandori_theme <- theme(legend.position = "bottom", 
                       plot.title = element_text(hjust = 0.5, size = 15, 
                                                 face = "bold"),
                       axis.title = element_text(size = 10, face = "bold"
                                                 ),
                       axis.text.x = element_text(size = 8),
                       legend.text = element_text(size = 10))

To do the “bing” sentiment analysis of the lyrics, I matched the words from the word frequency table with the table of known words and their “bing” sentiments via an inner-join. The resultant table bandori_bing contains words that are identified as “positive” or “negative” under the “bing” sentiment lexicon.

#Match bandori_noStop with the "bing" sentiment lexicon
bandori_bing <- bandori_noStop %>%
                inner_join(get_sentiments("bing"), by = "word")

For each band, I counted the number of words that were either “positive” or “negative” and generated a table with separate “positive” and “negative” count columns. I did further calculations to count the total number of sentiment words (total), the difference between the number of positive and negative words (polarity) and the proportion of positive words (prop_pos).

#Count the total number of positive and negative words from "bing" sentiment lexicon
bandori_bing_total <- bandori_bing %>%
                        group_by(band, sentiment) %>%
                        summarise(count = sum(freq)) %>%
                        spread(sentiment, count) %>%
                        mutate(total = positive + negative, 
                               polarity = positive - negative,
                               prop_pos = positive / total)

It is possible that the proportion of positive words might appear to deviate away from the 0.50 value by chance. Hence, to test whether there were significantly more positive than negative words, I conducted a Test of Equal Proportions. From the test, I collected the p-values as well as the 95% lower and higher confidence interval values. Along with the number of songs, these were added to the bandori_bing_total table.

#Define empty numeric vectors to store Equal Proportion test results
p_values <- vector(mode = "numeric")
conf_low <- vector(mode = "numeric")
conf_high <- vector(mode = "numeric")

#Conduct the Equal Proportion test to see whether each band has more positive words than negative words (by comparing it to prop = 0.5)
test_results <- for (i in 1:6) {
  z_test <- binom.test(bandori_bing_total$positive[i], 
                       bandori_bing_total$total[i], 
                       alternative = "two.sided")
  tidied <- tidy(z_test)
  p_values[i] <- tidied$p.value
  conf_low[i] <- tidied$conf.low
  conf_high[i] <- tidied$conf.high
}

#Add test results onto bandori_bing_total
bandori_bing_total$conf_low <- conf_low
bandori_bing_total$conf_high <- conf_high
bandori_bing_total$p_value <- p_values
bandori_bing_total$num_songs <- band_count$num_songs

#Rearrange bandori_bing_total so that number of songs is next to song names, then round decimals to 3 significant values
bandori_bing_total2 <- bandori_bing_total %>%
                        select(band, num_songs, negative:p_value) %>%
                        mutate(prop_pos = round(prop_pos, 3), 
                               conf_low = round(conf_low, 3),
                               conf_high = round(conf_high, 3),
                               p_value = round(p_value, 3))

From the bing sentiment analysis, I also generated a graph visualising the proportion of positive and negative words for each band. Given that there were different word counts among the bands, I normalised the number of positive and negative words as proportions so that they can be compared across bands.

#Graph the proportions of positive and negative words for each band
bandori_bing %>%
  group_by(band, sentiment) %>%
  summarise(count = sum(freq)) %>%
  ggplot(aes(x = band, y = count, fill = factor(sentiment))) +
  geom_col(position = "fill") + 
  geom_hline(yintercept = 0.50, colour = "black", linetype = 2) +
  scale_fill_manual(values = c("red", "green")) + 
  labs(x = "Band", y = "Proportion", fill = "Sentiment", title = "Proportion of positive/negative words in Bandori songs") + 
  bandori_theme

I also counted the number of songs that were positive or negative overall according to bing sentiment analysis. Positive and negative songs were defined as songs where the polarity (the difference between the number of positive and negative words) is more than 2 or less than -2 respectively. Songs whose polarities were between -2 and 2 inclusive were defined as neutral because these differences were too small to conclusively group that song as positive or negative.

#Count the number of positive and negative sentiment words for each song (while keeping the band name)
bandori_bing_songSent <- bandori_bing %>%
                      group_by(band, song, sentiment) %>%
                      summarise(total = sum(freq)) %>%
                      ungroup() %>%
                      spread(sentiment, total)

#Replace NAs in bandori_bing_songSent with 0
bandori_bing_songSent[is.na(bandori_bing_songSent)] <- 0

#Continue grouping songs into different sentiment categories
bandori_bing_sort <- bandori_bing_songSent %>%
                      mutate(polarity = positive - negative, 
                             result = 
                               case_when(polarity > 2 ~ "positive", 
                                        (polarity <= 2) & (polarity >= -2) ~ "neutral", 
                                        polarity < -2 ~ "negative")) %>%
                      group_by(band, result) %>%
                      summarise(num_song = n()) %>%
                      spread(result, num_song)

#Replace NAs in bandori_bing_sort with 0
bandori_bing_sort[is.na(bandori_bing_sort)] <- 0

“NRC” sentiment analysis of lyrics

Similar to the “bing” sentiment analysis, I initially matched the words from the word frequency table to the table of known words and their emotions via an inner-join. Then for each band, the number of words under a specific emotion were counted. As positive and negative sentiments were already analysed during the “bing” sentiment analysis, data relating to the two sentiments were excluded for the “NRC” sentiment analysis. The table was then modified so that emotions appear as separate columns.

#Match bandori_noStop with the "nrc" sentiment lexicon
bandori_nrc <- bandori_noStop %>%
  inner_join(get_sentiments("nrc"), by = "word")

#For each band, count the number of words under each emotion and exclude positive and negative sentiments
bandori_nrc_total <- bandori_nrc %>%
  group_by(band, sentiment) %>%
  summarise(count = sum(freq)) %>%
  filter(!sentiment %in% c("positive", "negative"))

#Spread the bandori_nrc_total table so that emotions appear as separate columns
bandori_nrc_spread <- spread(bandori_nrc_total, sentiment, count)

Proportions of words under specific emotions were also calculated so that they can be compared across bands. This was done by generating a proportional or marginal table.

#Convert bandori_nrc_spread into a matrix
bandori_nrc_matrix <- as.matrix(bandori_nrc_spread[, 2:9])
rownames(bandori_nrc_matrix) <- bandori_nrc_spread$band

#Calculate proportions for bandori_nrc_total
bandori_nrc_prop <- round(prop.table(bandori_nrc_matrix, 1), 2)

Following this, I generated a graph visualising the proportion of words that appeared under a specific emotion for each band. Again, the height of the bars were normalised in order to calculate and compare proportions across different bands.

#Define a named vector of colours attached to specific emotions
emotion_colour <- c("red", "green4", "lawngreen", "black", "yellow1", "navy", "purple", "lightskyblue")
names(emotion_colour) <- c("anger", "anticipation", "disgust", "fear", "joy", "sadness", "surprise", "trust")

#For each song, count the number of words belonging to a specific sentiment
bandori_nrc_total %>%
  ggplot(aes(x = band, y = count, fill = sentiment)) + 
  geom_col(position = "fill") + 
  scale_fill_manual(values = emotion_colour) +
  labs(x = "Band", y = "Proportion", fill = "Emotion", title = "\"NRC\" emotion words in Bandori songs") + 
  bandori_theme

Acknowledgements

I would like to acknowledge the following people who have translated the original Bandori songs from Japanese to English:

AERIN
Aletheia
Arislation
Betasaihara
BlaZofgold
bluepenguin
Choocolatiah
Eureka
Hikari
Komichi
Leephysic
Leoutoeisen
LuciaHunter
lunaamatista
MijukuNine
Ohoyododesu
PocketLink
Rolling
Starlogakemi
Thaerin
Tsushimayohane
UnBound
vaniiah
Youraim

I may have missed other people who have translated songs for this analysis. If I have missed you, I would also like to thank you all the same.

What sentiments and emotions are contained in RAISE A SUILEN’S songs? A sentiment analysis of Bandori lyrics

RAISE A SUILEN (shortened as RAS) is a band in the BanG Dream! franchise (shortened as Bandori) consisting of five female musicians. They were initially formed as THE THIRD to perform songs from Bandori bands that do not play their own instruments (those being Afterglow, Pastel*Palettes and Hello, Happy World!). However, they have evolved to become their own separate live band where they now also perform their own songs. RAS also appears in the second season of the BanG Dream! anime series, where the band members play as characters that form a band to compete against Roselia, another Bandori live band.

I was first interested in RAS’ songs when I glanced that their lyrics tended to have a rebellious theme. From there, I was interested in finding out the predominant sentiments and emotions contained in RAS’ songs and how they compared to songs from other Bandori bands. This blog post describes how that was achieved with sentiment analysis: the process of identifying an author’s sentiments and emotions from text.

Brief methodology

A full methodology of importing and cleaning lyrics and downstream analyses are contained in a subsequent methodology blog post (https://activeevaluator.com/bandori-sentiment-method/). In brief, English translations of original songs for each band were copied from the Bandori fandom website (https://bandori.wikia.com/wiki/BanG_Dream!_Wikia) and pasted into separate .txt files in one folder. The .txt files of the English-translated songs were later imported into R. The lyrics were then cleaned to remove any punctuation, abbreviations, contractions and common and band-specific stopwords (defined as commonly-used words that do not add meaning to text analyses). Once the lyrics were cleaned, I generated a word frequency table counting the number of times a word appeared in each song. The word frequency table was then matched to known sentiment and emotion words from the “bing” and “nrc” sentiment lexicons respectively to calculate the proportion of words under a specific sentiment or emotion for each band.

Exploratory Data Analysis

## # A tibble: 6 x 3
##   band                num_songs num_words
##   <chr>                   <int>     <dbl>
## 1 Afterglow                   8       900
## 2 Hello, Happy World!         7       653
## 3 Pastel*Palettes             8       690
## 4 Poppin'Party               30      3255
## 5 RAISE A SUILEN              4       462
## 6 Roselia                    18      1939

Poppin’Party and Roselia, two well-established live bands in the Bandori franchise, have the most number of songs and consequently the highest number of words in their lyrics. In contrast, given that RAS was only established last year, it is not surprising that they only have four songs so far. Hence, the results of sentiment analyses for RAS may change as English translations of new RAS songs are released.

The wordcloud above gives a visualisation of which words appear most often in Bandori songs with grey, yellow and red words representing increasing word frequencies. The most frequent words tend to relate to positive nouns such as smile, love and dream. In particular, dream was used very often in Bandori songs, particularly the lyrics of Poppin’Party.

Bing sentiment analysis

The “bing” sentiment lexicon is a dictionary of words that are grouped into either “positive” or “negative” sentiments. Initially conceived by Bing Liu and collaborators (Hu & Liu, 2004), this dictionary was built from a small group of adjectives with known sentiments that were used to predict the sentiments of other adjectives and nouns. The end-result is a dictionary containing words that are either labelled “positive” or “negative”. I used the “bing” sentiment lexicon to calculate the proportion of words in each band that were either positive or negative.

## # A tibble: 6 x 7
##   band                num_songs negative positive total prop_pos p_value
##   <chr>                   <dbl>    <dbl>    <dbl> <dbl>    <dbl>   <dbl>
## 1 Afterglow                   8      126      104   230    0.452   0.166
## 2 Hello, Happy World!         7       41      138   179    0.771   0    
## 3 Pastel*Palettes             8       79      115   194    0.593   0.012
## 4 Poppin'Party               30      170      359   529    0.679   0    
## 5 RAISE A SUILEN              4       74       64   138    0.464   0.444
## 6 Roselia                    18      221      268   489    0.548   0.037

Similar to Afterglow’s songs, RAS’ songs have an approximately equal number of positive- and negative-associated words. For both bands, there was no statistically-significant deviation from the 50% positive sentiment null result (p > 0.05 via Test of Equal Proportions). To note, Roselia’s songs had significantly more positive words than negative words, though the difference in proportions between the two groups is quite small (p = 0.037 via Test of Equal Proportions, 95% confidence interval = (50.3%, 59.3%)). In contrast, the lyrics of the other three bands (Hello, Happy World!, Pastel*Palettes and Poppin’Party) have significantly and relatively more positive-associated than negative-associated words in their lyrics (p < 0.05 via Test of Equal Proportions).

## # A tibble: 6 x 4
##   band                positive neutral negative
##   <chr>                  <dbl>   <dbl>    <dbl>
## 1 Afterglow                  4       0        4
## 2 Hello, Happy World!        6       0        1
## 3 Pastel*Palettes            7       0        1
## 4 Poppin'Party              22       5        3
## 5 RAISE A SUILEN             1       1        2
## 6 Roselia                   10       5        3

I also used the “bing” sentiment lexicon to count the number of songs for each band that had a “negative”, “neutral” or “positive” overall sentiment. A song with “positive” overall sentiment was defined as a song with more positive words than negative. This was reversed in a song that had a “negative” overall sentiment. From these analyses, four out of six Bandori bands had more songs that were positive overall while Afterglow had an equal number of positive and negative songs. In contrast, RAS had more songs that are negative overall than positive songs.

These results indicate that RAS’ songs are less positive overall than songs from other bands. The limitation of the “bing” sentiment lexicon is that it is only able to distinguish “positive” and “negative” sentiments; it is unable to identify specific emotions contained within the lyrics. An alternative sentiment lexicon would need to be used to probe this result further.

NRC sentiment analysis

The “nrc” sentiment lexicon was conceived by Saif Mohammad and Peter Turney to group words into positive and negative sentiments and eight primary emotions: anger, anticipation, disgust, fear, joy, sadness, surprise and trust (Mohammad and Turney, 2010). The dictionary was built by crowdsourcing where people sorted the terms into a sentiment and one or more primary emotions. The NRC sentiment lexicon was used in my analysis to identify the predominant emotions contained within each band’s songs.

The NRC sentiment analysis results are interesting. RAS’ songs tended to have a higher proportion of words that are associated with negative emotions, namely anger and fear. RAS’ songs also had a lower proportion of words associated with positive emotions such as joy and trust. Taken together, these results show that RAS’ songs are geared more towards negative emotions than positive ones which seem to conform to RAS’ themes of mistrust and rebellion.

How do sentiment analysis results relate to the overall themes of songs?

In the Bandori universe, Afterglow, RAS and Roselia are three rock bands that were formed under different circumstances. Afterglow was formed by a group of friends as a way to stay together while Roselia was formed with dreams of making it to the “Future World Festival”, a high-level rock music festival. In contrast to these two bands, RAS was formed as a rival band to Roselia with ambitions to beat them. The sentiment analysis results match the bands’ ambitions and themes which are reflected in the lyrics of their songs.

A predominant theme in Afterglow’s and Roselia’s songs is the positive influence friendship brings to a group. Both bands have lyrics with a lot of words relating to joy and trust which can be driven by friendship. For Afterglow, friendship keeps the band members together which allows them to create memories and influence each other positively. These can be seen in some of their songs such as COMIC PANIC and Jamboree! Journey! where the band members enjoy being together to create new experiences such as creating a manga or going out.

The power of friendship is also evident in Roselia’s songs. Some of their songs tend to show a thankful tone to a specific person. This can be seen in their song Kiseki, where the singer thanks one of their band members for what they have contributed to the band. They call out that despite going down separate paths, they promise to keep in touch while they move forward. This reciprocality of support among band members is less prevalent in RAS’ songs which tend to be more self-centred. This can be seen in their song UNSTOPPABLE, where the singer is merely using the other person to ramble on their negative thoughts towards the other person without being considerate. This is reflected in sentiment analyses of their songs which show a reduced association to trust and an increased association to anger.

How they go about changing themselves is another point of difference among the three bands. The power of friendship in Roselia’s songs can also be used as a positive influence to change someone’s perspective on life. A prime example of this is Re:Birth Day, where the singer initially expresses her despair of being left alone. However, she is not only soothed by the support of the other band members but she also encourages the other band members to move forward. Personal change is also explored in Afterglow’s songs with a prime example being That is How I Roll!. In this song, the need for personal change is driven by a realisation that doing nothing will not change anything. This realisation is what drives the singer to change to become a better person.

In contrast, RAS’s songs have the tendency of the need to push themselves to breaking point to change as quickly as possible. This is driven by the sense of fear that they will not stand out if they do not change. Little joy comes out during this process. This can be seen in their song EXPOSE “Burn Out!!!”, where the need to change is driven by a sense of fear. The reader is pushed to change out of anger and impulse and to let out their emotions. This theme can be seen in sentiment analyses of their songs with a higher proportion of words associated with fear and anger compared to other bands.

Conclusion

Sentiment and text analyses of the Bandori songs both show that RAS’ songs tend to be more negative compared to those from other bands. Their songs tend to show a sense of anger and mistrust towards other people, something that is at odds with the power of friendship shown in the lyrics of other bands. Given that RAS will release new songs in mid-June 2019, it would be interesting to see whether they maintain the same sentiments and emotions from their earlier songs. Nevertheless, sentiment analysis of song lyrics can be used to delve into the sentiments and emotions of the bands which can be matched to the overall themes of their songs.

Bibliography

Hu, M., and Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining (Seattle, WA, USA, ACM), pp. 168-177.

Mohammad, S.M., and Turney, P.D. (2010). Emotions evoked by common words and phrases: using mechanical turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text (Los Angeles, California, Association for Computational Linguistics), pp. 26-34.

Acknowledgements

I would like to acknowledge the following people who have translated the original Bandori songs from Japanese to English:

AERIN
Aletheia
Arislation
Betasaihara
BlaZofgold
bluepenguin
Choocolatiah
Eureka
Hikari
Komichi
Leephysic
Leoutoeisen
LuciaHunter
lunaamatista
MijukuNine
Ohoyododesu
PocketLink
Rolling
Starlogakemi
Thaerin
Tsushimayohane
UnBound
vaniiah
Youraim

I may have missed other people who have translated songs for this analysis, but I thank you all the same.

Listing the easier and harder songs in the BanG Dream! mobile game

In my last blog post, I built a model to predict how difficult a song is to play (measured by actual song level) in the BanG Dream! Girls Band Party! mobile game by its number of notes or notes/min. The number of notes gives an indicator of how many beats a player has to hit to complete (or full combo) a song while notes/min measures how quickly notes appear on the screen. I then grouped the expert difficulty songs into three categories based on the difference between the predicted and actual levels:

“Higher” group contains songs whose predicted levels are higher than the actual level from the game. These songs might be harder to play than expected, either due to the sheer number of notes in the song or notes appearing very rapidly on the screen.
“Lower” group contains songs whose predicted levels are lower than the actual level from the game. These songs might be easier to play than expected due to the low number of notes in the song or appearing on the screen at once.
“Same” group contains songs whose predicted levels matched the actual level from the game. In other words, the actual level is a good indicator of how difficult it is to play the song.

In this blog post, I would like to list the songs that are under the “higher” and “lower” groups based on their number of notes or notes/min. This will make it easier for a player who is just starting to play “expert” difficulty songs to decide which songs to play next and work towards a full combo.

For each song in the list, the actual levels from the game along with the predicted levels from using the number of notes or notes/min as the independent variable of the model are shown. If a – appears under a predicted level column of a song, it means the model has not identified that song as easier or harder to play than expected (i.e., not under its “lower” or “higher” group respectively). However, if both predicted level columns have a value, it means that both models have identified the song as easier or harder to play than expected.

Now, a few caveats to the lists:

First, the number of notes and notes/min do not take into account the types of notes present in the song or the note patterns. The results are only based on the number of notes that appear in the song (total number of notes) or how quickly they appear on the screen (measured by notes/min). Just because a song appears in the “harder” list does not necessarily mean it is hard; it may have a note pattern that is easy to play.
Second, the “lower” list identified some high-level songs (levels 27 or above) that are identified by the model(s) as easier to play than expected. Songs that are level 27 or above are very difficult to play let alone full combo, so they should not be taken as songs that a novice player can play first.
Third, if a song does not appear in either the lower or higher groups, it means it has a predicted level that is the same as the actual level (i.e., it belongs in the “same” group).
Lastly, individual experience will vary from song to song. The following lists should only be used as a guide to deciding which song to play next or to work towards a full combo.

Without further ado, here are the songs in the “lower” group. These are songs that are easier to play than expected and hence can be chosen first to work towards a full combo:

Lower songs list

Song name	English name	Actual level	Notes model predicted level	Notes/min model predicted level	# notes	Notes/min
Egao no Orchestra!	Orchestra of Smiles!	23	22	-	427	246
BLACK SHOUT	BLACK SHOUT	25	24	24	506	279
fantastic dreamer	fantastic dreamer	25	24	-	553	335
Fuwa Fuwa Time	Fluffy Time	25	24	-	544	344
Hacking to the Gate	Hacking to the Gate	25	23	24	487	273
Happiness! Happy Magical	Happiness! Happy Magical	25	24	-	569	338
Hashiri Hajimeta Bakari no Kimi ni	On Your New Journey	25	24	24	531	270
Kimi ga Inakucha!	It's Got To Be You!	25	24	-	569	322
Kimi no Kioku	Memories of You	25	-	24	651	300
Little Busters!	Little Busters!	25	24	-	575	314
Pasupa Revolutions	Pasupa Revolutions	25	24	24	511	292
Romeo	Romeo	25	-	24	620	291
secret base ~kimi ga kureta mono~	secret base ~What You Gave Me~	25	24	23	560	235
STAR BEAT! ~Hoshi no Kodou~	STAR BEAT! ~The Heartbeat of the Stars~	25	-	24	642	292
Tamashii no Refrain	Soul's Refrain	25	-	24	698	276
Tokimeki Experience!	Tokimeki Experience!	25	24	24	508	288
True color	True color	25	24	24	574	280
Yumemiru Sunflower	Sunflower Dreams	25	-	24	628	292
1, 2 Fanclub	1, 2 Fanclub	26	25	25	652	346
1000-kai Urunda Sora	1000 Crying Skies	26	-	25	868	343
Alchemy	Alchemy	26	25	24	603	304
Believe in my existence	Believe in my existence	26	25	-	589	380
DISCOTHEQUE	DISCOTHEQUE	26	25	-	626	379
Dream Parade	Dream Parade	26	24	25	575	322
GLAMOROUS SKY	GLAMOROUS SKY	26	-	25	780	332
great escape	great escape	26	25	-	676	446
Karma	Karma	26	25	25	582	346
Mae e Susume!	Keep On Moving!	26	-	25	723	321
Miku Miku ni Shite Ageru (Shite Yan Yo)	I'll Miku-Miku You (For Reals)	26	25	25	639	358
Neo-Aspect	Neo-Aspect	26	-	25	707	356
Nesshoku Starmine	Passionate Starmine	26	-	25	691	348
Saa Ikou!	Saa Ikou!	26	25	25	666	351
Sorairo Days	Sky Blue Days	26	25	-	626	361
Taiyou Iwaku Moeyo Chaos	Burning Chaos According to the Sun	26	25	-	652	399
Time Lapse	Time Lapse	26	-	25	728	352
Tsunagu, Soramoyou	The Look Of The Sky, Connected	26	25	-	646	366
Goka! Gokai!? Phantom Thief!	Goka! Gokai!? Phantom Thief!	27	26	26	698	415
Guren no Yumiya	Crimson Bow and Arrow	27	26	26	707	433
LOUDER	LOUDER	27	26	-	828	456
ONENESS	ONENESS	27	26	26	746	367
Redo	Redo	27	26	26	712	440
This game	This game	27	-	26	907	446
God knows…	God knows…	28	27	27	1081	507
Hey-day Capriccio	Hey-day Capriccio	28	27	27	878	479
Opera of the wasteland	Opera of the wasteland	28	26	26	696	373
Re:birth day	Re:birth day	28	26	26	848	410
Sugar Song to Bitter Step	Sugar Song to Bitter Step	28	26	26	811	438
Roku-chou Nen to Ichiya Monogatari	Six Trillion Years and Overnight Story	29	27	27	895	548

And here are the songs in the “higher” group. These songs are harder to play than expected and hence should be put off until the player has mastered other songs of the same actual level:

Higher songs list

Song name	English name	Actual level	Notes model predicted level	Notes/min model predicted level	# notes	Notes/min
Yes! BanG_Dream!	Yes! BanG_Dream!	20	23	22	459	230
Himawari no Yakusoku	Sunflower's Promise	21	23	22	481	222
Kimi ni Moratta Mono	Your Gift to Me	21	-	22	371	212
Kiseki	Trajectory	21	23	-	449	200
Poppin' Shuffle	Poppin' Shuffle	22	25	24	651	294
Watashi no Kokoro wa Choco Cornet	My Heart is a Chocolate Cornet	22	23	23	482	256
Dragon Night	Dragon Night	23	-	24	500	270
Happy Happy Party!	Happy Happy Party!	23	24	-	549	266
Hikaru Nara	If You Will Shine	23	24	24	531	277
Sekai wa Koi ni Ochiteiru	The World has Fallen in Love	23	24	24	527	273
Butter-Fly	Butter-Fly	24	25	25	615	310
Dokidoki SING OUT!	Dokidoki SING OUT!	24	25	25	605	318
Fuwa Fuwa Yumeiro Sandwich	Fluffy Dream-Color Sandwich	24	25	25	613	309
Girl's Code	Girl's Code	24	25	-	638	304
Hachigatsu no if	If In August	24	25	25	609	318
Kimagure Romantic	Kimagure Romantic	24	25	25	608	326
Melancholic	Melancholic	24	25	25	622	322
Natsunodon!	Boom Through Summer!	24	25	25	594	324
READY STEADY GO	READY STEADY GO	24	25	25	584	347
Shin Takarajima	New Treasure Island	24	25	25	658	311
That is How I Roll!	That is How I Roll!	24	-	25	547	319
Yura-Yura Ring-Dong-Dance	Gently Swaying Ring-Dong-Dance	24	-	25	546	312
B.O.F (Believe Our Future)	B.O.F (Believe Our Future)	25	-	26	612	391
Charles	Charles	25	-	26	602	372
Christmas no Uta	The Song of Christmas	25	26	-	683	318
CIRCLING	CIRCLING	25	26	26	823	412
DAYS	DAYS	25	26	-	725	340
Hanamaru Pippi wa Yoiko Dake	Ganamaru Pippis are Just for Good Little Kids	25	26	26	729	417
Kimi Ja Nakya Dame Mitai	It Looks Like It Has To Be You	25	-	26	620	368
Kimi ni Todoke	From Me To You	25	26	26	732	369
Natsumatsuri	Summer Festival	25	26	26	823	395
Natsuzora SUN! SUN! SEVEN!	Summer Skies & SUN! SUN! SEVEN!	25	26	-	750	308
Pride Kakumei	Pride Revolution	25	-	26	663	372
Romeo and Cinderella	Romeo and Cinderella	25	26	26	744	395
SAKURA Skip	SAKURA Skip	25	-	26	650	375
Sanctuary	Sanctuary	25	26	26	768	369
Shin Ai	Deep Love	25	-	26	666	363
Su-Suki Nanka Janai!	I-I Never Said Love!	25	-	26	606	375
Tamashii no Refrain	Soul's Refrain	25	26	-	698	276
Teardrops	Teardrops	25	26	-	774	310
Tooi Ongaku ~Heartbeat~	A Distant Heartbeat	25	26	26	687	368
YAPPY! SCHOOL CARNIVAL	YAPPY! SCHOOL CARNIVAL	25	-	26	663	386
Yuriyurarararayuruyuri Daijiken	The Great YuriYurarararaYuruYuri Incident	25	-	26	667	389
1000-kai Urunda Sora	1000 Crying Skies	26	27	-	868	343
Don't say "lazy"	Don't say "lazy"	26	27	-	925	440
GO! GO! MANIAC	GO! GO! MANIAC	26	-	27	799	465
Happy Synthesizer	Happy Synthesizer	26	27	-	933	397
Hare Hare Yukai	Sunny Sunny Happiness	26	27	-	972	420
Koi wa Chaos no Shimobe Nari	Love is the Servant of Chaos	26	-	27	815	494
Lost One no Goukoku	The Lost One's Weeping	26	27	27	1026	513
SURVIVOR Never Give Up!	SURVIVOR Never Give Up!	26	27	27	868	482
Tenka Toitsu A to Z	The World Stands As One	26	-	27	713	470

Using the number of notes to predict the most difficult songs in the BanG Dream! rhythm game

BanG Dream! is a Japanese multi-media franchise by Bushiroad where different girl bands play songs. These girl bands include:

Poppin’ Party, a girl band in pursuit of a sparkling, heart-pounding beat;
Afterglow, a rock band of childhood friends;
Pastel*Palettes, an idol band that sing and play instruments;
Roselia, a gothic rock band aiming to reach the top; and
Hello, Happy World!, a band aiming to make the world happy.

The franchise spans multiple modes of media including music, anime and a rhythm mobile game called “BanG Dream! Girls Band Party” (which I will shorten to Bandori from now on). The game was initially launched in Japan on March 16th 2017 and was later made available in Taiwan and Korea. The game was launched to the rest of the world one year later on April 4th 2018.

The game involves the player hitting different kinds of notes while playing a song. These notes can range from simple tap notes to more complicated hold and swipe notes. Each song has four difficulty settings: easy, normal, hard and expert, with higher difficulty settings presenting more numerous and various note types.

After the song finishes, the player receives a score based on how well they played the song and the cards the player has in their team. These cards are received randomly from gacha events and vary in many factors such as rarity, type and ability. These markedly influence the score a player receives after playing a song. On the other hand, everyone plays the same note pattern or beat map for a specific song in a particular difficulty. In this blog post, I investigated whether the number of notes as well as the related variable notes/min can explain how difficult it is to play a song, measured by the dependent variable “song level”. This data is contained in the band_tidy dataset which is imported below.

# Import CSV file of Bandori dataset 
band_tidy <- read_csv("song_list_csv_rate_051118.csv") 

# Convert the difficulty column into an ordered factor with "easy" difficulty being the base group 
band_tidy$difficulty <- factor(x = band_tidy$difficulty,                                
                               levels = c("easy", "normal", "hard", "expert"))

Plotting the relationship between song level and number of notes or notes/min

Different songs have various numbers of notes that need to be hit to achieve a full combo, a situation where no notes are missed or hit too early or too late. The number of notes that need to be hit in a song increases as higher difficulties are selected. The number of notes in a song can also be standardised by its duration to notes/min which measures the rate at which notes appear on the screen. The higher the notes/min for a song, the more quickly the player has to react to notes on a screen.

These two measurements measure contrasting elements of a song which may differentially influence song level. Hence, I plotted song level based on the total number of notes or notes/min on separate graphs to look at the relationship between the variables. These graphs were further subsetted by song difficulty to see whether the relationship changes as the difficulty setting is adjusted.

Song level is found to be positively associated with both the number of notes and notes/min. This agrees with the principle that a song will become harder to play as the number of notes increases. However, the rate at which song level increases is reduced as higher difficulties are selected. While the easy difficulty songs form the steepest slope in its relationship between song level and the number of notes or notes/min, the expert difficulty songs form the flattest slope due to the wider range of song level, number of notes and notes/min.

For the rest of the blog post, I will use the number of notes and song difficulty as independent variables to predict song level. It should be noted; however, that similar results were found when the number of notes was replaced by notes/min to predict song level.

Building a model of song level vs number of notes

A linear model has two components: a gradient that represents the rate of change and the y-intercept that represents the initial value. Given that the relationship between song level and the number of notes varies for each difficulty, it makes sense to change both the gradient and the y-intercept parts of the model. Hence, I incorporated an interaction term into the model that allows the song difficulty to influence the relationship between song level and the number of notes.

#Create a model with an interaction term between difficulty and number of notes
level_diff_int <- lm(level ~ notes*difficulty, data = band_tidy)
summary(level_diff_int)

## 
## Call:
## lm(formula = level ~ notes * difficulty, data = band_tidy)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5671 -0.5275  0.0224  0.5320  2.6145 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)             3.959007   0.257040  15.402  < 2e-16 ***
## notes                   0.027317   0.001784  15.311  < 2e-16 ***
## difficultynormal        5.487872   0.427006  12.852  < 2e-16 ***
## difficultyhard          8.676859   0.478188  18.145  < 2e-16 ***
## difficultyexpert       16.086420   0.460639  34.922  < 2e-16 ***
## notes:difficultynormal -0.012280   0.002228  -5.512 5.39e-08 ***
## notes:difficultyhard   -0.015418   0.001986  -7.762 3.95e-14 ***
## notes:difficultyexpert -0.019645   0.001868 -10.514  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8302 on 564 degrees of freedom
## Multiple R-squared:  0.9842, Adjusted R-squared:  0.984 
## F-statistic:  5012 on 7 and 564 DF,  p-value: < 2.2e-16

The song level can be predicted by the equation:

level_pred = (0.027 – 0.012 x difficulty_normal – 0.015 x difficulty_hard – 0.020 x difficulty_expert) x notes_pred + (3.96 + 5.49 x difficulty_normal + 8.68 x difficulty_hard + 16.09 x difficulty_expert)

The “difficulty” parts of the model can be defined as 0 or 1 depending on which difficulty the model is covering. For example, if we wanted to model the “expert” difficulty songs, we can set difficulty~expert~ = 1 and difficulty~normal~, difficulty~expert~ = 0. This allows the gradient and the y-intercept of the model to be adjusted for each difficulty. The performance of the model is very good with an R^{2^} value of 0.984 and a relatively low residual standard error (RSE) of 0.8302.

#Predict levels from data points in band_tidy using model
band_tidy_int <- augment(level_diff_int, data = band_tidy)

For each difficulty, plotting the residuals formed a random distribution of points around residual = 0. This indicates that it is appropriate to incorporate an interaction term into the model to predict song level.

For each difficulty, the model closely fits with the data points because the interaction term is present to change both the gradient and y-intercept of the model for each difficulty. However, for the expert difficulty songs, the model fails to account for the sharp drop-off in song level as the number of notes decreases.

To fix this problem, I fitted the expert difficulty songs onto a natural cubic spline, a piecewise graph that consists of cubic functions. I fitted a linear relationship for the other song difficulties.

#Place the expert difficulty songs in a natural cubic spline and maintain a linear relationship for the other difficulties
level_piece <- lm(level ~ notes + difficulty + notes:difficulty + ns(notes, 2):I(difficulty == "expert"), data = band_tidy)

#Look at performance of model
summary(level_piece)

## 
## Call:
## lm(formula = level ~ notes + difficulty + notes:difficulty + 
##     ns(notes, 2):I(difficulty == "expert"), data = band_tidy)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2737 -0.5101 -0.0035  0.5050  2.4150 
## 
## Coefficients: (2 not defined because of singularities)
##                                               Estimate Std. Error t value
## (Intercept)                                   4.127861   0.428830   9.626
## notes                                         0.024572   0.005919   4.151
## difficultynormal                              5.386206   0.466126  11.555
## difficultyhard                                8.053912   1.365749   5.897
## difficultyexpert                              6.819089   1.733146   3.935
## notes:difficultynormal                       -0.011774   0.002410  -4.886
## notes:difficultyhard                         -0.013446   0.004502  -2.987
## notes:difficultyexpert                       -0.013044   0.005984  -2.180
## ns(notes, 2)1:I(difficulty == "expert")FALSE  1.539715   3.172823   0.485
## ns(notes, 2)2:I(difficulty == "expert")FALSE        NA         NA      NA
## ns(notes, 2)1:I(difficulty == "expert")TRUE  12.060487   2.170404   5.557
## ns(notes, 2)2:I(difficulty == "expert")TRUE         NA         NA      NA
##                                              Pr(>|t|)    
## (Intercept)                                   < 2e-16 ***
## notes                                        3.82e-05 ***
## difficultynormal                              < 2e-16 ***
## difficultyhard                               6.39e-09 ***
## difficultyexpert                             9.38e-05 ***
## notes:difficultynormal                       1.34e-06 ***
## notes:difficultyhard                          0.00294 ** 
## notes:difficultyexpert                        0.02969 *  
## ns(notes, 2)1:I(difficulty == "expert")FALSE  0.62767    
## ns(notes, 2)2:I(difficulty == "expert")FALSE       NA    
## ns(notes, 2)1:I(difficulty == "expert")TRUE  4.25e-08 ***
## ns(notes, 2)2:I(difficulty == "expert")TRUE        NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8096 on 562 degrees of freedom
## Multiple R-squared:  0.985,  Adjusted R-squared:  0.9848 
## F-statistic:  4103 on 9 and 562 DF,  p-value: < 2.2e-16

Although it is difficult to describe the relationship between song level and the number of notes with a natural cubic spline, the model performance improved. The R² value increased from 0.984 in the previous model to 0.9848 with a natural cubic spline. The RSE was also reduced from 0.8302 in the previous model to 0.8096 with a natural cubic spline.

#Predict song levels in band_tidy dataset using level_piece model
band_tidy_piece <- augment(level_piece, data = band_tidy)

#Change column name of ".fitted" to "predicted_level"
colnames(band_tidy_piece)[colnames(band_tidy_piece) == ".fitted"] <-
  "predicted_level"

Compared to the residual plot of the previous model, incorporating a natural cubic spline into the model resulted in the expert difficulty songs showing a more concave relationship in its residuals. This has increased the randomness of the residuals, indicating improvements in the model.

The natural cubic spline more closely fits the data points on the expert difficulty songs, particularly as the number of notes decreases. A linear relationship between song level and the number of notes is also maintained for the other song difficulties.

In summary, the models show a significant relationship between song level and the number of notes when the difficulty setting is incorporated. This could be used to predict song level based on the total number of notes or notes/min.

A methodology to predict song level

#Calculate the difference between observed and predicted levels
band_tidy_piece <- 
  band_tidy_piece %>%
  mutate(predicted_level = round(predicted_level), 
         difference = level - predicted_level, 
         verdict = case_when(difference > 0 ~ "lower", 
                             difference == 0 ~ "same", 
                             difference < 0 ~ "higher"))

From the augmented band_tidy_piece data frame, I rounded the predicted song levels to the nearest whole value and subtracted it from the actual level, creating a “difference” column. The difference column can be thought of as the song level residual: it measures how far off the predicted level is from the actual level. From there, I sorted the songs into the following categories based on the sign of the difference value:

If the difference value was positive (i.e., the predicted level is lower than the actual level), the song was placed under the “lower” category;
If the difference value was negative (i.e., the predicted level is higher than the actual level), the song was placed under the “higher” category; and
If the difference value was equal to 0 (i.e., the predicted and actual levels matched), it was placed under the “same” category.

#Filter rows to only include the "expert" difficulty data of each song
level_rate_diff_expert <- band_tidy_piece %>%
  filter(difficulty == "expert")

I then filtered the data set to only include the expert difficulty data of each song. I did this because there is a wider variance on the ability to play and complete an expert difficulty song compared to the other difficulties.

Identifying the easier or harder expert difficulty songs

The songs that were placed under the “lower” group tended to be easier to play and complete than expected. This is due to them having a lower number of notes, allowing the player to more easily complete the song. On the other hand, songs under the “higher” group tended to have songs that are more difficult to play than expected due to the higher number of notes in the song.

In addition to the model where the number of notes is used to predict song level, I also generated a separate model where notes/min was used as an independent variable of song level. Both of these models produced similar “lower”, “higher” and “same” lists. Comparing the “higher” lists of the two models; however, revealed songs that were identified to be more difficult than expected in one model but not the other. For instance, Natsuzora SUN! SUN! SEVEN! (Summer Skies & SUN! SUN! SEVEN!) by Poppin’Party was seen as a difficult song to complete when the number of notes was used as the independent variable of the model. That song was not in the “higher” group when notes/min was used as the independent variable, instead appearing in the “same” group. That is because the song is one of the longest in the game, offsetting the high number of notes for a level 25 song. This results in a notes/min that is similar to other level 25 songs. This seems to suggest that the total number of notes and notes/min differentially influence song level, so they could be treated as separate independent variables if the model continues to be refined.

Comparing the model results with real-life surveys

Some surveys were conducted by a Japanese Youtuber Mihaya Gaming on what people thought were the most difficult expert songs. In these surveys, he asked thousands of people what were the most difficult level 25, 26 and 27 songs in the game. He then compiled a top 10 list of the most difficult songs for each song level. I wanted to compare the model and survey results to see whether the model was able to identify the most difficult songs from the surveys.

The most difficult level 25 songs
Rank	Song	English translation	% votes	ID by notes model	ID by notes/min model
1	Teardrops	–	58%	Yes	No
2	Kimi ja Nakya Dame Mitai	It Looks Like It Has To Be You	16%	No	Yes
3	Su-Suki Nanka Janai!	I Never Said Love!	12%	No	Yes
4	Hidamari Rhodonite	Sunkissed Rhodonite	6%	No	No
5	Zankoku na Tenshi no Thesis	A Cruel Angel’s Thesis	0.36%	No	No
6	Pride Kakumei	Pride Revolution	0.267%	No	Yes
7	Circling	–	0.2%	Yes	Yes
8	Yuriyurarararayuruyuri Daijiken	The Great YuriYurarararaYuruYuri Incident	0.187%	No	Yes
9	Romeo	–	0.18%	No	No
10	Alien Alien	–	0.147%	No	No
Source: https://youtube.com/watch?v=FGzn-PUsa-4

The notes/min model was able to identify half of the most difficult level 25 songs from the survey, including two of the top three most difficult songs: “Kimi ja Nakya Dame Mitai” and “Su-Suki Nanka Janai!”. In contrast, the notes model could only identify two of the most difficult level 25 songs. However, the model was able to identify the most difficult level 25 song in the survey that was not picked up by the notes/min model: “Teardrops”. This is due to the song having the highest number of notes of a level 25 song. It was not identified by the notes/min model as a difficult song because it is also one of the longest songs in the game. This resulted in a notes/min that was comparable to other level 25 songs.

The most difficult level 26 songs
Rank	Song	English translation	% votes	ID by notes model	ID by notes/min model
1	Tenka Toitsu A to Z	The World Stands As One	37%	No	Yes
2	Y.O.L.O!!!!! (You Only Live Once)	–	31%	No	No
3	Happy Synthesizer	–	20%	Yes	No
4	Go! Go! Maniac	–	10%	No	Yes
5	Imagination	–	0.141%	No	No
6	Light Delight	–	0.125%	No	No
7	Asu no Yozora Shoukaihan	Night Sky Patrol of Tomorrow	0.108%	No	No
8	Lost One no Goukoku	The Lost One’s Weeping	0.1%	Yes	Yes
9	Tsunagu, Soramoyou	The Look Of The Sky, Connected	0.091%	No	No
10	MOON PRIDE/R (tie)	-/-	0.066%	No	No
Source: https://www.youtube.com/watch?v=28eocJj3y8I

Moving onto the most difficult level 26 songs, the notes/min model identified three of the most difficult level 26 songs, including the most difficult one “Tenka Toitsu A to Z”. However, the notes/min model did not pick up the third most difficult song “Happy Synthesizer” which was identified by the notes model. This result reinforces the notion that the total number of notes and notes/min measure different aspects of a song, distinguishing them as different independent variables.

The most difficult level 27 songs
Rank	Song	English translation	% votes	ID by notes model	ID by notes/min model
1	Zettai Sengen Recital	Absolute Declaration ~Recital~	38.48%	No	No
2	Goka Gokai Phantom Thief	–	27.48%	No	No
3	Passionate Anthem	–	12.17%	No	No
4	Determination Symphony	–	5.82%	No	No
5	Teardrops (special)	Teardrops (special)	5.75%	No	No
6	Oneness	–	4.60%	No	No
7	Louder	–	3.59%	No	No
8	This Game	–	1.74%	No	No
9	Guren no Yumiya	Crimson Bow and Arrow	1.12%	No	No
10	Redo	–	0.63%	No	No
Source: https://www.youtube.com/watch?v=5lA0zX01rtk

Both the notes and notes/min models were able to identify some of the most difficult level 25 and 26 songs. However, they were unable to identify the most difficult level 27 songs, categorising half of them as easier than the actual song level. This is because the natural cubic spline flattens out at around level 27 as the number of notes or notes/min is increased, making it near impossible to predict higher song levels. This highlights the limitation of the current models, song difficulty and number of notes or notes/min may not be enough to predict song levels beyond level 26. There are other independent variables that could be incorporated into the model such as song duration and tempo. As well as that, the number of notes can be split into their constituent parts such as off-beats, holds and swipes. Therefore, incorporating more independent variables into the model may allow higher song levels to be predicted for the expert difficulty songs.

Conclusion

In summary, both the number of notes and notes/min can be used to predict song level, a measure of how difficult it is to complete or full combo a song. This can be used to identify songs that are easier or harder to play than expected, allowing one to choose songs that are easy to play at a specific song level. The models are also robust in identifying the most difficult songs according to the surveys but also highlight improvements that need to be made to predict higher song levels, perhaps by incorporating other independent variables such as duration and tempo.

Exploratory Data Analysis of lyrics

Commonality cloud: which words are common across both bands’ lyrics?

Comparison cloud: which words are more frequent in one band’s songs over the other?

Bing sentiment analysis of the bands’ songs

NRC sentiment analysis of the bands’ songs

Conclusion

Acknowledgements

Share this blog post to your social networks:

Like this:

Setting up

Function building

Importing and cleaning lyrics data

Exploratory data analysis

“Bing” sentiment analysis of lyrics

“NRC” sentiment analysis of lyrics

Acknowledgements

Share this blog post to your social networks:

Like this:

Brief methodology

Exploratory Data Analysis

Bing sentiment analysis

NRC sentiment analysis

How do sentiment analysis results relate to the overall themes of songs?

Conclusion

Bibliography

Acknowledgements

Share this blog post to your social networks:

Like this:

Lower songs list

Higher songs list

Share this blog post to your social networks:

Like this:

Plotting the relationship between song level and number of notes or notes/min

Building a model of song level vs number of notes

A methodology to predict song level

Identifying the easier or harder expert difficulty songs

Comparing the model results with real-life surveys

Conclusion

Share this blog post to your social networks:

Like this: