Friday, May 9, 2014

The change in the lexical complexity of Van Halen songs


The change in the lexical complexity of Van Halen songs
by Csikós Mátyás

Van Halen is an American hard rock band that was formed in Pasadena, California, in 1972.  The band had two distinctive eras that were defined by their lead singers, David Lee Roth and Sammy Hagar. The former was switched to Hagar because Edward Van Halen, the leader of the band, wanted more lyrical depth in the band's songs. This essay will use corpus linguistic tools to see if this change in the band lived up to Edward's expectations by analysing how the complexity of lyrics changed by getting a new singer for the band and exploring if there is a significant contrast between the two eras' lyrics. The essay will attempt to find this out by performing lexical complexity and readability analyses on corpora built from sixteen songs that were chosen from each album of Van Halen to provide a wide enough array of lyrics. The lexical complexity and readability indices of the two eras will then be compared to draw the conclusion. The research paper expects that the Hagar era should have more complex lyrics and an increased vocabulary along with a more increased verb variation and lexical sophistication because he was invited to the band to compose more complex and deeper lyrics. The basis of the research was Tim Murphey's similar study on pop songs concluded in the early nineties.
This research used online lexical complexity analysis tools made by Lu Xiaofei (2012) available at aihaiyang.com/synlex/lexical (currently being migrated to another server), as well as a number of readability formulas. The first lexical complexity formula used was lexical density, which measures the ratio of lexical (as opposed to grammatical) words to the total number of words in a text, so a high lexical density value means that the text is more complex (p. 190). The next formula used in the analysis was lexical sophistication (also known as lexical rareness), which is "the proportion of relatively unusual or advanced words in the learner's text" (as cited in Xiaofei, 2012, p. 194). Sophisticated words are defined as English words that are introduced at Grade 9 or later in the Swedish educational system (Xiaofei, p. 191).
The number of different words (NDW), number of different words based on the first fifty words (NDW-50) and from a randomly chosen fifty words (NDWR-50) were also calculated. These are standard analyses to determine lexical complexity as a higher number of different words means more word variation, therefore the text is more dense and complex. Type-token ratio (TTR), the ratio of the number of word types to the number of words in a text, was also calculated for this research (Xiaofei, p. 193). However, the NDW and the TTR formulae are heavily dependent on text length, as they will yield much lower numbers at longer texts since the relative chance of repetition increases with text length (Xiaofei, p. 197). Although the chosen lyrics were relatively short (averaging at 200 words) in order to compensate for this possible redundancy in the NDW and TTR analyses, other lexical variation formulae were used to get more convincing results.
These were verb variation and corrected verb variation, noun variation, adverb variation, adjective variation and modifier variation. These are essentially the part of speech subcategories of lexical variation, which is the range of the writer's vocabulary used in a text (Xiaofei, p. 195). The Uber index was also used in the analysis to describe lexical richness in the lyrics, where U=(logN)2/(logN-logV) , N being the total number of tokens and V being the number of different words).
Finally, the research utilized a number of readability formulas to determine the complexity and denseness of the lyrics. These were Flesch score, the Gunning fog index, the Flesch-Kincaid grade level, the Coleman-Liau index, and the SMOG index, all of them available at readabilityformulas.com. These scores all give a number that can be placed on a scale to see how complex the analysed piece of text is.
The corpus for the research was built from eight songs chosen from both the David Lee Roth era and the Sammy Hagar era. The chosen songs were Panama, Runnin' with the Devil, Little Dreamer,  Light Up the Sky, And the Cradle Will Rock, Unchained, Little Guitars, and Jump from the David Lee Roth era; Feelin', Aftershock, Seventh Seal, The Dream is Over, Runaround, Cabo Wabo, Love Walks In, and Summer Nights from the Sammy Hagar era.
All song lyrics were punctuated and transformed into continuous text. This was necessary so they can be analysed by the various lexical complexity analyses as they only work on continuous, punctuated bodies of text. Lyric length was also an important factor during the composition of the corpus, as sample songs with a similar length will provide a more accurate feedback, not to mention the possibility of redundancy in the NDW and TTR formulae which was already mentioned.
After entering the punctuated, formatted lyrics into each analysis tool individually, the data were summarized in a spreadsheet. There were a few additional, but crucially important values calculated. These were mean values for each era per index (e.g. the mean value of NDW for each era, for example), and a paired T-test was concluded to get a significance value using Student's commonly used t-distribution. The T-test was necessary to know if the difference between the two compared groups is statistically significant. The limit for the significance value was set at .05, which means all indices that yielded a p (significance) value below .05 were considered statistically significant. Consequently, this essay will only deal with those values from now on that yielded a low enough p value.
The overall results of the analysis have shown that the expectations described in the introduction, namely that the lyrics from the Hagar era will be much more complex and dense, were fulfilled, e.g. Edward Van Halen made the right choice when he wanted more lyrical depth and replaced David Lee Roth.
The results of the NDW analysis show that the number of different words used is significantly higher in the lyrics of the Hagar era, which had an NDW value of 105.88 compared to the 85.38 of the David Lee Roth era. Out of the randomly chosen fifty words (NDWR-50), the analysis yielded similar results, favouring the Hagar era.
Out of the word variation subcategories, verb and noun variation had a correct significance value, and they are all favouring the Hagar era as well with the verb variation value being 0.24 compared to the earlier era's 0.19 and the noun variation value standing at 0.69 compared to David Lee Roth's 0.50. As it can be seen, within word variation, the difference was the largest between nouns, while between verbs, the difference was fairly low. This is probably because the band's early lyrics contain a lot more repetitions than the Hagar era's do, for example in the song Panama, the one-word sentence "Panama!" is repeated numerous times, which contributes a lot to the low noun variation value of the David Lee Roth era.
Verb sophistication for the second era of the band were also much higher, with a value of 0.61 which is almost twice as much as the 0.37 of the David Lee Roth era, which shows that with Hagar joining the band, the lyrics got a lot more complex, utilizing more sophisticated, rarer words.
All lyrics yielded fairly similar results on the field of readability. The Coleman-Liau index yielded the most similar results, averaging at the speech level of a fifth-grader. However, the Gunning Fog index revealed that the David Lee Roth era had much simpler lyrics than the Hagar era had as the readability score of the Hagar lyrics is 6.26 compared to the 4.85 of the David Lee Roth lyrics on a scale of 1 to 10.
As lexical richness is multidimensional (Xiaofei, 2012, p. 190), the analysis should yield consistent results on all dimensions: lexical density, lexical sophistication, lexical variation, and number of errors in vocabulary use. As the latter can be excluded since lyrics are expected to have correct word usage, it can be said that since all four lexical sophistication formulae, three out of five density indices and all ten variation indices are showing higher values for the later lyrics of Van Halen, it can be concluded that the Hagar lyrics are lexically richer than the band's early lyrics.
Although the type-token ratio analyses turned out to be insignificant after the T-test, they must be mentioned here as interestingly, the word-frequency count yielded fairly positive results as it showed a type-token ratio ranging generally between 0.43 and 0.47. This means that a word is repeated two times at average within a song, which is fairly impressive, as for example in Murphey's (1992) research, this value was 0.29, much lower than the Van Halen songs' TTR (p. 773). However, it must be noted that the songs analysed were fairly short in order to have worthwhile results with the NDW analysis, so this value of the TTR is not far above the average levels.
It must be mentioned though that the lyrics are considered spoken texts (although they are pre-written), which have a much lower lexical density than their written counterparts and they may be affected by factors such as degree of interactiveness (Xiaofei, p. 195).
To conclude, the analysis showed that the expectations of the essay (and Edward Van Halen) were fulfilled as comparing the lexical complexity and readability values of the two distinctive eras of the band showed that the second era with Sam Hagar had a lot more lyrical depth with more sophisticated words, less repetition and more lexical density.


Analysis spreadsheet here: https://docs.google.com/spreadsheets/d/1VQgHpyz_xCV8uyvI0qaroa2uG_Xrb1zo4-6y_zJseiU/edit?usp=sharing 





References

Murphey, Tim. (1992). The Discourse of Pop Songs. TESOL Quarterly, 26, 770-74.
My Byline Media. (2012). Free Readability Formulas [online computer software].
readabilityformulas.com/free-readability-formula-tests.php
Xiaofei, Lu. (2013). Lexical Complexity Analyser [online computer software].
aihaiyang.com/synlex/lexical
Xiaofei, Lu. (2012). The Relationship of Lexical Richness to the Quality of ESL Learners’ Oral Narratives. The Modern Language Journal. 96, 190-208.


No comments:

Post a Comment