Perplexitaal 1: Can you guess the text from the keywords?
Keyword analysis has become a widely used tool in applied linguistics since it is relatively easy to do and often provides useful insights into the content and style of the target texts. Basically, keyword analysis compares the proportional frequencies of each word in the target text with their proportional frequencies in a benchmark corpus. Words with high keyness values are words which are likely to be important in some way in the target text.
There are several free tools available for conducting keyword analyses (and AntConc includes this function). For this Perplexitaal, I used KeyBNC (https://key-bnc.tfiaa.com/) which compares your text against the British National Corpus. I used the most common keyness statistic – log likelihood (LL), excluded words with a frequency of less than 3, and deleted proper nouns to make it a bit more challenging.
Here are the top 10 keywords for 3 well-known texts. Can you guess the text from the keywords? Hint: LL values are affected by the size of the corpora so, in this case, they should give you some indication of how long the target text is.
Text A
Top 10 keywords
Word | Keyness (LL) |
let | 93.91 |
me | 93.19 |
go | 91.70 |
ooo | 52.43 |
blows | 36.45 |
matters | 30.29 |
no | 29.91 |
just | 29.62 |
gotta | 26.11 |
poor | 26.10 |
Text B
Top 10 keywords
Word | Keyness (LL) |
here | 52.81 |
nation | 50.04 |
dedicated | 45.29 |
we | 24.75 |
dead | 21.24 |
shall | 18.48 |
cannot | 17.92 |
great | 13.22 |
that | 10.26 |
people | 8.03 |
Text C
Top 10 keywords
Word | Keyness (LL) |
he | 1,616.93 |
was | 1,510.00 |
had | 1,209.20 |
his | 690.33 |
proles | 606.80 |
party | 600.71 |
it | 478.73 |
him | 478.53 |
seemed | 393.26 |
face | 346.83 |
…….