Every year, Richard Watson Todd shares a short, fun and ridiculous analysis of Christmas using methods in Applied Linguistics.


With the growth in influence of artificial intelligence in society, for this year’s Christmas message I decided to focus on one of the main linguistic underpinnings of AI. If you’ve used any of the recent natural language processing applications, you’ve probably seen that they’re based on word2vec. Created by Google in 2013, word2vec used a neural network to analyse a very large corpus. Each word was assigned values on several hundred dimensions based on its associations producing a vector for each word. This allows basic word mathematics to be done which resulted in some surprising results. For example, the equation shopkeeper – man + woman produced the answer housewife. Google was criticized for its inherent sexism but the fault really lies in wider society and the associations in the corpus. I therefore decided to see what light word2vec could shed on how society views the words we associate with Christmas.

I started by choosing 15 words with broad associations with Christmas: carols, Christmas, elves, festive, holly, jolly, merry, mistletoe, presents, reindeer, Santa, Scrooge, sled, snowman and tidings. I then came up with some dimensions on which these words could be rated. Clearly, if you’re hoping that Santa will bring you some presents, good and bad need to be considered. As a second dimension, I chose love and hate. This allows us to produce a graph of the 15 Christmassy words:

By far the most positive aspect of Christmas are the presents; they’re good and we love them. We also love being merry even more, but realise it’s not necessarily good for us. The bad things we’re close to hating are mistletoe (perhaps being forced to kiss someone isn’t a good idea), tidings (which might not always be for the best) and, strangely, snowmen (possibly as one of the last holdouts of male-dominated language). The outliers here are reindeer who are sadly bereft of love, and Scrooge, well-loved despite being so bad.

While I gained some insights from this, I realized that the graph didn’t account for either the religious or the commercial sides of Christmas, so I created a second graph where a religious birth – death dimension was twinned with a more commercial give – receive dimension:

Christmas traditionally is associated with birth, but this graph suggests that may be a misapprehension as Christmas is pretty close to death – perhaps all that over-eating has dangerous effects. Apparently, it’s the reindeer who are being born, while all the kissing under the mistletoe may lead to more than is expected. It’s also clear that we prefer receiving visits from carol singers, rather than doing the singing ourselves. The most notable point, though, is that apparently the Christmas gifts are now being delivered by elves, a sad state of affairs probably caused by the tragic death of Santa.

Have a great Christmas – enjoy the presents you love so much even if they’re delivered by elves, and watch out for the immanent population explosion of hateful reindeer.


For this year’s Christmas message, I’m using the new (at least I’ve never noticed it before) Word function in COCA (the Corpus of Contemporary American English). Among other useful resources, the Word function gives the highest ranked collocates of the word you enter. I started with Christmas, which unsurprisingly gives collocates like tree, Eve, presents and gift. Taking each of these collocates, I entered them into the Word function to obtain their collocates, chose one of these and repeated the process until, for each of the original collocates, I ended up with a full sentence. So the most meaningful Christmas messages I have this year are:

  • Christmas tree grows tired of fighting a bloody nose
  • Christmas Eve tragically dies of a preventable disease
  • Christmas presents brightly color your hair
  • Christmas gift is precious and rare genetic material

Best wishes for Christmas and the New Year and hope you enjoy the precious and rare genetic material you receive.



For this year’s Christmas message, I’m being a bit more serious than usual and analysing the Queen’s annual Christmas speech using multiple keyword analyses.

First, the methodological technicalities. From the royal website I downloaded two corpora of Queen’s speech transcripts – one of speeches from the 1950s and early 1960s, and one of speeches from this century. I used 2 programs to analyse the corpora – KeyBNC (which compares each corpus to the BNC and identifies keywords either by log likelihood or odds ratio) and AntConc (which compares the corpora to each other using log likelihood).

So what did I find? First, the shared keywords which are indicative of the Queen’s speeches in general are predictable:  Christmas, happy, message, family, year, friendship. They also confirm the Queen’s use of the royal first person plural: our, we, us.

More interesting is the contrast between the 1950s and the 2000s. The early speeches are more self-centred (my, we, I), focus on global harmony (brotherhood, earth, goodwill, men, nations, peoples), and have a positive outlook (encouragement, kindness, tolerance). The recent speeches are more religious (Bethlehem, Jesus, Christ, bible), include some unexpected activities (games, sport), and are bleaker in their outlook (adversity, forgiveness, reflection).

What we can conclude from this is that staying in the same job for 67 years changes you from being a positive idealist to someone who relieves their depression by playing religious games.

Best wishes for Christmas and the New Year.



In last year’s Christmas message, I substituted some words in a well-known Christmas song using etymologies and thesauruses. This year, I’ve done something similar but I’ve taken a different song, and I’ve used word associations as the basis for substitution. To do this, I’ve used the database from the word association research available at smallworldofwords.org. For each content word in the song, I’ve chosen one of the top 5 associations to replace it. Hopefully you’re familiar with the song and can work out why the new improved version has come out the way it has.

On the twelfth day of Christmas my true love sent to me

Twelve boys a-banging

Eleven plumbers smoking

Ten gods a-hopping

Nine toilets singing

Eight virgins drinking

Seven necks a-sinking

Six eggs a-sleeping

Five yellow fingers

Four vocation songs

Three kissing cocks

Two green soaps and

A pear tree in a partridge

Best wishes for Christmas and the New Year, and may your gods always hop and your toilets always sing.



For this year’s Christmas message, I have etymologised and thesaurusised a well-known Christmas song for your entertainment.First, I looked at the etymologies of the content words of the song and chose an original meaning of the word in root languages, such as Latin, Proto-Indo-European and even Northumbrian. Substituting these meanings into the song, we getI’m deceiving for a radiant dismissal by Christ,Punctually corresponding to the ones I used to be able to distinguish,Where the solid tufts shineAnd the fetuses obeyTo judge sleigh roars in the snow.Then, I turned to Roget and replaced the content words with another word from the same paragraph in the thesaurus. In this case, we getI’m in ecstasy for a candid routine,A quasi-twin of the ones I used to possessWhere the herb climax shoots out beamsAnd elves harkTo give an audience to ambulance rattles in the iceberg.I’m glad that Bing Crosby realized the importance of getting fetuses to obey and elves to hark at Christmas time.Best wishes.



This year’s Christmas message looks at the topics and interdiscursivity of the Bible stories about Christmas. As most of you are aware, the Christmas stories appear in Matthew and Luke only. To see what other kinds of texts these are related to, I used the following procedures. First, keywords were identified in two ways: each story against the British National Corpus and each story against the other. Since keywords are usually associated with aboutness, combining the top few keywords into a phrase should tell us what each text is about. Doing this, we find:

The Matthew Christmas story is about Herod’s mother’s dream.

The Luke Christmas story is about shepherds’ fear of angels.

(As a piece of self-promotion, in the introduction to my new book, ‘Discourse Topics’, I showed how problematic this approach to identifying topics can be.)

I then selected the 6 highest-ranking content words not specifically associated with Christmas (so proper nouns and words like ‘manger’ were not counted). I put these as a selection of search terms into Google to identify what other registers use the same terms (focusing on the first 10 returns only). So what came out of this?

The Matthew Christmas story is related to parenting manuals and Aboriginal dreamtime.

The Luke Christmas story is related to baby clothing, Masonic laws and news about Justin Bieber’s indiscretions.

Other than confirming teen fans’ beliefs that Justin Bieber is the Messiah and conspiracy theorists’ views that the Masons control everything (apparently including Jesus’ birth), I’m not sure what this means, but hope you enjoyed it anyway.

Best wishes for Christmas and the New Year.

For further reading:




In my Christmas message a few years ago, I used a keyword analysis to conclusively prove that the religious meaning of Christmas is best summed up by the single word la, and the secular meaning by pum. This year I will follow up on this groundbreaking research, albeit with a somewhat technical (I’m afraid) analysis.

In my original research, I applied a standard keyword analysis to collections of Christmas carols (e.g. Away in a manger) for the religious meanings, and Christmas songs (e.g. Jingle bells) for the secular meanings. To identify keywords, or words apearing with a greater than normal frequency, I used the log likelihood statistic (LL), comparing each of my musical collections against the British National Corpus. LL-based keywords highlight what the corpus under investigation is about.

Recently, one of my students has proposed an alternative – odds ratio or OR (for those who are interested – very few, I guess – OR is an effect size statistic, whereas LL is a probability statistic). In contrast to LL, OR-based keywords highlight critical issues in a corpus which might otherwise be hidden.

In my original work using LL-based keywords, the Christmas carols (in addition to the dominant keyword, la) focused on warm feelings: tidings, joy, merry, comfort, rejoice, sweet and heavenly; and the Christmas songs (in addition to the dominant keyword, pum) concerned the traditional trappings of Christmas: rum, jingle, bells, reindeer, Santa Claus, sleigh and mistletoe.

If, however, we switch methods to look at OR-based keywords, a different pattern emerges. About half of the keywords is the carols have Hebrew roots. We find, for instance, alleluia, hosanna, myrrh, Bethlehem and Emmanuel. This suggests that Christmas carols may actually be Hebrew-as-a-foreign-language training materials. In the songs, on the other hand, animals dominate the OR-based keyword list: reindeer, Rudolph, Blitzen, Prancer, Dasher, Donner, Snoopy and one-horsed.

In line with the substantial psychology literature on the pernicious effects of hidden subliminal messages, we are forced to conclude that Christmas carols were actually written by a group of teachers of Hebrew to facilitate the learning of that language, and the Christmas songs by animal lovers to create an association between the positive feelings evoked by Christmas and animals.

Have a happy Hebrew and animal-oriented Christmas and New Year.

For further reading:




For this year’s Christmas message, I’ve looked at how often Christmas is referred to and specific phrases associated with Christmas in 14 different countries where English is widely spoken using the Corpus of Global Web-based English (GloWbE).

To start with, how often is Christmas referred to in these countries? This is a straightforward frequency count, but the results need to be presented as a proportion of all words in the corpus for each country. To avoid lots of decimal places, I’ve multiplied the proportions by 100,000, with higher proportion ratings indicating that Christmas is more common in a country:

CountryFrequency of ChristmasProportion ratings
Sri Lanka
South Africa

Although Christmas is most used in the UK, since the size of the dataset for the UK is larger than for most countries, as a proportion the Philippines comes out on top. There appears to be some relationship between how Christian a country is and how often Christmas is used, with traditionally Christian countries (Philippines and Ireland) coming out on top, and Islamic countries (Pakistan and Bangladesh) at the bottom. There are, however, a couple of exceptions to this pattern – the strongly Christian USA doesn’t score that highly, while the famously apatheist UK scores highly, probably for the commercial implications of Christmas (see below).

A potentially more important issue than simply frequency of reference is how each of the countries characterises Christmas. To investigate these, we can look at the collocates for Christmas, or the words most frequently appearing immediately before or after the word Christmas. We need to focus, however, on relative, not absolute, frequency. For instance, a word frequently appearing immediately before Christmas is at (in the phrase, at Christmas), but at is fairly frequent throughout the whole corpus so its frequent co-occurrence with Christmas may, in part, reflect its high overall frequency. Rather than frequency, I have used Mutual Information (MI) scores to identify collocates, since these account for overall frequency. The words with the highest MI scores immediately preceding Christmas include merry and celebrate, and those immediately succeeding Christmas include eve, carols and decorations. These collocations are fairly common for all countries. To identify those which characterise how a particular country perceives Christmas, we need to identify those collocating words with high MI scores which are far more common for a particular country than for the other countries. For instance, 1831 immediately preceding Christmas occurs 15 times in total in GloWbE, with 14 of these being in Jamaican texts (referring to the 1831 Christmas Rebellion). Thus, 1831 immediately preceding Christmas is peculiar to Jamaican English.

For each of the 20 countries, I have identified the highest MI-ranking word (with a minimum frequency of 5) that is far more highly ranked for that country than for the other countries in the 2 positions (immediately preceding and immediately succeeding Christmas). These can then be put together into 3-word phrases which illustrate how each country characterises Christmas, as follows:

Sri Lanka
South Africa
Muppet Christmas cactus
Coca-Cola Christmas pressies
continental Christmas cracker
moody Christmas pudding
homemade Christmas hamper
special Christmas cake
best Christmas wreath
artificial Christmas shopping
crazy Christmas moss
stole Christmas decors
perfect Christmas cheer
Father Christmas bonus
last Christmas festivities
1831 Christmas extravaganza

For anyone planning a last-minute holiday over the Christmas period, I hope that this list can provide some guidance. Gourmands may want to visit Australia, India or Sri Lanka (though gourmets should probably avoid the moody pudding); materialists might be most interested in going to the UK, Bangladesh, or Nigeria; hedonists might favour Ghana and South Africa; and for those who love the weird, the USA and Singapore are perfect destinations.

Have a great Christmas and New Year.



In the current age of reduced attention spans and the need for short sound-bites to convey messages, for this year’s Christmas message I decided to do my part by looking at how Christmas carols and songs could be reduced to their core essence to get their message across in a form fitting for the times. To achieve this goal, I used Microsoft Word’s Autosummarize function (sadly not included in recent versions of Word – although it was completely useless at summarising texts, it was good fun to play with).

Starting with the traditional religious Christmas carols, the Autosummarize function worked well for some (e.g. ‘Silent night, holy night’ was identified as the key summarising line for the eponymous carol), but more randomly for others (e.g. the key summarising line for Good King Wenceslas apparently is ‘Tread thou in them boldly’). Taking the key lines from 15 carols and combining them, I have come up with the a combination carol that contains the essence of the other carols:

Silent night, holy night

Comfort and joy

Above thy deep and dreamless sleep

Star with royal beauty bright

The blessed angels sing

Appeared a shining throng

Tread thou in them boldly

Sweet singing of the choir


Rejoice! Rejoice!

Gloria, hosanna in excelsis!

Noel, noel, noel, noel

Repeat, repeat, the sounding joy

Strike the harp and join the chorus

Merry, merry, merry, merry Christmas!

For those for whom this combined carol is still too long, putting this through the Autosummarize function identifies the key line as:

Noel, noel, noel, noel

I then realised that this approach was over-emphasising the Christian viewpoint of Christmas, when, as we all know, the main purpose of Christmas is to maximise department store sales. I therefore decided to rewrite the combined carol by including key summarising lines from well-known secular Christmas songs. I am therefore proud to announce the ultimate Christmas song which includes the essence of 7 carols and 8 songs:


Hark! There’s something stuck up in the chimney. Rejoice! Rejoice!

Mommy kissing Santa Claus last night!

Thumpetty thump thump!

It’s not Christmas without Grandma rocking around the Christmas tree

Christmas bells those Christmas bells – friends with tired eyes, tread thou in them boldly.

The blessed angels sing right down Santa Claus Lane!

My two front teeth strike the harp and join the chorus.

Merry, merry, merry, merry Christmas!

Again, for those for whom this is still too long, the key summarising line in The Ultimate Christmas Song is:

Merry, merry, merry, merry Christmas!

As well as being short enough for Twitter, this nicely sums up my message.



Regular recipients of my Christmas messages may remember that a couple of years ago I conclusively proved that the word that best sums up the spirit of Christmas is ‘la’ through a keyword analysis of Christmas carols. This year, I’m returning to the theme of Christmas carols, but comparing them to the top ten Christmas songs (as compiled by WCBS FM with ‘White Christmas’ as number one) as a way of comparing the religious and secular meanings of Christmas.

I started by doing a keyword analysis comparing the carols against the songs. The top 5 keywords in the carols are (unsurprisingly) ‘la’, followed by ‘o’, ‘of’, ‘joy’ and ‘us’, so we can now conclude that the religious message of Christmas is best summed up as “La o”.

Doing the reverse comparison, the top 5 keywords in the songs are ‘pum’, ‘Christmas’, ‘I’, ‘Pa’ and ‘rum’. Clearly, the Christmassy aspect of Christmas is more emphasised secularly than religiously, and the secular message of Christmas is “Pum”.

I then tagged the two data sets semantically using the USAS category system (for those of you who care about these things), and then replicated the keyness comparisons but for categories this time, rather than words. The top 5 semantic categories in the carols are 1. Power and authority; 2. Geographical names; 3. Helping; 4. Light; and 5. Sensory: Taste. So we can conclude that a religious Christmas is a geographically limited, powerful, helping light and taste sensation.

The top 5 semantic categories in the songs are 1. Drinks; 2. Colour; 3. Entertainment; 4. Living creatures; and 5. Sensory: Sound. A typical secular Christmas celebration therefore may involve colourful animals barking and howling to entertain us while we drink.

Having fully clarified the meanings of Christmas, I wish you, depending on your religious persuasions, either a powerful ‘La o’ or a drunken ‘Pum’.



Christmas can be merry or happy, but rarely both. Merry Christmas, first recorded in 1699, has overtones of drinking and frivolous enjoyment. After all, merry’s etymological roots concern pleasure and amusement to the extent that, from 1790, merry-bout meant an incident of casual sex. Happy Christmas, on the other hand, with happyoriginally meaning lucky, is more neutral (apparently Queen Elizabeth II’s reason for preferring Happy to Merry). This Christmas’ research investigation concerns whether Christmas wishes are hedonistic or neutral.Starting with the British National Corpus, merry and happyare two of the most common collocates to the left of Christmas. In the BNC, Happy Christmas (N = 78) is slightly more frequent than Merry Christmas (N = 68), although, given its lower overall frequency, merryas a collocate has a much higher z-score. The incidence of Merry Christmasbeing 0.87 times as frequent as Happy Christmas in the BNC, however, is unusual. In all other sources, Merry Christmas is more frequent.Historically, using the Corpus of Historical American English, Merry Christmas is between 9 (from 1930 to 1950) and 64 (from 1850 to 1870) times as frequent as Happy Christmas. These figures reflect the American preference for wishing a Merry Christmas, but what about other countries?Using Google and controlling for country domain names, I compared the frequency of Merry Christmas and Happy Christmas in websites for 16 countries. Below are the ratios for how more frequent Merry Christmas is than Happy Christmas, so for Egypt you are nearly 3 times more likely to be wished Merry Christmas than Happy Christmas.

Hong Kong

Living in Thailand, it looks like we have a much greater chance of having a fun, rather than neutral, Christmas than if we were living in other countries. Hope your Christmas is as merry as my Christmas is likely to be.

Close Menu