Collocations, or words that frequently occur together, have become a key component in both applied linguistics research and language teaching since automated corpus analysis enabled collocations to be easily identified. To identify collocations, statistics such as Mutual Information (MI) or z-score are used, rather than raw frequency. To see why, let’s think about the word nightshade. What word is the most important collocate for one left (1L or the word immediately preceding the target word). In the Corpus of Contemporary American English (COCA), there are only 2 words that frequently immediately precede nightshade: the and deadly, both at roughly the same co-occurrence frequency. However, the MI score for deadly is 13.3, while the MI score for the is 1.8 – the difference in these values reflects the overall frequencies of the 2 words.
In this Perplexitaal, we’re focusing on 1L collocations. The highest-ranked 1L collocation for collocation is improved (all examples come in the phrase improved collocation meshless method). Here are 4 sets of 5 words – I’ve sequenced them from fairly easy to guess to pretty difficult. Each set is the highest-ranked 1L collocations for a certain word (for nerds, I used min F = 25, ranked by MI). For each set, can you identify the target word which they precede?