Perplexitaal 6

Improved collocation

Collocations, or words that frequently occur together, have become a key component in both applied linguistics research and language teaching since automated corpus analysis enabled collocations to be easily identified. To identify collocations, statistics such as Mutual Information (MI) or z-score are used, rather than raw frequency. To see why, let’s think about the word nightshade. What word is the most important collocate for one left (1L or the word immediately preceding the target word). In the Corpus of Contemporary American English (COCA), there are only 2 words that frequently immediately precede nightshade: the and deadly, both at roughly the same co-occurrence frequency. However, the MI score for deadly is 13.3, while the MI score for the is 1.8 – the difference in these values reflects the overall frequencies of the 2 words.

In this Perplexitaal, we’re focusing on 1L collocations. The highest-ranked 1L collocation for collocation is improved (all examples come in the phrase improved collocation meshless method). Here are 4 sets of 5 words – I’ve sequenced them from fairly easy to guess to pretty difficult. Each set is the highest-ranked 1L collocations for a certain word (for nerds, I used min F = 25, ranked by MI). For each set, can you identify the target word which they precede?

Set 1

              ulterior

              profit

              underlying

              primary

              apparent

Set 2

              third-person

              constructivist

              first-person

              sociocultural

              comparative

Set 3

              dubious

              crucial

              arbitrary

              sharp

              subtle

Set 4

              logical

              obvious

              practical

              clear

              important

0 0 vote
Article Rating
Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments