Examples of the usefulness of a corpus

In this section, two examples will be given of how a corpus can be used in order to find answers to questions about how words, phrases and sentences are used in English. The examples are meant to inspire you to use corpora as tools yourself in your academic writing endeavours in English.

Example 1: Should I write 'fill in a form' or 'fill out a form'?

In the event that we need to describe the process of someone writing all the necessary information in a form, we can typically use one of two phrasal verbs: fill in or fill out a form. A dictionary like Longman Dictionary of Contemporary English (LDOCE) does not provide any information as to which one is more common or preferred in either British or American English. Since no information is available in the dictionary, we can use a corpus to try to find information to guide us.

In this example, we will look at the Corpus of Contemporary American English (COCA) to attempt to find out whether one of the two phrasal verbs is used more frequently than the other in American English. We can access COCA by going to the following website:

On this website, we can find a number of different corpora, among these the ones specified above. Let us start by looking in COCA. The search interface on the COCA website allows us to make both simple and advanced searches.

Simple searches for verbatim phrases in COCA

1. Click on link above and then the COCA link; this will take you to the COCA web page. Alternatively, click on the following link:

2. Click on 'ENTER' visible on the landing page.

3. You are now on the page where searches can be made. In the search window in the top left corner, insert the following search string and press the search button:

fill in a form

4. On the results display in the top right corner, we learn that the exact phrase fill in a form occurs only 2 times in the whole corpus.

5. Make a new search, this time with the following search string:

fill out a form

6. On the results display, we are told that the exact phrase fill out a form occurs 43 times in the whole corpus.

Thus, it seems to be the case that the phrase fill out a form is the preferred one in American English. However, as will be explained in the next section, there are drawbacks to a simple search like the one above.

Advanced searches for different forms of phrases in COCA

Even though a simple verbatim search can be quickly executed and give us an indication of the frequency of a word or a phrase, there is a risk that the frequency is substantially deflated compared to the frequency of its occurrence if very similar forms are also taken into account.

This is so because the exact form of fill out a form is only one of several possible realisations in the English language. For example, the verb fill can occur in its past tense form filled, its 3rd person singular present form fills, or its present participle form filling. Consequently, in theory, we would expect the following forms, among others, to exist as well:

fills out a form

filled out a form

filling out a form

In order not to miss forms like these, a slightly more advanced search, which, admittedly, requires a bit more linguistic knowledge, time and patience, can be carried out. These types of searches make use of a special search string syntax which is described on the COCA website.

Without going into too much detail, the search string used in the following instruction means that any form of the verb fill is allowed, followed by the adverb out, in turn followed by either an article (a, an or the) or a possessive pronoun (e.g. my, your, their), and finally ending with a form of the noun form.

1. Click on the COCA link below; this will take you to the COCA web page.

2. Click on 'ENTER' visible on the landing page.

3. You are now on the page where searches can be made. In the search window in the top left corner, insert the following search string and press the search button:

[fill].[v*] in [a*] [form]

4. On the results display in the top right corner, we learn that the phrase fill in a form, in its various existing shapes, occurs 10 times in the whole corpus.

5. Make a new search, this time with the following search string:

[fill].[v*] out [a*] [form]

6. On the results display, we are told that the phrase fill out a form, in its various existing shapes, occurs as many as 222 times in the whole corpus.

Just like the simple search, an advanced search, using a special, technical search code, told us that fill out a form seems to be considerably more frequent in American English than fill in a form. 

Example 2: Should I write 'the government has' or 'the government have' made a decision?

This example is based on a situation where we want to report that a certain decision has been made by a government. The word government is a noun in English. More specifically, it is a special kind of noun which is normally referred to as a collective noun. This type of noun refers to a group of people, hence the name collective. Other examples of collective nouns are words like crowd, team and audience.

With collective nouns in English, it is possible to treat these as either singular or plural entities. This means that we can use either a singular or a plural verb form together with these nouns (see section on subject-verb agreement in the grammar section of AWELU). For example, we can write the government has decided to... or the government have decided to... 

In this example, we will use both the Corpus of Contemporary American English (COCA) and the British National Corpus (BNC) to find out which version seems to be more frequently used than the other.  We can access both corpora on this website:

We can start by searching for the two alternative phrases in American English. For this we will use COCA.

Simple searches for verbatim phrases in the COCA

1. Click on the COCA link; this will take you to the COCA web page.

2. Click on 'ENTER' visible on the landing page.

3. You are now on the page where searches can be made. In the search window in the top left corner, insert the following search string and press the search button:

the government has

4. On the results display in the top right corner, we learn that the exact phrase the government has occurs 2,431 times in the whole American English corpus.

5. Make a new search, this time with the following search string:

the government have

6. On the results display, we are told that the exact phrase the government have occurs 121 times in the whole American English corpus.

The tentative conclusion is that the construction the government has is much more common in American English than the government have. The ratio is approximately 20:1.

Let us now see whether this is also the case in British English. We will do a search in the BNC.

Simple searches for verbatim phrases in the BNC

Go back to the following page:

1. Click on the British National Corpus link (BYU-BNC: British National Corpus); this will take you to the BNC web page.

2. Click on 'ENTER' visible on the appearing web page.

3. You are now on the page where searches can be made. In the search window in the top left corner, insert the following search string and press the search button:

the government has

4. On the results display in the top right corner, we learn that the exact phrase the government has occurs 973 times in the whole British English corpus.

5. Make a new search, this time with the following search string:

the government have

6. On the results display, we are told that the exact phrase the government have occurs 448 times in the whole British English corpus.

The tentative conclusion this time is that the construction the government has is more common in British English than the government have, with an approximate ratio of 2:1.

Overall, we may conclude that the use of a singular verb form together with the collective noun government is preferred both in American and British English, as evidenced by the data from the COCA and the BNC. However, the ratios are quite different, as was shown above.