CRAL
Centre for Research in Applied Linguistics

Teenage Health Freak Corpus: Overview

The teenage health freak corpus comprises messages sent via the 'Ask Doctor Ann' facility on the teenage health freak website. This facility enables users to submit their health questions anonymously to the online GP persona, Doctor Ann. At the time the message is submitted the user has the option to specify their gender and their age both of which involve selecting a value from a drop-down list. It is not compulsory for users to provide this information and therefore messages can also be submitted with no age or gender specified. Below is a detailed overview of the corpus based on the year the message was sent and the age and gender of the sender.

 

Full Corpus

Word and Message Counts in the THF Corpus

 No. of Messages  No. of Words  Median Length 
 113,480  2,217,919  10

Breakdown by Gender

Percentage of words by Gender

Full Details of THF Corpus by Gender Full Details of THF Corpus by Gender
Gender  No. of Messages  No. of Words  Median Length
Male 41,830 667,277 8
Female 59,884 1,442,784 13
Unspecified 11,766 107,858

Breakdown by Age

Number of words by Age

Full Details of the THF Corpus by Age
Age No. of Messages  No. of Words  Median Length 
10 or younger 5,237 70,528 
11 4,803 81,355  10
12 12,244 204,598 
13 21,354 381,468 
14 21,304  425,504  10 
15 13,613  340,534  13
16 8,792 258,696 16
17 13,021 313,046 11
Unspecified  13,112  142,190  6

Breakdown by Year

Number of Words by Year

Full Details of the THF Corpus by Year

Year No. of Mesages  No. of Words Median Length
2004 24,622  502,793  10 
2005 24,809  502,666  10 
2006 21,362  410,927  10 
2007 20,573  393,096 
2008 13,518  246,807 
2009 8,596  161,630 

 

Back to top

Centre for Research in Applied Linguistics

The University of Nottingham
Nottingham
NG7 2RD

telephone: +44 (0) 115 951 5900
fax: +44 (0) 115 951 5924
email: cral@nottingham.ac.uk