CRAL
Centre for Research in Applied Linguistics

Corpus Linguistics Workshop Spring Opening Event: Talk by Andrew Kehoe

Date(s)
Friday 20th February 2015 (15:30-17:00)
Description
Corpus Linguistics Workshop

The Corpus Linguistics Workshop is pleased to host a special event with Dr Andrew Kehoe to open the spring term [see the event poster].

The event will take place on Friday, February 20th, Trent A35, from 3:30 pm, followed by a small wine reception. Please let us know if you want to attend via email to Lorenzo Mastropierro or Viola Wiegand.

Andrew is Director of the Research & Development Unit for English Studies (RDUES) at Birmingham City University. The RDUES team has in recent years developed the WebCorp suite of online search tools for linguistic study and the eMargin collaborative text annotation system. He has research interests in all aspects of corpus linguistics, including the development of software tools for the identification and visualisation of language change across time and the use of the web as a source of natural language data. The title of his talk is: “Reader comments on online news articles: A corpus-based analysis”.

 

Abstract:

Reader comments on online news articles: a corpus-based analysis Andrew Kehoe, Birmingham City University

Launched in March 2006, ‘Comment is Free’ is a section on The Guardian website where non-journalists can, by invitation, write a blog post on any subject of their choosing (http://www.theguardian.com/commentisfree/). Readers are encouraged to comment on these blog posts and take part in discussions, with some posts generating over 1000 comments. A fortnight after the launch of Comment is Free, The Guardian began to allow reader comments on conventional news articles across all sections of its website. Hermida & Thurman (2008: 6) report that five other UK newspaper websites were allowing reader comments on news articles by the end of 2006. The integration of blogs and reader comments – so-called ‘user-generated content’ – across such websites has led to a blurring of the boundaries between opinion and hard news, and between professional and non-professional writing.

This paper presents a corpus linguistic analysis of comments across The Guardian website since their introduction in 2006, based upon a corpus of over 500,000 articles and blog posts. The first part of the paper adopts a ‘key words’ approach to explore the differences between Comment is Free and the other sections of the website, and whether or not these differences are becoming less pronounced over time.

The second half of the paper explores the distribution of reader comments across blog posts and articles (henceforth referred to collectively as ‘articles’). Our initial analysis has suggested that comments are permitted on around 40% of articles and, where commenting is permitted, the vast majority of articles (85%) have at least one comment. The Guardian’s commenting policy is rather vague, stating only that comments are not allowed on ‘stories about particularly divisive or emotional issues’ (http://www.guardian.co.uk/community-faqs). In this paper, we are able to identify sub-sections of the newspaper’s website where commenting is most prevalent and where it is most likely to be banned outright. Taking the analysis further, through the extraction of keywords we identify the specific topics which are most likely to generate debate, often relating to politics, religion and social issues. Moreover, we are able to identify specific words indicative of particular styles of writing which encourage the most reader discussion.

Overall, this paper offers insights into changing newspaper practices and reader behaviour through lexical analyses of a large corpus of articles and comments. With the continued growth of user-generated content, the work is of potential interest across disciplines. From a practical perspective, the work offers suggestions for the refinement of automated spam detection and moderation procedures.

References
Hermida, A. & N. Thurman (2008) ‘A clash of cultures: the integration of user-generated content within professional journalistic frameworks at British newspaper websites’. Journalism Practice 2(3), 343-356.

 

Centre for Research in Applied Linguistics

The University of Nottingham
Nottingham
NG7 2RD

telephone: +44 (0) 115 951 5900
fax: +44 (0) 115 951 5924
email: cral@nottingham.ac.uk