CRAL
Centre for Research in Applied Linguistics

Second language speech fluency (SLSF) and the role of pauses in automatically extracted multi-word units (MWUs)

funded by the Engineering and Physical Sciences Research Council (EPSRC grant number EP/C548191/1 ).

Project details

This is a collaborative project at the University of Nottingham between the School of English Studies/CRAL and the School of Computer Science/MRL.

Grant Period:

October 2005 – September 2008

Project Overview

The SLSF project explored the role of pauses in automatically extracted multi-word units (MWUs) in spontaneous speech of native speakers of English and of learners of English. It seeked to explore an empirical approach to the study of multi-word units in spoken corpora and their relationship to second language speech fluency.

The study was based on psycholinguistic theory concerning characteristics of holistic storage of multi-word units. Holistic storage can not be measured directly, however, it has been proposed that prosodic cues and pauses are indirect indicators or prefabricated language and holistic storage as MWUs in speech exhibit more phonological coherence

Objectives

Our project aimed to explore the feasibility of an interdisciplinary approach and seeks to enhance our understanding of

  • Second language speech fluency
  • The difference between native and non-native speaker use of MWUs
  • The juncture profiles of automatically extracted MWUs and thus the overlap between the psycholinguistic conceptualisation of MWUs and their statistically-based extraction

Methodology

The research combined theory and methodology from the areas of psycholinguistics and computational linguistics within a demonstrator project of MWUs in two existing native speaker and non-native speaker corpora of English, investigating the placement of pauses within and around such units.

MWUs

The aim was to gain a better understanding of the way in which different types of automatically extracted multi-word units are stored by native and non-native speakers.

Background information

       Read a description of the extraction methods used (pdf)

Materials/Results

Project Publications

  • Irina Dahlmann and Svenja Adolphs (2009). “Spoken Corpus Analysis: Multimodal Approaches to Language Description”. In: Baker, Paul (ed) Contemporary Approaches to Corpus Linguistics. London: Continuum Press.
  • Irina Dahlmann and Svenja Adolphs (2007). Pauses as an Indicator of Psycholinguistically Valid Multi-Word Expressions (MWEs)? In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, at ACL 2007, 45th Annual Meeting of the Association for Computational Linguistics, Prague, 28 June 2007, 49-56. paper

Conference Presentations

  • Phoebe Min-sum Lin and Irina Dahlmann. “The role of prosodic features in the identification of formulaic sequences”. 41st BAAL Annual Meeting, Swansea, September 11-13, 2008.
  • Irina Dahlmann, Svenja Adolphs and Tom Rodden. “Patterns in learner English - Use and storage of formulaic language”. 15th World Congress of Applied Linguistics (AILA), Essen, Germany August 24-29, 2008.
  • Irina Dahlmann. “Multi-modal spoken corpus analysis and its relevance for key issues in language description: the case of multi-word expressions”. Postgraduate Conference in Corpus Linguistics, Aston University, Birmingham, May 22, 2008.
  • Irina Dahlmann and Svenja Adolphs (2007b) “Designing multi-modal corpora to support the study of spoken language – a case study”. Poster presented at E-Social Science Conference,  Ann Arbor, MI, USA Oct 7-9, 2007.
  • Irina Dahlmann (2007b). “What do pauses tell us about learners’ use and storage of formulaic language? – Methodological challenges”. Paper delivered at EUROSLA, Interfaces in SLA research , Newcastle, Sept 11-14, 2007.
  • Svenja Adolphs, Irina Dahlmann and Tom Rodden (2007). “Multi-word units, fluency and pause annotation in spoken corpora”. Paper delivered at the 40th BAAL Annual Meeting , Edinburgh, Sept 6-8, 2007.
  • Irina Dahlmann (2007a). “How Big is Big Enough? Methodological Considerations on the Determination of Corpus Size for the Study of Frequent Multi-Word Units (MWUs) in Spoken Language”. Paper delivered at Corpus Linguistics, Birmingham, July 27-30, 2007.
  • Irina Dahlmann and Svenja Adolphs (2007a). Pauses as an Indicator of Psycholinguistically Valid Multi-Word Expressions (MWEs)? In: Proceedings of the Workshop on A Broader Perspective on Multiword Expressions, at ACL 2007, 45th Annual Meeting of the Association for Computational Linguistics, Prague, 28 June 2007, 49-56.
  • Svenja Adolphs, Irina Dahlmann and Tom Rodden (2006). “Investigating the use of pauses as an indicator of holistic storage of multi-word units in spoken learner language”. Paper delivered at the 39th joint Annual Meeting of BAAL and IRAL, Cork, Ireland, 7-9 September 2006.
  • Irina Dahlmann (2006). “Different Methods for the extraction of multi-word units (MWUs) and the effect on pause phenomena.” Paper delivered at the 3rd Inter-Varietal Applied Corpus Studies conference (IVACS) , Nottingham, 23-24 June 2006.
  • Svenja Adolphs (2005). “Second Language Speech Fluency: The role of pauses in the automatic extraction of multi-word units.” Presentation held at Phraseology 2005, The Many Faces of Phraseology. Louvain-la-Neuve, Belgium, 13-15 October 2005

Investigators

 

Back to top

Centre for Research in Applied Linguistics

The University of Nottingham
Nottingham
NG7 2RD

telephone: +44 (0) 115 951 5900
fax: +44 (0) 115 951 5924
email: cral@nottingham.ac.uk