A Rich Domain of ELF – the ELFA Corpus of Academic Discourse

The academic field has used English as a lingua franca widely for a long time, and is a good choice for an ELF corpus. It is useful to restrict the scope of exploratory research in one way or another, mode and domain offering themselves as clear and reasonably unproblematic limiters in language sampling. What makes the academic domain suitable for ELF research is that it is demanding, deeply influential, and largely constituted by its own specific genres. This paper discusses the compilation principles of the ELFA corpus (English as a Lingua Franca in Academic Settings) of spoken language, and outlines the kind of research it has already produced and that is going on with the data.


Introduction
Research on English as a lingua franca is still in its early stages.The situation has changed very fast over the last three or four years, and there is a notable difference between the time of writing this text and the turn of the millennium, when English as a lingua franca could boast only a handful, mainly small-scale exploratory studies (e.g.Firth 1996;Firth and Wagner 1997;Knapp 1987;Meierkord 1998).Jenkins's (2000) book-length study on linguistic features in ELF was a landmark, which has been followed by a few other studies.The field is nevertheless still very new, and as might be expected in an incipient research field, studies tend to assume an exploratory character and tackle a restricted domain in one way or another.Jenkins's seminal work on ELF phonology (2000) limited itself in terms of level of language, and other studies have restricted their scope in other ways, mostly to speaking in particular situations (e.g.Lesnyak 2004), or in terms of domain, such as Euro-English (Mollin 2006) for example.Even the VOICE corpus (Breiteneder et al., this volume, Seidlhofer 2001), despite seeking to address ELF on a very broad front, restricts itself in terms of mode (spoken) and region (mainly European) and domains, which, although broad and varied, are predetermined and do not cover arbitrarily anything that happens to come in handy.It obviously makes sense to start somewhere rather than everywhere at once in a new field of study -but if several research projects start from different angles, the field stands a good chance of making rapid progress; the more researchers get involved, the more interesting the field becomes.
ELF studies have largely been concerned with the spoken mode.Speech undoubtedly lends itself more readily to observing change than writing, which in its published form is heavily monitored and tends to be conservative.Writing has undergone major changes since the Internet revolution, and obviously the web is a fruitful source of all kinds of English in an unrestricted mixture; as ELF research develops, we can expect interesting results on new forms of writing.It is to be hoped that large Internet-based corpora will be available to us in the near future; when this is the case, ELF will certainly be making its mark on the enormous, varied whole which we call 'English', mixing happily with other uses and varieties to produce a global language unlike any seen before.While these interesting times are still ahead, it is wise to make more modest beginnings in corpus building, especially where resources are limited, and to suit the focus of our databases to our exploratory interests.The ELFA corpus has chosen academia as its specific domain, and it consists of only spoken data at the present stage.
Academia is one of the domains which have most eagerly adopted English as their common language in international communication.The development has been particularly fast since the Second World War, after which English has increasingly dominated research publishing.Although academic mobility or the existence of an academic lingua franca are not new phenomena, the present scale of mobility and the global rule of English, which has spread to degree programmes in non-English-speaking countries, are unprecedented (see, Mauranen in press b).The worldwide demand for learning English for academic purposes has not passed unnoticed in the linguistic professions; it has resulted in a large teaching business, and in its wake a burgeoning research field which has developed far beyond the needs of immediate applications: much of the published research is descriptive, historical or primarily theoretical.Most of the work that has been done on academic English has been concerned with written discourse, but a distinct change has taken place since the compilation of the MICASE corpus (www.hti.umich.edu/m/micase/)was begun in 1997 at the University of Michigan, and publications and presentations started appearing from this database.Simultaneously with the MICASE corpus, another American corpus project, the T2K-SWAL in Northern Arizona began to collect both spoken and written university discourses, and in their wake, the BASE corpus (www2.warwick.ac.uk/fac/soc/celte/research/base/) was compiled in the UK, to provide a British point of comparison.The existence of these corpora and the general accessibility of MICASE in particular has stimulated a great amount of research into the intricacies of spoken academic English.
All the three above-mentioned corpora of academic speaking are essentially based on native speakers, just like general reference corpora of English, such as the Bank of English or the BNC.This is clearly a limitation to a general understanding of what English actually is like in the academic world today.Although we have no reliable estimates at hand on what the proportions of non-native and native English speaking might be we might surmise that non-natives are likely to outnumber native users, as is so clearly the case in the world at large.And be the numbers what they may, it is in the interests of the research world to encourage international discussion, dissemination and exchange of knowledge, findings and criticism as widely as possible.Thus, if the goal of investigating academic English is to understand its use in today's world, ELF must be one of the central concerns in this line of research.It is to meet this need that the ELFA corpus has been compiled.
The ELFA corpus serves a two-way purpose: on the one hand, it helps us understand how academic discourses work at time when so much of teaching and research is carried out in different countries using English as a lingua franca.On the other hand, the corpus offers a clearly delimited database of ELF in situations which are linguistically and intellectually demanding, and which therefore go well beyond simple routines or rudimentary exchanges.This paper gives a brief description on the nature and structure of the ELFA corpus, and outlines the research that has been done with it up to now, finally taking a look into the ways in which it plans to develop.

Why study spoken academic ELF?
As any linguist will agree, speech is central to the study of language.It is also generally accepted that it plays a central role in language change, because speakers influence each other's language use in face-to-face interaction.The strong tendency of speakers to cooperate puts pressures on speakers to adapt to each other's ways of speaking.Such negotiation of both meaning and form is largely lost in writing, particularly in published varieties, where a chain of gatekeepers will iron out a good deal of unconventional and non-standard forms.Norms of the standard language carry a lot of weight in academic writing.
For lingua franca research, then, the adaptive mechanisms are best seen in interaction; it is therefore not surprising that spoken discourses have occupied centre stage in ELF studies, and this choice seems equally natural as the basis of the first ELF corpora as well.Why, then, academic speaking?As already pointed out above, a domain-specific beginning is a sensible solution for a database which is to answer exploratory questions.From this point of view, academia is as good a choice as any.But we can also argue that it is a particularly good choice.
First of all, academic language is influential and on the whole enjoys high social prestige.Its influence on society as a whole does not emanate from direct contact with all corners of society as much as through the indirect influence of university education.A considerable proportion of influential people in many societies acquire a university education; typical groups are media journalists, economic experts, teachers and politicians.As larger age cohorts participate in tertiary education, the position of university styles and registers in society grows even stronger, while at the same time such trends may accelerate changes in these registers as a consequence.Academic language norms also exert a strong influence on standard varieties -which even tend to be labelled as 'educated' varieties.Universities thus transmit a fair proportion of language norms.
Academic discourses are also comparatively demanding for participants; they require the simultaneous handling of high-level intellectual content and real-time speaking.They provide more sophisticated data than more stereotypical interaction in, say, routine sales transactions or typical tourist encounters, and show notable variation in degrees of formality and familiarity.In this way, they offer more interesting and rewarding material for research than simpler exchanges.
A third point in favour of academic discourses is that, as already mentioned above, academic discourses are inherently international, and have a long history in employing lingua francas for the needs of research communication within the international research community.Since research discourses do not belong to any national community alone, they need not follow the norms of a particular national language very closely either, even if they adopt the language for vehicular purposes.Internationalism and academic mobility at all levels, which were fundamental properties of mediaeval European scholarship, are on the increase again, and in addition to research publications, spoken university discourses make use of English worldwide, including countries where English has no official status (Kachru's 'expanding circle', see, Kachru 1985).
Finally, academic communities have their own particular genres, which to a large part constitute the communities as a set of discourse communities or communities of practice.The community's discourses, ways of speaking, serve a gatekeeping function, and need to be acquired by novices before they can regard themselves as full members.At the same time, shared discourses contribute to the cohesion of the community and mark its identity.The genres and rheotric of the discourse communities that we participate in need to be acquired by all novices, and from this perspective we could argue there are no native speakers of academic English, that the English of academic genres is a new use to all its practitioners at the beginning.
Domain-specific linguistic research has great potential in throwing light on the discourse communities and their practices, as for example Swales (1998) has shown in his insightful study of three very different university departments.We can see academia as a discourse community or a community of practice.From either perspective, making sense of its practices requires attention to the speaking that constructs and maintains their institution and its structures.Academic institutions depend crucially on spoken discourses, such as conferences, lectures, seminars, financial negotiations, and faculty meetings, which structure the practices, thereby constituting the institutions themselves in a Giddensian sense (see, Giddens 1984).Talk is also the chief mode for socialising new generations into academia and beyond, through seminars, lectures, supervision, consultations and so on.
The international academic culture is a global subculture which is a cultural hybrid, and its English is the language of an 'interculture', not that of one or a few national cultural formations.

The corpus
The ELFA corpus (English as a Lingua Franca in Academic Settings; www.uta.fi/laitokset/kielet/engf/research/ elfa/) has reached its initial target for size, and stands now at 0.7 million transcribed words.It has been compiled mainly at the University of Tampere, as the original idea was to reflect the discourses of one university, along the lines of MICASE or T2K-SWAL.The one-university approach was to be supplemented with technological data, because Tampere does not have a science faculty.The unit of one university was thus expanded somewhat from the start, and the corpus also includes events recorded at the Tampere University of Technology and the Helsinki University of Technology.At present, other disciplinary domains, particulary sciences -physical, chemical, biosciences, forestry -are being added in order to make a more well-rounded whole in terms of covering a wide variety of disciplines.This data is being compiled at the University of Helsinki.
All of these universities offer a number of degree programs run entirely in English.They are available for international as well as Finnish students, and the students come from a wide variety of countries, although most are European.Many among the teaching staff also come from abroad.The majority of the programmes operate at the master's level, but international doctoral programmes are also on the increase, and included in the corpus.The recordings do not cover undergraduate programmes because these are not normally run in English.As this is a corpus of ELF, it is important that English is in the position of a vehicular language in all events included in the data.This means that it is not the object of study and therefore no language classes have been recorded.
As a general principle, all data in the ELFA corpus is authentic in the sense that it is not elicited for research purposes but has been recorded in natural situations, where the speakers are engaged in activities of their own concern.The speech events are 'complete' in that the individual sessions have been recorded in their entire duration, without truncating them or sampling mere extracts.Clearly, academic events are often heavily interlinked, so that one might question the idea of a 'whole' or 'complete' session in the case of one in a series of lectures or seminars, but most recordings concern single events even though they might be one in a series.On occasion, two seminar sessions of the same series have been recorded at different times; the participants in these cases are the same, but presenter roles shift, and the familiarity of the seminar group members with each other changes as well.Variation along the familiarity parameter can be captured by consulting the recording dates: the point of the term at which recordings have been made is a fairly reliable indicator of how long the group has been together.
Compilation criteria have been 'external' throughout, that is, they are not determined on the basis of linguistic register features, but by socially based definitions of the prominent genres in the discourse community.This has meant compiling the corpus mainly on the basis of 'folk genres', i.e. the distinctions and labels that the university community uses of its own discourses and genres.At the same time, the aim has been to cover as many different kinds of discourses carried out in English as possible, focusing on those which are regarded as prototypical, and shared and named by many disciplines.As a result, the speech events cover discourse types such as lectures, seminars, thesis defences, and conference presentations.
The basic unit of sampling is the 'speech event type', in which we follow MICASE.This is a looser term than 'genre', and therefore preferable for the present purpose, because the discourses represent a variety of events, some of which are much further established as genres (e.g.lectures) than others (e.g.workshops or panels).Thus, many of the widely recognised event types are indeed genres, but 'event type' is used as a cover term.Another important sampling criterion is discipline, where balance is an important consideration.As already mentioned, the University of Tampere does not have a science faculty, which prompted a search for science and technology, because a picture of academic ELF without the hard sciences would be deficient; it is the hard sciences that have been the most eager to adopt English as their lingua franca.Disciplines can be considered at different levels: broad disciplinary domain ('arts', 'technology'), a single discipline ('political history', 'electrical engineering'), and subdisciplines ('organic chemistry', 'educational psychology').Experience from corpus work, including ELFA, shows that the most useful level is the highest one, disciplinary domain, and although we code lower levels in the file headers, it is the broad domain that best lends itself to comparisons -otherwise the search results remain too meagre (see, Mauranen 2006c).The broad domains also give a clear picture of the overall balance of the corpus.
Balanced coverage has also been sought in other, essentially socially based parameters like the participants' relative social position: both symmetrical relations (as in conference presentations or student groups) and asymmetrical relations (as in a lecture, seminar session or thesis defence) have been included.
The main selection criteria for event types are related to their perceived importance in one way or another: (1) typicality, or the extent to which event types or genres are shared and named by many disciplines; for example lectures, seminars, thesis defences, conference presentations (2) influence: genres that affect a large number of participants; for example introductory lecture courses; (3) prestige: genres with high status in the discourse community, such as guest lectures or plenary conference presentations.
An important consideration is the speaker's mother tongue.Any events where all speakers share a first language have been excluded.This has not normally presented a problem, since situations where everyone is a speaker of Finnish, participants have simply switched to Finnish even if the course or event has originally been intended for an international group.Native speakers of English have not been excluded from the data, although their role has been minimal, because the primary objective is to discover patterns of language use among non-natives.Native speakers have therefore never been recorded in monologues, such as lectures or conference presentations, or in dialogues where their role would have been dominant, such as thesis defences (which in Finland are public occasions lasting at least two hours).In polylogic situations native speakers have not been avoided; thus they appear as participants in multi-party discussions, where their presence is coded, so that it is possible to exclude their usage from the analyses when necessary.
Despite an overall orientation to external compilation criteria, one language-internal category distinction was utilised in sampling: the distinction between monologic and dialogic speech, that is, whether there are one or more active participants.While both kinds are included, the emphasis is on dialogic events, or polylogic, to be precise.In all, about two thirds of the data are from dialogic, multi-party discussions.There is thus a clear bias, which is intentional, because it is in dialogic interaction that language primarily and most naturally gets negotiated.
The transcription is broad, with spelling normalised to Standard (British) English as far as possible, to facilitate computer searches.To offset this, the sound files will be made available to researchers who wish to consult them along with the corpus.Basic background information such as context, and speaker age, gender, and nationality are coded along with recording and transcription information.
Despite the similarities between ELFA and other corpora on academic speaking, it is worth keeping in mind that academic genres are highly context-dependent and surprisingly local.Even when the same labels ('seminar', 'lecture') are used, the interpretations are culturebound and are differently positioned in relation to other genres in the same context (see, for example Mauranen 1994).Moreover, the disciplinary selection and balance is unique to universities, and in contexts where English is not the main language of the university, the choices are narrowed further.

The speakers in ELFA
It is important to note that the speakers in the ELFA corpus are never learners of English, and none of the speech events have been recorded in situations where English is the object of study.The ELFA speakers all have an educational background which includes formal learning of English, and they all have subsequent experience of speaking English.They are all well educated in that the majority has at least one university degree already.
The language backgrounds vary widely, with forty first languages represented, although Finnish L1 speakers are proportionally especially well represented.The diverse first languages can be expected to make their mark on speakers' idiolects in the general manner of what is commonly termed 'interference'.These idiolects in turn can roughly be grouped according to the speakers' first language, and the groupings, which sometimes get humorous folk names such as 'Swenglish' for the English of Swedish speakers, 'Finglish' for Finns, or 'Dunglish' for the Dutch, can be likened to dialects in that they bear perceptible group affinities.There is nevertheless one important difference between such 'dialects' and dialects as normally thought of, namely varieties spoken by regionally defined speech communities: these L1-based usage groups do not evolve in the normal internal use of a speech community.Their similarities are shared linguistic features among speakers who do not use this language to communicate with each other.Dialect (or even language) boundaries are often drawn for social and political reasons, and not on the basis of linguistic features alone, and in the case of ELF we should also be wary of trying to determine speaker groups on the basis of the latter only.The jocular labels applied to speaker groups for whom English is a foreign language refer to typical deviations from the norms of Standard English varieties, and some of the characteristic features are systematic enough to distinguish the groups, as corpus studies of learner language show (see, e.g.Granger et al. 2002).The speakers can thus be grouped according to linguistic similarity, but the groups do not constitute speech communities.
ELF speakers' proficiency in English appears to vary a good deal, as we might expect: some are very fluent and deviate very little from Standard English when they speak, while others communicate more hesitantly and do not adhere to so many rules of received native speaker grammar.This is the situation in most ELF encounters, and in fact university contexts presumably have a narrower range of variation than many other contexts, given its specific demands.

Research on ELFA
The ELFA research group has begun to use parts of the corpus even before the whole is finished.The first publications were concerned with research perspectives on the ELFA corpus (Mauranen 2003), and promoting an ELF perspective as an alternative to traditional SLA research in order to understand what speakers in fact do with a foreign language when they use it in real life (Mauranen 2006a).In addition, background research on student and teacher attitudes towards ELF (Ranta 2004) as well as features of informal student conversations (Lappalainen 2001) and the presence of other than British or American accents in textbooks (Kivistö 2005) was carried out as master's theses.One of the first studies actually based on the ELFA data was concerned with the occurrence and prevention of misunderstandings among the speakers (Mauranen 2006b).The findings showed that ELF speakers quite successfully managed to prevent linguistic misunderstandings, apparently by resorting to explicitation strategies, repetitions, and a number of collaborative tactics.A variety of other explicitation manifest in the corpus data have since been attested (Mauranen in press b).
More specific attention to the linguistic aspects of ELFA has been enabled by the accumulating corpus data.One line of research focuses on syntactic features in ELF, linking the findings with similar and divergent uses in other kinds of English (Ranta in this volume) and questioning received wisdom for example on the use of ing forms by second language speakers.Another finding is that phraseological units are widely used by ELF speakers (Mauranen 2005b(Mauranen , 2006c)), despite common conceptions that they tend to be absent or incorrect.Their use seems to be largely conventional but also creative in very similar ways to native speakers, which points to an essential similarity in the processing mechanisms, contrary to Wray's (2002) suggestion, which is essentially based on findings from classroom learning.The creativity nevertheless manifests itself in some novel functions of phraseological units, while other functions are ignored or downplayed.
Investigations into discourse features are also providing interesting evidence on similarities and differences in comparison to native speakers: expressions of vagueness, which have been alternately seen as a problem of underuse or overuse among non-natives, are employed quite appropriately in ELF discourse (Metsä-Ketelä in this volume), although they also show preferences for functions which are minor or nonexistent in native speech.Discourse reflexivity, or metadiscourse, is also present in ELF use, but preferences for certain expressions are not identical to those of native speakers (Mauranen 2005b).Disciplinary domain is a powerful factor in determining academic language practices even when obvious things like terminology are excluded, and again ELF speakers show preferences specific to their own use across first language boundaries (Mauranen 2006c).Discourse organisation as a rhetorical device shows mainly similarities between native and ELF lecturers, which may point to the strength of genre conventions over language barriers (Mauranen in press a), and the same is true of organising arguments in spontaneous, dialogic situations (Mauranen 2005a).
The research on ELFA has reached a very active phase; there are findings awaiting publication and new hypotheses based on the commonalities across the results up to the present.On the whole, ELFA findings lend support to the perception that lingua franca English has its own specific characteristics.At the same time, many affinities to other kinds of nonstandard English, both native and no-native, are clearly emerging as well.

Prospects
As has been discussed in this paper and as the results from the ELFA project so far show, research on the ELFA corpus can help discover linguistic features of complex language contact as well as understand mechanisms of language change and describe variation in contemporary English.It can also help understand situated foreign language use in the real world, outside the confines of the classroom and the demeaning construction of all second-language speakers as 'learners'.
The project continues to investigate ELF language with university discourses in focus, seeking answers to the question of how the multilingual and multicultural settings shape and change the language.
An applicational offshoot of the ELFA project is its daughter project SELF (Studying in English as a Lingua Franca) at the University of Helsinki, which aims to produce useful research-based recommendations for practitioners.This is intended to benefit both students and staff involved in international study programmes: how can we help participants avoid and overcome commonly occurring communication problems and ensure the quality of teaching, research and study in English?In this project the cooperation of instructors in English for Academic Purposes at Helsinki is crucial, and has started smoothly on account of common interests.
At this stage our main focus is on spoken discourse.However, we have also begun to collect written data, largely in connection with project SELF: at the outset, it will consist of reports, essays and term papers in master's programmes.
The general, descriptive research perspective will remain in the foreground for the ELFA team: as the database is large enough for meaningful research into many aspects of ELF, including syntax, phraseology, pragmatics and discourse, the research which is in progress is expected to yield new results within the next couple of years.The work has just begun, and we are looking forward to new discoveries from our own database as well as the other important ELF projects and individual research enterprises, which are in progress in many places and some of which are exemplified in the present issue.