The PolyU Language Bank

What is the PolyU Language Bank?

The PolyU Language Bank, developed in the Department of English at Hong Kong PolyU, is a large archive of language corpora made up of a wide range of written and spoken texts totalling over 20.5 million words. Corpus searches can be performed using the Bank's built-in Web-based concordancer, enabling the easy use of corpus resources for language teaching and research.

What is in the Bank?

The Bank contains both departmental corpora and external corpora. Departmental corpora are those compiled by staff and researchers in the Department of English as part of past or on-going research projects. External corpora are standard and commercially available corpora acquired from external sources, such as the BNC Sampler and the Brown Corpus, which, due to licensing agreements, can only be accessed by internal users in the Department of English.

Written English texts form the bulk of the Bank's corpora, while a small number of texts in other languages (Chinese, Japanese, French) are presented as components of larger collections of parallel or comparable corpora. Different disciplines and text types are represented, including Academic, Business, Journalistic and Legal texts, and Literature. Both native speaker and learner data are available in the Bank but native data predominate.

Why was the Bank created?

The Bank was created primarily to encourage the use of corpus resources among staff and students in the Department of English. This is seen as a way of incorporating new methods and information technology into the department's teaching and research activities. A secondary goal is to promote the sharing of existing collections of data that stem from different departmental research projects.

Uses of the Bank

Language teaching and learning

The corpora are useful as reference sources in language teaching and learning, as supplements to dictionaries and grammar books; they provide teachers with samples of empirical language data that can be included in or adapted for teaching materials (e.g., vocabulary lists and cloze tests); they expose learners to authentic instances of language use, which can be particularly beneficial to those studying language for specific purposes (e.g., academic and business writing).

Linguistics and corpus linguistics research

The corpora are sources of raw data that can be used for research on various aspects of language and linguistics including lexical semantics, syntax, morphology, grammar, pragmatics, discourse, and second language acquisition.

Translation and translation studies

Texts from the comparable or parallel corpora can be compared and studied in order to identify suitable translation equivalents of terms; other monolingual, domain-specific corpora can also be utilised to create glossaries of domain-specific terms, which can be of practical use in translating and language learning.