ISLE Summer School - 2016 details

Corpus-based approaches to the study of language (Martin Hilpert)

Martin Hilpert is an Assistant Professor of English Linguistics at the University of Neuchâtel. He holds a PhD from Rice University and did postdoctoral research at the International Computer Science Institute in Berkeley and at the Freiburg Institute for Advanced Studies. He is interested in cognitive linguistics, language change, construction grammar, and corpus linguistics.

This workshop is a gentle introduction to corpus linguistics that assumes no prior experience with it. Why should you learn how to use corpora in your research? The short answer is that corpus-linguistic tools are incredibly empowering: they allow you to ask (and answer) a broad range of research questions that you could not tackle otherwise. At the same time, a lot of current linguistic research is corpus-based, so you need to know about corpus methods to better understand what other linguists are doing.

The three days of the workshop will be structured as follows. Each day will consist of a two-hour presentation and two hours of hands-on exercises. Day 1 will be dedicated to a basic corpus-linguistic tool: concordancing software. We will examine what kinds of question can be answered through the simple retrieval of key words in context. You will also get to know a few tricks for effective search, including so-called regular expressions.

Days 2 and 3 will examine 'what concordance programs can't do'. For a considerable period of time, concordance programs have defined the scope of corpus-linguistic work. Whatever the software could do, researchers could do, whatever it couldn't, researchers couldn't. Recent years have seen a very liberating development: Instead of using ready-made programs, more and more corpus linguists turn to tools that they can flexibly adapt. One such tool that I will present is R, which is a software that not only allows you to perform corpus-linguistic tasks, but which you can also use for visualization and for the statistical analysis of linguistic data. On days 2 and day 3, we will use R to answer a few basic linguistic questions that would be quite hard to investigate with a regular concordance program.

Participants should bring their own laptop computers. There is no need to pre-install software. We will do that directly in the workshop.


Experimental approaches to the study of language (Sible Andringa)

This workshop is a hands-on introduction into designing powerful linguistic experiments. The goal of this workshop is to help you set up and conduct your own empirical research and to learn to read and critically evaluate scientific research. In the morning sessions, all aspects of scientific research will be covered: how may your research design help or hinder you in providing adequate answers to your research questions, how do you choose a method of observation/measurement (such as eye-tracking, lexical decision, etc.) and what exactly such methods measure, how can you exercise control over possibly intervening variables, and how do you report results. The afternoon sessions will allow you to practice designing experiments hands on, either through focused assignments or by developing your own research ideas into full blown experiments (or both). If you are planning an experimental study, then this workshop is for you. Participants should bring their own laptop computers.


Acoustic and statistical analysis of speech with Praat (Paul Boersma)

Paul Boersma is Professor of Phonetic Sciences at the University of Amsterdam. His research programme consists of figuring out, by computational modelling, how humans produce, comprehend, acquire and diachronically change their phonology and phonetics. He is also the main author of Praat, the world's most used computer program for speech research.

Day 1 of this workshop introduces Praat from a beginner's perspective, i.e. from the perspective of a beginner in Praat, not a beginner in phonetics, so some knowledge obtained in a first course of phonetics, for instance from Ladefoged's "Vowels and consonants", is assumed. You will probably learn something new even about phonetics on this day, though, so please don't skip it. You should bring your own laptop computer.

Day 2 discusses how you can work with corpora in Praat: making multiple measurements on your annotated data by using Praat scripts, running the right kinds of statistical analyses, and interfacing Praat with statistical software such as R.

On day 3 you are supposed to bring your own cases, which we will discuss on the fly. If you have data, we can look at them and propose lines of attack. If you have a research idea, we can look at it and propose an experiment design. You can also bring up any kind of technical question regarding Praat, statistics, and their interaction. You will probably learn helpful things from this day even if you don't present your own experiment or data, because of the many methodological issues that will be dealt with.


T. Florian Jaeger: "Progress in the study of language universals"

Languages across the world vary along many dimensions, but also share many commonalities. A common assumption—whether in generative or functional linguistics—is that at least some of the commonalities cannot be reduced to historical or geographic dependencies between languages. Rather, these cross-linguistic patterns (e.g., gradient or absolute "universals") are taken to be due to biases that originate in properties of the human brain. Much of the ongoing debate thus focuses more on the specific nature of the biases, whether they are specific to linguistic systems and in this sense cognitive arbitrary, or whether they are shared with other cognitive systems and/or follow from general constraints on communicative systems.

Traditionally, research on the causes of linguistic universals—a question that has continued to intrigue scientists and the public alike—has primarily originated in linguistic departments. Increasingly though, it has been researchers from other disciplines that have introduced novel methods and theories from other fields (statistics, biology, physics, information theory, psychology, etc.) to address this question. Some recent research, for example, has called into question whether there is any evidence for Greenbergian word order universals (Dunn et al., 2011-Nature; though see replies, including Croft et al., 2011-Linguistic Typology).

While any given example of this trend can be subject to criticism, this new types of studies raise both challenges and opportunities for linguistics: linguistics has much to offer to this endeavor; however, in order to participate in this research, linguistic scholars will also have to embrace this new world and acquire the training to partake in it.

I will give an overview of recent research in this vein, focusing on work from my lab. Specifically, I will present cross-linguistic computational studies and behavioral experiments that jointly suggest i) that there are universal biases affecting word order preferences, ii) that these biases originate in language processing, iii) that the biases are strong enough to cause learners to produce different outputs from the inputs they receive, and iv) that the effect of this cause can be observed cross-linguistically. Crucially, these biases do not predict absolute universals, but rather gradient relational patterns. The framework I present can capture both diversity and commonalities across languages, while making predictions about what types of languages one should be unlikely to observe.

With this work in mind, I hope to stimulate discussion of how it can be extended, corrected, improved, etc.


Alison Wray: "Can Linguists Predict who will get Alzheimer's Disease?"

What scope is there for linguists to assist in establishing the likelihood of someone developing Alzheimer's later in life? What would we look for? How could patterns of language be appropriately and reliably measured? If we found indicative patterns, what would we interpret them to signify? What would the ethical issues be of categorising people as at higher or lower risk of Alzheimer's many years before symptoms appeared?

We know quite a lot about the changes in language that occur in different types of dementia, including Alzheimer's. We know that some words are harder to retrieve than others – for example, one study found that natural objects were harder to name than manufactured ones; another found that verbs of cognition were more difficult to understand than verbs of motion. Research also found that people who later developed Alzheimer's already had certain patterns in their language in early adulthood that were different from those of people who didn't develop it. What does all this mean? What insights can linguistic theory give us about what exactly is being affected, when, and why?

There are many tantalising observations, but in fact, we don't always even know whether a linguistic pattern is part of the problem or part of the solution. Take formulaic language, for instance. Should we construe the use of vague fillers like 'something' and 'those people' as indicative of not knowing more information about the referents, or as an attempt to rescue a potential gap in the output arising from naming difficulties? When is the repetition of questions and statements a sign of having forgotten what was said previously, and when is it a success, in enabling the person to say something, when novel output is difficult to generate?

In this talk, we will get up close to some of the measures that are used in capturing patterns in Alzheimer's talk, and examine them with a linguist's eye. A primary focus will be two current studies that I am conducting with colleagues, to profile of the language of people at greater or lesser genetic risk of future Alzheimer's. I'll reflect on the potential social risks of over-interpreting observations, and consider what linguistic patterns might really tell us about variation in the population.


Rens Bod: "Hierarchical versus Sequential Processing in Language: Towards a Linguistics without Structure?"

It is generally assumed that hierarchical phrase structure plays a central role in human language. However, recent computational and behavioral studies show that sequential sentence structure has considerable explanatory power and that hierarchical processing is often not involved. In this talk, which is partly based on joint work with Stefan Frank and Morten Christiansen, I review evidence supporting the hypothesis that sequential structure may be fundamental to language acquisition and use. I also outline a non-hierarchical model of language use, based on ideas from the Data-Oriented Parsing approach, and discuss its implications and predictions in modelling language acquisition and comprehension. Hierarchical structure needs only be invoked in cases where simpler explanations, based on sequential structure, do not suffice.