Why study language?

Published: April 11, 2008
Tags: cogsci language linguistics psycholinguistics phd

I'm more or less settled in at the university now and working four days a week on my PhD. The room that houses my office has only just recently had some renovations finished and it's not exactly completely set up yet. It's also so far below ground level that there is absolutely zero mobile phone reception, which might be something of a pain, but which is also pretty hard to do anything about. Anyway, expect the entries in this blog to start revolving around what I've decided to tentatively declare "computational psycholinguistics" in the near future. And in that vein...

Why study human language? Three reasons stand out in particular for me.

Language is something that is reasonably tractable by mathematical and scientific methodology. A lot of what goes on in psychology verges, in my opinion, on being pseudo-scientific rubbish. Any study, for instance, which revolves around things like one's perception of oneself, or feelings or anything like that is immediately confronted with the fairly insurmountable problem that we can't even precisely define these things, let alone measure them or model them. We don't even properly understand conceptually simpler things, like memory, on which these grandiose ideas must surely depend. These psychologists are, metaphorically speaking, trying to fly to the moon before they've fully learned Newton's laws of motion.

I think language is in a different situation. It's fairly easily to define what language, at its heart, is all about. We have two finite sets - one of words and one of concepts - and language is about mapping back and forward between finite sequences of words (more commonly known as "sentences" in the written case and, apparently, "utterances" in the spoken case) and logical relations between these concepts (which we might well call "ideas"). That's what it is. Learning a language is nothing more than learning this mapping. This is perhaps an oversimplification - from a language perspective, we've side-stepped the issue of building words up from heard phonemes or seen morphemes, and from a mathematical perspective it's true that were not really concerned with a mapping, but rather a relation because one sentence can conceivably have more than one possible interpretation - but it certainly captures the essence of the problem and puts it in an entirely tractable form: finite sequences of elements from finite sets are not mysterious, ephemeral, intuitive things - they're rigorously defined and well studied entities. We can do statistical analysis on them, we can define equivalence classes on them and we can generate them using stochastic or deterministic processes. Logical relations between concepts are nothing new or "squishy" either, and we can use things like predicate calculus to model them.

In short, the study of language is firmly grounded in objective reality, thus letting one investigate the human mind - certainly an appealing area of study - without sacrificing one's scientific integrity.
Language is surprisingly fundamental to human cognition. Although it's not initially clear under casual consideration, I think that, when you think about it, it becomes an inescapable conclusion that language is inherently tied up - and very deeply so - with how humans form and internally represent arbitrary and often quite abstract concepts and categories. After all, we're mapping back and forward between sentences or utterances and relations between concepts. The nature of these concepts and their initial formation, internal representation and long term storage can hardly be irrelevant. Sometimes when we map from a linguistic input into the conceptual "idea space", the resulting idea has the long term affect of modifying the way we perform these mappings in future - i.e. when we are explicitly taught a new word.
Language has some really cool applications in areas that I'm interested in. Better understanding of how humans understand and generate natural language can lead directly (thanks largely to point 1, i.e. that it's understanding of something tangible) to giving computers better ability to do things like:
- translate between human languages,
- search the web,
- automatically generate RDF triples for the semantic web,
- intelligently aggregate related items from the overwhelming forest of online news sources and/or blogs
- communicate with users in a more natural manner using speech and/or language recognition and synthesis.
These sorts of applications are, I think, likely to fairly strongly influence the direction of my research, especially those related to the web.

So that's it! Some of my reasoning behind devoting the next 3 years of my life largely to the study of human natural language processing.

Feeds