Proceedings of the first workshop on algorithms and resources for modelling of dialects and language varieties

Jeremy Jancsary, Friedrich Neubarth, and Harald Trost (editors)
Workshop held in conjunction with EMNLP 2011
2011, Association for Computational Linguistics (ACL),  Brunswick, NJ, USA

Language varieties (and specifically dialects) are a primary means of expressing a person’s social affiliation and identity. Hence, computer systems that can adapt to the user by displaying a familiar socio-cultural identity are expected to raise the acceptance within certain contexts and target groups dramatically. Although the currently prevailing statistical paradigm has made possible major achievements in many areas of natural language processing, the applicability of the available methods is generally limited to major languages / standard varieties, to the exclusion of dialects or varieties that substantially differ from the standard.

While there are considerable initiatives dealing with the development of language resources for minor languages, and also reliable methods to handle accents of a given language, i.e., for applications like speech synthesis or recognition, the situation for dialects still calls for novel approaches, methods and techniques to overcome or circumvent the problem of data scarcity, but also to enhance and strengthen the standing that language varieties and dialects have in natural language processing technologies, as well as in interaction technologies that build upon the former.

What made us think that a such a workshop would be a fruitful enterprise was our conviction that only joint efforts of researchers with expertise in various disciplines can bring about progress in this field. We therefore aimed in our call to invite and bring together colleagues that deal with topics ranging from machine learning algorithms and active learning, machine translation between language varieties or dialects, speech synthesis and recognition, to issues of orthography, annotation and linguistic modelling.

The 2011 Workshop on Algorithms and Resources for Modelling of Dialects and Language Varieties (DIALECTS 2011) is the first workshop to be held on this rather interdisciplinary topic. The workshop received seventeen submissions, out of which six were accepted as oral presentations (long papers) and three as posters (short papers). These papers represent interesting work from almost all the scientific fields that were mentioned in the call as being necessary to contribute to the common goal.

In addition to the submitted papers we are happy to welcome Burr Settles as our invited speaker to give a keynote talk on the topic of using multiple machine learning strategies to facilitate rapid development of NLP tools for new/rare languages/dialects. We hope that this gathering and the proceedings will help to promote and to advance the topic this workshop is centered around. We would like to thank all the authors who submitted their work for consideration. We are also especially grateful to the members of the program committee and the additional reviewers for their insightful and detailed reviews. 






