------------------ SIGGEN Newsletter www.siggen.org Date: 05 June 2005 ------------------ ====================================================================== TOPICS: 1. INF: Recent SIGGEN updates; board election, state of SIGGEN 2. CfB: Call for Bids for INLG'06 3. CfP: EWNLG'05 in Aberdeen [Early Registration by June 17] 4. CfP: Symposium on Dialog Modeling and Generation [July 7] 5. CfP: Using Corpora for Natural Language Generation [July 14] 6. TUT: Statistical Machine Translation and Generation [Aug. 11] 7. JOB: Postdoctoral Position in Adaptive Spoken Language, NY 8. JOB: Research Fellow/PostDoc, Aberdeen 9. Stu: Funded Studentship, Aberdeen 10. ANN: Surge 2.3 now available ====================================================================== SIGGEN Board Members: Tilman Becker Tilman.Becker@dfki.de Charles Callaway ccallawa@inf.ed.ac.uk Irene Langkilde-Geary irenelg@cs.byu.edu David McDonald dmcdonald@bbn.com David Reitter dreitter@inf.ed.ac.uk ====================================================================== TOPIC 1: Recent SIGGEN updates; board election, state of SIGGEN Dear SIGGEN members, Tilman Becker, the last remaining member from the previous SIGGEN board, has been joined by the 4 new board members from this winter's election: Charles Callaway, Irene Langkilde-Geary, David McDonald, and student representative Dave Reitter. As mentioned before, the website has been moved to www.siggen.org, (change your bookmarks!) hosted at DFKI, and has been updated, most significantly in regards to the membership on the Who's who page. This page, which now lists 145 members, has been revamped to ensure that all links are valid. If you, or someone you know, would like to be added to this list, please don't hesitate to email us. The mailing list has been similarly checked to ensure valid email addresses, and now contains 209 members. This means that those who do not receive copies of this newsletter are not currently on the email list, not considered to be members, and thus cannot vote in future elections. (A copy of the current SIGGEN constitution is located at: http://www.siggen.org/discussion/constitution/constitution_v2.html) Your board members will be attending a wide array of conferences this summer, so if you see us, please don't hesitate to talk to us, or of course send us email. We will quickly respond to any suggestions or comments you may have. -- The SIGGEN Board ====================================================================== TOPIC 2: SIGGEN: Call for Bids to Host INLG-2006 http://www.siggen.org/event/bidinlg06.html SIGGEN (Special Interest Group in Generation of the Association for Computational Linguistics) invites proposals to host the International Natural Language Generation (INLG) Conference in 2006. INLG conferences are usually held in the summer, and sometimes co-located with other NLP events, such as ACL. INLG attendance is usually on the order of 80 people (that is, more than 50 and less than 120). As INLG-2004 was in the ACL European region, we especially welcome and will prefer proposals for holding INLG-2006 in the ACL Americas or Asia/Pacific regions. Draft proposals should be emailed to ccallawa@inf.ed.ac.uk by 30 Sept 2005. These proposals should outline: * conference location and practicalities (venue, accomodation, meals). Note that INLG's have traditionally been held in places which are secluded but easily accessible (within a few hours drive of a major international airport), such as Brighton, UK (2004); mid-State New York (2002); Mitzpe Ramon, Israel (2000); Niagara-on-the-Lake, Canada (1998); Hertsmonceux Castle, UK (1996); and Kennebunkport, USA (1994). * approximate conference date. Will it be possible for INLG attendees to combine attendance at INLG with attendance at other conferences of interest to the NLG community (for example, INLG-02 immediately preceded ACL-02, INLG-98 immediately preceded ACL-98, and INLG-92 immediately followed ANLP-92). * rough budget and expected sponsorship. Approximately how much will participants need to pay to attend, including accomodation and meals as well as conference registration? Note that attendance cost for previous INLG's has generally been US$500 or less. * local arrangements. Who will be in charge of organising the conference, and how will finances be handled (eg, can participants pay by credit card)? Draft proposals will be considered by a committee that includes some SIGGEN board members and previous INLG chairs. This committee may contact proposers and request additional information. For more information, see: * http://www.itri.brighton.ac.uk/inlg04/ for information about INLG-2004. * http://inlg02.cs.columbia.edu/ for information about INLG-2002. * http://www.dfki.de/~wahlster/bids/ for draft bids for ACL-01 (a bit different from INLG draft bids, but useful as examples). ====================================================================== TOPIC 3: EWNLG'05 in Aberdeen [Early Registration by June 17] Call for Participation 8-10 August 2005 Aberdeen, Scotland (following IJCAI-2005 in Edinburgh) http://www.ling.helsinki.fi/~gwilcock/ENLG-05/ Natural language generation (NLG) is a subfield of natural language processing that focuses on the generation of written texts in natural languages from some underlying non-linguistic representation of information, generally from databases or knowledge sources. Accomplishing this goal may be envisioned for a number of different purposes, including standardized and/or multi-lingual reports, summaries, machine translation, dialogue applications, and embedding in multi-media and hypertext environments. Consequently, the automated production of language is associated with a large number of highly diverse tasks whose appropriate orchestration in high quality poses a variety of theoretical and practical problems. Relevant issues include content selection, text organization, production of referring expressions, aggregation, lexicalization, and surface realization, as well as coordination with other media. The workshop continues a biennial series of workshops on natural language generation that has been running since 1987. Previous European workshops have been held at Royaumont, Edinburgh, Judenstein, Pisa, Leiden, Duisburg, Toulouse (2001) and Budapest (2003). The series provides a regular forum for presentation of research in this area, both for NLG specialists and for researchers who may not think of themselves as part of the NLG community. The 2005 workshop will span the interest areas of natural language generation and Artificial Intelligence, with a special focus on research that integrates NLG with AI, including vision, robotics, intelligent agents, and knowledge discovery. We also encourage papers that investigate the use of state-of-the-art generation technology in real world applications to handle both spoken and text output, and apply language generation techniques to interactive AI systems like communicating robots, to allow the user to enter into short conversations with the system in search for information. There will be demonstrations of working NLG systems, and special sessions for posters describing real-world applications and advanced language technology systems. Papers will be presented on formal, corpus-based, implementational and analytical work on conventional NLG topics (realisation, microplanning, etc), and especially papers with a focus on the following themes: * Embodied agents and robot communication (special track) * NLG for real-world applications * Use of ontologies in NLG * Statistical methods for NLG * Information organization for planning and NLG * Robust methods and techniques for NLG * Evaluation of NLG systems Invited Speaker: Kevin Knight (Information Sciences Institute, University of Southern California) will give an invited talk on Tree Transducers for Machine Translation and Generation ====================================================================== TOPIC 4: Symposium on Dialogue Modelling and Generation Call for Participation July 7, Amsterdam, The Netherlands http://lubitsch.lili.uni-bielefeld.de/DMG/ This symposium is intended to tackle issues in the semantics and pragmatics of dialogue and dialogue generation. It aims at bringing together the dialogue modelling and language generation/production communities and will provide an opportunity for researchers from a variety of disciplines, including linguistics, computer science and psycholinguistics, to exchange ideas. We invited talks elaborating on important theoretical notions in dialogue modelling -such as constraints (Asher & Lascarides, 2003, and many other recent papers), the role of domain knowledge (e.g., Ludwig, 2003, and, again, many more) and the influence of social relations between interlocutors on dialogue behaviour (going back to the seminal work by Brown and Levinson, 1978)- and asked presenters to shed light on these or other theoretically fruitful notions in dialogue modelling by: * relating them to issues in language generation/production or * drawing out similarities and differences between applications of such notions in discourse generation versus interpretation or * describing computational/implemented models, in particular, for generation/production or * comparing psycholinguistic with linguistic or engineering approaches to dialogue modelling. The symposium will thus be a natural complement to ones that deal with natural language interpretation or structural properties of discourse. ====================================================================== TOPIC 5: Using Corpora for Natural Language Generation Call for Participation July 14, Birmingham, England (preceding Corpus Linguistics 2005) http://www.itri.brighton.ac.uk/ucnlg/ We aim to bring together researchers who use corpora for NLG research either in the traditional, manual way, or automatically, involving machine learning and statistical methods. The goal of the workshop is to present and discuss current research, to compare manual and automatic corpus exploitation, to evaluate achievements, and to identify challenges for the future. Registration is open at the Corpus Linguistics 2005 website. Please note that Using Corpora for NLG is a full-day workshop, and that you do not need to register for the main conference. Simply select the appropriate options in the registration form. The workshop registration fee is 70 Pounds. Papers will be presented on all aspects of using corpora for natural language generation, including, but not limited to: * (Partial) automation of traditional corpus analysis for NLG * Issues in annotating corpora for NLG * Statistical approaches to deep and/or surface generation * Machine learning methods for deep and/or surface generation * Role of corpora in the evaluation of NLG systems * Reuse of resources developed for NLU (e.g. treebanks) in NLG * Domain-specific vs. general purpose corpora for NLG We would like to emphasise that where we say `NLG' we mean to include the language generation components of machine translation and dialogue systems. Invited Speaker: Irene Langkilde-Geary (Brigham Young University, Provo, USA) will give an invited talk with the provisional title: Constraint programming as a Whiteboard Architecture for Probabilistic NLG. Panel on Exploiting Corpora for NLG: We will hold a panel discussion on the topics of the workshop. The panel members are: Chris Brew, Linguistics, Ohio State University, USA Irene Langkilde-Geary, Brigham Young University, USA Ehud Reiter, Computing Science, University of Aberdeen, UK Donia Scott, CRC, Open University, UK Bonnie Webber, Informatics, University of Edinburgh, UK ====================================================================== TOPIC 6: Tutorial: Statistical Machine Translation and Generation August 11, Aberdeen, Scotland (Immediately following EWNLG'05) http://www.csd.abdn.ac.uk/~cmellish/knight.html Kevin Knight, USC/Information Sciences Institute, USA The statistical approach to machine translation provides a set of techniques for (1) automatically learning translation knowledge from bilingual data, and (2) applying that knowledge to translate previously-unseen sentences. When it was first introduced, statistical MT was far too slow and inaccurate to be useful -- it was an interesting lab experiment. In 2005, we see statistical MT significantly outperforming other methods in many language pairs and domains, at speeds permitting commercial applications like foreign news broadcast translation. What made this possible? This tutorial will cover the basic theory and the major technical advances of the past few years. Of course, there is a long way to go! The tutorial will also cover known limitations of current MT models and describe current research trends. We will also discuss problems in natural language generation, where the input is typically more abstract than foreign text, and describe how statistical MT research is currently exploiting linguistic categories. This tutorial is free of charge. It is hosted by the Natural Language Generation group at the University of Aberdeen. We are grateful for the support of EPSRC grant EP/C523156/1 which has made this tutorial possible. If you are interested in attending this tutorial, please send an email to ccameron@csd.abdn.ac.uk so that you can be allocated a place and informed of any further developments. For more information, contact Chris Mellish (cmellish@csd.abdn.ac.uk). ====================================================================== TOPIC 7: Postdoctoral Position in Adaptive Spoken Language StonyBrook, NY The Psychology, Linguistics, and Computer Science Departments at Stony Brook University are collaborating on an innovative project, funded by the National Science Foundation: "Adaptive Spoken Dialog with Human and Computer Partners." We seek a postdoctoral associate to collaborate with us. The successful applicant will have a Ph.D. in Psychology, Linguistics, or Computer Science, or a relevant interdisciplinary field. Preferred Qualifications: Experience in one or more of the following: experiment design, statistics, linguistic phonetics, computational linguistics, speech processing, psycholinguistics techniques such as eyetracking. Depending on the candidate's background and qualifications, duties will include: 1) Contributing to empirical (laboratory and corpus-based) studies of language use (both comprehension and production). This involves working with human subjects, designing experiments, collecting data, and conducting detailed analyses of text and spoken corpora 2) Contributing to our efforts to model human language behavior and test computational models using data. 3) Generating independent sub-projects relevant to project's research questions. 4) Supervising graduate and undergraduate student researchers in day-to-day activities across one or more projects conducted within the PI's laboratories 5) Assisting with management of laboratory resources, such as ordering equipment, software installation, etc. 6) Writing up results for publication 7) Traveling to conferences and workshops as appropriate 8) Developing expertise in relevant techniques and procedures that span Psychology, Linguistics, and Computer Science This is a full time position. The Research Foundation of SUNY is a private educational corporation. Employment is subject to the Research Foundation policies and procedures, sponsor guidelines, and availability of funding. Projected start date: January 1, 2006 (flexible) Application Procedure: Applications will be accepted until the position is filled. More details about the project can be found at http://www.cs.sunysb.edu/~adaptation/. Applications for the may be submitted on-line at http://naples.cc.sunysb.edu/Admin/CampusJob.nsf via the "Postdoctoral positions" link, or else submit a cover letter and resume to: Prof. Susan E. Brennan Department of Psychology Stony Brook University Stony Brook, New York 11794-2500 Fax: 631-632-7876 Stony Brook University, flagship campus of the S.U.N.Y. system, is a world-class, student-centered research university located 60 miles from New York City. ====================================================================== TOPIC 8: Research Fellow/PostDoc: Towards a Unified Algorithm for the generation of referring expressions University of Aberdeen, Scotland Applications due: July 15, 2005 Contact: Dr Kees van Deemter kvdeemte@csd.abdn.ac.uk Background: Natural Language Generation programs generate text from an underlying Knowledge Base. It can be difficult to find a mapping from the information in the Knowledge Base to the words in a sentence. Difficulties arise, for example, when the Knowledge Base uses `names' (i.e., databases keys) that a hearer/reader does not understand. This can happen, for instance, if the Knowledge Base contains an artificial name like `#Jones083', because `Jones' alone is not uniquely distinguishing; it is also true if the Knowledge Base deals with entities for which no names at all are in common usage (e.g., a specific tree or a chair). In all such cases, the program has to "invent" a description that enables the reader to identify the referent. In the case of Mr. Jones, for example, the program could give his name and address; in the case of a tree, some longer description may be necessary (e.g., `the green oak on the corner of ... and ...'. The technical term for this set of problems is Generation of Referring Expressions (GRE). GRE is a key aspect of almost any Natural Language Generation system. Aims: Existing GRE algorithms tend to focus on one particular class of referring expressions, for example conjunctions of atomic or relational properties (e.g., `the black dog', `the book on the table'). Our research is aimed at designing and implementing a new algorithm for the generation of referring expressions that generates appropriate descriptions in a far greater variety of situations than any of its predecessors. The algorithm will be more complete than its predecessors because it is able to construct a greater variety of descriptions (involving negations, disjunctions, relations, vagueness, etc.). The descriptions generated should also be more appropriate (i.e., more natural in the eyes of a human hearer/reader), because the algorithm will be based on empirical studies involving corpora and controlled experiments. Among other things, these empirical studies will address the question under what circumstances the descriptions should be logically under- or overspecific; they will also allow us to prune the search space (i.e., the space of all descriptions) which would otherwise threaten to make the problem intractible. The project combines (psycho)linguistic, computational and logical challenges and should be of interest to people whose intellectual home is in either of these areas. General Info: http://www.itri.brighton.ac.uk/projects/tuna/TUNA-index.html ====================================================================== TOPIC 9: Funded Studentship: Managing Ambiguity in Generated Text University of Aberdeen, Scotland Applications due: July 15, 2005 Contact: Dr Kees van Deemter kvdeemte@csd.abdn.ac.uk General Info for prospective students at Aberdeen: http://www.abdn.ac.uk/sras/postgraduate/apply5 ====================================================================== TOPIC 10: Surge 2.3 now available for download Contact: Charles Callaway ccallawa@inf.ed.ac.uk Surge 2.3, the latest version of the SURGE English grammar, has been packaged for download at the following location. Improvements have been made for written and spoken dialogue, XML and LATEX formatting, punctuation, and additional coverage rules derived from the Penn TreeBank. For use with FUF5.3. http://homepages.inf.ed.ac.uk/ccallawa/index-c.html ====================================================================== eof ======================================================================