TaCoS29

Program

This schedule is subject to change without further notice.

Friday

Time Title Speaker Room
12:00-14:00 Registration TaCoS Team Foyer of C7.2
    You can fetch your conference material.
14:00-14:45 Computational Linguistics in the Era of Deep Learning Prof. Alexander Koller Conference room in C7.4
    Over the past few years, deep learning methods have revolutionized our field. Many problems that everyone used to think required symbolic knowledge of some kind (e.g. grammars, whether written by hand or learned from corpora) can now be solved more accurately and more robustly using neural networks. This raises the question whether it is even still useful for computational linguistics students to learn about linguistic foundations - shouldn't we all just do machine learning instead? In my talk, I will discuss some recent research (from Saarbrücken and elsewhere) on the successes and limitations of pure deep learning methods and propose some partial answers on what it means to combine deep learning with linguistic insights.
14:45-15:30 ADVISER - an open source dialog system framework Moritz Völkel Conference room in C7.4
    We present ADVISER - an open source dialog system framework for education and research purposes. This system supports multi-domain task-oriented conversations in two languages. It additionally provides a flexible, modular design in which modules can be arbitrarily combined or exchanged - allowing for easy switching between rules-based and neural network based implementations. Furthermore, ADVISER offers a transparent, user-friendly framework designed for interdisciplinary collaboration: from a flexible back end, allowing easy integration of new features, to an intuitive graphical user interface suitable for non-technical users.
15:30-15:45 Coffee break
    15:45-16:10 Fact Checking in Community Forums Dominik Stammbach Conference room in C7.4
      We describe our system developed for the Semeval 2019 Task 8 on automated fact checking. We fine-tuned a BERT checkpoint on the qatar living forum dump and used this checkpoint to train a number of models. Our hand-in for subtask A consists of a fine-tuned classifier from this BERT checkpoint. For subtask B, we first have a classifier deciding whether a comment is factual or non-factual. If it is factual, we retrieve intra-forum evidence and using this evidence, have a classifier deciding the comment’s veracity. We trained this classifier on ratings which we crawled from qatarliving.com
    16:15-16:30 paragon semvox GmbH Dr. Norbert Pfleger Conference room in C7.4
      16:30-17:15 Compositional neural semantic graph parsing Matthias Lindemann Conference room in C7.4
        With the availability of semantic graphbanks, there has been a growing interest in parsing sentences into graphs that represent sentence-level meaning (e.g. Abstract Meaning Representation). These semantic graphs exist in many "flavors" with differences in what they represent and how they relate to their sentences. I present our semantic parser that treats these diverse graphbanks in a single compositional framework and sets new state-of-the-art on some of them.
      Barbecue

        Saturday

        Time Title Speaker Room
        9:00-10:00 Breakfast and Poster Session Foyer of C7.2
          Posters:
          • Tillmann Dönicke: Representation of (non-)Unicode Chinese characters in HTML and LaTeX using SVGs
            In 2017, Unicode introduced the sixth extension for C(hinese)J(apanese)K(orean) ideographs, containing 7,494 new characters and letting the total number of CJK ideographs increase to 87,882. However, there are still characters which are unencoded: dialect words, proper names, neologisms as well as characters from Vietnamese, Japanese and earlier Chinese variants. The tool presented here revisits the generation of characters out of components for the usage on webpages, and surpasses the existing methods and tools in many respects.
        10:00-12:00 Workshop: Graphemic Standardisation and Human Writing Systems Victor Zimmermann -1.05 in C7.2
          One of the oldest human inventions, writing, has been around for a long while. For the longest time the only governing body of what could be written was one's own wrist and writing utensil. Starting with the printing press this changed as people had to agree what would be part of a character set, and what would not be. For computers this task has mostly been taken over by the Unicode standard, just one in a long string of international rulebooks for graphemic standardisation. But what makes a good international standard when it comes to writing systems? Is Unicode the be-all and end-all of what we can expect of alphabets in the digital age or is there still more to come? If we were to create a new standard, could Emojis of all places be the way forward? In this workshop we will dive deep into some very diverse alphabets, explore the cultural and historic significance of the Unicode standard and explore its advantages and shortcomings.
        12:00-12:30 CRC1102 "Information Density and Linguistic Encoding" Dr. Maria Staudte Conference room in C7.4
          Language provides speakers with a multitude of choices regarding how they may encode their messages — from the duration of syllables, to the choice of words, structuring of syntactic elements, and arrangement of sentences in discourse. While variation has traditionally been addressed at each of these levels separately — with each appealing to very different kinds of explanations — the aim of CRC 1102 is to investigate the extent to which the notion of information (Shannon, 1948) can contribute to a unifying model of language use and variation. Here, language use is viewed from the perspective of (bounded) rational communication, in which speakers seek to optimize the encoding of their utterances so as to both (i) successfully convey their intended message, and (ii) optimize the cognitive effort expended by both the speaker and the comprehender.
        12:30-14:00 Lunch
          14:00-14:45 Alexa, schreiben Sie! - Erfahrungen mit Sprachsteuerung Alexander Frey Conference room in C7.4
            Ich haben einen Amazon-Alexa-Skill für einen Webshop geschrieben. Von jener Teilmenge meiner Erfahrungen, die anderen bei ihren Projekten helfen könnten, berichte ich. Wir beginnen bei der Korpus-Sammlung und enden beim Interaktionsmodell.
          14:45-15:30 Surfing Through Audio-Semantic Latent Space Maximilian Müller-Eberstein Conference room in C7.4
            In this talk, we will explore what happens when autoencoding machine-learning models are trained to translate information across sensory modalities. Based on the ongoing master thesis "Synesthetic Variational Autoencoders" which aims to coherently generate music based on visual art, we will attempt to extend the same methodology to a small NLP task in order to hear what sarcastic music sounds like.
          15:30-16:00 Coffee break
            16:00-16:45 Intertextual allusion detection Tillmann Dönicke Conference room in C7.4
              The study of source material is an integral part of literary theory and is concerned with finding the sources of expressions/paragraphs/ideas in a given text. Sometimes the source is obvious, e.g. in the case of citations; however, in many cases the backreferences (allusions) are of indirect nature, which is why researchers still have to carry out this step manually. We will talk about intertextual allusions under an information-theoretical aspect and discuss computational approaches for allusion detection.
            16:45-17:10 How can readability indices be used for plagiarism analysis? Maja Toebs Conference room in C7.4
              The demand for effective plagiarism analysis has increased significantly in recent years. Applications of plagiarism analysis include forensics, education, journalism and many other fields. The state-of-the-art approach to plagiarism analysis is intrinsic, meaning that style changes within a document are detected with an automatic stylometric analysis. Readability is a measure of authorial style that can be calculated easily and used to identify plagiarised passages. In this presentation, a simple and widely used formula for readability, the so-called Gunning Fog Index, is analysed regarding its robustness. For well-functioning intrinsic plagiarism analysis, the used measures need to be very stable. Also, they should be insensitive to the type, topic and length of a document. However, tests have shown that the Gunning Fog values tend to vary quite much within one document. Thus, I tried several approaches to stabilise these values. The results of my analysis are presented in this thesis and it is discussed how such readability measures can be used for plagiarism analysis.
            17:10-17:35 Relation Prediction with BERT Jannis Rautenstrauch Conference room in C7.4
              On social media platforms like Twitter, millions of users discuss hundreds of topics every day. If it is possible to automatically predict the relations between those posts, a lot can be learned about public opinions and the dynamics of discussion. This work is an analysis of whether pre-trained language models such as Google’s BERT which achieve state-of-the-art results in many NLP tasks can be used for the task of relation prediction in argument mining. The task is very challenging due to the small size of existing datasets for training and the fact that it involves high-level knowledge representation and reasoning issues. First results show that BERT outperforms currently used manual feature engineering solutions.
            18:30-19:30 Guided tour through the city
              19:30 Dinner

                Sunday

                Time Title Speaker Room
                9:00-10:00 Breakfast Foyer of C7.2
                  10:00-10:25 The NRC emotion/sentiment lexicons and what we can do with them Michael Vrazitulis Conference room in C7.4
                    During my six-week internship at Acrolinx, I was given the opportunity to scratch the surface of emotion and sentiment analysis tasks. Given one or multiple documents, the goal was to use and meaningfully visualize data from English lexicons which had been annotated with emotion tags and/or sentiment values through crowd-sourcing by the National Research Council Canada. In my brief talk, I will describe the challenges that arose throughout the process and hint to the general discussion of to what extent you can evaluate methods and algorithms of applied tasks in NLP or the broader field of Digital Humanities.
                  10:25-11:15 "Estonian NER in One Week" - Machine Learning in Low Resource Settings Michael Hedderich Conference room in C7.4
                    Taking into account that over 300 languages have more than one million L1 speakers, even popular languages need to be counted as low-resource for many NLP tasks. Similarly, even in English, the available resources often focus only on specific domains. While unlabeled, raw text is usually available in these scenarios, only few labeled corpora exist to train machine learning algorithms, e.g. to perform named entity recognition. Distant and weak supervision techniques have been proposed to overcome this issue. They allow to automatically annotate raw text using techniques like self-training or external resources. While these approaches can be used to obtain labeled data in a cheap and quick way, the labels often contain errors and training on noisy labels can actually decrease performance. Noise-handling techniques are able to model the noise in the data and actually leverage the additional, cheaply obtained labeled data. In this talk, I'll present different ideas on how to automatically annotate text and explain one of our noise handling techniques with an application in named entity recognition.
                  Closing Remarks & Next TaCoS TaCoS Team Conference room in C7.4