Linked Data for Language Technology Roadmapping Workshop, 21st March 2014 Athens, Greece

The Linked Data for Language Technology community is organising a roadmapping workshop on 21st March in Athens, to build a better understanding of the potential synergies and co-evolution paths for language technologies, such as machine translation, information extraction and sentiment analysis, and  linked data. Language technologies are key to extracting information from unstructured content in different languages to form linked data, while linked data can aid the discovery and sharing of the language resources that underpin language technologies.

Who should attend? Any organisation interested in automated extraction of data from unstructured digital content, especially content in more than one language and including multimedia as well as textual content. Organisations engaged in the market for language technologies applied beyond English-language content and data. All these can benefit from more open access to linked language resources.

How can you participate? You can register for the event here. If you wish to present a similar statement you can indicate this on your registration form. The event will then proceed in an structured open format to identify and capture from participants their use case priorities and interoperability, best-practice and technology gaps they face. An online survey is currently open for gathering industry view on use case prioritation. You can also contribute directly by joining the Linked Data for Language Technology community at the w3C

Programme and Topics: The workshop will open with keynotes from Hans Uszkoreit who is Scientific Director DFKI, Nicoletta Calzolari Director of Research CNR, Phil Archer who is leading the W3C Data Activity and Asun Gomez-Perez UPM who is leading the LIDER coordination action on linguistic linked data. This will be followed by short briefing from four existing international communities working in this area, by position statements from companies about existing use cases and by an open workshop session to establish use case priorities.

The language resource community has already made a concerted attempt to catalogue different data sets through the META-SHARE initiative. It has tackled the need for common meta-data for linguistic corpora of various types and has paid particular attention to encoding the different usage rights that exist across governmental, academic and commercial data sources. This initiative is therefore well primed to exploit linked data technologies being standardised by the W3C Data Activity to further open the cataloguing and discovery of language resources.

This is particularly timely as the European Commission has launched it new H2020 funding programme with a strong support available for innovation and research in the open data and language technology space. In April 2014  it will also launch its Connecting Europe Facilities programme, with €1Billion for funding new pan-European digital services, including open data exchange and automated translations services. In both these initiatives, strong, open solutions for the interoperability of language resources as open web data will be key.

The workshop we take a use case driven approach to key questions around the synergies possible between the W3C’s open web data standards and existing approaches to sharing language resources and applying them for training language technologies:

  • How can language resource sharing infrastructure, such as META-SHARE, migrate to a linked data approach so as to benefit from more robust, decentralised and scalable publication and search features?

  • How well can existing linked data vocabularies such as Creative Commons Rights Expression Language and Linked Data Right support the usage rights models established for language resources?

  • How far can language resource meta-data be supported by the Data Catalogue Vocabulary or the Vocabulary of Interlinked Datasets?

  • How can emerging onto-lexical resources such as BabelNet be usefully interlinked with individual terms in existing language resources?

  • How can the process of locating and managing language resources to train language technologies be eased and optimised by vocabularies such as the Provenance Ontology or the Provenance and Plans Ontology for repeatable data workflows.

However these are just a sample of the many issues and viewpoints that will have a bearing on the future of Linked Data for Language Technoloiges, and we hope you will be able to join us in Athen to share yours.

Regards,

Dave Lewis