Session on Multilingual Data Value Chains

Call for contributions

If you are interested in a short presentation,  send to asun@fi.upm.es your contribution by 12th June 2015

Title: Session on Multilingual Data Value Chain

Event: Big Data Value Association Summit

Date of the session:  Friday 19 June 2015 at 11:45.

In Europe, linguistic and cultural diversity is a reality. Such invaluable richness has, however, an effect on the free flow of digital data and services across national borders because of current language barriers. Many stakeholders across Europe, companies, citizens, governments, cultural organizations, are being hampered in their daily operations by the simple fact that information and services are expressed in different languages. Clearly, all of these stakeholders would benefit from a much more transparent flow of information and data, enabling stakeholders to find, access, evaluate and consume any source of digital information independent of their native language. A European data economy with language barriers is far from being integrated, connected or universal, leading to a further commercial fragmentation of European markets, restrictions on cultural exchange across borders, less efficient public administration procedures between countries, etc. New technologies, products and services able to deal with the multilingual data value chains are needed to foster the efficient use of multilingual data across digital services and national borders.

This session will explore the various facets related to multilingualism in the data value chain, identifying the most urgent topics and challenges to be solved. Addressing these and other language-related issues can help to identify existing problems, propose new mechanisms and guidelines or adapt the ones in use for analyzing, publishing and using multilingual data and creating multilingual data value chains.

There is a growing number of datasets in different natural languages, and there is a need for guidelines and mechanisms to ensure the quality and organic growth of this emerging multilingual big data network. In this workshop, participants will explore:

  • business challenges and opportunities related to multilingual data value chains
  • the lifecycle of multilingual data in the data value chain
  • representation of language-related features in data sets at the metadata and data level
  • data management across different languages
  • methods, techniques and tools for multilingual content analytics
  • methods, techniques and tools for enhanced and more sophisticated navigation through multilingual data sets
  • methods, techniques and tools for enriching monolingual data sets into multilingual data sets
  • methods, techniques and tools for cross-lingual question-answering (a user could make a query in one language, retrieve the data store in datasets that are in other languages and answer the query in the user’s native language)
  • how language resources and language technologies help to deal with multilingual data
  • the role of selected standards in creating multilingual data sets and multilingual data value chains
  • the role of Linguistic Linked Data in the multilingual data value chains
  • What Linguistic Linked Data can provide to cross-media multilingual content analytics
  • how multilingual (linked) data could help to provide metrics to evaluate quality aspects of a data set

Addressing these and other language-related issues can help to identify existing problems, discuss opportunities on multilingual data sets in different big data application domains, propose new mechanisms and guidelines or adapt the ones in use for analysing, publishing, aggregating and navigating multilingual data, and, ultimately, provide metrics to evaluate quality aspects. 

Agenda

  1. Introduction  (A. Gómez-Pérez)
  2. Multilingualism in the Data Value Chain
    • Technology challenges: lifecycle
    • Business challenges and opportunities
  3. Lightning-talk to elicit requirements.
    • Companies make explicit the requirements to deal with Multilingual Data in the data value chain in 2 or 3 slides. Companies presentation is forbidden.
  4. Lightning-talk about technologies & Existing Resources to be used in the  Multilingual data value chains
    • Language Resources
    • Linguistic Linked data
    • Babelnet
    • Guidelines and best practices for multilingual data 
    • Technologies
      • Machine Translation
      • Multilingual Content analytics
      • Navigation through multilingual data sets
      • Enriching monolingual data sets into multilingual
      • Cross-lingual question-answering
      • Quality metrics
  5. Role of Standardization bodies
    • W3C  (Felix Sasaki)
    • ISO
  6. Discussion