Mittwoch, 13. Dezember 2017

Conference on Semantics, Data and Analytics

Last week I attended the Bayer-hosted conference on Semantics, Data and Analytics. It was a high-profile event with many interesting invited speakers including Harald Sack from Karlsruhe Institute of Technology (giving a nice intro to the Semantic Web), Steffen Lohmann from Fraunhofer IAIS talking about visual analytics, Martin Hoffmann-Apitius talking about data integration for biology as well as myself talking on vertical knowledge graphs.

 The issue of how to cost-effectively create knowledge graphs for the purpose of data integration was all around in the air. I was impressed by seeing how present the topic of data integration was not only at Bayer but for all pharmaceutical companies present there. In my talk, titled 'Domain-specific knowledge graphs for knowledge management and knowledge discovery' I emphasized that all data integration activities require clear use cases and competency questions to scope a project adequately and get the most out of the data. But cost is an issue. Semantics and ontologies are key to data integration, providing the five principles for semantic data integration:


  • Normalization: Semantics is inherently reductionists, abstracting from details and focusing on commonalities. Practically, this is achieved by mapping data to an agreed upon set of IDs and vocabulary elements. 
  • Reuse: Normalization is achieved by reuse of vocabulary and IDs, do not invent own IDs or vocabulary elements if suitable elements exist already? Otherwise, integration will simply not happen. 
  • Commitment: Commitment is about agreeing to understand a certain concept in the same way as the stakeholder who introduced that concept; this works without formal axiomatic definitions. It works when we speak. We can exchange messages in natural language without formally agreeing on the definition of each single word. 
  • Grouping: Normalization and typing allows to group different entities together at a certain abstraction level. This is key for aggregation (see below), that is computing summarization for data that are grouped according to some criterion. 
  • Aggregation: The ultimate goal of any semantic data integration exercise. At the end of the day, we are less interested in the single data point, put in aggregating all data points or entities that share certain characteristics or features and provide informative summaries / statics for the aggregated elements. 


I also talked about the challenges of incorporating unstructured / textual information into knowledge graphs via text mining. Errors are unavoidable when using machine reading / information techniques. On the other hand, by deploying machine reading, we are able to ingest information from text at a speed and scale that no single human would ever be able to do.

So "Where is the sweet-spot along the trade-off between being able to „machine read“ a large amount of documents and having to live with errors?"

 The panel after the talks in the morning of the 7th of December was very informative and lively. There were very interesting discussions on the role of foundational/upper ontologies in data integration, the cost of integrating data using knowledge graphs compared to using a standard data warehouse approach, the challenge of dealing with datasets and vocabularies that constantly evolve, the question how to implement quality assurance / quality control over an evolving knowledge graph and how to effectively involve users in this process. Big questions!

 It was a great conference. It was a pleasure to speak to the audience, a very interesting and knowledgeable audience indeed. The post-its all around and all the brainstorming going on were really inspiring and fruitful. When is the next edition?

Sonntag, 1. Oktober 2017

The impact of AI on customer relationship management

In a recent report, the International Data Corporation (IDC) estimates that artificial intelligence (AI) technology applied to customer relationship management (CRM) might boost global business revenue in the orders of $1.1 trillion from 2017 to 2021.

In particular, AI-driven CRM might lead to the creation of 800,000 direct jobs, and 2 Mio. of indirect jobs. The year 2018 is likely to turn out to be the mayor year for AI adoption.

Significant amount of work activities replaceable by analytics and machine learning

In a study from 2015 "Four Fundamentals of workplace automation", McKinsey has found that 45% of 2,000 work activities performed in every occupation in the economy and associated with a $14,6 trillion of wages, have the potential to be automated on the basis of machine learning technology.

Potential of analytics remains high, but progress is slow

In their report "The age of Analytics: competing in a data-driven world" from December 2016, McKinsey concludes that the potential of analytic technologies remains as high as identified in their 2011 report "Big data: The next frontier for innovation, competition, and productivity". Nevertheless, progress and adoption have been slower than anticipated in their 2011 report.

Donnerstag, 25. Mai 2017

Rechtsextremismus im Netz erkennen

Hate Speech, Fake News und rechte Hetze – wer sich Diskussionen in sozialen Netzwerken oder Kommentarspalten anschaut, stößt schnell auf fragwürdige Posts. Längst haben rechte Extremisten das Web 2.0 als Instrument für ihre Propaganda entdeckt. Social Media hilft ihnen, sich zu vernetzen und neue Anhänger zu rekrutieren.

Für unsere Kooperationspartner und Auftraggeber vom Kompetenzzentrum Rechtsextremismus (KomRex) der Uni Jena sind soziale Netzwerke deshalb ein interessantes Forschungsfeld.

Mittwoch, 22. Februar 2017

Cognitive Computing: Wie intelligent sind Maschinen?

Aktuell diskutiert das EU-Parlament, ob Roboter in Zukunft als „elektronische Personen“ zählen sollen. „Personen“, die Rechte und Pflichten haben und die unter Umständen sogar Haftung übernehmen, wenn sie einen Fehler machen [1]. Der Vorschlag soll eine Rechtsgrundlage schaffen für eine Zukunft, in der selbstfahrende Autos eigenständig Entscheidungen treffen und in der Maschinen womöglich ein Bewusstsein entwickeln – vielleicht ohne, dass wir es merken.

Aber was heißt das: eigenständige Entscheidungen? Und was macht ein Bewusstsein aus?

Montag, 6. Februar 2017

Kundenmeinungen in Echtzeit: der Social Media Kompass bietet Orientierung


Auf einen Blick
  • Mit Social Media Analytics können Unternehmen in Echtzeit erfahren, was Menschen über ihre Produkte denken. Dafür werten Computer Beiträge in sozialen Netzwerken aus.
  • Bisher ist es für Computer schwierig, doppeldeutige, kontextabhängige oder fachspezifische Aussagen zu verstehen.
  • Semalytix passt deshalb seinen Social Media Kompass für jeden Auftraggeber individuell an. Die Algorithmen sind für den jeweiligen Anwendungsfall maßgeschneidert und dadurch treffsicherer.

„Cooler Gucci-Street-Style!“ „Mein neuer Mustang… echt schnell!“ „Verdammt, meine Regenjacke ist nicht wasserdicht!“ Jede Sekunde posten Nutzer weltweit tausende neue Tweets auf Twitter [1]. Sie tauschen sich aus über private und politische Themen, über Fernsehsendungen und Zeitungsartikel, und auch über Produkte, Marken und Trends. Für Unternehmen ist das eine super Chance, mehr über ihre Kunden zu erfahren. Ohne technologische Unterstützung ist das allerdings kaum möglich. Bei Semalytix beschäftigen wir uns deshalb mit Social Media Analytics. Dabei geht es darum, die riesigen Datenmengen aus sozialen Netzwerken maschinell auszuwerten und Entscheidern in nützlicher Form zu präsentieren.