Data, Knowledge, and the Web

The advent of large-scale data on the Web and elsewhere poses new challenges and opportunities. Concepts, models, and algorithms from several fields, including database systems, information retrieval, natural language processing, statistical learning, and data mining can help us to analyze and learn from this data.

Groups and Researchers in this Field

Text + Time Search & Analytics

Klaus Berberich coordinates the Text+Time Search and Analytics research area in the Databases and Information Systems Department at the Max Planck Institute for Informatics, focusing on developing efficient, effective methods to search and analyze natural language texts that come with associated temporal information. This may include temporal expressions, which convey time periods a text refers to, as well as publication timestamps. Data of interest include web archives, newspaper corpora, and other collections of born-digital or now-digital documents. Implementing and experimentally evaluating methods on real-world data is integral to the group’s approach. Recent and ongoing efforts include time-travel text search, algorithms to compute n-gram statistics at large scale, and redundancy-aware retrieval models. Read more

Klaus Berberich

MPI-INF, Senior Researcher

Personal Website

Machine Learning and Large-scale Data Mining Methods

Manuel Gomez Rodriguez is a research group leader at the Max Planck Institute for Software Systems. He is interested in developing machine learning and large-scale data mining methods for analysis and modeling of large real-world networks and processes that take place over them. His research comprises several dimensions: developing models of these networks and processes, assessing their theoretical properties and limitations; developing machine learning algorithms to fit the models and computational methods to influence processes over networks; and validating models and methods on gigabite- and terabyte-scale real-world datasets. Ultimately, he aims to provide computational tools with applications in a variety of domains, e.g. social and information sciences, economics, decision theory, causality, and epidemiology. Read more

Manuel Gomez Rodriguez

MPI-SWS, Faculty

Personal Website

Knowledge base construction and quality

Simon Razniewski leads the Knowledge Base Construction and Quality area at the Databases and Information Systems Department of the Max Planck Institute for Informatics. A main objective of his research is construct domain-specific knowledge bases, incorporating all relevant stages such as entity recognition, taxonomy construction or fact extraction, and to extend traditional knowledge base models, for instance by adding counting quantifiers, negative information, or quality metadata. Besides that, he is interested in applications of knowledge bases, in particular in areas such as question answering and image enrichment, and in the extraction and consolidation of common-sense knowledge, e.g. appearance and properties of everyday objects. Read more

Simon Razniewski

MPI-INF, Senior Researcher

Personal Website

Question Answering

Rishiraj Saha Roy leads the research group on “Question Answering” in the Databases and Information Systems Department at the Max Planck Institute for Informatics. Research on question answering (QA) aims to provide direct answers to natural language utterances over curated knowledge graphs, structured databases, unstructured Web text, or a combination of the above. In our group, we have tried to push the state-of-the-art in QA along multiple dimensions. The key driving criteria have been handling diversity in question formulations, complexity in information needs, and providing unsupervised, interpretable, and robust solutions that are not constrained to specific settings and benchmarks. Read more

Rishiraj Saha Roy

MPI-INF, Senior Researcher

Personal Website

Bridging AI and Neuroscience

Mariya Toneva’s research is at the intersection of Machine Learning, Natural Language Processing, and Neuroscience. Her group bridges language in machines with language in the brain, with a focus on building computational models of language processing in the brain that can also improve natural language processing systems. Prior to joining MPI-SWS, she is conducting research as a C.V. Starr Fellow at the Princeton Neuroscience Institute. She received her Ph.D. in a joint program between Machine Learning and Neural Computation from Carnegie Mellon University. Read more

Mariya Toneva

MPI-SWS, Faculty

Personal Website

Exploratory Data Analysis

Jilles Vreeken is a senior researcher in the Databases and Information Systems Department at the Max Planck Institute for Informatics, and leads the Exploratory Data Analysis independent research group at the Cluster of Excellence on Multimodal Computing and Interaction. His research focuses on exploratory data mining: developing theory and algorithms to identify interesting structures within given data. Of particular value here are statistical methods, such as information-theoretic principles of minimum description length and maximum entropy. Next, he develops efficient algorithms to extract these structures from large and complex data, and investigates how they can be used in a range of applications, including identifying rare diseases, e-health, bio-informatics, market analysis, product recommendation, etc. Read more

Jilles Vreeken

MPI-INF, Senior Researcher

Personal Website

Knowledge Harvesting

Gerhard Weikum is a Research Director at the Max Planck Institute for Informatics, where he leads the Databases and Information Systems Department. He is also an adjunct professor in the Department of Computer Science of Saarland University, and a Principal Investigator of the Cluster of Excellence on Multimodal Computing and Interaction. The long-term objective of his research is to develop methodology for knowledge discovery: collecting, organizing, searching, exploring, and ranking facts from a wide array of structured, semistructured, and textual information sources, which may exhibit varying levels of credibility. His group’s approach towards this goal combines concepts, models, and algorithms from several fields, including database systems, information retrieval, statistical learning, and data mining. Read more

Gerhard Weikum

MPI-INF, Scientific Director

Personal Website

Searching, Mining, and Learning with Informal Text

Andrew Yates in a senior researcher in the Databases and Information Systems Department at the Max Planck Institute for Informatics, where he leads the Searching, Mining, and Learning with Informal Text research group. In contrast to authoritative information sources, like encyclopedias, news articles, and academic papers, much of the information available on the Web is contained in informal text that requires different strategies to interpret. His research group aims to develop methods for searching, mining, and learning with such text so that it may be integrated with other knowledge. This goal spans both information retrieval and natural language processing tasks, such as mining health-related claims from social media, extracting information from dialogue, and learning to identify relevant spans of text. On the information retrieval side, the group is particularly interested in leveraging recent advances in deep learning to develop more powerful retrieval models and to learn fine-grained types of relevance, including task-specific and passage-level relevance. Read more

Andrew Yates

MPI-INF, Senior Researcher

Personal Website

Research at Partner Universities

Data Engineering Group

Database Systems Research Group

Big Data Research Area

Database and Information Systems Group