After successful completion of the course, students are able to:
(1) Asses how data from different domains can be viewed as a graph, compare graph-structured representations of data, and translate such representations to and from related data modeling formalisms like relational databases and RDF triple stores. (2) Write, read, and understand specifications of domain knowledge for a given graph data set using RDF-S, SHACL, and different variants of the OWL profiles. (3) Formulate information needs as queries over knowledge graphs using adequate query languages: conjunctive queries, unions of conjunctive queries, and their extensions with regular path expressions. (4) Identify which query languages can express a given query, and to transform queries between representations. (5) Chose among the mentioned specification formalisms which is adequate for different settings, taking into account the completeness of data, information needs, and expected computational efficiency of access. (6) Describe an algorithm for validating a knowledge graph comprising knowledge in any of these languages, and for accessing the data. For example inputs of moderate size, the student will be able to answer queries or validate targets manually. (7) Compare the studied querying and validation algorithms, argue which techniques are adequate for which formalisms, and provide an explanation of their computational complexity. (8) Given an inconsistent knowledge graph, list its repairs and compute the answers to a given query over standard variations of the repair semantics. (9) Given a description of a domain and of some possibly heterogeneous incomplete data sources, write an OBDI specification to construct a virtual knowledge graph. (10) Given a query and an OBDI specification, explain how to compute the answers over the represented virtual knowledge graphs.
The course studies several semantic technologies and the way they can be used for integrating and accessing data, especially data that cannot be easily handled with legacy techniques because it may be incomplete, inconsistent, or heterogeneous, and expensive to integrate and maintain. We will study specification languages like RDF-S, SHACL, and the OWL profiles, as well as the SPARQL query language. These formalisms are studied in some detail, comparing their abstract syntax, their semantic assumptions, and their core algorithms for validation and query evaluation. We will see how RDF-S, SHACL, and OWL can be used to validate graph data, and to obtain useful knowledge graphs from data that may be incomplete and heterogeneous. We will study how these graphs can be queried in (fragments of) the SPARQL query language, some of the algorithmic and computational challenges that result from different choices of formalisms, and solutions for querying both virtual and inconsistent knowledge graphs.
Topic 1: Data Model for Graph-structured Data(GSD) 1.1 The basic model of GSD
1.2 Property graphs
Topic 2: Querying graphs
2.1 Query answering
2.2 Complexity of query answering
Topic 3: Querying RDF graphs
Topic 4: Querying with Knowledge
3.1 Inconsistencty
Topic 5: OWL Ontologies
Topic 6: SHACL
Topic 7: Ontology-based data integration and virtual KGs
The lecture will have different components.
(1) *Lecture videos* (asynchronous)Video presentations of the lecture materials will be posted online every week for asynchronous watching.
(2)*Online discussion* (attendance optional; synchronous - tentative schedule Tue 9:00-10:00)In a weekly zoom meeting, I will comment on the materials posted the week before, answer questions, and discuss exercises. In some of these meetings there will be online quizzes (I will announce them a week before), which are optional as an alternative to the exam.
(3) *Exercises* Optional exercises will be posted, to be solved independently. I will provide feedback on the solutions I receive. Correct solutions can also contribute to the mark, as an alternative to the exam.
ECTS breakdown:
Lectures: 18 hours (9 lectures of 2 hours each)
Exercises: 30 hours (3 exercise sheets, each 9 h house work + 1 h discussion)
Quizzes: 27 hours (3 quizzes, each 8h hours study + 1h quiz)
---- Total: 75 hours
*Grades*You can choose between two forms of grading: (A) You can get your grade during the semester, showing that you master the material of all the course units by means of exercises and quizzes. (B) An exam at the end of the semester (written + oral). In model (A) you can retake the quizzes and improve grades of the previous units.
More details will be announced in the introductory lecture on Tuesday, 12.10., 14:00 on Zoom.