After successful completion of the course, students are able to implement advanced and state-of-the-art concepts of Information Retrieval. More specifically, the students should:
- Gain a fundamental understanding on how (web) search engines (like Google, Bing, Lucene, Elasticsearch, …) work
- Learn how to efficiently search a large number of text documents and rank them according to their relevance with respect to a given query
- Learn how we create and analyze datasets to confidently evaluate search results
- Learn about Deep Neural Networks and how they can be used to create sequential text representations
i.e. Word Embeddings, CNNs, RNNs, and contexualized language models (Transformers, BERT etc...)
- Learn how neural nets can be used in the IR context in web and other domains to:
- Re-rank passages and documents in a search pipeline
- Learn to embed and retrieve passages with a vector-based nearest neighbor index
Based on the contents of the lecture awarded with 🏆 Best Distance Learning Award 2021 & 🏆 Best Teacher Award 2021.
Information Retrieval is the science behind search technology. Certainly, the most visible instances are the large Web Search engines, the likes of Google and Bing, but information retrieval appears everywhere we have to deal with unstructured data (e.g. free text).
A paradigm shift. Starting in 2019 the Information Retrieval research field began an enormous paradigm shift towards utilizing BERT-based language models (start here) in various forms to great effect with huge leaps in quality improvements for search results using large-scale training data. This course aims to showcase a slice of these advances in state-of-the-art IR research towards the next generation of search engines (a great & (almost) up-to-date overview is here).
Differences to the Grundlagen des IR Course (188.977). The basic concepts of IR (inverted index, text pre-processing, etc.) are taught in detail in the Grundlagen course. These concepts, will be only briefly refreshed in the advanced course. Our main focus will be Machine Learning, Deep Learning and Contextualized Language Models, whereas, in the Grundlagen course, these topics are not covered.
Our Online Format
- Communication & Materials via TUWEL & GitHub
- YouTube uploads of recorded lectures
- 45min to 1 hour each
- Additionally: PDF slides + automatic closed caption text
- Online office hours for exercises and lectures
- Exam (3 dates offered)
ECTS-Breakdown
Lectures & Background Reading (25 h)
Exercises (45 h)
- Exercise #1 (Data annotation): 5 h
- Exercise #2 (Neural re-ranking in Pytorch): 40 h
Exam (5 h)
- Exam incl. preparation: 5 h
Total (75 h)