188.980 Advanced Information Retrieval
This course is in all assigned curricula part of the STEOP.
This course is in at least 1 assigned curriculum part of the STEOP.

2021S, VU, 2.0h, 3.0EC
TUWEL

Properties

  • Semester hours: 2.0
  • Credits: 3.0
  • Type: VU Lecture and Exercise
  • Format: Online

Learning outcomes

After successful completion of the course, students are able to implement advanced and state-of-the-art concepts of Information Retrieval. More specifically, the students should:

  • Gain a fundamental understanding on how (web) search engines (like Google, Bing, Lucene, Elasticsearch, …) work
  • Learn how to efficiently search a large number of text documents and rank them according to their relevance with respect to a given query
  • Learn how we create and analyze datasets to confidently evaluate search results

  • Learn about Deep Neural Networks and how they can be used to create sequential text representations  
    i.e. Word Embeddings, CNNs, RNNs, and contexualized language models (Transformers, BERT etc...)
  • Learn how neural nets can be used in the IR context in web and other domains to:
    • Re-rank passages and documents in a search pipeline
    • Learn to embed and retrieve passages with a vector-based nearest neighbor index

Information Retrieval is the science behind search technology. Certainly, the most visible instances are the large Web Search engines, the likes of Google and Bing, but information retrieval appears everywhere we have to deal with unstructured data (e.g. free text).

A paradigm shift. Starting in 2019 the Information Retrieval research field began an enormous paradigm shift towards utilizing BERT-based language models (start here) in various forms to great effect with huge leaps in quality improvements for search results using large-scale training data. This course aims to showcase a slice of these advances in state-of-the-art IR research towards the next generation of search engines (a great & (almost) up-to-date overview is here).

Differences to the Grundlagen des IR Course (188.977). The basic concepts of IR (inverted index, text pre-processing, etc.) are taught in detail in the Grundlagen course. These concepts, will be only briefly refreshed in the advanced course. Our main focus will be Machine Learning, Deep Learning and Contextualized Language Models, whereas, in the Grundlagen course, these topics are not covered.

Subject of course

Our Online Format

  • Communication & Materials via TUWEL

  • Weekly YouTube uploads of recorded lectures
    • 45min to 1 hour each 🙌
    • Additionally: PDF slides + automatic closed caption text 🎉
  • Flexible grading structure 👌

  • Online office hours for exercises & lectures

  • 24h take home exam (2 dates offered)

ECTS-Breakdown

Lectures & Background Reading (20 h)

  • Introduction
  • 2x Crash course IR

  • 2x Machine learning & data annotation

  • 4x NLP & Neural ranking

Exercises (50 h)

  • Exercise #1 (Data annotation): 5 h
  • Exercise #2 (Neural re-ranking in Pytorch): 45 h

Exam (1-4 h)

  • Exam: 1-4 h

Total (75 h)

Teaching methods

Programming Neural Networks in PyTorch

Mode of examination

Immanent

Lecturers

Institute

Examination modalities

Exercise and Exam (24h take-home exam)

Exams

DayTimeDateRoomMode of examinationApplication timeApplication modeExam
Tue17:00 - 19:0004.06.2024GM 1 Audi. Max.- ARCH-INF written12.03.2024 00:00 - 03.06.2024 12:00TISSExam (1st date)
Mon14:00 - 16:0017.06.2024EI 7 Hörsaal - ETIT written12.03.2024 00:00 - 16.06.2024 12:00TISSExam (2nd date)

Course registration

Begin End Deregistration end
30.01.2021 00:00 10.03.2021 23:59 29.03.2021 23:59

Curricula

Study CodeObligationSemesterPrecon.Info
066 645 Data Science Not specified
066 926 Business Informatics Mandatory elective
066 932 Visual Computing Mandatory elective
066 935 Media and Human-Centered Computing Not specified
066 937 Software Engineering & Internet Computing Mandatory elective

Literature

No lecture notes are available.

Preceding courses

Miscellaneous

Language

English