20802125 - BIG DATA

The goal of the course is to illustrate the modern solutions to the management of big data, very large repositories of de-structured data. Starting from the requirements of modern database applications, the course will illustrate the hardware and software architectures that have been recently proposed for the management and analysis of big data. The topics addressed in the course will include: cluster architectures, map-reduce paradigm, cloud computing, NoSQL systems, tools and languages for data analysis. Both theoretical and practical aspects will be addressed and the discussed technologies will be experimented during practical classes and through the assignment of projects.
teacher profile | teaching materials

Programme

- Infrastructures and programming paradigms for big data
- The Hadoop Ecosystem
- Cloud computing
- Big data processing (MapReduce, Hive, Spark)
- NoSQL systems
- Big data analytics
- Data lakes
- Systems and applications
- Business seminars

Core Documentation

Martin J. Fowler, PramodkumarJ. Sadalage. "NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence".
Teacher slides (available on the Web side of the course)

Type of delivery of the course

Lectures Exercises Seminars Teamwork Case studies

Type of evaluation

Student assessment is made with a written test at the end of the course and with projects assigned during the course to be carried out in groups of two. The test consists of a big data problem aimed at verifying the level of understanding of the concepts and the ability of students to apply them in real contexts.