Master's level
- CD122A Databases (7.5 credits)
- CD631E Artificial Intelligence for Data Science (15 credits)
- CM661E Exploratory Data Analysis, Visualization and Storytelling (7.5 credits)
CTDVA / Computer Science
A1F / Second cycle, has second-cycle course/s as entry requirements
The course is part of the degree requirements for a Master of Science in Engineering in Computer Science and Engineering (specialisation Applied Data Science)
The aim of this course is for the student to develop in in-depth understanding of big data analytics on cloud computing infrastructures, and how software is made available in cloud services. In group projects, the student will also develop their ability to handle big data processing using tools such as Apache Spark.
The course contains the following elements:
- Ecosystem for big data processing
- Large-scale data storage (including cloud file systems, cloud object stores, archival storage)
- Data analytics with Apache Spark
- Spark’s programming model with RDD
- Spark applications with Hadoop/AWS
- Spark SQL
- Alternatives to SQL-based databases for big data
- Streaming with Spark
- Machine learning with Spark MLlib
- Advanced real-world applications with Spark
Knowledge and understanding
Upon completion of the course, the student shall be able to:
1. demonstrate in writing an in-depth understanding of the data flow programming model for distributed computations for Big Data applications,
2. distinguish between traditional and large-scale database management systems, and
3. describe components and programming models used in building big data analysis systems.
Competence and skills
Upon completion of the course, the student shall be able to:
4. use cloud-based platforms and implement techniques for large-scale data management,
5. analyse large-scale data management problems and construct data-driven models,
6. integrate trained models with cloud-based services,
7. develop application of big data analytics by working in groups, and
8. verbally and in writing present work within Big Data Analytics on Cloud Computing Infrastructures.
Judgement and approach
Upon completion of the course, the student shall be able to:
9. assess the characteristics of large-scale data frameworks and determine when such frameworks are applicable or not.
Lectures, computer laboratories, seminars, project work and self-study
The following are required to pass the course
- passing grade on report and oral presentation in group project (7 credits, Pass/Fail) (Intended learning outcomes 4–9)
- passing grades on lab session assignments (3 credits, Pass/Fail) (Intended learning outcomes 4, 6)
- passing grade on written examination (5 credits, UA) (Intended learning outcomes 1–3, 9)
For all assessments, the materials must be presented in a manner that makes it possible to discern individual performance.
The final grade corresponds to the grade of the written examination.
- Amirgodshi, S., Rajendran, M., Hall, B. & Mei, S. (2017). Mastering Machine Learning with Apache Spark 2.x. Packt Publishing.
- Teller, A., Pumperla, M. & Malohlava, M. (2015). Advanced Analytics with Spark: Patterns for Learning from Data at Scale. O’Reilly
- A collection of scientific articles will be used in addition to the above literature..
Malmö University provides students who participate in, or who have completed a course, with the opportunity to express their opinions and describe their experiences of the course by completing a course evaluation administered by the University. The University will compile and summarise the results of course evaluations. The University will also inform participants of the results and any decisions relating to measures taken in response to the course evaluations. The results will be made available to the students (HF 1:14).
If a course is no longer offered, or has undergone significant changes, the students must be offered two opportunities for re-examination based on the syllabus that applied at the time of registration, for a period of one year after the changes have been implemented. The syllabus is a translation of a Swedish source text.
If a student has a Learning support decision, the examiner has the right to provide the student with an adapted test, or to allow the student to take the exam in a different format. The syllabus is a translation of a Swedish source text.