Operating Systems Design and Implementation

CPSC 436A 2024

In this course, the students will obtain a thorough understanding of the challenges and issues related to the design and implementation of modern multicore operating systems. The students will apply and extend their knowledge in systems, software engineering, project management, and team work.

The course covers the design and implementation of various operating systems concepts such as memory management, scheduling, inter-process communication, inter-core synchronization, protection, device drivers, file systems, and networking. Moreover, the course pays particular attention to the design of system software architectures that differ from the traditional monolithic arrangements of Unix/Linux and Windows.

During the course, the students will work together in small groups to build a fairly complete operating system.

Credit: This course is based on 263-3800-00L @ ETH Zurich taught by Prof. Timothy Roscoe. Additional and huge thanks to Reto Achermann who made delivering this course at UBC possible.

Goals

Teach general operating systems principles, using a real research operating system to illustrate them and by reading the research papers which propose some of the ideas that the particular OS builds on.
Give a broader perspective on operating systems which do not look like Linux, Unix, or Windows.
Provide exposure to the practical experience of working on OS code on real “metal”, including debugging, hardware access, etc. This kind of experience is hard to gain merely from reading books or papers.
Introduce a sense of the complexity of a real OS, rather than simplified teaching OSes often used in more basic courses.

Prerequisites

This course is about the design and implementation of operating systems, and thus requires strong background in computer systems, software engineering, and programming.

Students should have taken an undergraduate systems and software engineering course course. Ideally this means the following:

CPSC 313 (or equivalent) with 85% or above.
CPSC 310 (or equivalent) with 85% or above.
CPSC 317 (or equivalent) with 85% or above.

(See the course catalogue for hard requirements)

Additional requirements that are helpful for successfully completing the course.

Proficiency in C programming is assumed
Knowledge of computer architecture: caches, MMU, interrupts, devices
Not afraid to read assembly code
Navigate technical documentation and manuals
Familiarity with OS concepts: concurrency, asynchrony, process management, memory management, virtual memory, networking, filesystem
Linux development tools: shell, editors, make, gcc, gdb, …
General interest in system software and low-level hacking

Schedule

CPSC 436A requires oral examinations of course work on a regular basis and in-person during the Monday 1-4PM lab period. Before the drop-off deadline of the course, students must complete a scheduling survey with the course staff. Any who are unable to schedule their oral examination time will be denied permission to take the course and have their registration removed. Within the Monday 1-4PM period, we do expect flexibility to accommodate students’ scheduling constraints as long as they have at least one contiguous 30-minute slot available. Most oral examinations will be conducted in groups. It is student responsibility to join a group with a compatible schedule. The oral examinations are a central element of this course, and no accommodation can be made.

The lectures and labs are designed around the class project. They are designed to help the students to succeed in completing each milestone and progress through the construction of their operating system.

Classes

Monday (1pm-4pm ICICS 005) – Tuesday (9am-11am MCML 166) – Thursday (9am–11am MCML 166)

Optional (office hours): Wednesday (2pm ICICS 238) – Friday (1pm ICICS 104)

Fill the group registration form before September 17th.

This may seem like a lot of hours, but unlike a lot of courses, this course focus on hand-on experience. You will spend a significant amount of those hours working as a team or discussing with the instruction team. The activities during the term are divided between the following categories:

Lecture: During a lecture slot, we will cover the necessary background information, present an overview of the up-coming milestone, and provide hints and tips for the next project milestone. The students are expected to have at least looked at the next milestone. The students are encouraged to ask questions regarding the project during lectures.

Presence is strongly encouraged.

Lab: This is a office-hours like lab session. Course staff will be present to answer project related questions. During this slot, the students are expected to meet with their team and work on the project. Some groups will be asked to present the progress they have made on the current milestone.

Presence is encouraged. Presence is mandatory if your group is presenting. Failure to show up without giving notice will result in a zero in the associated assignment.

Tutorial: This is similar to the Lab sessions; but we’ll have a short tutorial with some additional information that may be useful for the project. The remainder of the session will be available to work on the project and ask questions.

Presence is encouraged.

Milestones demonstrations: During this slot the students will present and explain their milestone solutions to the course staff.

Presence is mandatory. Failure to show up without giving notice will result in a zero in the associated assignment.

Office Hours: During this slot you can ask questions about the course and/or your project.

Presence is optional.

Quizzes: Short knowledge test organized in the CBTF rooms.

Presence is mandatory. Failure to show up without giving notice will result in a zero in the associated assignment.

		Monday	Tuesday	Wednesday	Thursday	Friday
1	02 Sept				Lecture: Introduction
2	09 Sept (Reserve CBTF)	Lab: Environment Setup	Tutorial: Barrelfish and Tools		Lecture: Capabilities
3	16 Sept	M0 - Presentation Drop deadline	Tutorial: Capability Operations Group Formation Deadline	Office hours	Lecture: Memory Management	Office hours
4	23 Sept (Quizz 1)	Lab	Tutorial Debugging and Autograder	Office hours	Lecture: Virtual Memory/Paging I	Office hours
5	30 Sept		Tutorial: Heap	Office hours	Lecture: Virtual Memory/Paging II	Office hours
6	07 Oct	M1 - Presentation	Lab	Office hours	Lecture: Process I	Office hours
7	14 Oct (Reserve CBTF)		Lab	Office hours	Lecture: Process II	Office hours Self-reflection report
8	21 Oct	M2 - Presentation	Lecture: Process III	Office hours	Lecture: IPC	Office hours
9	28 Oct (Quizz 2)	Lab	Tutorial: IPC/LRPC	Office hours	Lab	Office hours
10	04 Nov (Reserve CBTF)	M3 - Presentation	Tutorial: LMP	Office hours	Lab	Office hours
11	11 Nov				Lecture: Multicore	Office hours Mid-term report deadline
12	18 Nov (Quizz 3)	M4 - Presentation	Tutorial: Booting Core	Office hours	Lecture: User-level Message Passing	Office hours
13	25 Nov	M5 - Presentation	Tutorial: UMP	Office hours	Lecture: Research Talk	Office hours
14	02 Dec	M6 - Presentation	Student presentations	Office hours	Student Presentations	Final report deadline Final self-reflection report

Presenters

Hugo Lefeuvre

Postdoc

Praveen Gupta

Master Student

Rut Vora

Master Student

Shaurya Patel

PhD Student

Sid Agrawal

PhD Student

Yayu Wang

PhD Student

Graduate Students Presentations

Cache Side-Channel Attacks on Language Runtimes

Presenter: Yayu Wang

Abstract: Modern high-performance processors use a hierarchical memory system with multiple levels of caches, which store instructions and data for running applications and can be either private to or shared among different processors. Such sharing of the cache across different cores leads to a type of attack known as cache side-channel attacks, where the adversary monitors the victim’s usage of the shared cache to infer sensitive information, such as encryption keys.

Last-level cache side-channel attacks exploit the shared nature of the cache across multiple processors, and the requirement for hardware sharing makes cloud environments an ideal setting for such attacks. Many cache side-channel attacks assume a cloud environment, and focus on cryptography libraries compiled into native binary runtime. Language runtimes, despite being the most widely used runtimes in cloud platforms, have rarely been studied as potential targets for cache side-channel attacks. We present a novel attack vector that utilizes language runtimes as an intermediary to infer confidential data of the victim program. Our attack can overcome the noise introduced by the interpretation of the language runtime, and achieve bytecode-level observation. We evaluate our attack on a lightweight JavaScript engine QuickJS, and we show that even a timing-balanced implementation of RSA algorithm is vulnerable to our attack.

OSmosis: Modeling and Building Flexible Isolation Mechanisms

Presenter: Sid Agrawal

Abstract: Operating systems provide an abstraction layer between the hardware and higher-level software. any abstractions, such as threads, processes, containers, and virtual machines, are mechanisms to provide isolation. New application scenarios frequently introduce new isolation mechanisms. Implementing each isolation mechanism as an independent abstraction makes it difficult to reason about the state and resources shared among different tasks, leading to security vulnerabilities and performance interference.

We present OSmosis, an isolation model that expresses the precise level of resource sharing, a framework in which to implement isolation mechanisms based on the model, and an implementation of the framework on seL4. The OSmosis model lets the user determine the degree of isolation guarantee that they need from the system. This determination empowers developers to make informed decisions about isolation and performance trade-offs, and the framework enables them to create mechanisms with the desired degree of isolation.

Towards safer and faster systems with compartmentalization and specialization

Presenter: Hugo Lefeuvre

Abstract: This short talk will give an overview of my research. At a high level, I aim to make systems - broadly construed as we like to say in Systopia - safer and faster. To that aim, I research two fundamental techniques: 1) compartmentalization, where we split programs into isolated and distrusting components to constrain potential attackers; and 2) specialization, where we optimize the design of systems software for a given application use-case or metric. Often mingling the two topics, I am interested in theoretical as well as engineering-heavy aspects. In this presentation, I will quickly go over my work on unikernels (Unikraft, EuroSys'21), flexible OS isolation (FlexOS, ASPLOS'22), interface vulnerabilities (NDSS'23), OS compatibility layer development (Loupe, ASPLOS'24), and systematizing compartmentalization (SoK, S&P'25). I am looking forward to a discussion with the audience.

Side Channels in CXL Memory Pooling Solutions

Presenter: Rut Vora

Abstract: Compute eXpress Link (or CXL) is a new interconnect standard explicitly designed for heterogeneous computing. CXL 3.0 standard allows for upto 4096 hosts to connect to and access a part of a shared memory pool (CXL 3.0 Type-3 device). This would inevitably lead to contention at any of the common communication points (e.g. CXL switches, CXL controllers) leading to the CXL device. This contention can be used for a side-channel attack. Despite the CXL spec being in its 3rd major revision, no CXL devices are yet in the commercial market. However, the contention (and hence the side-channel attack) would also exist in the PCIe controller.

In this work, we develop a side channel using contention on the PCIe interconnect between a CPU and a GPU. We utilise MMIO on the GPU VRAM to have a behaviour similar to a CXL Type-3 (Memory Pool) device. We demonstrate that we can observe the contention on PCIe when two CPU cores independently write to (different areas of) the GPU memory. We outline a (work in progress) side-channel attack that leverages this contention to exfiltrate a victim process’s memory access or data transfer pattern. We also outline how this exfiltrated memory access pattern can be used to obtain the layer size of individual layers in a large neural network model.

A Developer-Centric Compliance Tool for Serverless Applications

Presenter: Praveen Gupta

Abstract: Serverless computing has emerged as a new paradigm that offers developers a streamlined approach to building and deploying cloud-native applications. These applications are characterized by ephemeral, stateless functions written in heterogeneous programming languages and relying on diverse cloud services for storage and communication. Although serverless computing reduces the burden of managing and scaling the infrastructure for cloud tenants, it makes it challenging to protect the application data from inadvertent leaks due to bugs, misconfigurations, and human errors. Existing cloud security tools, such as Identity and Access Management (IAM), lack observability into application-level data flows, while state- of-the-art dataflow tracking tools often require extensive platform modifications and impose substantial runtime overheads.

This work presents Growlithe, a developer-centric tool for serverless applica- tions to enable continuous compliance with data policies by design. Growlithe allows declarative specification of access and data flow control policies over a language- and platform-independent dataflow graph abstraction of a serverless appli- cation. Growlithe enforces these policies efficiently using a hybrid approach which combines static checks with deferred runtime checks when necessary.

ExtMem: Enabling Application-Aware Virtual Memory Management for Data-Intensive Applications

Author: Shaurya Patel

Abstract: For over forty years, researchers have demonstrated that operating system memory managers often fall short in supporting memory-hungry applications. The problem is even more critical today, with disaggregated memory and new memory technologies and in the presence of tera-scale machine learning models, large-scale graph processing, and other memory-intensive applications. Past attempts to provide application-specific memory management either required significant in-kernel changes or suffered from high overhead. We present ExtMem, a flexible framework for providing application-specific memory management. It differs from prior solutions in three ways: (1) It is compatible with today’s Linux deployments, (2) it is a general-purpose substrate for addressing various memory and storage backends, and (3) it is performant in multithreaded environments. ExtMem allows for easy and rapid prototyping of new memory management algorithms, easy collection of memory patterns and statistics, and immediate deployment of isolated custom memory management.

Grading

Individual Grades (35%)

15% Quizzes. The grade for the three quizzes.

5% Milestone 0 Demonstration. These are the points awarded during the individual milestone presentation. The students are expected to demonstrate the required milestone functionality, and explain their design and implementation.

5% Early self-reflection. You will reflect on what you expect from the course, what you want to learn and how you want to grow.

10% Final self-reflection You will look back at what happened during the class, how things diverged from your expectations and more importantly why.

Group Grades (65%)

25% Milestone Demonstration. These are the points awarded during the group milestone presentations. The students are expected to demonstrate the required milestone functionality, and explain their design and implementation.

5% Intermediate Group Report. This includes the quality and completeness of the intermediate group report.

10% Final Group Report. This includes the quality and completeness of the final group report.

20% Integration Test. These are the results of the integration tests run on the final submission.

5% Class Presentation. How well the group did during their presentation(s).

Further Information

Extra Challenges. Milestones come with extra challenges that award the student extra points and additionally provides the basis for a good reference letter. The completion of the extra challenges is not required for obtaining a maximum grade. To receive the extra points, the extra challenge must be presented before the normal deadline. No points will be awarded for completed extra challenges on late submissions, or if the base or target tasks have not been met.

All milestones must be completed.

This means presenting the stated base functionality for each of the milestones. This is extremely important as the milestones are cumulative. The base functionality are the minimum required to implement subsequent milestones.

Late submissions. Generally, submissions must be completed by the stated deadline. Late submissions will be graded at 75% (one week delay) and 50% (two weeks delay) of the base points respectively. To help the students prioritize the tasks, the course materials indicate the base functionality for each milestone. Due to the schedule and to ensure enough time for grading, there is no late hand-in for milestone M0 and the final submission.

Joker Cards. Life happens, and sometimes it’s good to get a little extra time.

Each group will receive two joker cards that extend the deadline of group milestones – no question asked. The instructor will set the date for the postponed demonstration. There are no jokercards for the individual milestone at the beginning of the course.
The final hand-in deadline is a hard deadline and cannot be extended

Joker cards must be redeemed the day before the deadline as posted on the schedule.

Why is there so many deadlines?

The goal is to cut the overall project into digestible chunks. The course staff can rapidly identify struggling students and provide adequate support. Furthermore, each individual submission is relatively low stake and you can easily course correct when performing bellow your expectations. While this course is challenging and have a significant workload, you are extremely unlikely to fail if you put in the effort.

Use of third-party libraries, code and tools

This is not an algorithms course, or a coding interview. There won’t be any points awarded for correctly implementing a self-balancing tree, or a solution based on dynamic programming. The students are allowed to make reasonable use of third-party libraries.

The use of third-party code need to be approved.

The use of third-party code (e.g., libraries, data structures, …) must be approved by the instructor and properly cited in the report. Standard plagiarism rules apply.

You may use third-party code in the project, or discuss with other students in the class. However, you must attribute and cite any third-party code that you use in your project. Moreover, you must implement a non-trivial fraction of the code on your own (or as a team). Using code from other students (teams) is not accepted. In any case, all members of the groups must be able to explain the design and implementation.

The students agree not to publish their code on the Internet, or distribute it in other ways.

Generative AI. The course aims to keep up with a changing landscape and generative AI (e.g., ChatGPT or copilot) is likely to become one of the many tools available to developers. As such the use of such tools is not prohibited. However, you should be able to understand all code you submit and be able to explain it. If you use generative AI during your project, you should explain clearly how you used it (e.g., prompt and any other setup) as well as the strategy you adopted to ensure the correctness of the generated code.

Please, be aware that such tool are unlikely to lead to good results.

I would be happy to be proved wrong, especially if you report details how you got it to work well. However, please do not expect to use those tools at the last minute and to get your deliverable miraculously done. Getting such tools to work in the context of this project is likely to require an equal amount of effort as doing the actual work would have taken.

How to do well

Here we summarize some advice that may be useful to successfully complete the project.

Lectures

To get most out of the lectures, it is highly recommended reading the next chapter in the book before the class. This enables an engaged discussion during class and to help you ask relevant questions.

The lectures are also your chance to ask questions about the next milestone early on in the week. So it’s good to think about the upcoming tasks before the lecture.

Project

Think before you code. Think about the functionality that need to be implemented. Write down your architecture and keep design notes (this is helpful for your presentations and reports). This is especially important, especially when you work as a team.

Read ahead. Each milestone builds upon the functionality of the previous one. It is a good idea to read a bit ahead to get an overview of the entire project.

Read the milestone descriptions carefully.

Are there any hints and explanations? (your answer should be: yes, there is)

What is the interface to be implemented?

What are the required functionalities?

You are significantly more likely to succeed if you read the instructions, follow the suggested task order, and identify the hints you are given.

Write tests. This is one way to make sure your code works as intended. This is especially important, as later milestones build upon earlier ones. Also, those tests are a fantastic way to demonstrate your milestones.

Git. Try to use version control wisely. Use meaningful and descriptive commit messages, and clean patches to help your teammates understand what you have done. Be strategic in how you use branches, we have seen students constructing diverging branches that soon became impossible to merge. A significant portion of the difficulty inherent to this course is about working effectively with others.

Milestone Presentations

Each milestone will be demonstrated to one of the course staff. This will take about 15mins. The students will need to show that they have implemented all required functionality, and be prepared to answer question about their implementation. Any member of a group should be able to explain what is going on.

Reports

Write the report as you go, or at least keep notes. You will need to submit 1-2 pages of notes on their design decisions, implementation details, performance evaluation, etc. with every group milestones.

Presentation

Make it entertaining and interesting, do a demo. What has worked well, what have you learned. What turned out to be a bad decision? The class should be a supportive environment. You are going through the same (difficult?!) experience and it is a good opportunity to share! This class is not a zero-sum game, grades are unscaled, if everyone learns and finishes the class with an A+, it will be a resounding success.

Reading Material

Source code and other materials (the course book, manuals, specifications, etc. ) will be distributed through a git repository. The students will use the provided git repository to submit their code and reports.

What do they say about the course?

Feedback from 2023 UBC students

This course is extremely interesting, and allows us to actually apply concepts we’ve learned to something practical, which I believe will stick with me for much longer than the standard lecture/exam format of other courses. I also enjoy that we get exposed to leading edge research in systems, rather than learning about ancient systems that will have to be phased out in the future.

Practical application of previous courses. Learn how to approach design problems and evaluate the advantages and disadvantages of designs.

By far the most I’ve learned in any one class at UBC. Pushed me to my limits and made me learn a lot about myself.

Out of all my courses at UBC I think this is the one I learned the most in! The focus on design really made me think carefully about the consequences of operating system decisions in a much more effective way than a lecture–focused class would have. I also liked that this course includes grading not just for how well the code works, but also for having an insightful analysis in the report/presentation. While other courses have gone almost entirely to autograded code assessments where code quality is not a priority, this course emphasized the importance of readable and maintainable code as my group had to often look back at the previous milestone work that we did. I think that this course reflects much more of a difficult real–world experience where many things are unclear and certain decisions may not necessarily be the best for future milestones.

Very practical course which is different to almost all other cpsc courses. gives a different perspective to what is taught in 313/213

The content is amazing – no other course at UBC like this. Its a struggle, but satisfying when you manage to figure things out. The difficulty is a good thing, this class should not be made easier at all.

Amazingly practical and useful content. Lots of office hours and tutorials to ask questions. Approachable prof and teaching assistants.

It has been a great pleasure taking this course with the professor. The content was amazing and the hands–on components of this class has been a great help at learning how operating systems work in modern contexts. The professor is very approachable and has open to questions during office hours. I would definitely take this course again.

Feedback from 2022 UBC students

It’s a self learning style course. You learn a ton by jumping into the deep end. The beginning of the course is extremely painful but gets better as time goes on.

This is one of the most interesting courses that I have taken. It has a very unique approach that allows you to think through new solutions. I am very happy that this course was added and it’s a lot better than the old OS Course.

The project, its complexities, and challenges were a highlight. While many students struggled, and many of us complained, I think we’ve grown so much as engineers and computer scientists.

I really liked the approach of this course since it is very different from what we have experienced in other classes at UBC, it teaches us what it takes to do real design and implementation of a fairly large system without just being told what to do at every stage. As a 4th year course, it also required us to use our knowledge from other courses which I found to be a great strength, especially when many other CS courses seem very disjoint.

I did have completed 310 and 313 but I still feel that I am not well–prepared for this course. To be honest, the workload for this course is much larger than other courses I have taken.

Best CS course I’ve taken! With the vast majority of my courses, I come out feeling unsatisfied, like the assignments are trivial toy problems, that I didn’t accomplish or learn much, or that I could’ve read a textbook and picked it up on my own in a few weeks. But with 436A I feel like I learned a ridiculous amount in just a couple of months and it made me feel like doing a degree in CS was worth it, and that this was an experience I wouldn’t have gotten by messing around with computers in my free time. Overall, I was really happy with the class :)

It’s too much work, it’s about 3 course load worth of work. I spend at most 4 hours finishing assignments from CPSC 314 and CPSC 322 every other week. For this course, I’m spending 30+ hours per week (all tracked via wakatime extension).

This is definitely the most difficult course I’ve taken, the project is very time consuming and the concepts are difficult. The course is centered around a project which forces you to experiment and focus on design rather than just following a recipe. As a result, you are rewarded with a much deeper understanding of OS concepts that were introduced in CPSC 313.

The course was paced quite aggressively. For most milestones I found that it took me about half the allotted time for the content to simply sink in and to figure out what exactly I was supposed to do. By that point, there was very little time left so my code was often messier than I would have liked.

Weekly feedback and answered questions well without giving away answers so that we could work through the problems ourselves. Probed us with questions to guide us in the right direction. Also provided good lectures.

At Monad I got to learn and work with Rust. I built their RPC server, which CPSC 436A was really handy for ;-)

Past Grade Distribution

Operating Systems Design and Implementation

Goals

Prerequisites

Schedule

Classes

Instructors

Instructor

Teaching Assistants

Teaching Assistant

Teaching Assistant

Presenters

Postdoc

Master Student

Master Student

PhD Student

PhD Student

PhD Student

Graduate Students Presentations

Cache Side-Channel Attacks on Language Runtimes

OSmosis: Modeling and Building Flexible Isolation Mechanisms

Towards safer and faster systems with compartmentalization and specialization

Side Channels in CXL Memory Pooling Solutions

A Developer-Centric Compliance Tool for Serverless Applications

ExtMem: Enabling Application-Aware Virtual Memory Management for Data-Intensive Applications

Grading

Individual Grades (35%)

Group Grades (65%)

Further Information

All milestones must be completed.

Why is there so many deadlines?

Use of third-party libraries, code and tools

The use of third-party code need to be approved.

Code sharing.

Please, be aware that such tool are unlikely to lead to good results.

How to do well

Lectures

Project

Read the milestone descriptions carefully.

Milestone Presentations

Reports

Presentation

Reading Material

What do they say about the course?

Feedback from 2023 UBC students

Feedback from 2022 UBC students