Overview: This deep-dive explains how operating systems and virtualization work from the hardware level upward. It covers how a CPU executes processes, how memory is virtualized using paging, and how user mode and kernel mode enforce protection and isolation. It extends these concepts to virtualization, showing how hypervisors create virtual machines by multiplexing CPU, memory, and I/O using hardware support such as Intel VT-x and AMD-V.

1. OS Recap: What is a Process?

Definition (core idea): A process is an executing instance of a program: it bundles an address space (code + data + stack), CPU context (registers, PC), OS resources (open files, sockets), and metadata (PID, owner, scheduling priority).

“Virtualization creates virtual processes inside virtualized operating systems. So understanding process structure is necessary to understand how a VM works.”
“A process is not a program. A program is a passive file on disk. A process is a running, executing, resource-owning entity created by the OS.”

CPU Reality: CPU can only execute machine instructions. Everything else (files, windows, networks) is just abstractions.

Process Equation: Process = Code + Data + Heap + Stack + CPU context + OS resources.
Executable: Machine code.
OS: Allocates memory image, creates PCB, loads program, manages execution.
Virtualization: Creates a fake machine on which a real OS creates real processes.
Understanding process internals is the foundation of hypervisors, VMs, containers, and cloud systems.

Key Structures & Mechanisms

  • Process Control Block (PCB): PID, registers, program counter, state, open file descriptors, memory map, parent/children, credentials.
  • Address Space Layout: text (code), data (heap), BSS, stack, mapped files.
  • States: New → Ready → Running → Waiting (I/O) → Terminated.
  • Context Switch: OS saves current PCB (registers, PC) and loads destination PCB. Cost: register save/restore, TLB flushs (maybe), cache effects.
  • Fork/Exec (UNIX): fork() duplicates process (copy-on-write optimizations), exec() replaces address space.

Analogy: The Virtual Office

Virtual Memory (The Desk): Imagine a worker (Process) with a desk that seems infinitely long and clean. They just reach for items. This is the “Virtual Address Space”.

Physical Memory (The Warehouse): In reality, items are stored in boxes in a chaotic warehouse. The worker doesn’t see this complexity.

Page Table (The Index Book): Maps “Item on Desk” to “Box #42 in Warehouse”.

TLB (Sticky Notes): Quick notes for frequently used items to avoid looking up the big index book every time.

Page Fault: If an item isn’t on the desk or nearby, work stops (Fault) until support staff brings it from long-term storage.


2. Types of Hypervisors

Type 1 Hypervisor (Bare-Metal)

Runs directly on physical hardware without an intermediate host operating system. It provides a minimal, high-performance control layer.

  • Core Characteristics: Installed on bare-metal, no host OS, optimized for security, scalability, and performance.
  • Common Implementations: VMware ESXi, Microsoft Hyper-V, KVM, Xen.
  • Use Cases: Enterprise data centers, Public Cloud (AWS Nitro, Azure, GCP).
  • Why Cloud? Near bare-metal performance, strong tenant isolation, support for hardware virtualization extensions (Intel VT-x, AMD-V).
Type 1 Hypervisor Architecture
Figure 1: Type 1 Hypervisor running directly on hardware.

Type 2 Hypervisor (Hosted)

Runs as an application on top of an existing host OS (like Windows or macOS). Relies on the host for hardware management.

  • Core Characteristics: Installed on top of Host OS, hardware access mediated by host, optimized for convenience.
  • Common Implementations: Oracle VirtualBox, VMware Workstation, Parallels Desktop.
  • Use Cases: Local development, learning/experimentation, application compatibility.

3. Virtual Memory & Paging

Virtual memory provides each process with a private, contiguous address space while the OS/MMU transparently maps virtual addresses to physical memory using paging structures.

  • Pages: Memory is divided into fixed-size blocks (usually 4KB).
  • Page Tables: Hierarchical data structures map virtual addresses to physical frames.
  • TLB (Translation Lookaside Buffer): Hardware cache for fast address translation.
  • Page Faults: Occur when accessing unmapped memory, handled by kernel (allocating or loading from disk).
Virtual and Physical Address
Figure 2: Mapping Virtual to Physical Addresses.
Address Translation with Paging
Figure 3: Address Translation Mechanism.
Paging Detail
Demand Paging
Figure 4: Demand Paging Workflow.

Virtualization Considerations

  • Two-Level Address Translation: Guest Virtual Address (GVA) → Guest Physical Address (GPA) → Host Physical Address (HPA).
  • Hardware-Assisted Nested Paging: EPT (Intel) and NPT (AMD) allow the MMU to perform both translation stages in hardware, improving performance.

4. User Mode vs. Kernel Mode

Modern CPUs enforce multiple privilege levels (rings) to separate user applications from the OS kernel.

  • User Mode (Ring 3): Unprivileged. Restricted instruction set. No direct hardware access.
  • Kernel Mode (Ring 0): Privileged. Full access to hardware, memory, and system resources.

The Trap: Transitions from user to kernel mode happen via System Calls, Faults, or Interrupts. The CPU switches privilege levels, jumps to the OS handler, and then returns.

OS Code Location
Figure 5: User Mode vs Kernel Mode Code Location.

5. Interrupt and Trap Handling in Virtualized Systems

Interrupts (Asynchronous): Hardware events (timer, network) independent of the current instruction.
Traps (Synchronous): Caused by the instruction itself (system call, page fault, divide by zero).

Trap Handling Process
Figure 6: What happens upon a trap.
Trap Handler
Figure 7: Trap Handler Execution.

Virtualization-Specific Design

  • Trap-and-Emulate: Privileged instructions in guest cause “VM Exits” to the hypervisor. Expensive.
  • Interrupt Virtualization: Hypervisor injects virtual interrupts into the Guest VM.
  • Paravirtualization: Guest OS is modified to use “Hypercalls” instead of sensitive instructions, reducing overhead.
Segmentation
Figure 8: Segmentation.
Use of Segmentation
Figure 9: Use of Segmentation.

Analogy: Interrupts & Traps

Interrupt (Phone Ringing): You are working. The phone rings (external event). You pause, place a bookmark, answer, then resume.

Trap (Mistake/Help): You make a calculation error or need approval. You stop work and go to the supervisor (Kernel) to fix it.

Segmentation Today
Figure 10: Segmentation in modern systems.
Segmentation Detail

6. The Computing Stack & VMM

Computing Stack
Figure 11: The Computing Stack.

The Stack

  • API: Interfaces apps to libs/OS.
  • ABI: Separates OS layer from apps (System Calls).
  • ISA: Instruction Set Architecture (Hardware interface).
Interfaces
Figure 12: Interfaces (API, ABI, ISA).
Security Rings
Figure 13: Security Rings and Privilege Modes.

VMM (Virtual Machine Monitor) & Popek Goldberg Theorem

Theorem: To build a VMM efficiently via trap-and-emulate, sensitive instructions must be a subset of privileged instructions.

x86 Issue: Historically, x86 did not satisfy this (some sensitive instructions didn’t trap), making pure trap-and-emulate impossible.

Techniques to Virtualize x86

  • Paravirtualization (e.g., Xen): Rewrite Guest OS to use Hypercalls. High performance, requires modified OS.
  • Full Virtualization (e.g., VMware): Binary translation of non-virtualizable instructions. Works with unmodified OS but higher overhead.
  • Hardware Assisted (e.g., KVM/QEMU): CPU adds VMX mode (Root/Non-Root). Guest runs in Non-Root Ring 0. Hypervisor runs in Root Mode.
Hardware Assisted Virtualization
Figure 14: Hardware Assisted Virtualization (VMX).
Memory Virtualization
Figure 15: Memory Virtualization techniques.

7. Popular Virtualization Technologies Comparison

Technology Type Primary Use Unique Features
VMware ESXi Type 1 Enterprise Mature, vMotion, HA.
Microsoft Hyper-V Type 1 Azure/Windows Deep Windows integration.
AWS Nitro Type 1 (Light) AWS EC2 Offloads virtualization to hardware cards.
KVM Type 1 (Kernel) Linux/Cloud Built into Linux, open-source.
Docker Container App Deployment Shares host kernel, fast startup.
Kubernetes Orchestrator Cloud Native Automated scaling/healing.
Firecracker MicroVM Serverless Secure like VM, fast like container.

References & Further Reading