General Functions - Maple Circuit

# "New" mounting API Adds mostly better logging and the possibility for upgrades. [Finishing the conversion to the "new" mount API [LWN.net]](https://lwn.net/Articles/979166/) # Lockdep Lockdep (short for Lock Dependency) is a debugging tool built into the Linux kernel. It's designed to detect potential deadlocks that could occur due to inconsistent lock ordering. If Lockdep detects a circular dependency in lock acquisition order, it reports this as a potential deadlock situation. source for the big brain [Runtime locking correctness validator — The Linux Kernel documentation](https://www.kernel.org/doc/html/latest/locking/lockdep-design.html) # Atomic write Atomic writes refer to the ability of a system call to write a contiguous block of data to a file without interruption[Understanding Atomic Writes with Syscalls in Linux (onexception.dev)](https://onexception.dev/news/1306560/atomic-writes-in-linux-syscalls) # Block Device /dev/sda is a block device, meaning a device controlled by block. the block subsystem take care of the talking between the device and the code (often FS code) PS block device's block size cannot be bigger than page size (normally 4K)[Block Device Drivers — The Linux Kernel documentation (linux-kernel-labs.github.io)](https://linux-kernel-labs.github.io/refs/heads/master/labs/block_device_drivers.html) # Secure VM Service Module (SVSM) **AMD ONLY** uses function like those included by [[AMD/Terms#Secure Encrypted Virtualization (SEV)|SEV]] . As the host cannot be trusted SVSM is a pipe between Firmware and guest VM that bypass host. [The Linux SVSM project [LWN.net]](https://lwn.net/Articles/921266/) [Secure VM Service Module for SEV-SNP Guests (amd.com)](https://www.amd.com/content/dam/amd/en/documents/epyc-technical-docs/specifications/58019.pdf) # Fast CPPC Fast Collaborative Processor Performance Control is a driver that permits optimization of the frequency on a per core basis to get more perf from the same amount of power. [AMD Fast CPPC To Be Merged For Linux 6.11 - Phoronix](https://www.phoronix.com/news/AMD-Fast-CPPC-For-Linux-6.11) # SLAB SLAB allocators are a type memory allocators use for smaller than page objects that aims to: - reduce fragmentation - optimize through caching common object - align with Cache CPU cache [Slab Allocator (kernel.org)](https://www.kernel.org/doc/gorman/html/understand/understand011.html) once upon a time there was 3 SLAB alloc in linux: - SLOB, Dropped in 6.4 - SLAB, Dropped in 6.8 - SLUB the last one remaining [What's next for the SLUB allocator [LWN.net]](https://lwn.net/Articles/974138/) # FineIBT Fine Indirect Branch Tracking is an hardware enhanced version of [[General Functions#Control Flow Integrity (CFI)|CFI]]. Works with Intel CPU and is called Branch Target Identification (BTI) for ARM64 [Indirect branch tracking - Wikipedia](https://en.wikipedia.org/wiki/Indirect_branch_tracking) Enabled by default in 6.2 [Linux Moving Ahead With Enabling Kernel IBT By Default - Phoronix](https://www.phoronix.com/news/Linux-IBT-By-Default-Tip) # Control Flow Integrity (CFI) Using smart compiler (LLVM) the kernel can be compiled with a smart way(jump table) to confirmed that a function is going/returning to a valid address and not to an infected program. [Control-flow integrity - Wikipedia](https://en.wikipedia.org/wiki/Control-flow_integrity) # Pages One unit of memory for the linux kernel. usefull infos:[Page Tables — The Linux Kernel documentation](https://docs.kernel.org/mm/page_tables.html) # Page writeback # Dirty page If data is written, it is first written to the Page Cache and managed as one of its _dirty pages_. _Dirty_ means that the data is stored in the Page Cache, but needs to be written to the underlying storage device first. Writeback throttling is the act of delaying the write to storage device as to not overload it, enable by default. [Linux Page Cache Basics - Thomas-Krenn-Wiki-en](https://www.thomas-krenn.com/en/wiki/Linux_Page_Cache_Basics) # Page fault A page fault is an exception raised by the memory management unit that happens when a process needs to access data within its address space, it fails to load in the physical memory. Minor page fault: 1. HW error that prevent reading like bad hdd sector, 2. Shared memory has already loaded page into physical memory Major page fault: 1. Attempt to read/write an address that you don't have the permission to read/write [Understanding and troubleshooting page faults and memory swapping: Site24x7](https://www.site24x7.com/learn/linux/page-faults-memory-swapping.html) # Transparent Huge Pages (THP) System to allocate Huge pages automatically, the only other way is to ask the kernel directly for it. STD size for page 4K possible size for Huge Pages: 2M, 4M, 1G (Depend on CPU) Made to optimize the use of [[General Functions#Translation Lookaside Buffer (TLB)|TLB]]. Has been added Linux 2.6.38 [Linux_2_6_38 - Linux Kernel Newbies](https://kernelnewbies.org/Linux_2_6_38#Transparent_huge_pages) mTHP Multi-size THP give the ability to declare huge pages that are: 1. Bigger than a simple page (4K) 2. Smaller than a STD Huge page 3. A power of 2 has been added Linux 6.8 [Linux_6.8 - Linux Kernel Newbies](https://kernelnewbies.org/Linux_6.8#Memory_management) [Transparent Hugepage Support — The Linux Kernel documentation](https://www.kernel.org/doc/html/next/admin-guide/mm/transhuge.html) # Translation Lookaside Buffer (TLB) With this in place **every** memory access requires a lookup (performed directly in the _Memory Management Unit_ (MMU) in the CPU) from virtual to physical address. The page table can become large and is usually a multi-layered data structure. Performing a lookup in this large table for every single memory access would be prohibitively slow. Therefore, the MMU keeps a _Translation Lookaside Buffer_ (TLB), which is essentially a cache of recently-used entries from the page tables. Modern CPUs usually have a multi-level TLB (similar as with data caches), so one can’t simply state a size of “the TLB”. As an example: the top-level data TLB in a Skylake CPU has 64 entries.1 Thus, memory from the 64 last-accessed pages is readily available, all other memory accesses will either fall back to a lower-level TLB cache, or in the worst case have the MMU traverse the large page table structure. 1. [Skylake (server) - Microarchitectures - Intel - WikiChip](https://en.wikichip.org/wiki/intel/microarchitectures/skylake_%28server%29) Look at DTLB (Data TLB) [Allocating Huge Pages on Linux | Lukas Barth (lukas-barth.net)](https://www.lukas-barth.net/blog/linux-allocating-huge-pages/) # Anonymous memory Memory associated with program data and not backed/mapped by a file on disk.[Large folios for anonymous memory [LWN.net]](https://lwn.net/Articles/937239/) # Folios Are a new type in memory management, they are used _only_ in Anonymous pages and FS (file memory) They are: 1. Bigger than pages 2. A power of 2 3. Special construct that make sure you are not hitting a tail page (no clue how that works) [MatthewWilcox/Folios - Linux Kernel Newbies](https://kernelnewbies.org/MatthewWilcox/Folios) [Large folios for anonymous memory [LWN.net]](https://lwn.net/Articles/937239/) has been added Linux 5.16 [Folios merged for 5.16 [LWN.net]](https://lwn.net/Articles/874684/) Most benchmarks of folios put the performance benefit in the 0~10% region.[Folio Improvements For Linux 5.17, Large Folio Patches Posted - Phoronix](https://www.phoronix.com/news/Linux-5.17-Folios) # DAMON Data Access MONitor (DAMON) is a data access monitoring framework for DRAM [DAMON: Data Access MONitor — The Linux Kernel documentation](https://www.kernel.org/doc/html/v5.17/vm/damon/index.html) Ex. of userspace implementation [GitHub - awslabs/damo: DAMON user-space tool](https://github.com/awslabs/damo) # Kernel Memory Sanitizer KMSAN is a dynamic error detector aimed at finding uses of uninitialized values. It is based on compiler instrumentation (CLANG only) KMSAN is not intended for production use [Kernel Memory Sanitizer (KMSAN) — The Linux Kernel documentation](https://www.kernel.org/doc/html/latest/dev-tools/kmsan.html) # Perf Also called perf_events, it can instrument CPU performance counters, tracepoints, kprobes, and uprobes (dynamic tracing). It is capable of lightweight profiling. Performance counters are CPU hardware registers that count hardware events such as instructions executed, cache-misses suffered, or branches mispredicted. They form a basis for profiling applications to trace dynamic control flow and identify hotspots. perf provides rich generalized abstractions over hardware specific capabilities. Among others, it provides per task, per CPU and per-workload counters, sampling on top of these and source code event annotation. [Perf Wiki (kernel.org)](https://perf.wiki.kernel.org/index.php/Main_Page) # Direct Rendering Manager (DRM) DRM is the in between GPU driver and program like X that ensure no process block another from accessing the GPU. also standardize communications with the GPU. [Direct Rendering Manager - Wikipedia](https://en.wikipedia.org/wiki/Direct_Rendering_Manager) # ROCm AMD's code to enable their GPU to: general-purpose computing on graphics processing units (GPGPU), high performance computing (HPC), heterogeneous computing. It offers several programming models: HIP (GPU-kernel-based programming), OpenMP (directive-based programming), and OpenCL. Works with the [[#AMDKFD]] driver # AMDKFD AMD Kernel Fusion Driver is the driver to make [[#ROCm]] and opencl work. in other words, kernel driver for computing on GPU. # Panfrost GPU [[#Direct Rendering Manager (DRM)]] & MESA driver for some Arm [Panfrost — The Mesa 3D Graphics Library latest documentation](https://docs.mesa3d.org/drivers/panfrost.html?highlight=panfrost) # SYSFS SYSFS is a pseudo FS made to expose infos and config of linux subsystem and FS. /sys [sysfs - Wikipedia](https://en.wikipedia.org/wiki/Sysfs) # IDMAPPED Mounts different mounts can expose the same file or directory with different ownership. help when you want to use your home dir on multiple computer of manage a FS that doesn't have permission (like fat & exFAT) with multiple user.[IDMAPPED Mounts Aim For Linux 5.12 - Many New Use-Cases From Containers To Systemd-Homed - Phoronix](https://www.phoronix.com/news/IDMAPPED-Mounts-Linux-5.12) # vDSO The "vDSO" (virtual dynamic shared object) is a small shared library that the kernel automatically maps into the address space of all user-space applications. There are some system calls the kernel provides that user-space code ends up using frequently, to the point that such calls can dominate overall performance. **TLDR,** Kernel mode take more perf to run than exposing lib to all app. [vdso(7) - Linux manual page (man7.org)](https://www.man7.org/linux/man-pages/man7/vdso.7.html) # User Name-space User namespaces isolate security-related identifiers and attributes, in particular, user IDs and group IDs. User namespaces can be nested; that is, each user namespace except the _initial ("root")_ [user_namespaces(7) - Linux manual page (man7.org)](https://www.man7.org/linux/man-pages/man7/user_namespaces.7.html) # VFS Relocated [[VFS]] # NUMA Non-uniform memory access (NUMA) is a computer memory design used in multiprocessing, where the memory access time depends on the memory location relative to the processor. Under NUMA, a processor can access its own local memory faster than non-local memory. **IN LINUX**, Linux divides the system’s hardware resources into multiple software abstractions called “nodes”. Linux maps the nodes onto the physical cells of the hardware platform, abstracting away some of the details for some architectures. As with physical cells, software nodes may contain 0 or more CPUs, memory and/or IO buses. And, again, memory accesses to memory on “closer” nodes–nodes that map to closer cells–will generally experience faster access times and higher effective bandwidth than accesses to more remote cells. **TLDR:** Linux will bundle resources that are closer(faster) together to enhance performance. [What is NUMA? — The Linux Kernel documentation](https://www.kernel.org/doc/html/v4.18/vm/numa.html) # Landlock The goal of Landlock is to enable to restrict ambient rights (e.g. global filesystem or network access) for a set of processes. added 5.13[Landlock: unprivileged access control — The Linux Kernel documentation](https://docs.kernel.org/userspace-api/landlock.html) # Scatter-Gather I/O (S/G) Some applications may need to read or write data to multiple buffers, which are separated in memory. Although this can be done easily enough with multiple calls to `read` and `write`, it is inefficient because there is overhead associated with each kernel call. Instead, many platforms provide special high-speed primitives to perform these _scatter-gather_ operations in a single kernel call.[Scatter-Gather (The GNU C Library)](https://www.gnu.org/software/libc/manual/html_node/Scatter_002dGather.html) # Page Attribute Table (PAT) x86 Page Attribute Table (PAT) allows for setting the memory attribute at the page level granularity. WB|Write-back UC|Uncached WC|Write-combined -> Take multiple small write to mem to burst it later to storage. WT|Write-through UC-|Uncached Minus [13. PAT (Page Attribute Table) — The Linux Kernel documentation](https://www.kernel.org/doc/html/v5.19/x86/pat.html) # IOMMU Input–output memory management unit (IOMMU) is a memory management unit (MMU) connecting a direct-memory-access–capable (DMA-capable) I/O bus to the main memory. Like a traditional MMU, which translates CPU-visible virtual addresses to physical addresses, the IOMMU maps device-visible virtual addresses (also called device addresses or memory mapped I/O addresses in this context) to physical addresses. [Input–output memory management unit - Wikipedia](https://en.wikipedia.org/wiki/Input%E2%80%93output_memory_management_unit) # DeviceTree An operating system used the Device Tree to discover the topology of the hardware at runtime, and thereby support a majority of available hardware without hard coded information. DT is often in the firmware on a board. [Linux and the Devicetree — The Linux Kernel documentation](https://www.kernel.org/doc/html/latest/devicetree/usage-model.html) # KPTI Kernel page-table isolation fixes these leaks(Meltdown) by separating user-space and kernel-space page tables entirely. [Kernel page-table isolation - Wikipedia](https://en.wikipedia.org/wiki/Kernel_page-table_isolation) # BPF [What is eBPF? An Introduction and Deep Dive into the eBPF Technology](https://ebpf.io/what-is-ebpf/) ## Struct_ops plug into kernel code that allows userspace to inject bpf program to run part of the code, the first implementation was a TCP congestion program but it is now a little bit everywhere. [Kernel operations structures in BPF [LWN.net]](https://lwn.net/Articles/811631/) # StackLeak StackLeak is a subsystem made to add more security to kernel memory management. it helps protect against Stack depth overflow (CWE-674), Uninitialized Vars (CWE-457) and Info exposure (CWE-200). VERY SIMPLIFIED the wa y they do it is by making sure that the memory is always clear to a special "poison" value at the end of each syscall and on all uninitialized values. https://youtu.be/5wIniiWSgUc?si=BdbvjGuuuQrXAGl6&t=233 # Sched_ext [[SCHED_EXT]] # Generic CPU Vulnerabilities Reporting The generic CPU vulnerabilities support reports the various vulnerabilities and whether the running system/CPU is affected by the vulnerabilities and if so the mitigation status. This is conveniently exposed under _/sys/devices/system/cpu/vulnerabilities_ across x86/x86_64, ARM, AArch64, and other architectures. added riscV and loongarch in 6.12 [RISC-V Enabling Generic CPU Vulnerabilities Reporting - Phoronix](https://www.phoronix.com/news/RISCV-CPU-Vulnerabilities-sysfs) # Kernel address space layout randomization (KASLR) added in 3.14, enables address space randomization for the Linux kernel image by randomizing where the kernel code is placed at boot time. [Address space layout randomization - Wikipedia](https://en.wikipedia.org/wiki/Address_space_layout_randomization#Kernel_address_space_layout_randomization) # Bus lock A split lock is any atomic operation whose operand crosses two (CPU)cache lines. Since the operand spans two cache lines and the operation must be atomic, the system locks the bus while the CPU accesses the two cache lines. A bus lock is acquired through either split locked access to writeback (WB) memory or any locked access to non-WB memory. This is typically 1000x of cycles slower than an atomic operation within a cache line. It also disrupts performance on other cores and brings the whole system to its knees. [22. Bus lock detection and handling — The Linux Kernel documentation](https://www.kernel.org/doc/html/v5.15/x86/buslock.html) # Error Detection And Correction (EDAC) Subsystem to manage errors with PCI devices and ECC memory # System Management Mode (SMM) System management mode is an execution mode in x86 processors that can only be entered via an [[#System management interrupt (SMI)]]. called Ring -2 or "Black box" as this code is not possible to debug/see while executed. [realtime:documentation:howto:debugging:smi-latency:start [Wiki]](https://wiki.linuxfoundation.org/realtime/documentation/howto/debugging/smi-latency/start) # System management interrupt (SMI) System management interrupts are high priority unmaskable hardware interrupts which cause the CPU to immediately suspend all other activities, including the operating system, and go into a special execution mode called [[#System Management Mode (SMM)]]. Once the system is in SMM, the interrupt is handled by firmware code. [realtime:documentation:howto:debugging:smi-latency:start [Wiki]](https://wiki.linuxfoundation.org/realtime/documentation/howto/debugging/smi-latency/start) # VirtIO Subsystem that allows to passthrough HW to KVM or also allow data sockets [[#VirtIO Vsock]] ## VirtIO Vsock Vsock are data socket meant to replace network socket with better perf, by bypassing iptables, netfilter... all the network stuff that wouldn't be needed in a Host(Hypervisor)-guess context [VSOCK: From Convenience to Performant VirtIO Communication](https://lpc.events/event/17/contributions/1626/attachments/1334/2674/VSOCK_%20From%20Convenience%20to%20Performant%20VirtIO%20Communication.pdf) # Integrity Policy Enforcement (IPE) Integrity Policy Enforcement (IPE) relies on immutable security properties of the system component and is engineered for fixed-function systems like network firewall devices, IoT platforms, etc, that are only ever running certain application-targeted code. **TLDR** only execute what is immutable to remove the possibility of foreign code breaking you system. [Linux 6.12 Landing Integrity Policy Enforcement "IPE" Module - Phoronix](https://www.phoronix.com/news/Linux-6.12-IPE-LSM-Security) # Replay Protected Memory Block (RPMB) RPMB is a several year old specification for having a portion of memory be more secure and accessed via a hidden security key. The RPMB block in eMMC can be used for matters like storing DRM protection keys, OEM security keys, and other information that can't -- for whatever legal or security reasons -- can't be stored via normal storage. RPMB aims to be tamper resistant and requires authentication for reads/writes. [Replay Protected Memory Block "RPMB" Subsystem Submitted For Linux 6.12 - Phoronix](https://www.phoronix.com/news/Linux-6.12-RPMB-MMC) # XZ Embedded Code used to decompress the kernel at boot time [XZ data compression in Linux — The Linux Kernel documentation](https://www.kernel.org/doc/html/next//staging/xz.html) # Protected KVM (pKVM) Arm confidential computing side adds support for booting an ARM64 kernel as a protected guest under Android's Protected KVM "pKVM" hypervisor. **History** Android was always a fragmented mess but turns out the kernel and hypervisor world of android was even worst, every model had a different kernel and may have a hypervisor without any standard. There was some initiatives that were created to fix that, GKI Generic Kernel Image to standardize what a kernel for android should be and how a vendor can add to it. pKVM is the logical extension of that issue, we need to standardize the Hypervisor so lets add one. now there are security implications read >> [Linux 6.12 To Support Arm's Permission Overlay Extension - Phoronix](https://www.phoronix.com/news/Linux-6.12-ARM64-Changes) [KVM for Android [LWN.net]](https://lwn.net/Articles/836693/) # Big Kernel Lock (BKL) removed in 2.6.39 (2011) for finer-grained locking. BKL was a kernel wide lock, in other words only one thread was able to operate in kernel space. [Giant lock - Wikipedia](https://en.wikipedia.org/wiki/Giant_lock)