virtual memory:concept

address space

linear address space:有序连续非负整数集
virtual address space:set of N=2^n virtual addresses
physical address space:set of M=2^m physical addresses
why VM:
- use dram as a cache for parts of the virtual address space
- 简化内存管理
- 隔离地址空间

vm as a tool

for caching

概念上讲:vm就是n个连续的在磁盘上的字节数组
硬盘上的数组内容缓存在物理内存(dram cache)中
- 这些缓存块称为page(P=2^p),通常比高级的缓存大得多
- 需要一个复杂得多的映射函数
- 总是使用write-back rather than write-through

page miss(就是缓存未命中)cause page fault(an exception)
- 解决:page fault handler 选择victim换掉了之后重新执行指令
- waiting until the miss copied to the physical memory is known as demand paging
看起来很低效但是locality
- 在任何时间程序倾向于访问一个活跃的页集合:working set
- 如果sizeof working set<main memory:性能相当好
- 多线程情况下sum>main memory:thrashing(抖动)页来回切换变差

vm as a tool for memory management

每个进程有自己的虚拟内存,可以将内存视为简单的线性地址空间
可以将不同进程的虚拟内存映射到同一个物理内存(sharing code and data)
链接的时候可以认为代码总是从相同的地址开始
loading:allocate VM for .code and .data,create PTE and set all invalid

for memory protection

extends permission bit :x86-64系统中的高位会用于设置权限(可读,可执行之类的),MMU will check the bit on each process
这实际上就可以防护类似code injection 之类的攻击

adress translation

speed up translation with TLB(translation lookaside buffer)
- PTE(page table entry)在l1缓存中
- small set-associative hardware cache in MMU
然而：一个页表全存起来太大了,solution:multi-level page table
- 可以搞k级页表,第i级页表指向下一级某个页表的表头地址
- 一级表全部映射,数组的访问方式也快

virtual memory:systems

simple memory system example

1.MMU取VPN
2.查看TLB中对应idx位是否缓存命中(查看TLBI位并检测TLBT)
3.若命中,直接返回相应的PPN,否则,从主存中加载对应的PTE
4.链接得到物理地址发送给缓存

5.缓存检测是否命中

corei7

翻译加速小技巧:

设置高速缓存的索引位和偏移位之和与VPO相同
地址翻译的同时就把VPO发给缓存,让缓存去找
virtually indexed,physically taged

linux将虚拟内存组织为区域(area 也叫segment):已存在的虚拟内存连续片
每个虚拟页都在某个区域中

memory mapping

通过将虚拟内存区域与磁盘上的一个对象相关联以初始化该区域

可以被映射到普通文件或者匿名文件(内核创建,全二进制0,一旦被writen，就和别的一样了)
一旦被初始化就在swap files之间换来换去
使用内存映射的角度理解共享和私有区域

私有对象写时复制(copy-on-write):

私有区域的页PTE标记为只读
该区域标记为private copy-on-write
写私有页的指令会触发异常(物理内存上的一个私有区域可能被映射到不同进程的不同位置)
handler会在物理内存中复制页并更新页表条目

fork函数

创建相同的mm_struct,vm_area_struct和页表等
将两个进程中的所有页都标记为只读
所有区域都标志位copy-on-write

user-level memory mapping

void* mmap(void* start,int len,int prot,int flags,int fd,int offset)

memory allocation(basic)

basic concept

use dynamic memory allocator to access virtual memory
the allocator manage the vm of a process known as heap
explicit allocator(malloc and free),implicit(garbage collector)
some constraints:
- allocator must response immediately
- can’t touch the memory once allocated
- must be aligned(8 bytes on x86 or 16 bytes on x86-64)
the goal(performence):maximize the throughput and the peak utilization(经典时间和空间的权衡)
fragmentation:
- internal:payload < blocksize(cause:overhead of maintaining datastructure,align)
- external:when the whole size is enough but no single block satisfied
implementation
- knowing how much to free:多分配一小块存该空间的大小(header)

implicit free list

对齐:只要有效载荷的第一位是x的倍数就行,整个块的前面可以放信息位
find a free block:first fit(从头往后找第一个),next fit(从上次找的地方开始往下找),best fit(找最size最接近的)
free的时候:仅仅擦除有效位可能导致空闲块连续,这个时候就需要查看其前后能否合并(类双向链表,把头全部复制到尾部一份)
改进:allocated blockes don’t need the tail, to ensure just set the second bit of free block to be the indicator

memory allocation(advanced)

explicit list

(maintain only the linked list of free blocks)
所以你的空闲块至少需要包含头尾和两个指针(free不能是更小的空间)
free insertion policy:
- LIFO(last in first out):每次新free出来的块都放在free list的链表头(simple and constant time,bad fragmentation)
  - 有优化,当free掉的块在某free block的后面可以直接不用更新指针
- address ordered:按地址排序插入,需要查找但是better fragmentation

segregated free lists

each size class has its own linked free list(class,can be single value or a range)

它在不牺牲吞吐量性能的同时提高了内存利用率(如果全是单个值的class那么等价于best fit without overhead)

garbage collection

some assumption:
- when no pointer point to the object,it surely can be freed
- the collector must be able to distinguish the pointer and non-pointer
mark and sweep algorithm
- allocate until out of space
- use extra mark bit in the head of each block,start at root,traverse and mark all the node,then scan and delete non-marked node
the conservative collector in C
complete understanding of C pointer(need to know the priority of operator(方括号和括号的优先级高于星))
deal with memory bug:
- gdb(debugger for simple)
- datastructure invarient consistency checker
- binary translator:valgrind
- glibc malloc contains checking code

malloc lab

macro and inline:
macro:pre-compile time but no type checking and may have some side effect
inline_function:compile time do type checking
pointers:
pointer arithmetic is undefined on void*
idea
- 计算地址偏移量通常将指针转换为char*操作(这样以byte操作)
- 读写unsigned int*
- 堆中的指针默认是指向有效载荷的首地址
- 通常维护一个序言块和结尾块,哨兵的作用