87 lines
6.5 KiB
Markdown
87 lines
6.5 KiB
Markdown
# Pez plane manager
|
|
|
|
## Introduction
|
|
|
|
Pez is the allocator that manages physical pages allocation and virtual pages ranges allocation. It's a custom design, (normally) never seen before. It's defined into `shelter/lib/include/memory/pez/pez.h` and implemented into `shelter/lib/src/memory/pez/pez.c`.
|
|
|
|
## Planes
|
|
|
|
Pez manages memory using planes. It can be as granular as a 4KB pages. Since Pez uses `sh_uint32` for representating pages index and regions sizes, a plane can technically manages up to 16 terabytes of memory.
|
|
|
|
In the implementation, Pez separate the physical plane from virtual planes. They work the same in their logic and algorithm but use differents source of truth and the virtual plane can provide an offset to each address it allocate.
|
|
|
|
The physical plane uses the physical pages bitmap that the Page subsystem maintain as source of truth. While the source of truth for virtual planes are the corresponding pages mapping into the virtual pages range the virtual plane manages, for the moment, only empty (not allocated at all) virtual planes can be created.
|
|
|
|
## Metadatas
|
|
|
|
### Regions objects
|
|
|
|
The Pez allocator uses regions objects allocated from their dedicated (physical or virtual) slab allocators.
|
|
|
|
The physical version is:
|
|
``` C
|
|
#pragma pack(1)
|
|
typedef struct {
|
|
sh_uint32 start_page_index;
|
|
sh_uint32 region_size_pages;
|
|
sh_uint8 next_region_index[3];
|
|
sh_uint8 flags;
|
|
} sh_pez_REGION_PHYSICAL_OBJECT;
|
|
#pragma pack()
|
|
```
|
|
|
|
The virtual version is:
|
|
``` C
|
|
#pragma pack(1)
|
|
typedef struct {
|
|
sh_uint32 start_page_index;
|
|
sh_uint32 region_size_pages;
|
|
sh_uint32 next_region_index;
|
|
} sh_pez_REGION_VIRTUAL_OBJECT;
|
|
#pragma pack()
|
|
```
|
|
|
|
The `start_page_index` field is the index of the first page of the region, starting at 0 from the start of the plane. The `region_size_pages` field is the size of the region in pages. The `next_region_index` field is the index of the next region in the size list. It's one byte longer on virtual region object to account for the 29 bits index. The `flags` field, while not being used in this implementation, can be used to store flags.
|
|
|
|
These regions objects represente free regions that are ready to be allocated.
|
|
|
|
### Size lists radix tree
|
|
|
|
Each plane maintain lists of free regions by their sizes. These are call size lists. They are linked through the `next_region_index` field, using their object index inside their respective slab allocator. These lists only contain regions of exact size. To signal the end of a size list, the last element in the list have their `next_region_index` field set to 0. This prevent the first slot of any of the slab inside the slab allocator to be allocated.
|
|
|
|
Each plane maintain a radix tree that link sizes of free regions (in page count) to indexes of the first region in this size list. This radix tree is called the size lists radix tree and is 8 levels deep.
|
|
|
|
This architecture allow for two allocations methods that are stricly time-constant or nearly time-constant:
|
|
|
|
Each time the allocator need to allocate a region, it search for this size into the corresponding size list. Two possibles results:
|
|
- if a free region of exactly this size is found, it's popped of the size list and allocated
|
|
- if no free region of exactly this size is found, the algorithm search for another region immediatly above in size of the requested size using the backtracking algorithm. This free region is popped of the size list and splitted in two. The first part is allocated and the second part is inserted into the corresponding size list.
|
|
|
|
This allocation scheme allow for zero internal fragmentation at the granularity of 4KB pages.
|
|
|
|
### Boundary radix tree
|
|
|
|
Each plane maintain what we call a boundary radix tree. This radix tree map the start and end pages index in the plane of each free regions to a boundary entry. For one-page free region, there is only one boundary. The two boundaries for free regions with more than one page are strictly identical.
|
|
|
|
A boundary is structured like this:
|
|
- The lower 32 bits of the boundary are the region object index of the region to which the boundary belong
|
|
- The upper 32 bits of the boundary are the previous region object index in the size list of the region to which the boundary belong
|
|
|
|
This radix tree allow for two things:
|
|
- constant-time passive fusion of free regions with deallocated occupied regions, at each free of occupied region. The external fragmentation is repaired as soon as the allocated region that cause it is deallocated.
|
|
- instant element suppression of any free region in any size lists
|
|
|
|
## API
|
|
|
|
The Pez allocator implementation inside the Shelter kernel provide the following functions:
|
|
- `void sh_pez_set_available()`: executed once to signal that Pez has taken over the role of Page for allocating pages
|
|
- `sh_bool sh_pez_is_available()`: return the current status of availability of Pez
|
|
- `sh_pez_PHYSICAL_PLANE* sh_pez_get_reference_phys_plane()`: return the reference physical plane that is used to manage physical memory
|
|
- `SH_STATUS sh_pez_init_physical_plane(sh_uint8 *physical_bitmap,sh_uint64 physical_page_count,sh_slab_reg_phys_SLAB_ALLOCATOR *slab_reg_phys,struct sh_slab_radix_node_SLAB_ALLOCATOR *slab_radix_node,sh_page_PAGE_TABLE_POOL *kernel_ptp,sh_pez_PHYSICAL_PLANE *phys_plane)`: initialize the physical plane by scanning the physical pages bitmap
|
|
- `SH_STATUS sh_pez_alloc_physical_pages(sh_pez_PHYSICAL_PLANE *phys_plane,sh_uint32 pages_count,sh_page_PHYSICAL_ADDRESS *address)`: allocate pages range from the physical plane. Doesn't map the pages at all
|
|
- `SH_STATUS sh_pez_free_physical_pages(sh_pez_PHYSICAL_PLANE *phys_plane,sh_page_PHYSICAL_ADDRESS *address,sh_uint32 pages_count)`: free a pages range and return them to the physical plane. Doesn't unmap the pages at all
|
|
- `SH_STATUS sh_pez_debug_physical(sh_pez_PHYSICAL_PLANE *phys_plane)`: print debugging information about Pez physical plane, intented for debug
|
|
- `SH_STATUS sh_pez_init_virtual_plane(sh_page_VIRTUAL_ADDRESS plane_offset,sh_slab_reg_virt_SLAB_ALLOCATOR *slab_reg_virt,struct sh_slab_radix_node_SLAB_ALLOCATOR *slab_radix_node,sh_page_PAGE_TABLE_POOL *kernel_ptp,sh_page_PAGE_TABLE_POOL *reference_ptp,sh_pez_VIRTUAL_PLANE *virt_plane)`: initialize an empty virtual plane
|
|
- `SH_STATUS sh_pez_alloc_virtual_pages(sh_pez_VIRTUAL_PLANE *virt_plane,sh_uint32 pages_count,sh_page_VIRTUAL_ADDRESS *address)`: allocate virtual pages range from the provided virtual plane
|
|
- `SH_STATUS sh_pez_free_virtual_pages(sh_pez_VIRTUAL_PLANE *virt_plane,sh_page_VIRTUAL_ADDRESS *address,sh_uint32 pages_count)`: free virtual pages range and return them to the provided virtual plane
|