Runtime Overview¤

Overview¤

A typical runtime consists of the following parts:

Compiled
Allocator
Program
Compiler

Compiled¤

The Compiled class is responsible for initializing and managing a device.

Compiled ¤

Compiled(
    device: str,
    allocator: Allocator,
    renderer: Renderer | None,
    compiler: Compiler | None,
    runtime,
    graph=None,
    group_id=None,
)

Methods:

synchronize –

Synchronize all pending operations on the device.

synchronize ¤

synchronize()

Synchronize all pending operations on the device.

This method ensures that all previously queued operations on the device have been completed before proceeding.

Allocator¤

The Allocator class is responsible for managing memory on the device. There is also a version called the LRUAllocator, which caches allocated buffers to optimize performance.

Allocator ¤

Allocator(dev: DeviceType)

Bases: Generic[DeviceType]

Methods:

_alloc –
_copyin –
_copyout –
_free –
alloc –
free –

Attributes:

default_buffer_spec (BufferSpec) –
dev (DeviceType) –

default_buffer_spec `instance-attribute` ¤

default_buffer_spec: BufferSpec = BufferSpec()

dev `instance-attribute` ¤

dev: DeviceType = dev

_alloc ¤

_alloc(size: int, options: BufferSpec)

_copyin ¤

_copyin(dest, src: memoryview)

_copyout ¤

_copyout(dest: memoryview, src)

_free ¤

_free(opaque, options: BufferSpec)

alloc ¤

alloc(size: int, options: BufferSpec | None = None)

free ¤

free(opaque, size: int, options: BufferSpec | None = None)

LRUAllocator ¤

LRUAllocator(dev: DeviceType)

Bases: Allocator, Generic[DeviceType]

The LRU Allocator is responsible for caching buffers. It ensures that buffers are not freed until it is absolutely necessary, optimizing performance.

Methods:

alloc –
free –
free_cache –

Attributes:

cache (dict[tuple[int, BufferSpec | None], Any]) –

cache `instance-attribute` ¤

cache: dict[tuple[int, BufferSpec | None], Any] = (
    defaultdict(list)
)

alloc ¤

alloc(size: int, options: BufferSpec | None = None)

free ¤

free(
    opaque: Any,
    size: int,
    options: BufferSpec | None = None,
)

free_cache ¤

free_cache()

Program¤

The Program class is created for each loaded program. It is responsible for executing the program on the device. As an example, here is a CPUProgram implementation which loads program and runs it.

CPUProgram ¤

CPUProgram(dev, name: str, lib: bytes)

Bases: HCQProgram

Methods:

__del__ –

Attributes:

fxn –
mem –
rt_lib –

Source code in tinygrad/runtime/ops_cpu.py

def __init__(self, dev, name:str, lib:bytes):
  if sys.platform == "win32":
    PAGE_EXECUTE_READWRITE, MEM_COMMIT, MEM_RESERVE = 0x40, 0x1000, 0x2000
    ctypes.windll.kernel32.VirtualAlloc.restype = ctypes.c_void_p
    self.mem = ctypes.windll.kernel32.VirtualAlloc(ctypes.c_void_p(0), ctypes.c_size_t(len(lib)), MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE)
    ctypes.memmove(self.mem, lib, len(lib))
    ctypes.windll.kernel32.GetCurrentProcess.restype = ctypes.c_void_p
    proc = ctypes.windll.kernel32.GetCurrentProcess()
    ctypes.windll.kernel32.FlushInstructionCache(ctypes.c_void_p(proc), ctypes.c_void_p(self.mem), ctypes.c_size_t(len(lib)))
    self.fxn = ctypes.CFUNCTYPE(None)(self.mem)
  else:
    # On apple silicon with SPRR enabled (it always is in macos) RWX pages are unrepresentable: https://blog.svenpeter.dev/posts/m1_sprr_gxf/
    # MAP_JIT allows us to easily flip pages from RW- to R-X and vice versa. It is a noop on intel cpus. (man pthread_jit_write_protect_np)
    self.mem = mmap.mmap(-1, len(lib), mmap.MAP_ANON|mmap.MAP_PRIVATE|(MAP_JIT if OSX else 0), mmap.PROT_READ|mmap.PROT_WRITE|mmap.PROT_EXEC)

    if OSX: CPUProgram.rt_lib.pthread_jit_write_protect_np(False)
    self.mem.write(lib)
    if OSX: CPUProgram.rt_lib.pthread_jit_write_protect_np(True)

    # __clear_cache isn't a normal libc function, but a compiler support routine found in libgcc_s for gcc and compiler-rt for clang.
    # libgcc_s comes as shared library but compiler-rt is only a bunch of static library archives which we can't directly load, but fortunately
    # it somehow found its way into libSystem on macos (likely because it used __builtin_clear_cache) and libgcc_s is ~always present on linux
    # Using ["name"] instead of .name because otherwise name is getting mangled: https://docs.python.org/3.12/reference/expressions.html#index-5
    CPUProgram.rt_lib["__clear_cache"](ctypes.c_void_p(mv_address(self.mem)), ctypes.c_void_p(mv_address(self.mem) + len(lib)))

    self.fxn = ctypes.CFUNCTYPE(None)(mv_address(self.mem))

  super().__init__(HCQArgsState, dev, name, kernargs_alloc_size=0)

fxn `instance-attribute` ¤

fxn = CFUNCTYPE(None)(mem)

mem `instance-attribute` ¤

mem = VirtualAlloc(
    c_void_p(0),
    c_size_t(len(lib)),
    MEM_COMMIT | MEM_RESERVE,
    PAGE_EXECUTE_READWRITE,
)

rt_lib `class-attribute` `instance-attribute` ¤

rt_lib = CDLL(
    find_library("System" if OSX else "kernel32")
    if (OSX or platform == "win32")
    else "libgcc_s.so.1"
)

del ¤

__del__()

Source code in tinygrad/runtime/ops_cpu.py

def __del__(self):
  if getattr(sys, 'is_finalizing', lambda: True)(): return
  if sys.platform == 'win32': ctypes.windll.kernel32.VirtualFree(ctypes.c_void_p(self.mem), ctypes.c_size_t(0), 0x8000) #0x8000 - MEM_RELEASE

Compiler¤

The Compiler class compiles the output from the Renderer and produces it in a device-specific format.

Compiler ¤

Compiler(cachekey: str | None = None)

Methods:

compile –
compile_cached –
disassemble –

Attributes:

cachekey –

Source code in tinygrad/device.py

def __init__(self, cachekey:str|None=None): self.cachekey = None if DISABLE_COMPILER_CACHE else cachekey

cachekey `instance-attribute` ¤

cachekey = None if DISABLE_COMPILER_CACHE else cachekey

compile ¤

compile(src: str) -> bytes

Source code in tinygrad/device.py

def compile(self, src:str) -> bytes: return src.encode()   # NOTE: empty compiler is the default

compile_cached ¤

compile_cached(src: str) -> bytes

Source code in tinygrad/device.py

def compile_cached(self, src:str) -> bytes:
  if self.cachekey is None or (lib := diskcache_get(self.cachekey, src)) is None:
    assert not getenv("ASSERT_COMPILE"), f"tried to compile with ASSERT_COMPILE set\n{src}"
    lib = self.compile(src)
    if self.cachekey is not None: diskcache_put(self.cachekey, src, lib)
  return lib

disassemble ¤

disassemble(lib: bytes)

Source code in tinygrad/device.py

def disassemble(self, lib:bytes): pass

Runtime Overview¤

Overview¤

Compiled¤

Compiled ¤

synchronize ¤

Allocator¤

Allocator ¤

default_buffer_spec instance-attribute ¤

dev instance-attribute ¤

_alloc ¤

_copyin ¤

_copyout ¤

_free ¤

alloc ¤

free ¤

LRUAllocator ¤

cache instance-attribute ¤

alloc ¤

free ¤

free_cache ¤

Program¤

CPUProgram ¤

fxn instance-attribute ¤

mem instance-attribute ¤

rt_lib class-attribute instance-attribute ¤

__del__ ¤

Compiler¤

Compiler ¤

cachekey instance-attribute ¤

compile ¤

compile_cached ¤

disassemble ¤

default_buffer_spec `instance-attribute` ¤

dev `instance-attribute` ¤

cache `instance-attribute` ¤

fxn `instance-attribute` ¤

mem `instance-attribute` ¤

rt_lib `class-attribute` `instance-attribute` ¤

del ¤

cachekey `instance-attribute` ¤