Skip to content

Runtime Overview¤

Overview¤

A typical runtime consists of the following parts:

Compiled¤

The Compiled class is responsible for initializing and managing a device.

Compiled ¤

Compiled(
    device: str,
    allocator: Allocator,
    renderer: Optional[Renderer],
    compiler: Optional[Compiler],
    runtime,
    graph=None,
)

Methods:

  • synchronize

    Synchronize all pending operations on the device.

synchronize ¤

synchronize()

Synchronize all pending operations on the device.

This method ensures that all previously queued operations on the device have been completed before proceeding.

Allocator¤

The Allocator class is responsible for managing memory on the device. There is also a version called the LRUAllocator, which caches allocated buffers to optimize performance.

Allocator ¤

Methods:

_alloc ¤

_alloc(size: int, options: BufferSpec)

_copyin ¤

_copyin(dest, src: memoryview)

_copyout ¤

_copyout(dest: memoryview, src)

_free ¤

_free(opaque, options: BufferSpec)

alloc ¤

alloc(size: int, options: Optional[BufferSpec] = None)

free ¤

free(
    opaque, size: int, options: Optional[BufferSpec] = None
)

LRUAllocator ¤

LRUAllocator()

Bases: Allocator

The LRU Allocator is responsible for caching buffers. It ensures that buffers are not freed until it is absolutely necessary, optimizing performance.

Methods:

Attributes:

cache instance-attribute ¤

cache: dict[tuple[int, Optional[BufferSpec]], Any] = (
    defaultdict(list)
)

alloc ¤

alloc(size: int, options: Optional[BufferSpec] = None)

free ¤

free(
    opaque: Any,
    size: int,
    options: Optional[BufferSpec] = None,
)

free_cache ¤

free_cache()

Program¤

The Program class is created for each loaded program. It is responsible for executing the program on the device. As an example, here is a CPUProgram implementation which loads program and runs it.

CPUProgram ¤

CPUProgram(name: str, lib: bytes)

Methods:

Attributes:

Source code in tinygrad/device.py
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
def __init__(self, name:str, lib:bytes):
  # On apple silicon with SPRR enabled (it always is in macos) RWX pages are unrepresentable: https://blog.svenpeter.dev/posts/m1_sprr_gxf/
  # MAP_JIT allows us to easily flip pages from RW- to R-X and vice versa. It is a noop on intel cpus. (man pthread_jit_write_protect_np)
  self.mem = mmap(-1, len(lib), MAP_ANON | MAP_PRIVATE | (MAP_JIT if OSX else 0), PROT_READ | PROT_WRITE | PROT_EXEC)

  if OSX: CPUProgram.helper_handle.pthread_jit_write_protect_np(False)
  self.mem.write(lib)
  if OSX: CPUProgram.helper_handle.pthread_jit_write_protect_np(True)

  # __clear_cache isn't a normal libc function, but a compiler support routine found in libgcc_s for gcc and compiler-rt for clang.
  # libgcc_s comes as shared library but compiler-rt is only a bunch of static library archives which we can't directly load, but fortunately
  # it somehow found its way into libSystem on macos (likely because it used __builtin_clear_cache) and libgcc_s is ~always present on linux
  # Using ["name"] instead of .name because otherwise name is getting mangled: https://docs.python.org/3.12/reference/expressions.html#index-5
  CPUProgram.helper_handle["__clear_cache"](ctypes.c_void_p(mv_address(self.mem)), ctypes.c_void_p(mv_address(self.mem) + len(lib)))

  self.fxn = ctypes.CFUNCTYPE(None)(mv_address(self.mem))

fxn instance-attribute ¤

fxn = CFUNCTYPE(None)(mv_address(mem))

helper_handle class-attribute instance-attribute ¤

helper_handle = CDLL(
    find_library("System") if OSX else "libgcc_s.so.1"
)

mem instance-attribute ¤

mem = mmap(
    -1,
    len(lib),
    MAP_ANON | MAP_PRIVATE | MAP_JIT if OSX else 0,
    PROT_READ | PROT_WRITE | PROT_EXEC,
)

__call__ ¤

__call__(*bufs, vals=(), wait=False)
Source code in tinygrad/device.py
242
243
244
245
246
247
248
249
250
def __call__(self, *bufs, vals=(), wait=False):
  args = list(bufs) + list(vals)
  # NOTE: replace this by --target={host's triple}-elf in clang args once we only support macos sequoia and later.
  # Apple relaxes abi requirement for stack arguments to always be at least 8 byte aligned on arm64
  # https://developer.apple.com/documentation/xcode/writing-arm64-code-for-apple-platforms
  # This hack is required because clang/llvm bug doesn't allow us to just use {host's triple}+'-elf' (relocation failures)
  # The bug was fixed in https://github.com/llvm/llvm-project/commit/454cc36630296262cdb6360b60f90a64a97f7f1a but was only backported to xcode 16+
  if platform.machine() == "arm64" and OSX: args = args[:8] + [ctypes.c_int64(a) if isinstance(a, int) else a for a in args[8:]]
  return cpu_time_execution(lambda: self.fxn(*args), enable=wait)

Compiler¤

The Compiler class compiles the output from the Renderer and produces it in a device-specific format.

Compiler ¤

Compiler(cachekey: Optional[str] = None)

Methods:

Attributes:

Source code in tinygrad/device.py
257
def __init__(self, cachekey:Optional[str]=None): self.cachekey = None if getenv("DISABLE_COMPILER_CACHE") else cachekey

cachekey instance-attribute ¤

cachekey = (
    None if getenv("DISABLE_COMPILER_CACHE") else cachekey
)

compile ¤

compile(src: str) -> bytes
Source code in tinygrad/device.py
258
def compile(self, src:str) -> bytes: return src.encode()   # NOTE: empty compiler is the default

compile_cached ¤

compile_cached(src: str) -> bytes
Source code in tinygrad/device.py
259
260
261
262
263
264
def compile_cached(self, src:str) -> bytes:
  if self.cachekey is None or (lib := diskcache_get(self.cachekey, src)) is None:
    assert not getenv("ASSERT_COMPILE"), f"tried to compile with ASSERT_COMPILE set\n{src}"
    lib = self.compile(src)
    if self.cachekey is not None: diskcache_put(self.cachekey, src, lib)
  return lib

disassemble ¤

disassemble(lib: bytes)
Source code in tinygrad/device.py
265
def disassemble(self, lib:bytes): pass