Skip to content

Complex Ops

Reduce¤

sum ¤

sum(
    axis: Optional[Union[int, Sequence[int]]] = None,
    keepdim=False,
    acc_dtype: Optional[DTypeLike] = None,
)

Returns the sum of the elements of the tensor along the specified axis or axes.

You can pass in axis and keepdim keyword arguments to control the axis along which the maximum is computed and whether the reduced dimensions are retained.

You can pass in acc_dtype keyword argument to control the data type of the accumulation. If not specified, the accumulation data type is chosen based on the input tensor's data type.

t = Tensor.arange(6).reshape(2, 3)
print(t.numpy())
[[0 1 2]
 [3 4 5]]
print(t.sum().numpy())
15
print(t.sum(axis=0).numpy())
[3 5 7]
print(t.sum(axis=1).numpy())
[ 3 12]

Source code in tinygrad/tensor.py
1620
1621
1622
1623
1624
1625
1626
1627
1628
1629
1630
1631
1632
1633
1634
1635
1636
1637
1638
1639
1640
1641
1642
1643
1644
1645
def sum(self, axis:Optional[Union[int, Sequence[int]]]=None, keepdim=False, acc_dtype:Optional[DTypeLike]=None):
  """
  Returns the sum of the elements of the tensor along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the maximum is computed and whether the reduced dimensions are retained.

  You can pass in `acc_dtype` keyword argument to control the data type of the accumulation.
  If not specified, the accumulation data type is chosen based on the input tensor's data type.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.arange(6).reshape(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.sum().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.sum(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.sum(axis=1).numpy())
  ```
  """
  ret = self.cast(sum_acc_dtype(self.dtype) if acc_dtype is None else acc_dtype)._reduce(F.Sum, axis, keepdim)
  return ret.cast(self.dtype) if acc_dtype is None and self.dtype in (dtypes.float16, dtypes.bfloat16) else ret

prod ¤

prod(
    axis: Optional[Union[int, Sequence[int]]] = None,
    keepdim=False,
    acc_dtype: Optional[DTypeLike] = None,
)

Returns the product of the elements of the tensor along the specified axis or axes.

You can pass in axis and keepdim keyword arguments to control the axis along which the maximum is computed and whether the reduced dimensions are retained.

You can pass in acc_dtype keyword argument to control the data type of the accumulation. If not specified, the accumulation data type is chosen based on the input tensor's data type.

t = Tensor([-1, -2, -3, 1, 2, 3]).reshape(2, 3)
print(t.numpy())
[[-1 -2 -3]
 [ 1  2  3]]
print(t.prod().numpy())
-36
print(t.prod(axis=0).numpy())
[-1 -4 -9]
print(t.prod(axis=1).numpy())
[-6  6]

Source code in tinygrad/tensor.py
1647
1648
1649
1650
1651
1652
1653
1654
1655
1656
1657
1658
1659
1660
1661
1662
1663
1664
1665
1666
1667
1668
1669
1670
1671
def prod(self, axis:Optional[Union[int, Sequence[int]]]=None, keepdim=False, acc_dtype:Optional[DTypeLike]=None):
  """
  Returns the product of the elements of the tensor along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the maximum is computed and whether the reduced dimensions are retained.

  You can pass in `acc_dtype` keyword argument to control the data type of the accumulation.
  If not specified, the accumulation data type is chosen based on the input tensor's data type.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([-1, -2, -3, 1, 2, 3]).reshape(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.prod().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.prod(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.prod(axis=1).numpy())
  ```
  """
  return self.cast(acc_dtype if acc_dtype is not None else self.dtype)._reduce(F.Prod, axis, keepdim)

max ¤

max(
    axis: Optional[Union[int, Sequence[int]]] = None,
    keepdim=False,
)

Returns the maximum value of the tensor along the specified axis or axes.

You can pass in axis and keepdim keyword arguments to control the axis along which the maximum is computed and whether the reduced dimensions are retained.

t = Tensor([[1, 0, 2], [5, 4, 3]])
print(t.numpy())
[[1 0 2]
 [5 4 3]]
print(t.max().numpy())
5
print(t.max(axis=0).numpy())
[5 4 3]
print(t.max(axis=1, keepdim=True).numpy())
[[2]
 [5]]

Source code in tinygrad/tensor.py
1673
1674
1675
1676
1677
1678
1679
1680
1681
1682
1683
1684
1685
1686
1687
1688
1689
1690
1691
1692
1693
1694
def max(self, axis:Optional[Union[int, Sequence[int]]]=None, keepdim=False):
  """
  Returns the maximum value of the tensor along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the maximum is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 0, 2], [5, 4, 3]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.max().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.max(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.max(axis=1, keepdim=True).numpy())
  ```
  """
  return self._reduce(F.Max, axis, keepdim)

min ¤

min(
    axis: Optional[Union[int, Sequence[int]]] = None,
    keepdim=False,
)

Returns the minimum value of the tensor along the specified axis or axes.

You can pass in axis and keepdim keyword arguments to control the axis along which the minimum is computed and whether the reduced dimensions are retained.

t = Tensor([[1, 0, 2], [5, 4, 3]])
print(t.numpy())
[[1 0 2]
 [5 4 3]]
print(t.min().numpy())
0
print(t.min(axis=0).numpy())
[1 0 2]
print(t.min(axis=1, keepdim=True).numpy())
[[0]
 [3]]

Source code in tinygrad/tensor.py
1698
1699
1700
1701
1702
1703
1704
1705
1706
1707
1708
1709
1710
1711
1712
1713
1714
1715
1716
1717
1718
1719
def min(self, axis:Optional[Union[int, Sequence[int]]]=None, keepdim=False):
  """
  Returns the minimum value of the tensor along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the minimum is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 0, 2], [5, 4, 3]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.min().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.min(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.min(axis=1, keepdim=True).numpy())
  ```
  """
  return self._inverse().max(axis=axis, keepdim=keepdim)._inverse()

any ¤

any(
    axis: Optional[Union[int, Sequence[int]]] = None,
    keepdim=False,
)

Tests if any element evaluates to True along the specified axis or axes.

You can pass in axis and keepdim keyword arguments to control the reduce axis and whether the reduced dimensions are retained.

t = Tensor([[True, True], [True, False], [False, False]])
print(t.numpy())
[[ True  True]
 [ True False]
 [False False]]
print(t.any().numpy())
True
print(t.any(axis=0).numpy())
[ True  True]
print(t.any(axis=1, keepdim=True).numpy())
[[ True]
 [ True]
 [False]]

Source code in tinygrad/tensor.py
1721
1722
1723
1724
1725
1726
1727
1728
1729
1730
1731
1732
1733
1734
1735
1736
1737
1738
1739
1740
1741
def any(self, axis:Optional[Union[int, Sequence[int]]]=None, keepdim=False):
  """
  Tests if any element evaluates to `True` along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the reduce axis and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[True, True], [True, False], [False, False]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.any().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.any(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.any(axis=1, keepdim=True).numpy())
  ```
  """
  return self.bool().max(axis, keepdim)

all ¤

all(
    axis: Optional[Union[int, Sequence[int]]] = None,
    keepdim=False,
)

Tests if all element evaluates to True along the specified axis or axes.

You can pass in axis and keepdim keyword arguments to control the reduce axis and whether the reduced dimensions are retained.

t = Tensor([[True, True], [True, False], [False, False]])
print(t.numpy())
[[ True  True]
 [ True False]
 [False False]]
print(t.all().numpy())
False
print(t.all(axis=0).numpy())
[False False]
print(t.all(axis=1, keepdim=True).numpy())
[[ True]
 [False]
 [False]]

Source code in tinygrad/tensor.py
1743
1744
1745
1746
1747
1748
1749
1750
1751
1752
1753
1754
1755
1756
1757
1758
1759
1760
1761
1762
1763
def all(self, axis:Optional[Union[int, Sequence[int]]]=None, keepdim=False):
  """
  Tests if all element evaluates to `True` along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the reduce axis and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[True, True], [True, False], [False, False]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.all().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.all(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.all(axis=1, keepdim=True).numpy())
  ```
  """
  return self.logical_not().any(axis, keepdim).logical_not()

mean ¤

mean(
    axis: Optional[Union[int, Sequence[int]]] = None,
    keepdim=False,
)

Returns the mean value of the tensor along the specified axis or axes.

You can pass in axis and keepdim keyword arguments to control the axis along which the mean is computed and whether the reduced dimensions are retained.

Tensor.manual_seed(42)
t = Tensor.normal(2, 3, mean=2.5, std=0.5)
print(t.numpy())
[[2.9889 2.7339 2.7763]
 [2.3356 2.0722 2.6376]]
print(t.mean().numpy())
2.5907671
print(t.mean(axis=0).numpy())
[2.6623 2.4031 2.707 ]
print(t.mean(axis=1).numpy())
[2.833  2.3485]

Source code in tinygrad/tensor.py
1765
1766
1767
1768
1769
1770
1771
1772
1773
1774
1775
1776
1777
1778
1779
1780
1781
1782
1783
1784
1785
1786
1787
1788
1789
def mean(self, axis:Optional[Union[int, Sequence[int]]]=None, keepdim=False):
  """
  Returns the mean value of the tensor along the specified axis or axes.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the mean is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.normal(2, 3, mean=2.5, std=0.5)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.mean().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.mean(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.mean(axis=1).numpy())
  ```
  """
  output_dtype = self.dtype if dtypes.is_float(self.dtype) else dtypes.float32
  numerator = self.cast(sum_acc_dtype(self.dtype)).sum(axis=axis, keepdim=keepdim)
  return numerator.div(prod([si for si, so in zip(self.shape, self.sum(axis=axis, keepdim=True).shape) if resolve(si != so)])).cast(output_dtype)

var ¤

var(
    axis: Optional[Union[int, Sequence[int]]] = None,
    keepdim=False,
    correction=1,
)

Returns the variance of the tensor along the specified axis or axes.

You can pass in axis, keepdim, and correction keyword arguments to control the axis along which the variance is computed, whether the reduced dimensions are retained, and the Bessel's correction applied.

Tensor.manual_seed(42)
t = Tensor.normal(2, 3, mean=2.5, std=0.5)
print(t.numpy())
[[2.9889 2.7339 2.7763]
 [2.3356 2.0722 2.6376]]
print(t.var().numpy())
0.109925404
print(t.var(axis=0).numpy())
[0.2134 0.2189 0.0096]
print(t.var(axis=1).numpy())
[0.0187 0.08  ]

Source code in tinygrad/tensor.py
1791
1792
1793
1794
1795
1796
1797
1798
1799
1800
1801
1802
1803
1804
1805
1806
1807
1808
1809
1810
1811
1812
1813
1814
1815
def var(self, axis:Optional[Union[int, Sequence[int]]]=None, keepdim=False, correction=1):
  """
  Returns the variance of the tensor along the specified axis or axes.

  You can pass in `axis`, `keepdim`, and `correction` keyword arguments to control the axis along
  which the variance is computed, whether the reduced dimensions are retained, and the Bessel's correction applied.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.normal(2, 3, mean=2.5, std=0.5)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.var().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.var(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.var(axis=1).numpy())
  ```
  """
  squares = (self - self.mean(axis=axis, keepdim=True)).square()
  n = prod([si for si, so in zip(self.shape, squares.sum(axis=axis, keepdim=True).shape) if resolve(si != so)])
  return squares.sum(axis=axis, keepdim=keepdim).div(smax([0, n-correction]))

std ¤

std(
    axis: Optional[Union[int, Sequence[int]]] = None,
    keepdim=False,
    correction=1,
)

Returns the standard deviation of the tensor along the specified axis or axes.

You can pass in axis, keepdim, and correction keyword arguments to control the axis along which the standard deviation is computed, whether the reduced dimensions are retained, and the Bessel's correction applied.

Tensor.manual_seed(42)
t = Tensor.normal(2, 3, mean=2.5, std=0.5)
print(t.numpy())
[[2.9889 2.7339 2.7763]
 [2.3356 2.0722 2.6376]]
print(t.std().numpy())
0.33155
print(t.std(axis=0).numpy())
[0.462  0.4679 0.0981]
print(t.std(axis=1).numpy())
[0.1367 0.2829]

Source code in tinygrad/tensor.py
1817
1818
1819
1820
1821
1822
1823
1824
1825
1826
1827
1828
1829
1830
1831
1832
1833
1834
1835
1836
1837
1838
1839
def std(self, axis:Optional[Union[int, Sequence[int]]]=None, keepdim=False, correction=1):
  """
  Returns the standard deviation of the tensor along the specified axis or axes.

  You can pass in `axis`, `keepdim`, and `correction` keyword arguments to control the axis along
  which the standard deviation is computed, whether the reduced dimensions are retained, and the Bessel's correction applied.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.normal(2, 3, mean=2.5, std=0.5)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.std().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.std(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.std(axis=1).numpy())
  ```
  """
  return self.var(axis, keepdim, correction).sqrt()

std_mean ¤

std_mean(
    axis: Optional[Union[int, Sequence[int]]] = None,
    keepdim=False,
    correction=1,
)

Calculates the standard deviation and mean over the dimensions specified by dim. Syntactic sugar around Tensor.std and Tensor.mean to match torch.std_mean.

Tensor.manual_seed(42)
t = Tensor.normal(2, 3, mean=2.5, std=0.5)
print(t.numpy())
[[2.9889 2.7339 2.7763]
 [2.3356 2.0722 2.6376]]
std, mean = t.std_mean()
print(std.numpy(), mean.numpy())
0.33155 2.5907671

Source code in tinygrad/tensor.py
1841
1842
1843
1844
1845
1846
1847
1848
1849
1850
1851
1852
1853
1854
1855
1856
def std_mean(self, axis:Optional[Union[int, Sequence[int]]]=None, keepdim=False, correction=1):
  """
  Calculates the standard deviation and mean over the dimensions specified by dim.
  Syntactic sugar around `Tensor.std` and `Tensor.mean` to match `torch.std_mean`.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.normal(2, 3, mean=2.5, std=0.5)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  std, mean = t.std_mean()
  print(std.numpy(), mean.numpy())
  ```
  """
  return self.std(axis, keepdim, correction), self.mean(axis, keepdim)

softmax ¤

softmax(axis=-1, dtype: Optional[DTypeLike] = None)

Applies the softmax function to the tensor along the specified axis.

Rescales the elements of the tensor such that they lie in the range [0, 1] and sum to 1.

You can pass in the axis keyword argument to control the axis along which the softmax is computed.

Tensor.manual_seed(42)
t = Tensor.randn(2, 3)
print(t.numpy())
[[ 0.9779  0.4678  0.5526]
 [-0.3288 -0.8555  0.2753]]
print(t.softmax().numpy())
[[0.4436 0.2664 0.29  ]
 [0.2924 0.1727 0.5349]]
print(t.softmax(axis=0).numpy())
[[0.787  0.7897 0.5689]
 [0.213  0.2103 0.4311]]

Source code in tinygrad/tensor.py
1864
1865
1866
1867
1868
1869
1870
1871
1872
1873
1874
1875
1876
1877
1878
1879
1880
1881
1882
1883
1884
1885
def softmax(self, axis=-1, dtype:Optional[DTypeLike]=None):
  """
  Applies the softmax function to the tensor along the specified axis.

  Rescales the elements of the tensor such that they lie in the range [0, 1] and sum to 1.

  You can pass in the `axis` keyword argument to control the axis along which the softmax is computed.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.randn(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.softmax().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.softmax(axis=0).numpy())
  ```
  """
  _, e, ss = self._softmax(axis, dtype)
  return e.div(ss)

log_softmax ¤

log_softmax(axis=-1, dtype: Optional[DTypeLike] = None)

Applies the log-softmax function to the tensor along the specified axis.

The log-softmax function is a numerically stable alternative to the softmax function in log space.

You can pass in the axis keyword argument to control the axis along which the log-softmax is computed.

Tensor.manual_seed(42)
t = Tensor.randn(2, 3)
print(t.numpy())
[[ 0.9779  0.4678  0.5526]
 [-0.3288 -0.8555  0.2753]]
print(t.log_softmax().numpy())
[[-0.8127 -1.3228 -1.238 ]
 [-1.2297 -1.7564 -0.6256]]
print(t.log_softmax(axis=0).numpy())
[[-0.2396 -0.2361 -0.564 ]
 [-1.5463 -1.5594 -0.8414]]

Source code in tinygrad/tensor.py
1887
1888
1889
1890
1891
1892
1893
1894
1895
1896
1897
1898
1899
1900
1901
1902
1903
1904
1905
1906
1907
1908
def log_softmax(self, axis=-1, dtype:Optional[DTypeLike]=None):
  """
  Applies the log-softmax function to the tensor along the specified axis.

  The log-softmax function is a numerically stable alternative to the softmax function in log space.

  You can pass in the `axis` keyword argument to control the axis along which the log-softmax is computed.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.randn(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.log_softmax().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.log_softmax(axis=0).numpy())
  ```
  """
  m, _, ss = self._softmax(axis, dtype)
  return m - ss.log()

logsumexp ¤

logsumexp(axis=None, keepdim=False)

Computes the log-sum-exp of the tensor along the specified axis or axes.

The log-sum-exp function is a numerically stable way to compute the logarithm of the sum of exponentials.

You can pass in axis and keepdim keyword arguments to control the axis along which the log-sum-exp is computed and whether the reduced dimensions are retained.

Tensor.manual_seed(42)
t = Tensor.randn(2, 3)
print(t.numpy())
[[ 0.9779  0.4678  0.5526]
 [-0.3288 -0.8555  0.2753]]
print(t.logsumexp().numpy())
2.1347282
print(t.logsumexp(axis=0).numpy())
[1.2174 0.7039 1.1167]
print(t.logsumexp(axis=1).numpy())
[1.7906 0.9009]

Source code in tinygrad/tensor.py
1910
1911
1912
1913
1914
1915
1916
1917
1918
1919
1920
1921
1922
1923
1924
1925
1926
1927
1928
1929
1930
1931
1932
1933
1934
1935
def logsumexp(self, axis=None, keepdim=False):
  """
  Computes the log-sum-exp of the tensor along the specified axis or axes.

  The log-sum-exp function is a numerically stable way to compute the logarithm of the sum of exponentials.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the log-sum-exp is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.randn(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logsumexp().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logsumexp(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logsumexp(axis=1).numpy())
  ```
  """
  m = self.max(axis=axis, keepdim=True)
  return (self - m).exp().sum(axis=axis, keepdim=keepdim).log() + m.squeeze(axis)

logcumsumexp ¤

logcumsumexp(axis=0)

Computes the log-cumsum-exp of the tensor along the specified axis or axes.

The log-cumsum-exp function is a numerically stable way to compute the logarithm of the cumulative sum of exponentials.

You can pass in the axis keyword argument to control the axis along which the log-cum-sum-exp is computed.

Tensor.manual_seed(42)
t = Tensor.randn(2, 3)
print(t.numpy())
[[ 0.9779  0.4678  0.5526]
 [-0.3288 -0.8555  0.2753]]
print(t.logcumsumexp().numpy())
[[0.9779 0.4678 0.5526]
 [1.2174 0.7039 1.1167]]
print(t.logcumsumexp(axis=0).numpy())
[[0.9779 0.4678 0.5526]
 [1.2174 0.7039 1.1167]]
print(t.logcumsumexp(axis=1).numpy())
[[ 0.9779  1.4481  1.7906]
 [-0.3288  0.1353  0.9009]]

Source code in tinygrad/tensor.py
1937
1938
1939
1940
1941
1942
1943
1944
1945
1946
1947
1948
1949
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
def logcumsumexp(self, axis=0):
  """
  Computes the log-cumsum-exp of the tensor along the specified axis or axes.

  The log-cumsum-exp function is a numerically stable way to compute the logarithm of the cumulative sum of exponentials.

  You can pass in the `axis` keyword argument to control the axis along which
  the log-cum-sum-exp is computed.

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.randn(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logcumsumexp().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logcumsumexp(axis=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.logcumsumexp(axis=1).numpy())
  ```
  """
  m = self.max(axis=axis, keepdim=True)
  return (self - m).exp().cumsum(axis=axis).log() + m

argmax ¤

argmax(axis=None, keepdim=False)

Returns the indices of the maximum value of the tensor along the specified axis.

You can pass in axis and keepdim keyword arguments to control the axis along which the maximum is computed and whether the reduced dimensions are retained.

t = Tensor([[1, 0, 2], [5, 4, 3]])
print(t.numpy())
[[1 0 2]
 [5 4 3]]
print(t.argmax().numpy()) # Returns the index of the maximum value in the flattened tensor.
3
print(t.argmax(axis=0).numpy()) # Returns the indices of the maximum values along axis 0.
[1 1 1]
print(t.argmax(axis=1).numpy()) # Returns the indices of the maximum values along axis 1.
[2 0]

Source code in tinygrad/tensor.py
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
def argmax(self, axis=None, keepdim=False):
  """
  Returns the indices of the maximum value of the tensor along the specified axis.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the maximum is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 0, 2], [5, 4, 3]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmax().numpy()) # Returns the index of the maximum value in the flattened tensor.
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmax(axis=0).numpy()) # Returns the indices of the maximum values along axis 0.
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmax(axis=1).numpy()) # Returns the indices of the maximum values along axis 1.
  ```
  """
  if axis is None: return self.flatten().argmax(0)
  axis = self._resolve_dim(axis)
  m = self == self.max(axis=axis, keepdim=True)
  idx = m * Tensor.arange(self.shape[axis],0,-1, requires_grad=False, device=self.device).reshape(self.shape[axis], *[1]*(self.ndim-axis-1))
  return (self.shape[axis]-idx.max(axis=axis, keepdim=keepdim)).cast(dtypes.int32)

argmin ¤

argmin(axis=None, keepdim=False)

Returns the indices of the minimum value of the tensor along the specified axis.

You can pass in axis and keepdim keyword arguments to control the axis along which the minimum is computed and whether the reduced dimensions are retained.

t = Tensor([[1, 0, 2], [5, 4, 3]])
print(t.numpy())
[[1 0 2]
 [5 4 3]]
print(t.argmin().numpy()) # Returns the index of the minimum value in the flattened tensor.
1
print(t.argmin(axis=0).numpy()) # Returns the indices of the minimum values along axis 0.
[0 0 0]
print(t.argmin(axis=1).numpy()) # Returns the indices of the minimum values along axis 1.
[1 2]

Source code in tinygrad/tensor.py
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2012
def argmin(self, axis=None, keepdim=False):
  """
  Returns the indices of the minimum value of the tensor along the specified axis.

  You can pass in `axis` and `keepdim` keyword arguments to control the axis along
  which the minimum is computed and whether the reduced dimensions are retained.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 0, 2], [5, 4, 3]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmin().numpy()) # Returns the index of the minimum value in the flattened tensor.
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmin(axis=0).numpy()) # Returns the indices of the minimum values along axis 0.
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.argmin(axis=1).numpy()) # Returns the indices of the minimum values along axis 1.
  ```
  """
  return self._inverse().argmax(axis=axis, keepdim=keepdim)

Processing¤

avg_pool2d ¤

avg_pool2d(
    kernel_size=(2, 2),
    stride=None,
    dilation=1,
    padding=0,
    ceil_mode=False,
    count_include_pad=True,
)

Applies average pooling over a tensor.

This function supports three different types of padding

  1. int (single value): Applies the same padding value uniformly to all spatial dimensions.

  2. Tuple[int, ...] (length = number of spatial dimensions): Specifies a distinct padding value for each spatial dimension in the form (padding_height, padding_width, ...).

  3. Tuple[int, ...] (length = 2 * number of spatial dimensions): Specifies explicit padding for each side of each spatial dimension in the form (padding_left, padding_right, padding_top, padding_bottom, ...).

When ceil_mode is set to True, output shape will be determined using ceil division. When count_include_pad is set to False, zero padding will not be included in the averaging calculation.

Note

unlike PyTorch, this implementation is not limited to only 2d pooling and instead works for any number of dimensions.

See: https://paperswithcode.com/method/average-pooling

t = Tensor.arange(25).reshape(1, 1, 5, 5)
print(t.avg_pool2d().numpy())
[[[[ 3.  5.]
   [13. 15.]]]]
print(t.avg_pool2d(ceil_mode=True).numpy())
[[[[ 3.   5.   6.5]
   [13.  15.  16.5]
   [20.5 22.5 24. ]]]]
print(t.avg_pool2d(padding=1).numpy())
[[[[ 0.    0.75  1.75]
   [ 3.75  9.   11.  ]
   [ 8.75 19.   21.  ]]]]
print(t.avg_pool2d(padding=1, count_include_pad=False).numpy())
[[[[ 0.   1.5  3.5]
   [ 7.5  9.  11. ]
   [17.5 19.  21. ]]]]

Source code in tinygrad/tensor.py
2105
2106
2107
2108
2109
2110
2111
2112
2113
2114
2115
2116
2117
2118
2119
2120
2121
2122
2123
2124
2125
2126
2127
2128
2129
2130
2131
2132
2133
2134
2135
2136
2137
2138
2139
2140
2141
2142
2143
2144
2145
2146
2147
2148
2149
2150
def avg_pool2d(self, kernel_size=(2,2), stride=None, dilation=1, padding=0, ceil_mode=False, count_include_pad=True):
  """
  Applies average pooling over a tensor.

  This function supports three different types of `padding`

  1. `int` (single value):
    Applies the same padding value uniformly to all spatial dimensions.

  2. `Tuple[int, ...]` (length = number of spatial dimensions):
    Specifies a distinct padding value for each spatial dimension in the form `(padding_height, padding_width, ...)`.

  3. `Tuple[int, ...]` (length = 2 * number of spatial dimensions):
    Specifies explicit padding for each side of each spatial dimension in the form
    `(padding_left, padding_right, padding_top, padding_bottom, ...)`.

  When `ceil_mode` is set to `True`, output shape will be determined using ceil division.
  When `count_include_pad` is set to `False`, zero padding will not be included in the averaging calculation.

  NOTE: unlike PyTorch, this implementation is not limited to only 2d pooling and instead works for any number of dimensions.

  See: https://paperswithcode.com/method/average-pooling

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.arange(25).reshape(1, 1, 5, 5)
  print(t.avg_pool2d().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.avg_pool2d(ceil_mode=True).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.avg_pool2d(padding=1).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.avg_pool2d(padding=1, count_include_pad=False).numpy())
  ```
  """
  axis = tuple(range(-len(k_ := make_tuple(kernel_size, 2)), 0))
  def pool(x:Tensor, padding_:Sequence[int]) -> Tensor: return x.pad(padding_)._pool(k_, stride if stride is not None else k_, dilation)
  reg_pads = self._resolve_pool_pads(padding, len(k_))
  ceil_pads = self._apply_ceil_mode(reg_pads, k_, stride if stride is not None else k_, dilation)
  if not count_include_pad:
    pads = ceil_pads if ceil_mode else reg_pads
    return pool(self, pads).sum(axis) / pool(self.ones_like(), pads).sum(axis)
  if not ceil_mode: return pool(self, reg_pads).mean(axis)
  return pool(self, ceil_pads).sum(axis) / pool(self.pad(reg_pads).ones_like(), tuple(cp-rp for cp,rp in zip(ceil_pads, reg_pads))).sum(axis)

max_pool2d ¤

max_pool2d(
    kernel_size=(2, 2),
    stride=None,
    dilation=1,
    padding=0,
    ceil_mode=False,
)

Applies max pooling over a tensor.

This function supports three different types of padding

  1. int (single value): Applies the same padding value uniformly to all spatial dimensions.

  2. Tuple[int, ...] (length = number of spatial dimensions): Specifies a distinct padding value for each spatial dimension in the form (padding_height, padding_width, ...).

  3. Tuple[int, ...] (length = 2 * number of spatial dimensions): Specifies explicit padding for each side of each spatial dimension in the form (padding_left, padding_right, padding_top, padding_bottom, ...).

When ceil_mode is set to True, output shape will be determined using ceil division.

Note

unlike PyTorch, this implementation is not limited to only 2d pooling and instead works for any number of dimensions.

See: https://paperswithcode.com/method/max-pooling

t = Tensor.arange(25).reshape(1, 1, 5, 5)
print(t.max_pool2d().numpy())
[[[[ 6  8]
   [16 18]]]]
print(t.max_pool2d(ceil_mode=True).numpy())
[[[[ 6  8  9]
   [16 18 19]
   [21 23 24]]]]
print(t.max_pool2d(padding=1).numpy())
[[[[ 0  2  4]
   [10 12 14]
   [20 22 24]]]]

Source code in tinygrad/tensor.py
2152
2153
2154
2155
2156
2157
2158
2159
2160
2161
2162
2163
2164
2165
2166
2167
2168
2169
2170
2171
2172
2173
2174
2175
2176
2177
2178
2179
2180
2181
2182
2183
2184
2185
2186
2187
def max_pool2d(self, kernel_size=(2,2), stride=None, dilation=1, padding=0, ceil_mode=False):
  """
  Applies max pooling over a tensor.

  This function supports three different types of `padding`

  1. `int` (single value):
    Applies the same padding value uniformly to all spatial dimensions.

  2. `Tuple[int, ...]` (length = number of spatial dimensions):
    Specifies a distinct padding value for each spatial dimension in the form `(padding_height, padding_width, ...)`.

  3. `Tuple[int, ...]` (length = 2 * number of spatial dimensions):
    Specifies explicit padding for each side of each spatial dimension in the form
    `(padding_left, padding_right, padding_top, padding_bottom, ...)`.

  When `ceil_mode` is set to `True`, output shape will be determined using ceil division.

  NOTE: unlike PyTorch, this implementation is not limited to only 2d pooling and instead works for any number of dimensions.

  See: https://paperswithcode.com/method/max-pooling

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.arange(25).reshape(1, 1, 5, 5)
  print(t.max_pool2d().numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.max_pool2d(ceil_mode=True).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.max_pool2d(padding=1).numpy())
  ```
  """
  pads = self._resolve_pool_pads(padding, len(k_ := make_tuple(kernel_size, 2)))
  if ceil_mode: pads = self._apply_ceil_mode(pads, k_, stride if stride is not None else k_, dilation)
  return self.pad(pads, value=dtypes.min(self.dtype))._pool(k_, stride if stride is not None else k_, dilation).max(tuple(range(-len(k_), 0)))

conv2d ¤

conv2d(
    weight: Tensor,
    bias: Optional[Tensor] = None,
    groups=1,
    stride=1,
    dilation=1,
    padding: int | tuple[int, ...] = 0,
    acc_dtype: Optional[DTypeLike] = None,
) -> Tensor

Applies a convolution over a tensor with a given weight and optional bias.

This function supports three different types of padding

  1. int (single value): Applies the same padding value uniformly to all spatial dimensions.

  2. Tuple[int, ...] (length = number of spatial dimensions): Specifies a distinct padding value for each spatial dimension in the form (padding_height, padding_width, ...).

  3. Tuple[int, ...] (length = 2 * number of spatial dimensions): Specifies explicit padding for each side of each spatial dimension in the form (padding_left, padding_right, padding_top, padding_bottom, ...).

Note

unlike PyTorch, this implementation is not limited to only 2d convolutions and instead works for any number of dimensions.

See: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html

t = Tensor.arange(9).reshape(1, 1, 3, 3)
w = Tensor.ones(1, 1, 2, 2)
print(t.conv2d(w).numpy())
[[[[ 8. 12.]
   [20. 24.]]]]
Source code in tinygrad/tensor.py
2189
2190
2191
2192
2193
2194
2195
2196
2197
2198
2199
2200
2201
2202
2203
2204
2205
2206
2207
2208
2209
2210
2211
2212
2213
2214
2215
2216
2217
2218
2219
2220
2221
2222
2223
2224
2225
2226
2227
2228
2229
2230
2231
2232
2233
2234
2235
2236
2237
2238
2239
2240
2241
2242
2243
2244
2245
2246
2247
2248
2249
2250
2251
2252
2253
2254
2255
2256
2257
2258
2259
2260
2261
def conv2d(self, weight:Tensor, bias:Optional[Tensor]=None, groups=1, stride=1, dilation=1, padding:int|tuple[int, ...]=0,
           acc_dtype:Optional[DTypeLike]=None) -> Tensor:
  """
  Applies a convolution over a tensor with a given `weight` and optional `bias`.

  This function supports three different types of `padding`

  1. `int` (single value):
    Applies the same padding value uniformly to all spatial dimensions.

  2. `Tuple[int, ...]` (length = number of spatial dimensions):
    Specifies a distinct padding value for each spatial dimension in the form `(padding_height, padding_width, ...)`.

  3. `Tuple[int, ...]` (length = 2 * number of spatial dimensions):
    Specifies explicit padding for each side of each spatial dimension in the form
    `(padding_left, padding_right, padding_top, padding_bottom, ...)`.

  NOTE: unlike PyTorch, this implementation is not limited to only 2d convolutions and instead works for any number of dimensions.

  See: https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.arange(9).reshape(1, 1, 3, 3)
  w = Tensor.ones(1, 1, 2, 2)
  print(t.conv2d(w).numpy())
  ```
  """
  if IMAGE: return self.image_conv2d(weight, bias, groups, stride, dilation, padding, acc_dtype)
  (bs,cin_), (cout,cin), HW = self.shape[:2], weight.shape[:2], weight.shape[2:]
  padding_ = self._resolve_pool_pads(padding, len(HW))
  assert groups*cin == cin_ and len(self.shape) == len(weight.shape), f"Input Tensor shape {self.shape} does not match the shape of the weights {weight.shape}. ({groups*cin} vs. {cin_})"  # noqa: E501

  # conv2d is a pooling op (with padding)
  x = self.pad(padding_)._pool(HW, stride, dilation)   # (bs, groups*cin, oy, ox, H, W)
  rcout, oyx = cout//groups, x.shape[2:-len(HW)]
  if not all(x == 3 for x in HW) or stride != 1 or dilation != 1 or not WINO:
    # normal conv
    x = x.reshape(bs, groups, cin, 1, *oyx, *HW).expand(bs, groups, cin, rcout, *oyx, *HW).permute(0,1,3,*[4+i for i in range(len(oyx))],2,*[4+len(oyx)+i for i in range(len(HW))])  # noqa: E501

    # conv! broadcasted to (bs, groups, rcout, *oyx, cin, *HW)
    ret = (x * weight.reshape(1, groups, rcout, *[1] * len(oyx), cin, *HW)).sum([-1-i for i in range(1+len(oyx))], keepdim=True, acc_dtype=acc_dtype).reshape(bs, cout, *oyx)  # noqa: E501
    return ret if bias is None else ret.add(bias.reshape(1, -1, *[1] * len(HW)))

  HWI, HWO = (6,) * len(HW), (4,) * len(HW)  # F(4x4,3x3) winograd tiles
  winograd_G = [[1/4, 0, 0], [-1/6, -1/6, -1/6], [-1/6, 1/6, -1/6], [1/24, 1/12, 1/6], [1/24, -1/12, 1/6], [0, 0, 1]]
  winograd_Bt = [[4, 0, -5, 0, 1, 0], [0, -4, -4, 1, 1, 0], [0, 4, -4, -1, 1, 0], [0, -2, -1, 2, 1, 0], [0, 2, -1, -2, 1, 0], [0, 4, 0, -5, 0, 1]]
  winograd_At = [[1, 1, 1, 1, 1, 0], [0, 1, -1, 2, -2, 0], [0, 1, 1, 4, 4, 0], [0, 1, -1, 8, -8, 1]] # applying At in pre-order doubles compile time

  # todo: stride == dilation
  # use padding to round up to 4x4 output tiles
  # (bs, cin_, tyx, HWI)
  d = self.pad(sum([[padding_[i*2], padding_[i*2+1] + (-(dim + sum(padding_[i * 2:(i + 1) * 2]) - 2) % 4)] for i, dim in enumerate(self.shape[-len(HW):])], []))._pool(HWI, HWO)  # noqa: E501
  # move HW to the front: # (HWI, bs, cin_, tyx)
  d = d.permute(*range(len(d.shape)-len(HW),len(d.shape)), *range(len(d.shape)-len(HW)))
  tyx = d.shape[-len(HWI):]  # dim of tiling

  g = weight.permute(*range(len(weight.shape)-len(HW),len(weight.shape)), *range(len(weight.shape)-len(HW)))  # move HW to the front

  # compute 6x6 winograd tiles: GgGt, BtdB
  # (HWI, groups * rcout, cin) -> (HWI, bs=1, groups, rcout, cin, tyx=(1,1))
  gfactors = _apply_winograd_matrix(winograd_G, g, len(HW)).reshape(*HWI, 1, groups, rcout, cin, *([1]*len(tyx)))
  # (HWI, bs, cin_, tyx) -> (HWI, bs, groups, 1 ,cin, *tyx)
  dfactors = _apply_winograd_matrix(winograd_Bt, d, len(HW)).reshape(*HWI, bs, groups, 1, cin, *tyx)

  # matmul; sum across cin: (HWI, bs, groups, rcout, *tyx); then HWI -> HWO: (HWO, bs, groups, rcout, *tyx)
  ret = _apply_winograd_matrix(winograd_At, (gfactors * dfactors).sum(axis=-1-len(HW), acc_dtype=acc_dtype), len(HW))

  # interleave tyx and HWO: (bs, groups, rcout, oy, HO, ox, WO)
  ret = ret.permute([*range(len(HW), len(ret.shape)-len(HW)), *[i+o for i in range(len(HW)) for o in [len(ret.shape)-len(HW),0]]])
  # merge groups and rcout, tyx and HWO: (bs, groups, cout, *yx), shrink to final
  ret = ret.reshape(bs, cout, *[c * HWO[i] for i, c in enumerate(tyx)]).shrink(tuple((0, s) for s in [bs, cout, *oyx]))

  return (ret if bias is None else ret.add(bias.reshape(1, -1, *[1 for _ in range(len(HW))]))).contiguous().contiguous_backward()

conv_transpose2d ¤

conv_transpose2d(
    weight: Tensor,
    bias: Optional[Tensor] = None,
    groups=1,
    stride=1,
    dilation=1,
    padding=0,
    output_padding=0,
) -> Tensor

Applies a transposed convolution over a tensor with a given weight and optional bias.

This function supports three different types of padding

  1. int (single value): Applies the same padding value uniformly to all spatial dimensions.

  2. Tuple[int, ...] (length = number of spatial dimensions): Specifies a distinct padding value for each spatial dimension in the form (padding_height, padding_width, ...).

  3. Tuple[int, ...] (length = 2 * number of spatial dimensions): Specifies explicit padding for each side of each spatial dimension in the form (padding_left, padding_right, padding_top, padding_bottom, ...).

Note

unlike PyTorch, this implementation is not limited to only 2d transposed convolutions and instead works for any number of dimensions.

See: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html

t = Tensor.arange(9).reshape(1, 1, 3, 3)
w = Tensor.ones(1, 1, 2, 2)
print(t.conv_transpose2d(w).numpy())
[[[[ 0.  1.  3.  2.]
   [ 3.  8. 12.  7.]
   [ 9. 20. 24. 13.]
   [ 6. 13. 15.  8.]]]]
Source code in tinygrad/tensor.py
2263
2264
2265
2266
2267
2268
2269
2270
2271
2272
2273
2274
2275
2276
2277
2278
2279
2280
2281
2282
2283
2284
2285
2286
2287
2288
2289
2290
2291
2292
2293
2294
2295
2296
2297
2298
2299
2300
def conv_transpose2d(self, weight:Tensor, bias:Optional[Tensor]=None, groups=1, stride=1, dilation=1, padding=0, output_padding=0) -> Tensor:
  """
  Applies a transposed convolution over a tensor with a given `weight` and optional `bias`.

  This function supports three different types of `padding`

  1. `int` (single value):
    Applies the same padding value uniformly to all spatial dimensions.

  2. `Tuple[int, ...]` (length = number of spatial dimensions):
    Specifies a distinct padding value for each spatial dimension in the form `(padding_height, padding_width, ...)`.

  3. `Tuple[int, ...]` (length = 2 * number of spatial dimensions):
    Specifies explicit padding for each side of each spatial dimension in the form
    `(padding_left, padding_right, padding_top, padding_bottom, ...)`.

  NOTE: unlike PyTorch, this implementation is not limited to only 2d transposed convolutions and instead works for any number of dimensions.

  See: https://pytorch.org/docs/stable/generated/torch.nn.ConvTranspose2d.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.arange(9).reshape(1, 1, 3, 3)
  w = Tensor.ones(1, 1, 2, 2)
  print(t.conv_transpose2d(w).numpy())
  ```
  """
  x, w = self, weight.unflatten(0, (groups, -1)).transpose(1, 2).flip(*range(3, len(weight.shape)+1))
  HW = weight.shape[2:]
  padding = _flat_to_grouped(self._resolve_pool_pads(padding, len(HW)))
  stride, dilation, output_padding = [make_tuple(x, len(HW)) for x in (stride, dilation, output_padding)]
  if any(s>1 for s in stride):
    # handle strides: (k) -> reshape -> (k,1) -> pad -> (k,s) -> reshape -> (k*s) -> shrink (k-(s-1))
    x = x.reshape(None, None, *flatten((k,1) for k in x.shape[2:]))
    x = x.pad((None, None, *flatten((None,(0,s-1)) for s in stride)))
    x = x.reshape(None, None, *[k*s for k,s in zip(x.shape[2::2], stride)])
    x = x.shrink((None, None, *[(0,k-(s-1)) for k,s in zip(x.shape[2:], stride)]))
  padding = flatten((((k-1)*d-pB,(k-1)*d-pA+op) for k,d,(pB,pA),op in reversed(list(zip(HW, dilation, padding, output_padding)))))
  return x.conv2d(w.flatten(end_dim=1), groups=groups, bias=bias, dilation=dilation, padding=padding)

dot ¤

dot(
    w: Tensor, acc_dtype: Optional[DTypeLike] = None
) -> Tensor

Performs dot product between two tensors. If w is 1-D, it's a sum product over the last axis of self and w. If w is N-D with N>=2, it's a sum product over the last axis of self and the second-to-last axis of w.

You can pass in the optional acc_dtype keyword argument to control the data type of the accumulation.

a = Tensor([1, 2, 3])
b = Tensor([1, 1, 0])
print(a.dot(b).numpy())
3
a = Tensor([[1, 2], [3, 4]])
b = Tensor([[5, 6], [7, 8]])
print(a.dot(b).numpy())
[[19 22]
 [43 50]]

Source code in tinygrad/tensor.py
2302
2303
2304
2305
2306
2307
2308
2309
2310
2311
2312
2313
2314
2315
2316
2317
2318
2319
2320
2321
2322
2323
2324
2325
2326
2327
2328
def dot(self, w:Tensor, acc_dtype:Optional[DTypeLike]=None) -> Tensor:

  """
  Performs dot product between two tensors.
  If `w` is 1-D, it's a sum product over the last axis of `self` and `w`.
  If `w` is N-D with N>=2, it's a sum product over the last axis of `self` and the second-to-last axis of `w`.

  You can pass in the optional `acc_dtype` keyword argument to control the data type of the accumulation.

  ```python exec="true" source="above" session="tensor" result="python"
  a = Tensor([1, 2, 3])
  b = Tensor([1, 1, 0])
  print(a.dot(b).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  a = Tensor([[1, 2], [3, 4]])
  b = Tensor([[5, 6], [7, 8]])
  print(a.dot(b).numpy())
  ```
  """
  if IMAGE: return self.image_dot(w, acc_dtype)
  x, dx, dw = self, self.ndim, w.ndim
  if not (dx > 0 and dw > 0): raise RuntimeError(f"both tensors need to be at least 1D, got {dx}D and {dw}D")
  if x.shape[-1] != w.shape[axis_w:=-min(w.ndim,2)]: raise RuntimeError(f"cannot dot {x.shape} and {w.shape}")
  x = x.reshape(*x.shape[0:-1], *[1]*min(dx-1, dw-1, 1), x.shape[-1])
  w = w.reshape(*w.shape[0:-2], *[1]*min(dx-1, dw-1, 1), *w.shape[axis_w:]).transpose(-1, axis_w)
  return (x*w).sum(-1, acc_dtype=acc_dtype).cast(least_upper_dtype(x.dtype, w.dtype) if acc_dtype is None else acc_dtype)

matmul ¤

matmul(
    x: Tensor,
    reverse=False,
    acc_dtype: Optional[DTypeLike] = None,
) -> Tensor

Performs matrix multiplication between two tensors.

You can pass in the reverse keyword argument to control the order of the matrix multiplication. You can pass in the optional acc_dtype keyword argument to control the data type of the accumulation.

a = Tensor([[1, 2], [3, 4]])
b = Tensor([[5, 6], [7, 8]])
print(a.matmul(b).numpy())
[[19 22]
 [43 50]]
Source code in tinygrad/tensor.py
2330
2331
2332
2333
2334
2335
2336
2337
2338
2339
2340
2341
2342
2343
def matmul(self, x:Tensor, reverse=False, acc_dtype:Optional[DTypeLike]=None) -> Tensor:
  """
  Performs matrix multiplication between two tensors.

  You can pass in the `reverse` keyword argument to control the order of the matrix multiplication.
  You can pass in the optional `acc_dtype` keyword argument to control the data type of the accumulation.

  ```python exec="true" source="above" session="tensor" result="python"
  a = Tensor([[1, 2], [3, 4]])
  b = Tensor([[5, 6], [7, 8]])
  print(a.matmul(b).numpy())
  ```
  """
  return x.dot(self, acc_dtype=acc_dtype) if reverse else self.dot(x, acc_dtype=acc_dtype)

einsum staticmethod ¤

einsum(
    formula: str,
    *operands: Tensor | Sequence[Tensor],
    acc_dtype: Optional[DTypeLike] = None
) -> Tensor

Sums the product of the elements of the input tensors according to a formula based on the Einstein summation convention.

See: https://pytorch.org/docs/stable/generated/torch.einsum.html

x = Tensor([[1, 2], [3, 4]])
y = Tensor([[5, 6], [7, 8]])
print(Tensor.einsum("ij,ij->", x, y).numpy())
70
Source code in tinygrad/tensor.py
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
2026
2027
2028
2029
2030
2031
2032
2033
2034
2035
2036
2037
2038
2039
2040
2041
2042
2043
2044
2045
2046
2047
2048
2049
2050
2051
2052
2053
2054
2055
2056
2057
@staticmethod
def einsum(formula:str, *operands:Tensor|Sequence[Tensor], acc_dtype:Optional[DTypeLike]=None) -> Tensor:
  """
  Sums the product of the elements of the input tensors according to a formula based on the Einstein summation convention.

  See: https://pytorch.org/docs/stable/generated/torch.einsum.html

  ```python exec="true" source="above" session="tensor" result="python"
  x = Tensor([[1, 2], [3, 4]])
  y = Tensor([[5, 6], [7, 8]])
  print(Tensor.einsum("ij,ij->", x, y).numpy())
  ```
  """
  def parse_formula(formula:str, *operands:Tensor):
    if "..." in (formula := formula.replace(" ", "")):
      ell_chars, ell_longest = "".join(set(string.ascii_letters) - set(formula)), 0
      for i, inp in enumerate(filter(lambda x: "..." in x, inputs := formula.split("->")[0].split(","))):
        if (ell_count := max(operands[i].ndim, 1) - (len(inp) - len("..."))) > ell_longest: ell_longest = ell_count
        inputs[i] = inp.replace("...", ell_chars[-ell_count:])
      inputs_str, out_ellipse = ",".join(inputs), ell_chars[-ell_longest:]
      return (inputs_str, formula.split("->")[1].replace("...", out_ellipse)) if "->" in formula else \
        (inputs_str, out_ellipse + ''.join(sorted(c for c in inputs_str if inputs_str.count(c) == 1 and c.isalpha() and c not in out_ellipse)))
    return formula.split("->") if "->" in formula else (formula, ''.join(c for c in sorted(formula) if formula.count(c) == 1 and c.isalpha()))

  xs:tuple[Tensor, ...] = argfix(*operands)
  inputs_str, output = parse_formula(formula, *xs)
  inputs = inputs_str.split(",")
  assert len(xs) == len(inputs), f"number of inputs doesn't match number of operands in formula, expected {len(inputs)}, got {len(xs)}"

  # map the value of each letter in the formula
  letter_val = sorted(merge_dicts([dict(zip(letters, tensor.shape)) for letters, tensor in zip(inputs, xs)]).items())

  xs_:list[Tensor] = []
  lhs = [sorted(enumerate(s), key=lambda e:e[1]) for s in inputs]
  for x,(order,letters) in zip(xs, [list(zip(*l)) for l in lhs]):
    # permute to the sorted letter order, then reshape/expand to create dimensions for the missing letters
    xs_.append(x.permute(order).reshape([val if letter in letters else 1 for letter,val in letter_val]).expand([val for _,val in letter_val]))

  # ordinal encode the output alphabet
  rhs_order = argsort(argsort(list(output)))

  # sum over all axes that's not in the output, then permute to the output order
  return functools.reduce(lambda a,b:a*b, xs_) \
    .sum(axis=[axis for axis,(letter,_) in enumerate(letter_val) if letter not in output], acc_dtype=acc_dtype).permute(rhs_order)

cumsum ¤

cumsum(axis: int = 0) -> Tensor

Computes the cumulative sum of the tensor along the specified axis.

t = Tensor.ones(2, 3)
print(t.numpy())
[[1. 1. 1.]
 [1. 1. 1.]]
print(t.cumsum(1).numpy())
[[1. 2. 3.]
 [1. 2. 3.]]

Source code in tinygrad/tensor.py
2364
2365
2366
2367
2368
2369
2370
2371
2372
2373
2374
2375
2376
def cumsum(self, axis:int=0) -> Tensor:
  """
  Computes the cumulative sum of the tensor along the specified `axis`.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.ones(2, 3)
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.cumsum(1).numpy())
  ```
  """
  return self._split_cumalu(axis, Ops.ADD)

cummax ¤

cummax(axis: int = 0) -> Tensor

Computes the cumulative max of the tensor along the specified axis.

t = Tensor([0, 1, -1, 2, -2, 3, -3])
print(t.numpy())
[ 0  1 -1  2 -2  3 -3]
print(t.cummax(0).numpy())
[0 1 1 2 2 3 3]

Source code in tinygrad/tensor.py
2378
2379
2380
2381
2382
2383
2384
2385
2386
2387
2388
2389
2390
def cummax(self, axis:int=0) -> Tensor:
  """
  Computes the cumulative max of the tensor along the specified `axis`.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([0, 1, -1, 2, -2, 3, -3])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.cummax(0).numpy())
  ```
  """
  return self._split_cumalu(axis, Ops.MAX)

triu ¤

triu(diagonal: int = 0) -> Tensor

Returns the upper triangular part of the tensor, the other elements are set to 0.

The argument diagonal determines which diagonal is on the boundary. diagonal = 0 means the main diagonal. Positive diagonal means above the main diagonal, and negative diagonal means below the main diagonal.

t = Tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(t.numpy())
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
print(t.triu(diagonal=0).numpy())
[[ 1  2  3  4]
 [ 0  6  7  8]
 [ 0  0 11 12]]
print(t.triu(diagonal=1).numpy())
[[ 0  2  3  4]
 [ 0  0  7  8]
 [ 0  0  0 12]]
print(t.triu(diagonal=-1).numpy())
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 0 10 11 12]]

Source code in tinygrad/tensor.py
2402
2403
2404
2405
2406
2407
2408
2409
2410
2411
2412
2413
2414
2415
2416
2417
2418
2419
2420
2421
2422
2423
def triu(self, diagonal:int=0) -> Tensor:
  """
  Returns the upper triangular part of the tensor, the other elements are set to 0.

  The argument `diagonal` determines which diagonal is on the boundary. `diagonal = 0` means the main diagonal.
  Positive `diagonal` means above the main diagonal, and negative `diagonal` means below the main diagonal.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.triu(diagonal=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.triu(diagonal=1).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.triu(diagonal=-1).numpy())
  ```
  """
  return Tensor._tri(self.shape[-2], self.shape[-1], diagonal=diagonal, device=self.device, dtype=dtypes.bool).where(self, 0).cast(self.dtype)

tril ¤

tril(diagonal: int = 0) -> Tensor

Returns the lower triangular part of the tensor, the other elements are set to 0.

The argument diagonal determines which diagonal is on the boundary. diagonal = 0 means the main diagonal. Positive diagonal means above the main diagonal, and negative diagonal means below the main diagonal.

t = Tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
print(t.numpy())
[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]
print(t.tril(diagonal=0).numpy())
[[ 1  0  0  0]
 [ 5  6  0  0]
 [ 9 10 11  0]]
print(t.tril(diagonal=1).numpy())
[[ 1  2  0  0]
 [ 5  6  7  0]
 [ 9 10 11 12]]
print(t.tril(diagonal=-1).numpy())
[[ 0  0  0  0]
 [ 5  0  0  0]
 [ 9 10  0  0]]

Source code in tinygrad/tensor.py
2425
2426
2427
2428
2429
2430
2431
2432
2433
2434
2435
2436
2437
2438
2439
2440
2441
2442
2443
2444
2445
2446
def tril(self, diagonal:int=0) -> Tensor:
  """
  Returns the lower triangular part of the tensor, the other elements are set to 0.

  The argument `diagonal` determines which diagonal is on the boundary. `diagonal = 0` means the main diagonal.
  Positive `diagonal` means above the main diagonal, and negative `diagonal` means below the main diagonal.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 2, 3, 4], [5, 6, 7, 8], [9, 10, 11, 12]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.tril(diagonal=0).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.tril(diagonal=1).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.tril(diagonal=-1).numpy())
  ```
  """
  return Tensor._tri(self.shape[-2], self.shape[-1], diagonal=diagonal+1, device=self.device, dtype=dtypes.bool).where(0, self).cast(self.dtype)

interpolate ¤

interpolate(
    size: tuple[int, ...],
    mode: str = "linear",
    align_corners: bool = False,
) -> Tensor

Downsamples or Upsamples to the input size, accepts 0 to N batch dimensions.

The interpolation algorithm is selected with mode which currently only supports linear, nearest and nearest-exact. To run bilinear or trilinear, pass in a 2D or 3D size.

t = Tensor([[1, 2, 3, 4], [21, 22, 23, 24], [41, 42, 43, 44]])
print(t.numpy())
[[ 1  2  3  4]
 [21 22 23 24]
 [41 42 43 44]]
print(t.interpolate(size=(2,3), mode="linear").numpy())
[[ 6  7  8]
 [36 37 38]]

Source code in tinygrad/tensor.py
2448
2449
2450
2451
2452
2453
2454
2455
2456
2457
2458
2459
2460
2461
2462
2463
2464
2465
2466
2467
2468
2469
2470
2471
2472
2473
2474
2475
2476
2477
2478
def interpolate(self, size:tuple[int, ...], mode:str="linear", align_corners:bool=False) -> Tensor:
  """
  Downsamples or Upsamples to the input `size`, accepts 0 to N batch dimensions.

  The interpolation algorithm is selected with `mode` which currently only supports `linear`, `nearest` and `nearest-exact`.
  To run `bilinear` or `trilinear`, pass in a 2D or 3D size.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 2, 3, 4], [21, 22, 23, 24], [41, 42, 43, 44]])
  print(t.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(t.interpolate(size=(2,3), mode="linear").numpy())
  ```
  """
  assert isinstance(size, (tuple,list)) and all_int(size) and 0 < len(size) <= self.ndim, f"invalid {size=}"
  assert mode in ("linear", "nearest", "nearest-exact"), "only supports linear, nearest or nearest-exact interpolate"
  assert not (align_corners and mode != "linear"), "align_corners option can only be set with the interpolating mode linear"
  x, expand = self, list(self.shape)
  for i in range(-1,-len(size)-1,-1):
    scale = (self.shape[i] - int(align_corners)) / (size[i] - int(align_corners))
    arr, reshape = Tensor.arange(size[i], dtype=dtypes.float32, device=self.device), [1] * self.ndim
    reshape[i] = expand[i] = size[i]
    if mode == "linear":
      index = (scale*arr if align_corners else (scale*(arr+0.5))-0.5).clip(0, self.shape[i]-1)
      low, high, perc = [y.reshape(reshape).expand(expand) for y in (index.floor(), index.ceil(), index - index.floor())]
      x = x.gather(i, low).lerp(x.gather(i, high), perc)
    else:
      index = (scale*(arr+0.5) if mode=="nearest-exact" else scale*arr).cast(dtypes.int32).reshape(reshape).expand(expand)
      x = x.gather(i, index)
  return x.cast(self.dtype)

scatter ¤

scatter(
    dim: int,
    index: Tensor,
    src: Union[Tensor, ConstType],
    reduce: Union[
        None, Literal["multiply"], Literal["add"]
    ] = None,
) -> Tensor

Scatters src values along an axis specified by dim. Apply add or multiply reduction operation with reduce.

src = Tensor.arange(1, 11).reshape(2, 5)
print(src.numpy())
[[ 1  2  3  4  5]
 [ 6  7  8  9 10]]
index = Tensor([[0, 1, 2, 0]])
print(Tensor.zeros(3, 5, dtype=src.dtype).scatter(0, index, src).numpy())
[[1 0 0 4 0]
 [0 2 0 0 0]
 [0 0 3 0 0]]
index = Tensor([[0, 1, 2], [0, 1, 4]])
print(Tensor.zeros(3, 5, dtype=src.dtype).scatter(1, index, src).numpy())
[[1 2 3 0 0]
 [6 7 0 0 8]
 [0 0 0 0 0]]
print(Tensor.full((2, 4), 2.0).scatter(1, Tensor([[2], [3]]), 1.23, reduce='multiply').numpy())
[[2.   2.   2.46 2.  ]
 [2.   2.   2.   2.46]]
print(Tensor.full((2, 4), 2.0).scatter(1, Tensor([[2], [3]]), 1.23, reduce='add').numpy())
[[2.   2.   3.23 2.  ]
 [2.   2.   2.   3.23]]

Source code in tinygrad/tensor.py
2480
2481
2482
2483
2484
2485
2486
2487
2488
2489
2490
2491
2492
2493
2494
2495
2496
2497
2498
2499
2500
2501
2502
2503
2504
2505
2506
2507
2508
2509
2510
2511
2512
2513
2514
2515
2516
2517
2518
2519
def scatter(self, dim:int, index:Tensor, src:Union[Tensor, ConstType], reduce:Union[None, Literal['multiply'], Literal['add']]=None) -> Tensor:
  """
  Scatters `src` values along an axis specified by `dim`.
  Apply `add` or `multiply` reduction operation with `reduce`.

  ```python exec="true" source="above" session="tensor" result="python"
  src = Tensor.arange(1, 11).reshape(2, 5)
  print(src.numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  index = Tensor([[0, 1, 2, 0]])
  print(Tensor.zeros(3, 5, dtype=src.dtype).scatter(0, index, src).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  index = Tensor([[0, 1, 2], [0, 1, 4]])
  print(Tensor.zeros(3, 5, dtype=src.dtype).scatter(1, index, src).numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor.full((2, 4), 2.0).scatter(1, Tensor([[2], [3]]), 1.23, reduce='multiply').numpy())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  print(Tensor.full((2, 4), 2.0).scatter(1, Tensor([[2], [3]]), 1.23, reduce='add').numpy())
  ```
  """
  if reduce not in {None, "add", "multiply"}: raise TypeError(f"{reduce=} must be one of None, 'multiply', or 'add'")
  index, dim = index.to(self.device), self._resolve_dim(dim)
  src = src.cast(self.dtype) if isinstance(src, Tensor) else Tensor(src, device=self.device, dtype=self.dtype)._broadcast_to(index.shape)
  assert index.ndim == self.ndim == src.ndim, f"self.ndim, index.ndim and src.dim must all equal, {self.ndim=} {index.ndim=} {src.ndim=}"
  assert all((d == dim or self_ >= index_) and src_ >= index_ for d,(self_,index_,src_) in enumerate(zip(self.shape, index.shape, src.shape))), \
    f"All dimensions of {index.shape=} should be <= to all dimensions of {src.shape=} and all dimensions except dimension {dim} of {self.shape=}"
  # shrink src to index shape to shrink away the unused values
  src = src.shrink(tuple((0,s) for s in index.shape))
  # prepare src and mask for reduce with respect to dim
  src = src.unsqueeze(-1).expand(*src.shape, self.shape[dim]).transpose(-1, dim)
  mask = index.unsqueeze(-1)._one_hot_along_dim(self.shape[dim]).transpose(-1, dim)
  # pad src and mask to self.shape so that reduce can be done with padded values as no-ops
  src, mask = (x.pad(tuple((0, self.shape[i] - x.shape[i]) if i != dim else None for i in range(self.ndim)) + (None,)) for x in (src, mask))
  if reduce == "add": return mask.where(src, 0).sum(-1, acc_dtype=self.dtype) + self
  if reduce == "multiply": return mask.where(src, 1).prod(-1, acc_dtype=self.dtype) * self
  return _masked_setitem(self, src, mask, (-1,))

Neural Network (functional)¤

linear ¤

linear(weight: Tensor, bias: Optional[Tensor] = None)

Applies a linear transformation to self using weight and bias.

See: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

t = Tensor([[1, 2], [3, 4]])
weight = Tensor([[1, 2], [3, 4]])
bias = Tensor([1, 2])
print(t.linear(weight, bias).numpy())
[[ 8 12]
 [16 24]]
Source code in tinygrad/tensor.py
3463
3464
3465
3466
3467
3468
3469
3470
3471
3472
3473
3474
3475
3476
3477
def linear(self, weight:Tensor, bias:Optional[Tensor]=None):
  """
  Applies a linear transformation to `self` using `weight` and `bias`.

  See: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[1, 2], [3, 4]])
  weight = Tensor([[1, 2], [3, 4]])
  bias = Tensor([1, 2])
  print(t.linear(weight, bias).numpy())
  ```
  """
  x = self.mul(weight) if len(weight.shape) == 1 else self.dot(weight)
  return x.add(bias) if bias is not None else x

sequential ¤

sequential(ll: list[Callable[[Tensor], Tensor]])

Applies a sequence of functions to self chaining the output of each function to the input of the next.

t = Tensor([1, 2, 3])
print(t.sequential([lambda x: x * 2, lambda x: x + 1]).numpy())
[3 5 7]
Source code in tinygrad/tensor.py
3479
3480
3481
3482
3483
3484
3485
3486
3487
3488
def sequential(self, ll:list[Callable[[Tensor], Tensor]]):
  """
  Applies a sequence of functions to `self` chaining the output of each function to the input of the next.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([1, 2, 3])
  print(t.sequential([lambda x: x * 2, lambda x: x + 1]).numpy())
  ```
  """
  return functools.reduce(lambda x,f: f(x), ll, self)

layernorm ¤

layernorm(
    axis: Union[int, tuple[int, ...]] = -1,
    eps: float = 1e-05,
) -> Tensor

Applies Layer Normalization over a mini-batch of inputs.

t = Tensor.randn(8, 10, 16) * 2 + 8
print(t.mean().item(), t.std().item())
7.923057556152344 2.0072731971740723
t = t.layernorm()
print(t.mean().item(), t.std().item())
-2.184478153921532e-09 1.0003893375396729

Source code in tinygrad/tensor.py
3490
3491
3492
3493
3494
3495
3496
3497
3498
3499
3500
3501
3502
3503
3504
3505
3506
3507
def layernorm(self, axis:Union[int,tuple[int,...]]=-1, eps:float=1e-5) -> Tensor:
  """
  Applies Layer Normalization over a mini-batch of inputs.

  - Described: https://paperswithcode.com/method/layer-normalization
  - Paper: https://arxiv.org/abs/1607.06450v1

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.randn(8, 10, 16) * 2 + 8
  print(t.mean().item(), t.std().item())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  t = t.layernorm()
  print(t.mean().item(), t.std().item())
  ```
  """
  y = (self - self.mean(axis, keepdim=True))
  return y.mul((y*y).mean(axis, keepdim=True).add(eps).rsqrt())

batchnorm ¤

batchnorm(
    weight: Optional[Tensor],
    bias: Optional[Tensor],
    mean: Tensor,
    invstd: Tensor,
    axis: Union[int, tuple[int, ...]] = 1,
) -> Tensor

Applies Batch Normalization over a mini-batch of inputs.

t = Tensor.randn(8, 4, 16, 16) * 2 + 8
print(t.mean().item(), t.std().item())
8.030435562133789 1.9699469804763794
t = t.batchnorm(None, None, t.mean(axis=(0,2,3)), t.var(axis=(0,2,3)).add(1e-5).rsqrt())
print(t.mean().item(), t.std().item())
1.7121278688136954e-06 0.9998164176940918

Source code in tinygrad/tensor.py
3509
3510
3511
3512
3513
3514
3515
3516
3517
3518
3519
3520
3521
3522
3523
3524
3525
3526
3527
3528
3529
3530
def batchnorm(self, weight:Optional[Tensor], bias:Optional[Tensor], mean:Tensor, invstd:Tensor, axis:Union[int,tuple[int,...]]=1) -> Tensor:
  """
  Applies Batch Normalization over a mini-batch of inputs.

  - Described: https://paperswithcode.com/method/batch-normalization
  - Paper: https://arxiv.org/abs/1502.03167

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor.randn(8, 4, 16, 16) * 2 + 8
  print(t.mean().item(), t.std().item())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  t = t.batchnorm(None, None, t.mean(axis=(0,2,3)), t.var(axis=(0,2,3)).add(1e-5).rsqrt())
  print(t.mean().item(), t.std().item())
  ```
  """
  axis_ = argfix(axis)
  shape = tuple(s if ax in axis_ else 1 for ax, s in enumerate(self.shape))
  x = self - mean.reshape(shape)
  if weight is not None: x = x * weight.reshape(shape)
  ret = x.mul(invstd.reshape(shape) if len(invstd.shape) == len(axis_) else invstd)
  return (ret + bias.reshape(shape)) if bias is not None else ret

dropout ¤

dropout(p=0.5) -> Tensor

Applies dropout to self.

Note

dropout is only applied when Tensor.training is True.

Tensor.manual_seed(42)
t = Tensor.randn(2, 2)
with Tensor.train():
  print(t.dropout().numpy())
[[ 0.      2.17  ]
 [ 0.     -0.1682]]
Source code in tinygrad/tensor.py
3532
3533
3534
3535
3536
3537
3538
3539
3540
3541
3542
3543
3544
3545
3546
3547
3548
3549
def dropout(self, p=0.5) -> Tensor:
  """
  Applies dropout to `self`.

  NOTE: dropout is only applied when `Tensor.training` is `True`.

  - Described: https://paperswithcode.com/method/dropout
  - Paper: https://jmlr.org/papers/v15/srivastava14a.html

  ```python exec="true" source="above" session="tensor" result="python"
  Tensor.manual_seed(42)
  t = Tensor.randn(2, 2)
  with Tensor.train():
    print(t.dropout().numpy())
  ```
  """
  if not Tensor.training or p == 0: return self
  return (Tensor.rand_like(self, requires_grad=False, dtype=dtypes.default_float, contiguous=False) >= p).contiguous().where(self, 0) / (1.0 - p)

one_hot ¤

one_hot(num_classes: int = -1) -> Tensor

Converts self to a one-hot tensor.

num_classes defaults to -1, which means num_classes will be inferred as max(self) + 1.

t = Tensor([0, 1, 3, 3, 4])
print(t.one_hot(5).numpy())
[[1 0 0 0 0]
 [0 1 0 0 0]
 [0 0 0 1 0]
 [0 0 0 1 0]
 [0 0 0 0 1]]
Source code in tinygrad/tensor.py
3556
3557
3558
3559
3560
3561
3562
3563
3564
3565
3566
3567
3568
def one_hot(self, num_classes:int=-1) -> Tensor:
  """
  Converts `self` to a one-hot tensor.

  `num_classes` defaults to -1, which means num_classes will be inferred as max(self) + 1.

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([0, 1, 3, 3, 4])
  print(t.one_hot(5).numpy())
  ```
  """
  if num_classes == -1: num_classes = (self.max()+1).item()
  return self[..., None]._one_hot_along_dim(num_classes).where(1, 0)

scaled_dot_product_attention ¤

scaled_dot_product_attention(
    key: Tensor,
    value: Tensor,
    attn_mask: Tensor | None = None,
    dropout_p: float = 0.0,
    is_causal: bool = False,
) -> Tensor

Computes scaled dot-product attention. self is the query tensor, key is the key tensor, and value is the value tensor.

q = Tensor.randn(2, 4, 8)
k = Tensor.randn(2, 4, 8)
v = Tensor.randn(2, 4, 8)
print(q.scaled_dot_product_attention(k, v).numpy())
[[[-0.1425 -0.1433 -0.3625  0.8853 -0.3129  1.0271 -0.0019  0.2445]
  [-0.7137  0.2617  1.1393  0.692   0.0461  0.1132  0.391  -0.3563]
  [ 0.4718  0.6791  0.8956  0.9387 -0.7198  0.753   0.5702  0.2661]
  [-1.0183  0.005   0.9208  0.6447  0.2658  0.0411  0.2314 -0.4636]]

 [[ 0.2928 -0.3364 -0.1937 -0.0755 -0.6196 -0.7339  0.8431 -0.3794]
  [ 0.5915  0.3565 -0.6987  0.241   0.2624 -0.1074 -0.3026 -0.3574]
  [ 0.3176 -0.4436 -0.3136 -0.5334 -0.5756 -0.851   0.9595 -0.4201]
  [ 0.4378  0.0234 -0.0984  0.4847 -0.3579 -0.3998  0.3781 -0.2338]]]
Source code in tinygrad/tensor.py
3570
3571
3572
3573
3574
3575
3576
3577
3578
3579
3580
3581
3582
3583
3584
3585
3586
3587
3588
3589
3590
3591
3592
3593
3594
3595
def scaled_dot_product_attention(self, key:Tensor, value:Tensor, attn_mask:Tensor|None=None, dropout_p:float=0.0, is_causal:bool=False) -> Tensor:
  """
  Computes scaled dot-product attention.
  `self` is the query tensor, `key` is the key tensor, and `value` is the value tensor.

  - Described: https://paperswithcode.com/method/scaled
  - Paper: https://arxiv.org/abs/1706.03762v7

  ```python exec="true" source="above" session="tensor" result="python"
  q = Tensor.randn(2, 4, 8)
  k = Tensor.randn(2, 4, 8)
  v = Tensor.randn(2, 4, 8)
  print(q.scaled_dot_product_attention(k, v).numpy())
  ```
  """
  # NOTE: it also works when `key` and `value` have symbolic shape.
  assert all_int(self.shape), f"does not support symbolic shape {self.shape}"
  qk = self.matmul(key.transpose(-2,-1), acc_dtype=least_upper_dtype(self.dtype, key.dtype, dtypes.float32)) / math.sqrt(self.shape[-1])
  # handle attention mask
  if is_causal:
    if attn_mask is not None: raise RuntimeError("cannot set attn_mask when is_causal=True")
    attn_mask = qk.ones_like(requires_grad=False, device=self.device, dtype=dtypes.bool).tril()
  if attn_mask is not None:
    if attn_mask.dtype == dtypes.bool: attn_mask = attn_mask.where(0, -float("inf"))
    qk = qk + attn_mask
  return qk.softmax(-1).cast(self.dtype).dropout(dropout_p) @ value

binary_crossentropy ¤

binary_crossentropy(
    Y: Tensor, reduction: ReductionStr = "mean"
) -> Tensor

Computes the binary cross-entropy loss between self and Y.

See: https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html

t = Tensor([0.1, 0.9, 0.2])
Y = Tensor([0, 1, 0])
print(t.binary_crossentropy(Y).item())
0.14462155103683472
Source code in tinygrad/tensor.py
3602
3603
3604
3605
3606
3607
3608
3609
3610
3611
3612
3613
3614
def binary_crossentropy(self, Y:Tensor, reduction:ReductionStr="mean") -> Tensor:
  """
  Computes the binary cross-entropy loss between `self` and `Y`.

  See: https://pytorch.org/docs/stable/generated/torch.nn.BCELoss.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([0.1, 0.9, 0.2])
  Y = Tensor([0, 1, 0])
  print(t.binary_crossentropy(Y).item())
  ```
  """
  return (-Y*self.log() - (1-Y)*(1-self).log())._do_reduction(reduction)

binary_crossentropy_logits ¤

binary_crossentropy_logits(
    Y: Tensor, reduction: ReductionStr = "mean"
) -> Tensor

Computes the binary cross-entropy loss between self and Y where self is logits.

See: https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html

t = Tensor([-1, 2, -3])
Y = Tensor([0, 1, 0])
print(t.binary_crossentropy_logits(Y).item())
0.16292567551136017
Source code in tinygrad/tensor.py
3616
3617
3618
3619
3620
3621
3622
3623
3624
3625
3626
3627
3628
def binary_crossentropy_logits(self, Y:Tensor, reduction:ReductionStr="mean") -> Tensor:
  """
  Computes the binary cross-entropy loss between `self` and `Y` where `self` is logits.

  See: https://pytorch.org/docs/stable/generated/torch.nn.BCEWithLogitsLoss.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([-1, 2, -3])
  Y = Tensor([0, 1, 0])
  print(t.binary_crossentropy_logits(Y).item())
  ```
  """
  return (self.maximum(0) - Y * self + (1 + self.abs().neg().exp()).log())._do_reduction(reduction)

sparse_categorical_crossentropy ¤

sparse_categorical_crossentropy(
    Y: Tensor,
    ignore_index: int = -1,
    label_smoothing=0.0,
    reduction: ReductionStr = "mean",
) -> Tensor

Computes the sparse categorical cross-entropy loss between self and Y.

Note

self is logits and Y is the target labels. NOTE: unlike PyTorch, this function expects the class axis to be -1

See: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

t = Tensor([[-1, 2, -3], [1, -2, 3]])
Y = Tensor([1, 2])
print(t.sparse_categorical_crossentropy(Y).item())
0.09391524642705917
Source code in tinygrad/tensor.py
3630
3631
3632
3633
3634
3635
3636
3637
3638
3639
3640
3641
3642
3643
3644
3645
3646
3647
3648
3649
3650
3651
3652
3653
def sparse_categorical_crossentropy(self, Y:Tensor, ignore_index:int=-1, label_smoothing=0.0, reduction:ReductionStr="mean") -> Tensor:
  """
  Computes the sparse categorical cross-entropy loss between `self` and `Y`.

  NOTE: `self` is logits and `Y` is the target labels.
  NOTE: unlike PyTorch, this function expects the class axis to be -1

  See: https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[-1, 2, -3], [1, -2, 3]])
  Y = Tensor([1, 2])
  print(t.sparse_categorical_crossentropy(Y).item())
  ```
  """
  assert 0.0 <= label_smoothing <= 1.0, "label_smoothing must be in [0.0, 1.0]"
  assert reduction in ("mean", "sum", "none"), "reduction must be one of ['mean', 'sum', 'none']"
  log_probs, loss_mask = self.log_softmax(), (Y != ignore_index) if ignore_index != -1 else Y.ones_like(dtype=dtypes.bool)
  y_counted = Y.to(self.device).flatten().reshape(-1, 1)._one_hot_along_dim(self.shape[-1])
  y = (y_counted * loss_mask.reshape(-1, 1)).reshape(*Y.shape, self.shape[-1])
  smoothing = label_smoothing * (log_probs.mean(-1) * loss_mask)
  unreduced = ((1 - label_smoothing) * (log_probs * y).sum(-1) + smoothing)
  # NOTE: because of ignore_index, we can't use Tensor.mean (so can't use `_do_reduction` here)
  return -(unreduced.sum() / loss_mask.sum() if reduction == "mean" else (unreduced.sum() if reduction == "sum" else unreduced))

cross_entropy ¤

cross_entropy(
    Y: Tensor,
    reduction: ReductionStr = "mean",
    label_smoothing: float = 0.0,
) -> Tensor

Compute the cross entropy loss between input logits and target.

Note

self are logits and Y are the target labels or class probabilities.

See: https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html

t = Tensor([[-1, 2, -3], [1, -2, 3]])
Y = Tensor([1, 2])
print(t.cross_entropy(Y).item())
0.09391524642705917
t = Tensor([[-1, 2, -3], [1, -2, 3]])
Y = Tensor([1, 2])
print(t.cross_entropy(Y, reduction='none').numpy())
[0.055  0.1328]

Source code in tinygrad/tensor.py
3655
3656
3657
3658
3659
3660
3661
3662
3663
3664
3665
3666
3667
3668
3669
3670
3671
3672
3673
3674
3675
3676
3677
3678
def cross_entropy(self, Y:Tensor, reduction:ReductionStr="mean", label_smoothing:float=0.0) -> Tensor:
  """
  Compute the cross entropy loss between input logits and target.

  NOTE: `self` are logits and `Y` are the target labels or class probabilities.

  See: https://pytorch.org/docs/stable/generated/torch.nn.functional.cross_entropy.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[-1, 2, -3], [1, -2, 3]])
  Y = Tensor([1, 2])
  print(t.cross_entropy(Y).item())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[-1, 2, -3], [1, -2, 3]])
  Y = Tensor([1, 2])
  print(t.cross_entropy(Y, reduction='none').numpy())
  ```
  """
  assert 0.0 <= label_smoothing <= 1.0, "label_smoothing must be in [0.0, 1.0]"
  Y = Y.one_hot(num_classes=cast(int, self.shape[1])) if Y.ndim < 2 else Y
  Y = (1 - label_smoothing)*Y + label_smoothing / cast(int, Y.shape[1])
  ret = -self.log_softmax(axis=1).mul(Y).sum(axis=1)
  return ret._do_reduction(reduction)

nll_loss ¤

nll_loss(
    Y: Tensor,
    weight: Optional[Tensor] = None,
    ignore_index: Optional[int] = None,
    reduction: ReductionStr = "mean",
) -> Tensor

Compute the negative log likelihood loss between log-probabilities and target labels.

Note

self is log-probabilities and Y is the Y labels or class probabilities.

See: https://pytorch.org/docs/stable/generated/torch.nn.functional.nll_loss.html

t = Tensor([[-1, 2, -3], [1, -2, 3]])
Y = Tensor([1, 2])
print(t.log_softmax().nll_loss(Y).item())
0.09391524642705917
t = Tensor([[-1, 2, -3], [1, -2, 3]])
Y = Tensor([1, 2])
print(t.log_softmax().nll_loss(Y, reduction='none').numpy())
[0.055  0.1328]

Source code in tinygrad/tensor.py
3680
3681
3682
3683
3684
3685
3686
3687
3688
3689
3690
3691
3692
3693
3694
3695
3696
3697
3698
3699
3700
3701
3702
def nll_loss(self, Y:Tensor, weight:Optional[Tensor]=None, ignore_index:Optional[int]=None, reduction:ReductionStr="mean") -> Tensor:
  """
  Compute the negative log likelihood loss between log-probabilities and target labels.

  NOTE: `self` is log-probabilities and `Y` is the Y labels or class probabilities.

  See: https://pytorch.org/docs/stable/generated/torch.nn.functional.nll_loss.html

  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[-1, 2, -3], [1, -2, 3]])
  Y = Tensor([1, 2])
  print(t.log_softmax().nll_loss(Y).item())
  ```
  ```python exec="true" source="above" session="tensor" result="python"
  t = Tensor([[-1, 2, -3], [1, -2, 3]])
  Y = Tensor([1, 2])
  print(t.log_softmax().nll_loss(Y, reduction='none').numpy())
  ```
  """
  weight = Tensor.ones_like(Y, requires_grad=False) if weight is None else weight[Y]
  masked_weight = weight if ignore_index is None else weight * (Y != ignore_index)
  nll = -self.gather(1, Y.unsqueeze(1)).squeeze(1) * masked_weight
  return nll.sum() / masked_weight.sum() if reduction == "mean" else nll._do_reduction(reduction)