python-graphblas · jim22k · Jun 8, 2026
diff --git a/.github/workflows/test_and_build.yml b/.github/workflows/test_and_build.yml
@@ -169,9 +169,10 @@ jobs:
           pyver=$(python -c "v='${{ steps.pyver.outputs.selected }}'.split('.');print(f'{v[0]}.{v[1]}')")
           eval "$(python scripts/ci_pick_versions.py --python "$pyver" --source ${{ steps.sourcetype.outputs.selected }})"
 
-          # Set up psg conda package variable (conda-forge installs via conda, others via pip)
+          # Set up psg conda package variable (conda-forge installs via conda, others via pip).
+          # For conda-forge, psg must always be installed; an empty psgver just means "latest".
           psg=""
-          if [[ ${{ steps.sourcetype.outputs.selected}} == "conda-forge" && -n "$psgver" ]] ; then
+          if [[ ${{ steps.sourcetype.outputs.selected}} == "conda-forge" ]] ; then
             psg=python-suitesparse-graphblas${psgver}
           fi
 

diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -24,7 +24,7 @@ repos:
       - id: check-illegal-windows-names
       - id: check-merge-conflict
       - id: check-ast
-      - id: check-json # No json files yet
+      - id: check-json # no JSON files yet; enabled for the future
       - id: check-toml
       - id: check-yaml
       - id: check-executables-have-shebangs
@@ -53,6 +53,9 @@ repos:
   # We can probably remove `isort` if we come to trust `ruff --fix`,
   # but we'll need to figure out the configuration to do this in `ruff`
   - repo: https://github.com/pycqa/isort
+    # 9.0 is still in alpha (9.0.0a3) and a pre-release pin defeats
+    # pre-commit.ci's autoupdate "avoid prereleases" logic; stay on the last
+    # stable 8.x until 9.0 GAs.
     rev: 8.0.1
     hooks:
       - id: isort
@@ -73,7 +76,7 @@ repos:
       - id: black
       - id: black-jupyter
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.15.7
+    rev: v0.15.13
     hooks:
       - id: ruff-check
         args: [--fix-only, --show-fixes]
@@ -98,10 +101,10 @@ repos:
           - tomli; python_version<'3.11'
         files: ^(graphblas|docs)/
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.15.7
+    rev: v0.15.13
     hooks:
       - id: ruff-check
-      # - id: ruff-format  # Prefer black, but may temporarily uncomment this to see
+      # - id: ruff-format  # we prefer black; uncomment to compare with ruff's formatter
   - repo: https://github.com/sphinx-contrib/sphinx-lint
     rev: v1.0.2
     hooks:
@@ -119,7 +122,7 @@ repos:
     hooks:
       - id: shellcheck
   - repo: https://github.com/rbubley/mirrors-prettier
-    rev: v3.8.1
+    rev: v3.8.3
     hooks:
       - id: prettier
         args: [--prose-wrap=preserve]
@@ -138,12 +141,13 @@ repos:
     rev: v0.9.3
     hooks:
       - id: taplo-format
+        args: ["--option", "column_width=100"]
   - repo: https://github.com/rhysd/actionlint
-    rev: v1.7.11
+    rev: v1.7.12
     hooks:
       - id: actionlint
   - repo: https://github.com/python-jsonschema/check-jsonschema
-    rev: 0.37.0
+    rev: 0.37.2
     hooks:
       - id: check-dependabot
       - id: check-github-workflows

diff --git a/docs/user_guide/index.rst b/docs/user_guide/index.rst
@@ -25,4 +25,5 @@ Topics
     init
     io
     udf
+    udt
     recorder
diff --git a/docs/user_guide/operators.rst b/docs/user_guide/operators.rst
@@ -221,6 +221,39 @@ IndexUnary operators are located in two places.
     (i.e. all except rowindex, colindex, and diagindex). Calling the operators in the
     select namespace will perform a ``select`` operation.
 
+IndexBinary Operators
+---------------------
+
+IndexBinary operators are the two-input analogue of IndexUnary operators. The function
+receives both values, the indices of *each* value, and a thunk parameter:
+``f(x, ix, jx, y, iy, jy, theta) -> z``, where ``ix, jx`` are the row/column indices of
+``x`` and ``iy, jy`` are those of ``y``.
+
+There are no built-in IndexBinary operators; all are user-defined (see
+:doc:`udf`). Binding a ``theta`` value to an IndexBinaryOp produces a
+regular BinaryOp usable directly in ``ewise_mult`` / ``ewise_add``, or as
+the multiplier of a Semiring for ``mxm`` / ``mxv`` / ``vxm``.
+
+Example usage:
+
+.. code-block:: python
+
+    def discounted_sum(x, ix, jx, y, iy, jy, theta):
+        return (x + y) * theta
+
+    gb.indexbinary.register_new("discounted_sum", discounted_sum)
+
+    binop = gb.indexbinary.discounted_sum[float](0.5)   # bind theta
+    C << A.ewise_mult(B, binop)
+
+    # For mxm, wrap in a Semiring; the additive monoid must accept the
+    # bound op's return type.
+    sr = gb.semiring.register_anonymous(gb.monoid.plus, binop)
+    D << A.mxm(B, sr)
+
+IndexBinary operators are located in the ``graphblas.indexbinary`` namespace.
+They require SuiteSparse:GraphBLAS 9.4 or newer.
+
 Aggregators
 -----------
 

diff --git a/docs/user_guide/udf.rst b/docs/user_guide/udf.rst
@@ -1,55 +1,190 @@
 
-User-defined Functions
-======================
+User-defined Functions (UDFs)
+=============================
 
-1. Show example of register_new
-2. Discuss commonality for other operators
-3. Discuss register_anonymous
+python-graphblas lets you write custom operators in Python. ``numba``
+JIT-compiles them to native machine code so they run inside
+SuiteSparse:GraphBLAS at C speed. This guide covers every operator type
+that accepts a user function: ``UnaryOp``, ``BinaryOp``, ``IndexUnaryOp``,
+``SelectOp``, and ``IndexBinaryOp``.
 
-Python-graphblas requires ``numba`` which enables compiling user-defined Python functions
-to native machine code for use by the GraphBLAS backend. This provides functions which are
-very performant.
-
-Example user-defined UnaryOp:
+A first example
+---------------
 
 .. code-block:: python
 
-    from graphblas import unary
+    from graphblas import unary, Vector
 
-    def force_odd_func(x):
+    def force_odd(x):
         if x % 2 == 0:
             return x + 1
         return x
 
-    unary.register_new("force_odd", force_odd_func)
+    unary.register_new("force_odd", force_odd)
 
     v = Vector.from_coo([0, 1, 3, 4, 5], [1, 2, 3, 8, 14])
     w = v.apply(unary.force_odd).new()
+    # w = [1, 3, _, 3, 9, 15]
+
+``register_new`` vs ``register_anonymous``
+------------------------------------------
+
+- ``register_new(name, func)`` puts the operator on ``gb.unary.{name}`` (or
+  ``gb.binary.{name}``, etc.). Use this for operators you'll reference by
+  name across files.
+- ``register_anonymous(func)`` returns the operator without adding it to a
+  namespace. Use this for one-off operators or operators created inside
+  another function.
+
+Lambdas are auto-registered as anonymous wherever a UnaryOp is expected:
+
+.. code-block:: python
 
-.. csv-table:: w
-    :class: matrix
-    :header: 0,1,2,3,4,5
+    v.apply(lambda x: x % 5 - 2).new()
 
-    1,3,,3,9,15
+Operator signatures
+-------------------
 
+The function signature depends on the operator type:
 
-Similar methods exist for BinaryOp and IndexUnaryOp. User-defined Monoids and Semirings are
-constructed out of previously defined and built-in UnaryOps and BinaryOps.
+==================  ====================================================  =====================================
+Operator            Function signature                                    Returns
+==================  ====================================================  =====================================
+``UnaryOp``         ``f(x) -> z``                                         a value
+``BinaryOp``        ``f(x, y) -> z``                                      a value
+``IndexUnaryOp``    ``f(x, ix, jx, theta) -> z``                          a value (often bool, for select)
+``SelectOp``        same as IndexUnaryOp                                  ``bool``
+``IndexBinaryOp``   ``f(x, ix, jx, y, iy, jy, theta) -> z``               a value (requires SS >= 9.4)
+==================  ====================================================  =====================================
 
-Auto-registration of Lambdas
-----------------------------
+``ix, jx`` are the row and column indices of ``x``; same for ``iy, jy`` and ``y``.
+``theta`` is a scalar parameter bound when the operator is used (see
+:ref:`udf_indexbinary` below).
 
-As a convenience, any lambda expression used in place of a UnaryOp will be automatically
-compiled as registered anonymously.
+Parameterized UDFs
+------------------
 
-Example lambda usage:
+To parameterize a UDF (e.g., a scale factor known only at call time), write a
+factory function that returns the actual operator, and register it with
+``parameterized=True``:
 
 .. code-block:: python
 
-    v.apply(lambda x: x % 5 - 2).new()
+    from graphblas import binary
+
+    def make_scaled_add(scale):
+        def inner(x, y):
+            return scale * (x + y)
+        return inner
+
+    binary.register_new("scaled_add", make_scaled_add, parameterized=True)
+
+    # Bind the parameter, then use the resulting op:
+    op = binary.scaled_add(2.0)
+    result = v.ewise_mult(w, op).new()
+
+User-defined types (UDTs)
+-------------------------
+
+Pass ``is_udt=True`` when your UDF operates on user-defined record or array
+types. See :doc:`udt` for the full story. In short:
+
+.. code-block:: python
+
+    from graphblas import binary, dtypes
+    import numpy as np
+
+    edge_dtype = dtypes.register_anonymous(
+        np.dtype([("weight", np.float64), ("hops", np.int32)], align=True),
+        "Edge",
+    )
+
+    def combine_edges(x, y):
+        return (x["weight"] + y["weight"], x["hops"] + y["hops"])
+
+    binary.register_new("combine_edges", combine_edges, is_udt=True)
+
+The function can return a tuple matching the record's fields, an existing
+record value, or a numpy array for array UDTs.
+
+.. _udf_indexbinary:
+
+IndexBinaryOps and the theta parameter
+--------------------------------------
+
+An IndexBinaryOp receives row and column indices for both operands plus a
+scalar ``theta``. Binding ``theta`` produces a regular ``BinaryOp`` that
+works in ``ewise_mult``, ``ewise_add``, and the other elementwise paths:
+
+.. code-block:: python
+
+    from graphblas import indexbinary, Matrix
+
+    def discounted_distance(x, ix, jx, y, iy, jy, theta):
+        return (x + y) * theta
+
+    indexbinary.register_new("discounted_dist", discounted_distance)
+
+    bound = indexbinary.discounted_dist[float](0.5)   # theta = 0.5
+    A = Matrix.from_coo([0, 1], [0, 1], [1.0, 2.0])
+    B = Matrix.from_coo([0, 1], [0, 1], [3.0, 4.0])
+    C = A.ewise_mult(B, bound).new()
+    # C[0,0] = (1+3)*0.5 = 2;  C[1,1] = (2+4)*0.5 = 3
+
+To use a bound IndexBinaryOp as the multiplier in ``mxm`` / ``mxv`` /
+``vxm``, wrap it in a Semiring. ``Semiring.register_anonymous`` accepts a
+bound IBO directly:
+
+.. code-block:: python
+
+    from graphblas import monoid, semiring
+
+    sr = semiring.register_anonymous(monoid.plus, bound)
+    C = A.mxm(B, sr).new()
+
+The resulting Semiring is monomorphic in the bound IBO's input/output types
+(SuiteSparse builds exactly one ``GrB_Semiring`` for that type pair, rather
+than the type-polymorphic table the standard semirings carry). To reuse the
+same IBO at a different type, bind theta again under that type and build a
+new Semiring. Per SuiteSparse, monoids themselves cannot be built from an
+IndexBinaryOp; only the multiplier slot accepts one.
+
+IndexBinaryOps require SuiteSparse:GraphBLAS >= 9.4.
+
+What numba accepts
+------------------
+
+The function body is compiled by ``numba.njit`` and must be pure
+numerical Python, with these constraints:
+
+- No closures over Python objects (capture scalars, not lists or dicts).
+- ``numpy`` array operations and standard library calls like ``math.sin``
+  work; complex Python types (sets, dicts) do not.
+- Records (UDT fields) are accessed by name: ``x["weight"]``, not ``x.weight``.
+- Tuple returns work for UDTs (one tuple element per field).
+
+When compilation fails, you get a ``UdfParseError`` with the actionable
+diagnostic line pulled out of Numba's typing pass, rather than the full
+multi-hundred-line traceback. The most common causes:
+
+- Referencing a field that doesn't exist on the record UDT.
+- Returning a tuple whose length doesn't match the record's field count
+  (the error names the expected arity).
+- Calling a function Numba doesn't support in nopython mode.
+
+See :doc:`udt` for UDT-specific guidance.
+
+Lazy registration
+-----------------
+
+Pass ``lazy=True`` to defer Numba compilation until the operator is first
+used. This is useful for libraries that register many operators at import
+time:
+
+.. code-block:: python
 
-.. csv-table::
-    :class: matrix
-    :header: 0,1,2,3,4,5
+    unary.register_new("heavy_op", heavy_func, lazy=True)
+    # heavy_func isn't compiled yet; the operator object exists, but no
+    # numba.njit has run.
 
-    -1,0,,1,1,2
+The compile happens on first lookup (``unary.heavy_op[int]``).
-Original file line number
+Diff line change
@@ Expand Up / @@ -25,4 +25,5 @@ Topics @@
         init
         io
         udf
+        udt
         recorder