Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .github/workflows/test_and_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -169,9 +169,10 @@ jobs:
pyver=$(python -c "v='${{ steps.pyver.outputs.selected }}'.split('.');print(f'{v[0]}.{v[1]}')")
eval "$(python scripts/ci_pick_versions.py --python "$pyver" --source ${{ steps.sourcetype.outputs.selected }})"

# Set up psg conda package variable (conda-forge installs via conda, others via pip)
# Set up psg conda package variable (conda-forge installs via conda, others via pip).
# For conda-forge, psg must always be installed; an empty psgver just means "latest".
psg=""
if [[ ${{ steps.sourcetype.outputs.selected}} == "conda-forge" && -n "$psgver" ]] ; then
if [[ ${{ steps.sourcetype.outputs.selected}} == "conda-forge" ]] ; then
psg=python-suitesparse-graphblas${psgver}
fi

Expand Down
18 changes: 11 additions & 7 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ repos:
- id: check-illegal-windows-names
- id: check-merge-conflict
- id: check-ast
- id: check-json # No json files yet
- id: check-json # no JSON files yet; enabled for the future
- id: check-toml
- id: check-yaml
- id: check-executables-have-shebangs
Expand Down Expand Up @@ -53,6 +53,9 @@ repos:
# We can probably remove `isort` if we come to trust `ruff --fix`,
# but we'll need to figure out the configuration to do this in `ruff`
- repo: https://github.com/pycqa/isort
# 9.0 is still in alpha (9.0.0a3) and a pre-release pin defeats
# pre-commit.ci's autoupdate "avoid prereleases" logic; stay on the last
# stable 8.x until 9.0 GAs.
rev: 8.0.1
hooks:
- id: isort
Expand All @@ -73,7 +76,7 @@ repos:
- id: black
- id: black-jupyter
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.7
rev: v0.15.13
hooks:
- id: ruff-check
args: [--fix-only, --show-fixes]
Expand All @@ -98,10 +101,10 @@ repos:
- tomli; python_version<'3.11'
files: ^(graphblas|docs)/
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.15.7
rev: v0.15.13
hooks:
- id: ruff-check
# - id: ruff-format # Prefer black, but may temporarily uncomment this to see
# - id: ruff-format # we prefer black; uncomment to compare with ruff's formatter
- repo: https://github.com/sphinx-contrib/sphinx-lint
rev: v1.0.2
hooks:
Expand All @@ -119,7 +122,7 @@ repos:
hooks:
- id: shellcheck
- repo: https://github.com/rbubley/mirrors-prettier
rev: v3.8.1
rev: v3.8.3
hooks:
- id: prettier
args: [--prose-wrap=preserve]
Expand All @@ -138,12 +141,13 @@ repos:
rev: v0.9.3
hooks:
- id: taplo-format
args: ["--option", "column_width=100"]
- repo: https://github.com/rhysd/actionlint
rev: v1.7.11
rev: v1.7.12
hooks:
- id: actionlint
- repo: https://github.com/python-jsonschema/check-jsonschema
rev: 0.37.0
rev: 0.37.2
hooks:
- id: check-dependabot
- id: check-github-workflows
Expand Down
1 change: 1 addition & 0 deletions docs/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,4 +25,5 @@ Topics
init
io
udf
udt
recorder
33 changes: 33 additions & 0 deletions docs/user_guide/operators.rst
Original file line number Diff line number Diff line change
Expand Up @@ -221,6 +221,39 @@ IndexUnary operators are located in two places.
(i.e. all except rowindex, colindex, and diagindex). Calling the operators in the
select namespace will perform a ``select`` operation.

IndexBinary Operators
---------------------

IndexBinary operators are the two-input analogue of IndexUnary operators. The function
receives both values, the indices of *each* value, and a thunk parameter:
``f(x, ix, jx, y, iy, jy, theta) -> z``, where ``ix, jx`` are the row/column indices of
``x`` and ``iy, jy`` are those of ``y``.

There are no built-in IndexBinary operators; all are user-defined (see
:doc:`udf`). Binding a ``theta`` value to an IndexBinaryOp produces a
regular BinaryOp usable directly in ``ewise_mult`` / ``ewise_add``, or as
the multiplier of a Semiring for ``mxm`` / ``mxv`` / ``vxm``.

Example usage:

.. code-block:: python

def discounted_sum(x, ix, jx, y, iy, jy, theta):
return (x + y) * theta

gb.indexbinary.register_new("discounted_sum", discounted_sum)

binop = gb.indexbinary.discounted_sum[float](0.5) # bind theta
C << A.ewise_mult(B, binop)

# For mxm, wrap in a Semiring; the additive monoid must accept the
# bound op's return type.
sr = gb.semiring.register_anonymous(gb.monoid.plus, binop)
D << A.mxm(B, sr)

IndexBinary operators are located in the ``graphblas.indexbinary`` namespace.
They require SuiteSparse:GraphBLAS 9.4 or newer.

Aggregators
-----------

Expand Down
193 changes: 164 additions & 29 deletions docs/user_guide/udf.rst
Original file line number Diff line number Diff line change
@@ -1,55 +1,190 @@

User-defined Functions
======================
User-defined Functions (UDFs)
=============================

1. Show example of register_new
2. Discuss commonality for other operators
3. Discuss register_anonymous
python-graphblas lets you write custom operators in Python. ``numba``
JIT-compiles them to native machine code so they run inside
SuiteSparse:GraphBLAS at C speed. This guide covers every operator type
that accepts a user function: ``UnaryOp``, ``BinaryOp``, ``IndexUnaryOp``,
``SelectOp``, and ``IndexBinaryOp``.

Python-graphblas requires ``numba`` which enables compiling user-defined Python functions
to native machine code for use by the GraphBLAS backend. This provides functions which are
very performant.

Example user-defined UnaryOp:
A first example
---------------

.. code-block:: python

from graphblas import unary
from graphblas import unary, Vector

def force_odd_func(x):
def force_odd(x):
if x % 2 == 0:
return x + 1
return x

unary.register_new("force_odd", force_odd_func)
unary.register_new("force_odd", force_odd)

v = Vector.from_coo([0, 1, 3, 4, 5], [1, 2, 3, 8, 14])
w = v.apply(unary.force_odd).new()
# w = [1, 3, _, 3, 9, 15]

``register_new`` vs ``register_anonymous``
------------------------------------------

- ``register_new(name, func)`` puts the operator on ``gb.unary.{name}`` (or
``gb.binary.{name}``, etc.). Use this for operators you'll reference by
name across files.
- ``register_anonymous(func)`` returns the operator without adding it to a
namespace. Use this for one-off operators or operators created inside
another function.

Lambdas are auto-registered as anonymous wherever a UnaryOp is expected:

.. code-block:: python

.. csv-table:: w
:class: matrix
:header: 0,1,2,3,4,5
v.apply(lambda x: x % 5 - 2).new()

1,3,,3,9,15
Operator signatures
-------------------

The function signature depends on the operator type:

Similar methods exist for BinaryOp and IndexUnaryOp. User-defined Monoids and Semirings are
constructed out of previously defined and built-in UnaryOps and BinaryOps.
================== ==================================================== =====================================
Operator Function signature Returns
================== ==================================================== =====================================
``UnaryOp`` ``f(x) -> z`` a value
``BinaryOp`` ``f(x, y) -> z`` a value
``IndexUnaryOp`` ``f(x, ix, jx, theta) -> z`` a value (often bool, for select)
``SelectOp`` same as IndexUnaryOp ``bool``
``IndexBinaryOp`` ``f(x, ix, jx, y, iy, jy, theta) -> z`` a value (requires SS >= 9.4)
================== ==================================================== =====================================

Auto-registration of Lambdas
----------------------------
``ix, jx`` are the row and column indices of ``x``; same for ``iy, jy`` and ``y``.
``theta`` is a scalar parameter bound when the operator is used (see
:ref:`udf_indexbinary` below).

As a convenience, any lambda expression used in place of a UnaryOp will be automatically
compiled as registered anonymously.
Parameterized UDFs
------------------

Example lambda usage:
To parameterize a UDF (e.g., a scale factor known only at call time), write a
factory function that returns the actual operator, and register it with
``parameterized=True``:

.. code-block:: python

v.apply(lambda x: x % 5 - 2).new()
from graphblas import binary

def make_scaled_add(scale):
def inner(x, y):
return scale * (x + y)
return inner

binary.register_new("scaled_add", make_scaled_add, parameterized=True)

# Bind the parameter, then use the resulting op:
op = binary.scaled_add(2.0)
result = v.ewise_mult(w, op).new()

User-defined types (UDTs)
-------------------------

Pass ``is_udt=True`` when your UDF operates on user-defined record or array
types. See :doc:`udt` for the full story. In short:

.. code-block:: python

from graphblas import binary, dtypes
import numpy as np

edge_dtype = dtypes.register_anonymous(
np.dtype([("weight", np.float64), ("hops", np.int32)], align=True),
"Edge",
)

def combine_edges(x, y):
return (x["weight"] + y["weight"], x["hops"] + y["hops"])

binary.register_new("combine_edges", combine_edges, is_udt=True)

The function can return a tuple matching the record's fields, an existing
record value, or a numpy array for array UDTs.

.. _udf_indexbinary:

IndexBinaryOps and the theta parameter
--------------------------------------

An IndexBinaryOp receives row and column indices for both operands plus a
scalar ``theta``. Binding ``theta`` produces a regular ``BinaryOp`` that
works in ``ewise_mult``, ``ewise_add``, and the other elementwise paths:

.. code-block:: python

from graphblas import indexbinary, Matrix

def discounted_distance(x, ix, jx, y, iy, jy, theta):
return (x + y) * theta

indexbinary.register_new("discounted_dist", discounted_distance)

bound = indexbinary.discounted_dist[float](0.5) # theta = 0.5
A = Matrix.from_coo([0, 1], [0, 1], [1.0, 2.0])
B = Matrix.from_coo([0, 1], [0, 1], [3.0, 4.0])
C = A.ewise_mult(B, bound).new()
# C[0,0] = (1+3)*0.5 = 2; C[1,1] = (2+4)*0.5 = 3

To use a bound IndexBinaryOp as the multiplier in ``mxm`` / ``mxv`` /
``vxm``, wrap it in a Semiring. ``Semiring.register_anonymous`` accepts a
bound IBO directly:

.. code-block:: python

from graphblas import monoid, semiring

sr = semiring.register_anonymous(monoid.plus, bound)
C = A.mxm(B, sr).new()

The resulting Semiring is monomorphic in the bound IBO's input/output types
(SuiteSparse builds exactly one ``GrB_Semiring`` for that type pair, rather
than the type-polymorphic table the standard semirings carry). To reuse the
same IBO at a different type, bind theta again under that type and build a
new Semiring. Per SuiteSparse, monoids themselves cannot be built from an
IndexBinaryOp; only the multiplier slot accepts one.

IndexBinaryOps require SuiteSparse:GraphBLAS >= 9.4.

What numba accepts
------------------

The function body is compiled by ``numba.njit`` and must be pure
numerical Python, with these constraints:

- No closures over Python objects (capture scalars, not lists or dicts).
- ``numpy`` array operations and standard library calls like ``math.sin``
work; complex Python types (sets, dicts) do not.
- Records (UDT fields) are accessed by name: ``x["weight"]``, not ``x.weight``.
- Tuple returns work for UDTs (one tuple element per field).

When compilation fails, you get a ``UdfParseError`` with the actionable
diagnostic line pulled out of Numba's typing pass, rather than the full
multi-hundred-line traceback. The most common causes:

- Referencing a field that doesn't exist on the record UDT.
- Returning a tuple whose length doesn't match the record's field count
(the error names the expected arity).
- Calling a function Numba doesn't support in nopython mode.

See :doc:`udt` for UDT-specific guidance.

Lazy registration
-----------------

Pass ``lazy=True`` to defer Numba compilation until the operator is first
used. This is useful for libraries that register many operators at import
time:

.. code-block:: python

.. csv-table::
:class: matrix
:header: 0,1,2,3,4,5
unary.register_new("heavy_op", heavy_func, lazy=True)
# heavy_func isn't compiled yet; the operator object exists, but no
# numba.njit has run.

-1,0,,1,1,2
The compile happens on first lookup (``unary.heavy_op[int]``).
Loading
Loading