spyx.experimental

Research-stage building blocks that are not part of the stable Spyx surface. Everything here is tested and usable, but the contract is different from the rest of the library.

Stability contract

The APIs in spyx.experimental — and in some cases their numerical behaviour — may change without a deprecation cycle as the underlying research matures. Anything you depend on for production or a long-lived experiment should come from the stable top-level modules (spyx.nn, spyx.ssm, spyx.phasor, spyx.nir, spyx.bench, spyx.quant, spyx.data, spyx.optimize).

The rule of thumb: import experimental things from spyx.experimental so the dependency is explicit; rely on the top-level modules for stable work. See Research with Spyx for how things graduate from here into the core.

What's here

Symbol	Kind	Notes
`spyx.experimental.PSU_LIF`	Neuron	Reset-free parallel LIF. Physically defined in `spyx.nn`, surfaced here as its supported experimental entry point.
`spyx.experimental.ResonateFire`	Neuron	Complex resonate-and-fire oscillatory neuron. Physically defined in `spyx.phasor`.
`spyx.experimental.raven`	Module	Routing-slot memory (`RavenRSM`), spiking sibling (`SpikingSlotMemory`), `SlotRouter`, and the `make_recall_batch` MQAR generator.
`spyx.experimental.compress`	Module	Bit-packed activation storage for memory-efficient BPTT.
`spyx.experimental.stochastic`	Module	Stochastic (Bernoulli-spiking) and parallelizable prototypes: `SPSN`, `StochasticAssociative{LIF,CuBaLIF}`, and the `sigmoid_bernoulli` activations.
`spyx.experimental.hybrid`	Module	The 0+1 hybrid trainer: surrogate gradient + antithetic-NES correction projected orthogonal to the surrogate (`hybrid_gradient`, `make_hybrid_train_step`, `es_gradient`, `hybrid_diagnostics`), plus the surrogate-steered Self-Guided ES variant (`sges_gradient`, `make_sges_hybrid_train_step`) — the surrogate direction is SGES's guiding subspace, so ES is spent on the orthogonal complement at several-fold lower variance.
`spyx.experimental.matfree`	Module	Matmul-free linear primitives — ternary (BitNet: `TernaryLinear`, `TernaryMLP`) and shift-add (DeepShift: `ShiftAddLinear`) layers that replace dense multiplies with accumulations / bit-shifts, plus `MatMulFreeBlock`, `MLGRU`, `RMSNorm`, and the `ternary_weights` / `power_of_two_weights` / `activation_quant` STE helpers. The native train-from-scratch counterpart to the post-training `spyx.quant.bitnet_ternary_rules` path.
`spyx.experimental.zoo`	Package	Runnable reference recipes keyed by application (control / classification / language) and tagged by training method × architecture (`REGISTRY`, `list_recipes`, `get`).
`spyx.experimental.onnx`	Module	Export a spiking model to ONNX — per-timestep step, or the whole `spyx.nn.run` loop as a native ONNX `Scan`/`Loop`. Conversion deps imported lazily.

Related research studies live under research/new/ in the repository.

Re-exported neurons

These two are physically defined in stable modules and re-exported here so the experimental surface is discoverable in one place.

Bases: Module

Parallel Spiking Unit LIF: a reset-free leaky integrate-and-fire neuron.

.. note:: Experimental. Its supported entry point is :class:spyx.experimental.PSU_LIF; the API may change without a deprecation cycle. It is defined here for locality with the other neurons.

A standard :class:LIF subtracts a reset spikes * threshold from the membrane every step, which couples each timestep to the (nonlinear) spike of the previous step and forces a strictly sequential O(T) scan. Dropping the reset turns the membrane into a pure linear leaky integrator,

.. math:: V_t = \beta \, V_{t-1} + x_t ,

which is a first-order associative recurrence and can therefore be evaluated with :func:jax.lax.associative_scan in O(\log T) parallel depth on an accelerator. Spikes are a pointwise surrogate threshold applied to the whole membrane trace, :math:s_t = \sigma(V_t - \text{threshold}).

Removing the reset is a deliberate accuracy/parallelism trade-off: the neuron never depresses after firing, so it can fire on consecutive steps while a well-tuned integration window keeps activity bounded. In exchange the sequence can be scored in logarithmic instead of linear depth.

Two execution modes are provided and are numerically identical:

:meth:__call__ -- one reset-free timestep (x, V) -> (spikes, V) with V = beta * V + x; a drop-in for :func:spyx.nn.run, :class:Sequential, and NIR, exactly like :class:LIF.
:meth:parallel -- the whole time-major sequence at once via an associative scan over the leak, O(\log T) depth.

Because both modes use the same clipped beta and the same surrogate, and :meth:__call__ integrates the input before spiking, scanning :meth:__call__ over x reproduces :meth:parallel exactly.

Source code in spyx/nn.py

class PSU_LIF(nnx.Module):
    r"""Parallel Spiking Unit LIF: a reset-free leaky integrate-and-fire neuron.

    .. note::
       **Experimental.** Its supported entry point is
       :class:`spyx.experimental.PSU_LIF`; the API may change without a
       deprecation cycle. It is defined here for locality with the other neurons.

    A standard :class:`LIF` subtracts a reset ``spikes * threshold`` from the
    membrane every step, which couples each timestep to the (nonlinear) spike
    of the previous step and forces a strictly sequential ``O(T)`` scan.
    Dropping the reset turns the membrane into a pure linear leaky integrator,

    .. math::
        V_t = \beta \, V_{t-1} + x_t ,

    which is a first-order *associative* recurrence and can therefore be
    evaluated with :func:`jax.lax.associative_scan` in ``O(\log T)`` parallel
    depth on an accelerator. Spikes are a pointwise surrogate threshold applied
    to the whole membrane trace, :math:`s_t = \sigma(V_t - \text{threshold})`.

    Removing the reset is a deliberate accuracy/parallelism trade-off: the
    neuron never depresses after firing, so it can fire on consecutive steps
    while a well-tuned integration window keeps activity bounded. In exchange
    the sequence can be scored in logarithmic instead of linear depth.

    Two execution modes are provided and are numerically identical:

    * :meth:`__call__` -- one reset-free timestep ``(x, V) -> (spikes, V)``
      with ``V = beta * V + x``; a drop-in for :func:`spyx.nn.run`,
      :class:`Sequential`, and NIR, exactly like :class:`LIF`.
    * :meth:`parallel` -- the whole time-major sequence at once via an
      associative scan over the leak, ``O(\log T)`` depth.

    Because both modes use the *same* clipped ``beta`` and the *same* surrogate,
    and :meth:`__call__` integrates the input *before* spiking, scanning
    :meth:`__call__` over ``x`` reproduces :meth:`parallel` exactly.
    """

    def __init__(
        self,
        hidden_shape: tuple,
        beta=None,
        threshold=1.0,
        activation=None,
        *,
        rngs: nnx.Rngs,
    ):
        """
        :hidden_shape: Shape of the layer.
        :beta: decay rate. Scalar if provided, else learnable per-unit init.
        :threshold: firing threshold. Defaults to 1.
        :activation: spyx.axn.Axon object determining the surrogate spike.
        """
        self.hidden_shape = hidden_shape
        self.threshold = threshold
        self.spike = activation if activation is not None else _DEFAULT_ACTIVATION

        if beta is None:
            self.beta = nnx.Param(
                nnx.initializers.truncated_normal(stddev=0.25)(
                    rngs.params(), self.hidden_shape
                )
                + 0.5
            )
        else:
            self.beta = nnx.Param(jnp.full((), beta))

    def __call__(self, x, V):
        """One reset-free timestep.

        :x: input vector coming from previous layer.
        :V: neuron state tensor.

        Integrates the input into the membrane (``V = beta * V + x``, no
        reset), then emits a surrogate spike on the updated membrane so that
        scanning this method matches :meth:`parallel` exactly.
        """
        beta = jnp.clip(self.beta[...], 0, 1)
        V = beta * V + x
        spikes = self.spike(V - self.threshold)
        return spikes, V

    def parallel(self, x):
        r"""Score a whole time-major sequence with an associative scan.

        :x: input with shape ``[Time, Batch, ...]``.
        :return: spikes with shape ``[Time, Batch, ...]``.

        Computes the full membrane trace ``V_t = beta * V_{t-1} + x_t`` (with
        ``V_{-1} = 0``) via :func:`jax.lax.associative_scan` over the time axis
        in ``O(\log T)`` depth, then applies the surrogate spike pointwise.
        """
        beta = jnp.clip(self.beta[...], 0, 1)
        # Broadcast the (scalar or per-unit) leak to every (Time, Batch, ...)
        # element so the linear-recurrence coefficient A_t == beta everywhere.
        A = jnp.broadcast_to(beta, x.shape)
        _, V = jax.lax.associative_scan(_leaky_associative_op, (A, x), axis=0)
        return self.spike(V - self.threshold)

    def initial_state(self, batch_size):
        return jnp.zeros((batch_size,) + self.hidden_shape)

`call(x, V)`

One reset-free timestep.

input vector coming from previous layer. :V: neuron state tensor.

Integrates the input into the membrane (V = beta * V + x, no reset), then emits a surrogate spike on the updated membrane so that scanning this method matches :meth:parallel exactly.

Source code in spyx/nn.py

def __call__(self, x, V):
    """One reset-free timestep.

    :x: input vector coming from previous layer.
    :V: neuron state tensor.

    Integrates the input into the membrane (``V = beta * V + x``, no
    reset), then emits a surrogate spike on the updated membrane so that
    scanning this method matches :meth:`parallel` exactly.
    """
    beta = jnp.clip(self.beta[...], 0, 1)
    V = beta * V + x
    spikes = self.spike(V - self.threshold)
    return spikes, V

`init(hidden_shape, beta=None, threshold=1.0, activation=None, *, rngs)`

:hidden_shape: Shape of the layer. :beta: decay rate. Scalar if provided, else learnable per-unit init. :threshold: firing threshold. Defaults to 1. :activation: spyx.axn.Axon object determining the surrogate spike.

Source code in spyx/nn.py

def __init__(
    self,
    hidden_shape: tuple,
    beta=None,
    threshold=1.0,
    activation=None,
    *,
    rngs: nnx.Rngs,
):
    """
    :hidden_shape: Shape of the layer.
    :beta: decay rate. Scalar if provided, else learnable per-unit init.
    :threshold: firing threshold. Defaults to 1.
    :activation: spyx.axn.Axon object determining the surrogate spike.
    """
    self.hidden_shape = hidden_shape
    self.threshold = threshold
    self.spike = activation if activation is not None else _DEFAULT_ACTIVATION

    if beta is None:
        self.beta = nnx.Param(
            nnx.initializers.truncated_normal(stddev=0.25)(
                rngs.params(), self.hidden_shape
            )
            + 0.5
        )
    else:
        self.beta = nnx.Param(jnp.full((), beta))

`parallel(x)`

Score a whole time-major sequence with an associative scan.

input with shape [Time, Batch, ...]. :return: spikes with shape [Time, Batch, ...].

Computes the full membrane trace V_t = beta * V_{t-1} + x_t (with V_{-1} = 0) via :func:jax.lax.associative_scan over the time axis in O(\log T) depth, then applies the surrogate spike pointwise.

Source code in spyx/nn.py

def parallel(self, x):
    r"""Score a whole time-major sequence with an associative scan.

    :x: input with shape ``[Time, Batch, ...]``.
    :return: spikes with shape ``[Time, Batch, ...]``.

    Computes the full membrane trace ``V_t = beta * V_{t-1} + x_t`` (with
    ``V_{-1} = 0``) via :func:`jax.lax.associative_scan` over the time axis
    in ``O(\log T)`` depth, then applies the surrogate spike pointwise.
    """
    beta = jnp.clip(self.beta[...], 0, 1)
    # Broadcast the (scalar or per-unit) leak to every (Time, Batch, ...)
    # element so the linear-recurrence coefficient A_t == beta everywhere.
    A = jnp.broadcast_to(beta, x.shape)
    _, V = jax.lax.associative_scan(_leaky_associative_op, (A, x), axis=0)
    return self.spike(V - self.threshold)

Bases: Module

Resonate-and-fire neuron: the complex/oscillatory sibling of PSU_LIF.

.. note:: Experimental. Its supported entry point is :class:spyx.experimental.ResonateFire; the API may change without a deprecation cycle. It is defined here for locality with the phasor layers.

A resonate-and-fire neuron carries a complex membrane that behaves as a damped harmonic oscillator. Written reset-free, its subthreshold dynamics are a complex linear recurrence

.. math:: z_t = a \, z_{t-1} + x_t , \qquad a = e^{\,\mathrm{dt}\,(-\lambda + i\,\omega)} ,

with per-unit decay :math:\lambda \ge 0 and angular frequency :math:\omega. The real input current x_t is injected into the real part of the membrane. Because there is no reset, the recurrence stays linear, so exactly like :class:spyx.nn.PSU_LIF it can be evaluated with :func:jax.lax.associative_scan in :math:O(\log T) parallel depth -- only now the scan runs over a complex pole a instead of a real leak.

Spikes are emitted by a pointwise surrogate threshold on the real part of the oscillator, :math:s_t = \sigma(\Re(z_t) - \text{threshold}). The rule is reset-free so the linear recurrence -- and therefore the parallel scan -- is preserved.

Stability: the pole magnitude is |a| = exp(-dt * lambda). Storing the decay through a softplus keeps :math:\lambda \ge 0, hence :math:|a| \le 1 and the oscillation never grows.

Parameters that enter the complex pole (lambda, omega) are stored as real float32 nnx.Param tensors, mirroring :class:PhasorLinear: the complex structure appears only in the forward pass, so a stock optax + jax.grad loop over a real loss trains them without the Wirtinger-conjugate surprise.

Two execution modes are provided and are numerically identical:

:meth:__call__ -- one reset-free timestep (x, z) -> (spikes, z) with z = a * z + x; a drop-in for :func:spyx.nn.run / :class:Sequential.
:meth:parallel -- the whole time-major sequence at once via an associative scan over the complex pole, :math:O(\log T) depth.

Because both modes use the same pole and surrogate and integrate the input before spiking, scanning :meth:__call__ over x reproduces :meth:parallel exactly.

Source code in spyx/phasor.py

class ResonateFire(nnx.Module):
    r"""Resonate-and-fire neuron: the complex/oscillatory sibling of ``PSU_LIF``.

    .. note::
       **Experimental.** Its supported entry point is
       :class:`spyx.experimental.ResonateFire`; the API may change without a
       deprecation cycle. It is defined here for locality with the phasor layers.


    A resonate-and-fire neuron carries a **complex** membrane that behaves as a
    damped harmonic oscillator. Written reset-free, its subthreshold dynamics
    are a *complex linear recurrence*

    .. math::
        z_t = a \, z_{t-1} + x_t , \qquad a = e^{\,\mathrm{dt}\,(-\lambda + i\,\omega)} ,

    with per-unit decay :math:`\lambda \ge 0` and angular frequency
    :math:`\omega`. The real input current ``x_t`` is injected into the *real*
    part of the membrane. Because there is no reset, the recurrence stays
    linear, so exactly like :class:`spyx.nn.PSU_LIF` it can be evaluated with
    :func:`jax.lax.associative_scan` in :math:`O(\log T)` parallel depth -- only
    now the scan runs over a *complex* pole ``a`` instead of a real leak.

    Spikes are emitted by a pointwise surrogate threshold on the real part of
    the oscillator, :math:`s_t = \sigma(\Re(z_t) - \text{threshold})`. The rule
    is reset-free so the linear recurrence -- and therefore the parallel scan --
    is preserved.

    Stability: the pole magnitude is ``|a| = exp(-dt * lambda)``. Storing the
    decay through a ``softplus`` keeps :math:`\lambda \ge 0`, hence
    :math:`|a| \le 1` and the oscillation never grows.

    Parameters that enter the complex pole (``lambda``, ``omega``) are stored as
    **real** ``float32`` ``nnx.Param`` tensors, mirroring :class:`PhasorLinear`:
    the complex structure appears only in the forward pass, so a stock
    ``optax`` + ``jax.grad`` loop over a real loss trains them without the
    Wirtinger-conjugate surprise.

    Two execution modes are provided and are numerically identical:

    * :meth:`__call__` -- one reset-free timestep ``(x, z) -> (spikes, z)`` with
      ``z = a * z + x``; a drop-in for :func:`spyx.nn.run` / :class:`Sequential`.
    * :meth:`parallel` -- the whole time-major sequence at once via an
      associative scan over the complex pole, :math:`O(\log T)` depth.

    Because both modes use the *same* pole and surrogate and integrate the input
    *before* spiking, scanning :meth:`__call__` over ``x`` reproduces
    :meth:`parallel` exactly.
    """

    def __init__(
        self,
        hidden_shape: tuple,
        lambda_init=None,
        omega_init=None,
        threshold: float = 1.0,
        dt: float = 1.0,
        activation=None,
        *,
        rngs: nnx.Rngs,
    ):
        """
        :hidden_shape: Per-unit shape of the layer.
        :lambda_init: Membrane decay ``>= 0``. Scalar constant if provided, else
            a learnable per-unit initialisation. Stored through ``softplus`` so
            the effective decay is always non-negative.
        :omega_init: Angular frequency of the oscillator. Scalar constant if
            provided, else a learnable per-unit initialisation.
        :threshold: Real firing threshold on ``Re(z)``. Defaults to 1.
        :dt: Integration timestep entering the pole ``exp(dt(-lambda+i*omega))``.
        :activation: :class:`spyx.axn.Axon` surrogate spike; defaults to
            ``superspike``.
        """
        if dt <= 0:
            raise ValueError(f"dt must be positive; got {dt}.")
        self.hidden_shape = hidden_shape
        self.threshold = threshold
        self.dt = dt
        self.spike = activation if activation is not None else _DEFAULT_ACTIVATION

        # Raw decay parameter; effective lambda = softplus(raw) >= 0 so |a| <= 1.
        if lambda_init is None:
            # Small positive decays: softplus(N(0.5, 0.25)) ~ light damping.
            raw = (
                nnx.initializers.truncated_normal(stddev=0.25)(
                    rngs.params(), self.hidden_shape
                )
                + 0.5
            )
            self.raw_lambda = nnx.Param(raw.astype(jnp.float32))
        else:
            self.raw_lambda = nnx.Param(
                _inverse_softplus(jnp.full((), float(lambda_init))).astype(jnp.float32)
            )

        if omega_init is None:
            # Spread frequencies around ~1 rad/step so units resonate distinctly.
            omega = (
                nnx.initializers.truncated_normal(stddev=0.5)(
                    rngs.params(), self.hidden_shape
                )
                + 1.0
            )
            self.omega = nnx.Param(omega.astype(jnp.float32))
        else:
            self.omega = nnx.Param(jnp.full((), float(omega_init)))

    @property
    def decay(self) -> jax.Array:
        """Effective non-negative decay ``lambda = softplus(raw_lambda)``."""
        return jax.nn.softplus(self.raw_lambda[...])

    @property
    def a(self) -> jax.Array:
        """Complex oscillator pole ``a = exp(dt(-lambda + i*omega))``.

        The magnitude ``|a| = exp(-dt * lambda) <= 1`` guarantees stability.
        """
        exponent = self.dt * (-self.decay + 1j * self.omega[...])
        return jnp.exp(exponent).astype(jnp.complex64)

    def __call__(self, x, z):
        """One reset-free timestep.

        :x: real input current from the previous layer, broadcastable to ``z``.
        :z: complex64 membrane state.

        Injects ``x`` into the real part of the membrane and advances the
        complex recurrence ``z = a * z + x`` (no reset), then emits a surrogate
        spike on ``Re(z)`` so that scanning this method matches :meth:`parallel`.
        """
        a = self.a
        z = a * z + x.astype(z.dtype)
        spikes = self.spike(jnp.real(z) - self.threshold)
        return spikes, z

    def parallel(self, x):
        r"""Score a whole time-major sequence with an associative scan.

        :x: real input with shape ``[Time, Batch, ...]``.
        :return: spikes with shape ``[Time, Batch, ...]``.

        Computes the full complex membrane trace ``z_t = a * z_{t-1} + x_t``
        (with ``z_{-1} = 0``) via :func:`jax.lax.associative_scan` over the time
        axis in :math:`O(\log T)` depth, then applies the surrogate spike
        pointwise on ``Re(z)``.
        """
        a = self.a
        xc = x.astype(jnp.complex64)
        # Broadcast the (scalar or per-unit) complex pole to every element so the
        # linear-recurrence coefficient a_t == a everywhere along the time axis.
        A = jnp.broadcast_to(a, xc.shape)
        _, z = jax.lax.associative_scan(_resonate_associative_op, (A, xc), axis=0)
        return self.spike(jnp.real(z) - self.threshold)

    def initial_state(self, batch_size):
        """Return complex64 zeros of shape ``(batch_size,) + hidden_shape``."""
        return jnp.zeros((batch_size,) + tuple(self.hidden_shape), dtype=jnp.complex64)

`a` `property`

Complex oscillator pole a = exp(dt(-lambda + i*omega)).

The magnitude |a| = exp(-dt * lambda) <= 1 guarantees stability.

`decay` `property`

Effective non-negative decay lambda = softplus(raw_lambda).

`call(x, z)`

One reset-free timestep.

real input current from the previous layer, broadcastable to z. :z: complex64 membrane state.

Injects x into the real part of the membrane and advances the complex recurrence z = a * z + x (no reset), then emits a surrogate spike on Re(z) so that scanning this method matches :meth:parallel.

Source code in spyx/phasor.py

def __call__(self, x, z):
    """One reset-free timestep.

    :x: real input current from the previous layer, broadcastable to ``z``.
    :z: complex64 membrane state.

    Injects ``x`` into the real part of the membrane and advances the
    complex recurrence ``z = a * z + x`` (no reset), then emits a surrogate
    spike on ``Re(z)`` so that scanning this method matches :meth:`parallel`.
    """
    a = self.a
    z = a * z + x.astype(z.dtype)
    spikes = self.spike(jnp.real(z) - self.threshold)
    return spikes, z

`init(hidden_shape, lambda_init=None, omega_init=None, threshold=1.0, dt=1.0, activation=None, *, rngs)`

:hidden_shape: Per-unit shape of the layer. :lambda_init: Membrane decay >= 0. Scalar constant if provided, else a learnable per-unit initialisation. Stored through softplus so the effective decay is always non-negative. :omega_init: Angular frequency of the oscillator. Scalar constant if provided, else a learnable per-unit initialisation. :threshold: Real firing threshold on Re(z). Defaults to 1. :dt: Integration timestep entering the pole exp(dt(-lambda+i*omega)). :activation: :class:spyx.axn.Axon surrogate spike; defaults to superspike.

Source code in spyx/phasor.py

def __init__(
    self,
    hidden_shape: tuple,
    lambda_init=None,
    omega_init=None,
    threshold: float = 1.0,
    dt: float = 1.0,
    activation=None,
    *,
    rngs: nnx.Rngs,
):
    """
    :hidden_shape: Per-unit shape of the layer.
    :lambda_init: Membrane decay ``>= 0``. Scalar constant if provided, else
        a learnable per-unit initialisation. Stored through ``softplus`` so
        the effective decay is always non-negative.
    :omega_init: Angular frequency of the oscillator. Scalar constant if
        provided, else a learnable per-unit initialisation.
    :threshold: Real firing threshold on ``Re(z)``. Defaults to 1.
    :dt: Integration timestep entering the pole ``exp(dt(-lambda+i*omega))``.
    :activation: :class:`spyx.axn.Axon` surrogate spike; defaults to
        ``superspike``.
    """
    if dt <= 0:
        raise ValueError(f"dt must be positive; got {dt}.")
    self.hidden_shape = hidden_shape
    self.threshold = threshold
    self.dt = dt
    self.spike = activation if activation is not None else _DEFAULT_ACTIVATION

    # Raw decay parameter; effective lambda = softplus(raw) >= 0 so |a| <= 1.
    if lambda_init is None:
        # Small positive decays: softplus(N(0.5, 0.25)) ~ light damping.
        raw = (
            nnx.initializers.truncated_normal(stddev=0.25)(
                rngs.params(), self.hidden_shape
            )
            + 0.5
        )
        self.raw_lambda = nnx.Param(raw.astype(jnp.float32))
    else:
        self.raw_lambda = nnx.Param(
            _inverse_softplus(jnp.full((), float(lambda_init))).astype(jnp.float32)
        )

    if omega_init is None:
        # Spread frequencies around ~1 rad/step so units resonate distinctly.
        omega = (
            nnx.initializers.truncated_normal(stddev=0.5)(
                rngs.params(), self.hidden_shape
            )
            + 1.0
        )
        self.omega = nnx.Param(omega.astype(jnp.float32))
    else:
        self.omega = nnx.Param(jnp.full((), float(omega_init)))

`initial_state(batch_size)`

Return complex64 zeros of shape (batch_size,) + hidden_shape.

Source code in spyx/phasor.py

def initial_state(self, batch_size):
    """Return complex64 zeros of shape ``(batch_size,) + hidden_shape``."""
    return jnp.zeros((batch_size,) + tuple(self.hidden_shape), dtype=jnp.complex64)

`parallel(x)`

Score a whole time-major sequence with an associative scan.

real input with shape [Time, Batch, ...]. :return: spikes with shape [Time, Batch, ...].

Computes the full complex membrane trace z_t = a * z_{t-1} + x_t (with z_{-1} = 0) via :func:jax.lax.associative_scan over the time axis in :math:O(\log T) depth, then applies the surrogate spike pointwise on Re(z).

Source code in spyx/phasor.py

def parallel(self, x):
    r"""Score a whole time-major sequence with an associative scan.

    :x: real input with shape ``[Time, Batch, ...]``.
    :return: spikes with shape ``[Time, Batch, ...]``.

    Computes the full complex membrane trace ``z_t = a * z_{t-1} + x_t``
    (with ``z_{-1} = 0``) via :func:`jax.lax.associative_scan` over the time
    axis in :math:`O(\log T)` depth, then applies the surrogate spike
    pointwise on ``Re(z)``.
    """
    a = self.a
    xc = x.astype(jnp.complex64)
    # Broadcast the (scalar or per-unit) complex pole to every element so the
    # linear-recurrence coefficient a_t == a everywhere along the time axis.
    A = jnp.broadcast_to(a, xc.shape)
    _, z = jax.lax.associative_scan(_resonate_associative_op, (A, xc), axis=0)
    return self.spike(jnp.real(z) - self.threshold)

spyx.experimental.raven

Raven Routing-Slot-Memory (RSM) block for Spyx.

A Flax NNX implementation of the Routing Slot Memory recurrence introduced by Raven (Afzal, Bick, Xing, Cevher, Gu, 2026; "High-recall sequence modeling with sparse memory routing"). Compressed-state recurrent models (a single SSM state with uniform decay) struggle with exact recall: every new token perturbs the whole state, so previously written associations interfere with each other.

Raven's fix is to partition the memory into M independent slots and use a learned sparse router r_t to write only the selected slots, leaving the rest untouched (shielded from interference). Writing slot m at step t:

.. math:: S_t = (1 - r_t) \odot S_{t-1} + r_t \odot ( D_t S_{t-1} A_t + U_t )

S_t: slot memory, shape (B, M, d_slot).
r_t \in [0, 1]^M: the per-slot router (ideally sparse). Unselected slots (r_t[m] ≈ 0) pass through unchanged; selected slots decay and are written.
U_t: the write (a projection of the current input).

The router is "a Mixture-of-Experts for memory". Two reductions are worth remembering (and are exercised by the tests):

a dense router (r_t all-ones) recovers a standard gated diagonal SSM,
a one-hot cyclic router recovers sliding-window attention.

Faithful-but-tractable simplification (documented, see :class:RavenRSM): the per-slot transition is made diagonal — the full matrix sandwich D_t S_{t-1} A_t is replaced by a per-slot (per-dim) decay a ⊙ S_{t-1}, so each slot is a gated diagonal recurrence. The full matrix-sandwich form is deferred. Likewise the recurrence is run with a plain :func:jax.lax.scan reference (honest baseline); because the per-step transition is input-dependent through the router gate (1 - r_t), the recurrence is a per-timestep diagonal linear recurrence and an associative / chunked associative_scan form is in principle possible (the Raven authors defer it to a "Part 2"), but is not implemented here.

`RavenRSM`

Bases: Module

Routing-Slot-Memory recurrent block (diagonal simplification).

Sequence-in / sequence-out, matching the :mod:spyx.ssm interface: __call__(u: (T, B, d_model)) -> (T, B, d_model).

Per step t the block computes, from u_t:

a sparse write router r_t = SlotRouter(u_t) \in [0, 1]^{(B, M)},
the write U_t = reshape(W_u u_t) \in (B, M, d_slot),

and updates the slot memory with the diagonal RSM recurrence

.. math:: S_t = (1 - r_t) \odot S_{t-1} + r_t \odot (a \odot S_{t-1} + U_t)

where a = sigmoid(raw_decay) \in (0, 1)^{(M, d_slot)} is a static, learnable per-slot / per-dim decay (kept in (0, 1) for stability; an input-dependent / selective decay is a straightforward extension but is not used here so the dense reduction stays a clean gated diagonal SSM). The recurrence is evaluated with :func:jax.lax.scan over time.

Readout (y_t): a query-gated read over slots. A learned query q_t = softmax(W_q u_t) \in (B, M) mixes the slots into a single read vector read_t = \sum_m q_t[m] S_t[m] \in (B, d_slot), which a linear map projects back to (B, d_model). This mirrors the routing idea on the read side: the query key selects which slot(s) to retrieve.

Simplifications (deferred, per the module docstring): (1) the full matrix-sandwich transition D_t S_{t-1} A_t is replaced by the diagonal decay a; (2) only a sequential lax.scan is provided — a chunked / associative-scan form is possible but deferred.

Source code in spyx/experimental/raven.py

class RavenRSM(nnx.Module):
    r"""Routing-Slot-Memory recurrent block (diagonal simplification).

    Sequence-in / sequence-out, matching the :mod:`spyx.ssm` interface:
    ``__call__(u: (T, B, d_model)) -> (T, B, d_model)``.

    Per step ``t`` the block computes, from ``u_t``:

    * a sparse write router ``r_t = SlotRouter(u_t) \in [0, 1]^{(B, M)}``,
    * the write ``U_t = reshape(W_u u_t) \in (B, M, d_slot)``,

    and updates the slot memory with the diagonal RSM recurrence

    .. math::
        S_t = (1 - r_t) \odot S_{t-1} + r_t \odot (a \odot S_{t-1} + U_t)

    where ``a = sigmoid(raw_decay) \in (0, 1)^{(M, d_slot)}`` is a **static,
    learnable per-slot / per-dim decay** (kept in ``(0, 1)`` for stability; an
    input-dependent / selective decay is a straightforward extension but is not
    used here so the dense reduction stays a clean gated diagonal SSM). The
    recurrence is evaluated with :func:`jax.lax.scan` over time.

    **Readout** (``y_t``): a query-gated read over slots. A learned query
    ``q_t = softmax(W_q u_t) \in (B, M)`` mixes the slots into a single read
    vector ``read_t = \sum_m q_t[m] S_t[m] \in (B, d_slot)``, which a linear map
    projects back to ``(B, d_model)``. This mirrors the routing idea on the read
    side: the query key selects which slot(s) to retrieve.

    Simplifications (deferred, per the module docstring): (1) the full
    matrix-sandwich transition ``D_t S_{t-1} A_t`` is replaced by the diagonal
    decay ``a``; (2) only a sequential ``lax.scan`` is provided — a chunked /
    associative-scan form is possible but deferred.
    """

    def __init__(
        self,
        d_model: int,
        n_slots: int = 8,
        d_slot: int | None = None,
        *,
        hard_top_k: int | None = None,
        decay_init: float = 0.9,
        rngs: nnx.Rngs,
    ):
        if d_slot is None:
            d_slot = d_model
        if n_slots < 1:
            raise ValueError(f"n_slots must be >= 1; got {n_slots}.")
        if d_slot < 1:
            raise ValueError(f"d_slot must be >= 1; got {d_slot}.")
        if not 0.0 < decay_init < 1.0:
            raise ValueError(f"decay_init must be in (0, 1); got {decay_init}.")

        self.d_model = d_model
        self.n_slots = n_slots
        self.d_slot = d_slot

        self.router = SlotRouter(d_model, n_slots, hard_top_k=hard_top_k, rngs=rngs)
        # Write projection: u_t -> (M * d_slot), reshaped to (M, d_slot).
        self.write = nnx.Linear(d_model, n_slots * d_slot, rngs=rngs)
        # Read side: query over slots + projection back to d_model.
        self.readout_query = nnx.Linear(d_model, n_slots, rngs=rngs)
        self.out_proj = nnx.Linear(d_slot, d_model, rngs=rngs)

        # Static learnable per-slot / per-dim decay, stored as a raw logit so
        # that a = sigmoid(raw_decay) stays in (0, 1). Init near ``decay_init``
        # (slow decay -> long memory) with a little jitter.
        logit = float(jnp.log(decay_init / (1.0 - decay_init)))
        noise = 0.01 * jax.random.normal(rngs.params(), (n_slots, d_slot))
        self.raw_decay = nnx.Param(jnp.full((n_slots, d_slot), logit) + noise)

    @property
    def decay(self) -> jax.Array:
        """Effective per-slot / per-dim decay ``a = sigmoid(raw_decay)`` in ``(0, 1)``."""
        return jax.nn.sigmoid(self.raw_decay[...])

    def initial_state(self, batch_size: int) -> jax.Array:
        """Return zero slot memory of shape ``(batch_size, M, d_slot)``."""
        return jnp.zeros((batch_size, self.n_slots, self.d_slot), dtype=jnp.float32)

    def _route(self, u_t: jax.Array) -> jax.Array:
        """Expose the router for reuse: ``u_t (..., d_model) -> r (..., M)``."""
        return self.router(u_t)

    def step(self, state: jax.Array, u_t: jax.Array) -> tuple[jax.Array, jax.Array]:
        """One reset-free RSM timestep.

        :state: slot memory ``S_{t-1}``, shape ``(B, M, d_slot)``.
        :u_t: input ``(B, d_model)``.
        :return: ``(S_t, y_t)`` with ``y_t`` of shape ``(B, d_model)``.
        """
        r_t = self.router(u_t)  # (B, M)
        U_t = self.write(u_t).reshape(u_t.shape[0], self.n_slots, self.d_slot)
        a = self.decay[None]  # (1, M, d_slot)
        gated = a * state + U_t
        r_exp = r_t[..., None]  # (B, M, 1)
        s_new = (1.0 - r_exp) * state + r_exp * gated
        attn = jax.nn.softmax(self.readout_query(u_t), axis=-1)  # (B, M)
        read = jnp.einsum("bm,bmd->bd", attn, s_new)  # (B, d_slot)
        y_t = self.out_proj(read)
        return s_new, y_t

    def _run(self, u: jax.Array, r: jax.Array) -> jax.Array:
        """Core recurrence with a *precomputed* router ``r`` of shape ``(T, B, M)``.

        Factored out so tests (and the dense-router reduction) can force ``r``.
        """
        T, B, _ = u.shape
        U = self.write(u).reshape(T, B, self.n_slots, self.d_slot)
        attn = jax.nn.softmax(self.readout_query(u), axis=-1)  # (T, B, M)
        a = self.decay[None]  # (1, M, d_slot)

        def scan_step(state, inp):
            r_t, U_t, attn_t = inp
            r_exp = r_t[..., None]
            gated = a * state + U_t
            s_new = (1.0 - r_exp) * state + r_exp * gated
            read = jnp.einsum("bm,bmd->bd", attn_t, s_new)
            return s_new, read

        s0 = self.initial_state(B)
        _, read_seq = jax.lax.scan(scan_step, s0, (r, U, attn))  # (T, B, d_slot)
        return self.out_proj(read_seq)

    def __call__(self, u: jax.Array) -> jax.Array:
        """Apply the RSM block to a time-major input.

        :u: real array of shape ``(T, B, d_model)``.
        :return: real array of shape ``(T, B, d_model)``.
        """
        if u.ndim != 3 or u.shape[-1] != self.d_model:
            raise ValueError(
                f"RavenRSM expects [T, B, d_model={self.d_model}]; got {u.shape}."
            )
        r = self.router(u)  # (T, B, M)
        return self._run(u, r)

`decay` `property`

Effective per-slot / per-dim decay a = sigmoid(raw_decay) in (0, 1).

`call(u)`

Apply the RSM block to a time-major input.

:u: real array of shape (T, B, d_model). :return: real array of shape (T, B, d_model).

Source code in spyx/experimental/raven.py

def __call__(self, u: jax.Array) -> jax.Array:
    """Apply the RSM block to a time-major input.

    :u: real array of shape ``(T, B, d_model)``.
    :return: real array of shape ``(T, B, d_model)``.
    """
    if u.ndim != 3 or u.shape[-1] != self.d_model:
        raise ValueError(
            f"RavenRSM expects [T, B, d_model={self.d_model}]; got {u.shape}."
        )
    r = self.router(u)  # (T, B, M)
    return self._run(u, r)

`initial_state(batch_size)`

Return zero slot memory of shape (batch_size, M, d_slot).

Source code in spyx/experimental/raven.py

def initial_state(self, batch_size: int) -> jax.Array:
    """Return zero slot memory of shape ``(batch_size, M, d_slot)``."""
    return jnp.zeros((batch_size, self.n_slots, self.d_slot), dtype=jnp.float32)

`step(state, u_t)`

One reset-free RSM timestep.

:state: slot memory S_{t-1}, shape (B, M, d_slot). :u_t: input (B, d_model). :return: (S_t, y_t) with y_t of shape (B, d_model).

Source code in spyx/experimental/raven.py

def step(self, state: jax.Array, u_t: jax.Array) -> tuple[jax.Array, jax.Array]:
    """One reset-free RSM timestep.

    :state: slot memory ``S_{t-1}``, shape ``(B, M, d_slot)``.
    :u_t: input ``(B, d_model)``.
    :return: ``(S_t, y_t)`` with ``y_t`` of shape ``(B, d_model)``.
    """
    r_t = self.router(u_t)  # (B, M)
    U_t = self.write(u_t).reshape(u_t.shape[0], self.n_slots, self.d_slot)
    a = self.decay[None]  # (1, M, d_slot)
    gated = a * state + U_t
    r_exp = r_t[..., None]  # (B, M, 1)
    s_new = (1.0 - r_exp) * state + r_exp * gated
    attn = jax.nn.softmax(self.readout_query(u_t), axis=-1)  # (B, M)
    read = jnp.einsum("bm,bmd->bd", attn, s_new)  # (B, d_slot)
    y_t = self.out_proj(read)
    return s_new, y_t

`SlotRouter`

Bases: Module

Learned per-slot write gate r_t = sigmoid(W_r u_t).

A small, reusable submodule (the spiking Raven variant reuses it). Maps an input of shape (..., d_model) to per-slot gates of shape (..., M) in [0, 1]. With hard_top_k set, the gate is additionally sparsified to the k most-active slots per row via a straight-through top-k (forward is sparse, gradients stay dense); the default (None) is a soft gate.

Design choice: a per-input sigmoid (independent per-slot Bernoulli logits) is used rather than a softmax so that several slots can be written at once (a multi-write MoE-for-memory), and so the dense all-ones reduction is reachable in the limit of large positive logits.

Source code in spyx/experimental/raven.py

class SlotRouter(nnx.Module):
    """Learned per-slot write gate ``r_t = sigmoid(W_r u_t)``.

    A small, reusable submodule (the spiking Raven variant reuses it). Maps an
    input of shape ``(..., d_model)`` to per-slot gates of shape ``(..., M)`` in
    ``[0, 1]``. With ``hard_top_k`` set, the gate is additionally sparsified to
    the ``k`` most-active slots per row via a straight-through top-``k`` (forward
    is sparse, gradients stay dense); the default (``None``) is a soft gate.

    Design choice: a per-input ``sigmoid`` (independent per-slot Bernoulli
    logits) is used rather than a ``softmax`` so that *several* slots can be
    written at once (a multi-write MoE-for-memory), and so the dense all-ones
    reduction is reachable in the limit of large positive logits.
    """

    def __init__(
        self,
        d_model: int,
        n_slots: int,
        *,
        hard_top_k: int | None = None,
        rngs: nnx.Rngs,
    ):
        if hard_top_k is not None and hard_top_k < 1:
            raise ValueError(f"hard_top_k must be >= 1 or None; got {hard_top_k}.")
        self.proj = nnx.Linear(d_model, n_slots, rngs=rngs)
        self.n_slots = n_slots
        self.hard_top_k = hard_top_k

    def __call__(self, u: jax.Array) -> jax.Array:
        """u: ``(..., d_model)`` -> gates ``(..., M)`` in ``[0, 1]``."""
        r = jax.nn.sigmoid(self.proj(u))
        if self.hard_top_k is not None:
            r = _straight_through_topk(r, self.hard_top_k)
        return r

`call(u)`

u: (..., d_model) -> gates (..., M) in [0, 1].

Source code in spyx/experimental/raven.py

def __call__(self, u: jax.Array) -> jax.Array:
    """u: ``(..., d_model)`` -> gates ``(..., M)`` in ``[0, 1]``."""
    r = jax.nn.sigmoid(self.proj(u))
    if self.hard_top_k is not None:
        r = _straight_through_topk(r, self.hard_top_k)
    return r

`SpikingSlotMemory`

Bases: Module

Spiking Routing-Slot Memory: a slot memory whose slots are spiking units.

This is the spiking sibling of :class:RavenRSM. It keeps the two ideas that make Raven a high-recall memory -- a bank of M independent slots and the same sparse write router -- but replaces each slot's linear accumulator with the reset-free spiking membrane of :class:spyx.nn.PSU_LIF: a leaky integrator V \leftarrow \beta V + x that emits a surrogate spike s = \sigma(V - \text{threshold}). The result is dual sparsity -- sparse in time (spikes) and sparse in slots (routing).

The slot membrane V_t has shape (B, M, d_slot). Per step t, from the input u_t:

the write router r_t = SlotRouter(u_t) \in [0, 1]^{(B, M)} (the exact router type reused from :class:RavenRSM -- self.router is a :class:SlotRouter, not a fork), and
the write U_t = reshape(W_u u_t) \in (B, M, d_slot).

The membrane is then advanced with the routed, reset-free spiking recurrence

.. math:: V_t = (1 - r_t) \odot V_{t-1} + r_t \odot (\beta \odot V_{t-1} + U_t), \qquad s_t = \sigma(V_t - \text{threshold}),

where \beta = sigmoid(raw_beta) \in (0, 1)^{(M, d_slot)} is a static, learnable per-slot / per-dim leak. Shielding: where r_t[m] = 0 the update collapses to V_t[m] = V_{t-1}[m] -- the slot's membrane (and hence its spike) is passed through byte-for-byte unchanged, shielded from interference exactly as in :class:RavenRSM. Where r_t[m] = 1 the slot runs a plain :class:spyx.nn.PSU_LIF step V \leftarrow \beta V + U_t.

Output is the raw slot spike train of shape (T, B, M, d_slot) (no dense readout projection -- the block is a spiking memory; compose a linear head downstream if real-valued outputs are needed).

Reset-freeness is deliberate: the membrane recurrence stays a first-order linear map per slot, so -- exactly as documented for :class:spyx.nn.PSU_LIF -- a chunked / :func:jax.lax.associative_scan parallel form is possible. Because the per-step transition here is input-dependent through the router gate (1 - r_t), the associative element is the affine map V \mapsto A_t V + b_t with A_t = (1 - r_t) + r_t \beta and b_t = r_t U_t; only the sequential :func:jax.lax.scan reference is implemented here (an honest baseline), matching :class:RavenRSM.

Reductions (exercised by the tests): a dense router (r_t all-ones) turns every slot into an independent, always-written :class:spyx.nn.PSU_LIF -- i.e. a plain bank of spiking leaky integrators driven by U_t; the routing is what makes it a memory.

Source code in spyx/experimental/raven.py

class SpikingSlotMemory(nnx.Module):
    r"""Spiking Routing-Slot Memory: a slot memory whose slots are *spiking* units.

    This is the spiking sibling of :class:`RavenRSM`. It keeps the two ideas that
    make Raven a high-recall memory -- a bank of ``M`` independent **slots** and
    the *same* sparse write **router** -- but replaces each slot's linear
    accumulator with the **reset-free spiking membrane** of
    :class:`spyx.nn.PSU_LIF`: a leaky integrator ``V \leftarrow \beta V + x`` that
    emits a surrogate spike ``s = \sigma(V - \text{threshold})``. The result is
    **dual sparsity** -- sparse in *time* (spikes) *and* sparse in *slots*
    (routing).

    The slot membrane ``V_t`` has shape ``(B, M, d_slot)``. Per step ``t``, from
    the input ``u_t``:

    * the write router ``r_t = SlotRouter(u_t) \in [0, 1]^{(B, M)}`` (the **exact**
      router type reused from :class:`RavenRSM` -- ``self.router`` is a
      :class:`SlotRouter`, not a fork), and
    * the write ``U_t = reshape(W_u u_t) \in (B, M, d_slot)``.

    The membrane is then advanced with the routed, reset-free spiking recurrence

    .. math::
        V_t = (1 - r_t) \odot V_{t-1} + r_t \odot (\beta \odot V_{t-1} + U_t),
        \qquad s_t = \sigma(V_t - \text{threshold}),

    where ``\beta = sigmoid(raw_beta) \in (0, 1)^{(M, d_slot)}`` is a static,
    learnable per-slot / per-dim leak. **Shielding:** where ``r_t[m] = 0`` the
    update collapses to ``V_t[m] = V_{t-1}[m]`` -- the slot's membrane (and hence
    its spike) is passed through byte-for-byte unchanged, shielded from
    interference exactly as in :class:`RavenRSM`. Where ``r_t[m] = 1`` the slot
    runs a plain :class:`spyx.nn.PSU_LIF` step ``V \leftarrow \beta V + U_t``.

    **Output** is the raw slot spike train of shape ``(T, B, M, d_slot)`` (no
    dense readout projection -- the block *is* a spiking memory; compose a linear
    head downstream if real-valued outputs are needed).

    Reset-freeness is deliberate: the membrane recurrence stays a first-order
    linear map per slot, so -- exactly as documented for :class:`spyx.nn.PSU_LIF`
    -- a chunked / :func:`jax.lax.associative_scan` parallel form is *possible*.
    Because the per-step transition here is *input-dependent* through the router
    gate ``(1 - r_t)``, the associative element is the affine map
    ``V \mapsto A_t V + b_t`` with ``A_t = (1 - r_t) + r_t \beta`` and
    ``b_t = r_t U_t``; only the sequential :func:`jax.lax.scan` reference is
    implemented here (an honest baseline), matching :class:`RavenRSM`.

    Reductions (exercised by the tests): a **dense** router (``r_t`` all-ones)
    turns every slot into an independent, always-written
    :class:`spyx.nn.PSU_LIF` -- i.e. a plain bank of spiking leaky integrators
    driven by ``U_t``; the routing is what makes it a *memory*.
    """

    def __init__(
        self,
        d_model: int,
        n_slots: int = 8,
        d_slot: int | None = None,
        *,
        hard_top_k: int | None = None,
        beta_init: float = 0.9,
        threshold: float = 1.0,
        activation=None,
        rngs: nnx.Rngs,
    ):
        """
        :d_model: Input feature width.
        :n_slots: Number of independent memory slots ``M``.
        :d_slot: Per-slot membrane width (defaults to ``d_model``).
        :hard_top_k: If set, the router keeps only its ``k`` most-active slots per
            step (straight-through top-``k``); the default is a soft gate.
        :beta_init: Initial per-slot leak in ``(0, 1)`` (stored as a logit).
        :threshold: Firing threshold on the membrane.
        :activation: :class:`spyx.axn.Axon` surrogate spike; defaults to
            ``superspike`` (matching :class:`spyx.nn.PSU_LIF`).
        :rngs: NNX PRNG collection.
        """
        if d_slot is None:
            d_slot = d_model
        if n_slots < 1:
            raise ValueError(f"n_slots must be >= 1; got {n_slots}.")
        if d_slot < 1:
            raise ValueError(f"d_slot must be >= 1; got {d_slot}.")
        if not 0.0 < beta_init < 1.0:
            raise ValueError(f"beta_init must be in (0, 1); got {beta_init}.")

        self.d_model = d_model
        self.n_slots = n_slots
        self.d_slot = d_slot
        self.threshold = threshold
        self.spike = activation if activation is not None else _DEFAULT_SPIKE

        # Reuse the *exact* router mechanism from RavenRSM (same SlotRouter class).
        self.router = SlotRouter(d_model, n_slots, hard_top_k=hard_top_k, rngs=rngs)
        # Write projection: u_t -> (M * d_slot), reshaped to (M, d_slot).
        self.write = nnx.Linear(d_model, n_slots * d_slot, rngs=rngs)

        # Static learnable per-slot / per-dim leak, stored as a raw logit so that
        # beta = sigmoid(raw_beta) stays in (0, 1). Init near ``beta_init`` (slow
        # leak -> long membrane memory) with a little jitter.
        logit = float(jnp.log(beta_init / (1.0 - beta_init)))
        noise = 0.01 * jax.random.normal(rngs.params(), (n_slots, d_slot))
        self.raw_beta = nnx.Param(jnp.full((n_slots, d_slot), logit) + noise)

    @property
    def beta(self) -> jax.Array:
        """Effective per-slot / per-dim leak ``beta = sigmoid(raw_beta)`` in ``(0, 1)``."""
        return jax.nn.sigmoid(self.raw_beta[...])

    def initial_state(self, batch_size: int) -> jax.Array:
        """Return zero slot membrane of shape ``(batch_size, M, d_slot)``."""
        return jnp.zeros((batch_size, self.n_slots, self.d_slot), dtype=jnp.float32)

    def _route(self, u_t: jax.Array) -> jax.Array:
        """Expose the reused router: ``u_t (..., d_model) -> r (..., M)``."""
        return self.router(u_t)

    def step(self, state: jax.Array, u_t: jax.Array) -> tuple[jax.Array, jax.Array]:
        """One reset-free spiking-slot timestep.

        :state: slot membrane ``V_{t-1}``, shape ``(B, M, d_slot)``.
        :u_t: input ``(B, d_model)``.
        :return: ``(V_t, s_t)`` -- the new membrane and the slot spikes of shape
            ``(B, M, d_slot)``.
        """
        r_t = self.router(u_t)  # (B, M)
        U_t = self.write(u_t).reshape(u_t.shape[0], self.n_slots, self.d_slot)
        beta = self.beta[None]  # (1, M, d_slot)
        gated = beta * state + U_t
        r_exp = r_t[..., None]  # (B, M, 1)
        v_new = (1.0 - r_exp) * state + r_exp * gated
        spikes = self.spike(v_new - self.threshold)
        return v_new, spikes

    def _run(self, u: jax.Array, r: jax.Array) -> jax.Array:
        """Core recurrence with a *precomputed* router ``r`` of shape ``(T, B, M)``.

        Factored out so tests (and the dense-router reduction) can force ``r``.
        """
        T, B, _ = u.shape
        U = self.write(u).reshape(T, B, self.n_slots, self.d_slot)
        beta = self.beta[None]  # (1, M, d_slot)

        def scan_step(state, inp):
            r_t, U_t = inp
            r_exp = r_t[..., None]
            gated = beta * state + U_t
            v_new = (1.0 - r_exp) * state + r_exp * gated
            spikes = self.spike(v_new - self.threshold)
            return v_new, spikes

        v0 = self.initial_state(B)
        _, spikes = jax.lax.scan(scan_step, v0, (r, U))  # (T, B, M, d_slot)
        return spikes

    def __call__(self, u: jax.Array) -> jax.Array:
        """Apply the spiking slot memory to a time-major input.

        :u: real array of shape ``(T, B, d_model)``.
        :return: spike train of shape ``(T, B, M, d_slot)``.
        """
        if u.ndim != 3 or u.shape[-1] != self.d_model:
            raise ValueError(
                f"SpikingSlotMemory expects [T, B, d_model={self.d_model}]; "
                f"got {u.shape}."
            )
        r = self.router(u)  # (T, B, M)
        return self._run(u, r)

`beta` `property`

Effective per-slot / per-dim leak beta = sigmoid(raw_beta) in (0, 1).

`call(u)`

Apply the spiking slot memory to a time-major input.

:u: real array of shape (T, B, d_model). :return: spike train of shape (T, B, M, d_slot).

Source code in spyx/experimental/raven.py

def __call__(self, u: jax.Array) -> jax.Array:
    """Apply the spiking slot memory to a time-major input.

    :u: real array of shape ``(T, B, d_model)``.
    :return: spike train of shape ``(T, B, M, d_slot)``.
    """
    if u.ndim != 3 or u.shape[-1] != self.d_model:
        raise ValueError(
            f"SpikingSlotMemory expects [T, B, d_model={self.d_model}]; "
            f"got {u.shape}."
        )
    r = self.router(u)  # (T, B, M)
    return self._run(u, r)

`init(d_model, n_slots=8, d_slot=None, *, hard_top_k=None, beta_init=0.9, threshold=1.0, activation=None, rngs)`

:d_model: Input feature width. :n_slots: Number of independent memory slots M. :d_slot: Per-slot membrane width (defaults to d_model). :hard_top_k: If set, the router keeps only its k most-active slots per step (straight-through top-k); the default is a soft gate. :beta_init: Initial per-slot leak in (0, 1) (stored as a logit). :threshold: Firing threshold on the membrane. :activation: :class:spyx.axn.Axon surrogate spike; defaults to superspike (matching :class:spyx.nn.PSU_LIF). :rngs: NNX PRNG collection.

Source code in spyx/experimental/raven.py

def __init__(
    self,
    d_model: int,
    n_slots: int = 8,
    d_slot: int | None = None,
    *,
    hard_top_k: int | None = None,
    beta_init: float = 0.9,
    threshold: float = 1.0,
    activation=None,
    rngs: nnx.Rngs,
):
    """
    :d_model: Input feature width.
    :n_slots: Number of independent memory slots ``M``.
    :d_slot: Per-slot membrane width (defaults to ``d_model``).
    :hard_top_k: If set, the router keeps only its ``k`` most-active slots per
        step (straight-through top-``k``); the default is a soft gate.
    :beta_init: Initial per-slot leak in ``(0, 1)`` (stored as a logit).
    :threshold: Firing threshold on the membrane.
    :activation: :class:`spyx.axn.Axon` surrogate spike; defaults to
        ``superspike`` (matching :class:`spyx.nn.PSU_LIF`).
    :rngs: NNX PRNG collection.
    """
    if d_slot is None:
        d_slot = d_model
    if n_slots < 1:
        raise ValueError(f"n_slots must be >= 1; got {n_slots}.")
    if d_slot < 1:
        raise ValueError(f"d_slot must be >= 1; got {d_slot}.")
    if not 0.0 < beta_init < 1.0:
        raise ValueError(f"beta_init must be in (0, 1); got {beta_init}.")

    self.d_model = d_model
    self.n_slots = n_slots
    self.d_slot = d_slot
    self.threshold = threshold
    self.spike = activation if activation is not None else _DEFAULT_SPIKE

    # Reuse the *exact* router mechanism from RavenRSM (same SlotRouter class).
    self.router = SlotRouter(d_model, n_slots, hard_top_k=hard_top_k, rngs=rngs)
    # Write projection: u_t -> (M * d_slot), reshaped to (M, d_slot).
    self.write = nnx.Linear(d_model, n_slots * d_slot, rngs=rngs)

    # Static learnable per-slot / per-dim leak, stored as a raw logit so that
    # beta = sigmoid(raw_beta) stays in (0, 1). Init near ``beta_init`` (slow
    # leak -> long membrane memory) with a little jitter.
    logit = float(jnp.log(beta_init / (1.0 - beta_init)))
    noise = 0.01 * jax.random.normal(rngs.params(), (n_slots, d_slot))
    self.raw_beta = nnx.Param(jnp.full((n_slots, d_slot), logit) + noise)

`initial_state(batch_size)`

Return zero slot membrane of shape (batch_size, M, d_slot).

Source code in spyx/experimental/raven.py

def initial_state(self, batch_size: int) -> jax.Array:
    """Return zero slot membrane of shape ``(batch_size, M, d_slot)``."""
    return jnp.zeros((batch_size, self.n_slots, self.d_slot), dtype=jnp.float32)

`step(state, u_t)`

One reset-free spiking-slot timestep.

:state: slot membrane V_{t-1}, shape (B, M, d_slot). :u_t: input (B, d_model). :return: (V_t, s_t) -- the new membrane and the slot spikes of shape (B, M, d_slot).

Source code in spyx/experimental/raven.py

def step(self, state: jax.Array, u_t: jax.Array) -> tuple[jax.Array, jax.Array]:
    """One reset-free spiking-slot timestep.

    :state: slot membrane ``V_{t-1}``, shape ``(B, M, d_slot)``.
    :u_t: input ``(B, d_model)``.
    :return: ``(V_t, s_t)`` -- the new membrane and the slot spikes of shape
        ``(B, M, d_slot)``.
    """
    r_t = self.router(u_t)  # (B, M)
    U_t = self.write(u_t).reshape(u_t.shape[0], self.n_slots, self.d_slot)
    beta = self.beta[None]  # (1, M, d_slot)
    gated = beta * state + U_t
    r_exp = r_t[..., None]  # (B, M, 1)
    v_new = (1.0 - r_exp) * state + r_exp * gated
    spikes = self.spike(v_new - self.threshold)
    return v_new, spikes

`make_recall_batch(key, *, batch=8, n_pairs=3, n_keys=8, n_values=8)`

Generate a multi-query associative-recall (MQAR-style) batch.

Each example is a sequence of n_pairs (key, value) bindings followed by a single query token equal to one of the presented keys. The target is the value bound to the queried key — a task compressed-state SSMs fail at but slot-routed memories solve, because each binding can live in its own (interference-free) slot.

Tokens are one-hot encoded into d_model = n_keys + n_values dims: key i -> e_i; value j -> e_{n_keys + j}. The query token reuses its key's encoding. Sequence length is T = 2 * n_pairs + 1.

PRNG key. :batch: number of independent examples. :n_pairs: key/value bindings per example (distinct keys, sampled w/o repl.). :n_keys: key vocabulary size (must be >= n_pairs). :n_values: value vocabulary size. :return: (u, target) where u is (T, B, d_model) float one-hots and target is (B,) int32 value ids for the query.

Source code in spyx/experimental/raven.py

def make_recall_batch(
    key: jax.Array,
    *,
    batch: int = 8,
    n_pairs: int = 3,
    n_keys: int = 8,
    n_values: int = 8,
) -> tuple[jax.Array, jax.Array]:
    """Generate a multi-query associative-recall (MQAR-style) batch.

    Each example is a sequence of ``n_pairs`` ``(key, value)`` bindings followed
    by a single **query** token equal to one of the presented keys. The target
    is the value bound to the queried key — a task compressed-state SSMs fail at
    but slot-routed memories solve, because each binding can live in its own
    (interference-free) slot.

    Tokens are one-hot encoded into ``d_model = n_keys + n_values`` dims: key
    ``i`` -> ``e_i``; value ``j`` -> ``e_{n_keys + j}``. The query token reuses
    its key's encoding. Sequence length is ``T = 2 * n_pairs + 1``.

    :key: PRNG key.
    :batch: number of independent examples.
    :n_pairs: key/value bindings per example (distinct keys, sampled w/o repl.).
    :n_keys: key vocabulary size (must be ``>= n_pairs``).
    :n_values: value vocabulary size.
    :return: ``(u, target)`` where ``u`` is ``(T, B, d_model)`` float one-hots
        and ``target`` is ``(B,)`` int32 value ids for the query.
    """
    if n_keys < n_pairs:
        raise ValueError(f"n_keys ({n_keys}) must be >= n_pairs ({n_pairs}).")
    d_model = n_keys + n_values
    T = 2 * n_pairs + 1

    keys_out = jnp.zeros((T, batch, d_model), dtype=jnp.float32)
    targets = jnp.zeros((batch,), dtype=jnp.int32)

    for b in range(batch):
        key, k_perm, k_val, k_q = jax.random.split(key, 4)
        # Distinct keys for this example.
        key_ids = jax.random.permutation(k_perm, n_keys)[:n_pairs]
        value_ids = jax.random.randint(k_val, (n_pairs,), 0, n_values)

        for p in range(n_pairs):
            kid = int(key_ids[p])
            vid = int(value_ids[p])
            keys_out = keys_out.at[2 * p, b, kid].set(1.0)
            keys_out = keys_out.at[2 * p + 1, b, n_keys + vid].set(1.0)

        q = int(jax.random.randint(k_q, (), 0, n_pairs))
        qid = int(key_ids[q])
        keys_out = keys_out.at[T - 1, b, qid].set(1.0)
        targets = targets.at[b].set(int(value_ids[q]))

    return keys_out, targets

spyx.experimental.compress

Bit-packed activation storage for memory-efficient BPTT.

Training spiking networks with backpropagation-through-time is dominated, memory-wise, by the activations saved for the backward pass. In an SNN the activations feeding each linear layer are the spikes, which are exactly {0, 1} valued. A dense op spikes @ weight normally stashes the full floating-point spikes tensor as its backward residual so it can later form dW = spikes^T @ g. Storing one bit per spike as a float wastes 8x-32x the memory it needs.

This module bit-packs that residual with :func:jax.numpy.packbits (8 spikes per uint8) and unpacks it lazily inside the backward pass. The forward output and both gradients (w.r.t. weight and spikes) are numerically identical to the naive spikes @ weight -- we only trade a cheap unpack-recompute for a large cut in the dominant activation residual.

Correctness relies on the input being exactly binary (values in {0, 1}); :func:packed_spike_dense is only valid for spike tensors, not arbitrary floats.

The lower half of this module generalises the same idea to quantized and sparse activations (graded sigma-delta events, ternary, int-N): pack at bits bits with :func:pack_nbit (bit-plane packing), use :func:packed_quant_dense for the k-bit BPTT residual, or — when the tensor is also sparse — store a 1-bit occupancy mask plus only the nonzero codes with :func:sparse_quant_pack. :func:packing_footprint gives the byte counts and the density crossover between the dense-k-bit and sparse schemes.

`pack_nbit(codes, bits, axis=-1)`

Bit-pack an integer-code tensor (values in [0, 2**bits)) along axis.

Generalises :func:pack_spikes (bits=1) to any width by packing each of the bits bit-planes with :func:jax.numpy.packbits and stacking them on a new leading axis. Storage is bits/8 bytes/element (a 32/bits x cut vs fp32).

Source code in spyx/experimental/compress.py

def pack_nbit(codes, bits, axis=-1):
    """Bit-pack an integer-code tensor (values in ``[0, 2**bits)``) along ``axis``.

    Generalises :func:`pack_spikes` (``bits=1``) to any width by packing each of the
    ``bits`` bit-planes with :func:`jax.numpy.packbits` and stacking them on a new
    leading axis. Storage is ``bits/8`` bytes/element (a ``32/bits`` x cut vs fp32).
    """
    codes = codes.astype(jnp.uint32)
    planes = [
        jnp.packbits(((codes >> b) & 1).astype(jnp.uint8), axis=axis)
        for b in range(bits)
    ]
    return jnp.stack(planes, axis=0)

`pack_spikes(x, axis=-1)`

Bit-pack a binary spike tensor along axis.

Mirrors the np.packbits(..., axis=...) convention used by :mod:spyx.data (which packs along the time axis): every group of 8 consecutive {0, 1} values along axis is packed into a single uint8, big-endian bit order. If the axis length is not a multiple of 8 the final byte is zero-padded on the low bits, so the original length must be supplied to :func:unpack_spikes to recover the exact tensor.

:param x: binary tensor (values in {0, 1}); cast to uint8. :param axis: axis along which to pack (default last). :return: uint8 tensor with ceil(len/8) entries along axis.

Source code in spyx/experimental/compress.py

def pack_spikes(x, axis=-1):
    """Bit-pack a binary spike tensor along ``axis``.

    Mirrors the ``np.packbits(..., axis=...)`` convention used by
    :mod:`spyx.data` (which packs along the time axis): every group of 8
    consecutive ``{0, 1}`` values along ``axis`` is packed into a single
    ``uint8``, big-endian bit order. If the axis length is not a multiple of
    8 the final byte is zero-padded on the low bits, so the original length
    must be supplied to :func:`unpack_spikes` to recover the exact tensor.

    :param x: binary tensor (values in ``{0, 1}``); cast to ``uint8``.
    :param axis: axis along which to pack (default last).
    :return: ``uint8`` tensor with ``ceil(len/8)`` entries along ``axis``.
    """
    return jnp.packbits(x.astype(jnp.uint8), axis=axis)

`packed_quant_dense(acts, weight, bits, step)`

acts @ weight with a k-bit-packed backward residual.

The k-bit generalisation of :func:packed_spike_dense: for activations that are grid-quantised (symmetric uniform grid of spacing step representable in bits signed levels -- e.g. graded sigma-delta events, ternary, int-N), the backward saves the bits-bit codes instead of the fp residual (a 32/bits x cut), unpacking them to reform dW = acts^T @ g exactly. First-order VJP only; exact iff acts lie on the grid {(c - 2**(bits-1)) * step}.

Source code in spyx/experimental/compress.py

@jax.custom_vjp
def packed_quant_dense(acts, weight, bits, step):
    """``acts @ weight`` with a **k-bit-packed** backward residual.

    The k-bit generalisation of :func:`packed_spike_dense`: for activations that are
    grid-quantised (symmetric uniform grid of spacing ``step`` representable in ``bits``
    signed levels -- e.g. graded sigma-delta events, ternary, int-N), the backward saves
    the ``bits``-bit codes instead of the fp residual (a ``32/bits`` x cut), unpacking
    them to reform ``dW = acts^T @ g`` exactly. First-order VJP only; exact iff ``acts``
    lie on the grid ``{(c - 2**(bits-1)) * step}``.
    """
    return _dense(acts, weight)

`packed_spike_dense(spikes, weight)`

spikes @ weight with a bit-packed backward residual.

Forward numerics are a plain matmul over the trailing feature axis of spikes (shape (..., in)) against weight (shape (in, out)), yielding (..., out). The custom VJP saves packbits(spikes) -- a uint8 tensor 8x smaller than spikes would be as bf16/fp -- instead of the dense activations, unpacking it in the backward pass to form dW = spikes^T @ g and dspikes = g @ weight^T.

Both first-order gradients equal those of the naive spikes @ weight.

Limitations: valid only when spikes is exactly binary (values in {0, 1}) -- packing a general float tensor silently binarizes the saved residual, so the forward stays exact but dW becomes wrong. Only the first-order VJP is correct; second-order derivatives (grad-of-grad) are not, since the packed residual is not itself differentiated. Both are fine for ordinary first-order BPTT, the intended use.

Source code in spyx/experimental/compress.py

@jax.custom_vjp
def packed_spike_dense(spikes, weight):
    """``spikes @ weight`` with a bit-packed backward residual.

    Forward numerics are a plain matmul over the trailing feature axis of
    ``spikes`` (shape ``(..., in)``) against ``weight`` (shape ``(in, out)``),
    yielding ``(..., out)``. The custom VJP saves ``packbits(spikes)`` -- a
    ``uint8`` tensor 8x smaller than ``spikes`` would be as bf16/fp -- instead
    of the dense activations, unpacking it in the backward pass to form
    ``dW = spikes^T @ g`` and ``dspikes = g @ weight^T``.

    Both first-order gradients equal those of the naive ``spikes @ weight``.

    Limitations: valid only when ``spikes`` is exactly binary (values in
    ``{0, 1}``) -- packing a general float tensor silently binarizes the saved
    residual, so the forward stays exact but ``dW`` becomes wrong. Only the
    first-order VJP is correct; second-order derivatives (grad-of-grad) are not,
    since the packed residual is not itself differentiated. Both are fine for
    ordinary first-order BPTT, the intended use.
    """
    return _dense(spikes, weight)

`packing_footprint(n_elements, bits, density)`

Bytes to store n_elements grid-quantised activations at bits bits and the given nonzero density, under three schemes, plus which one wins.

Schemes: fp32 (4 B/elem), dense_kbit (N*bits/8), and sparse (mask N/8 + nonzero codes nnz*bits/8). The sparse scheme wins below the (bits-1)/bits density crossover.

Source code in spyx/experimental/compress.py

def packing_footprint(n_elements, bits, density):
    """Bytes to store ``n_elements`` grid-quantised activations at ``bits`` bits and the
    given nonzero ``density``, under three schemes, plus which one wins.

    Schemes: ``fp32`` (4 B/elem), ``dense_kbit`` (``N*bits/8``), and
    ``sparse`` (mask ``N/8`` + nonzero codes ``nnz*bits/8``). The sparse scheme wins below
    the ``(bits-1)/bits`` density crossover.
    """
    import math

    nnz = round(density * n_elements)
    schemes = {
        "fp32": 4 * n_elements,
        "dense_%dbit" % bits: math.ceil(n_elements * bits / 8),
        "sparse_mask+%dbit" % bits: math.ceil(n_elements / 8)
        + math.ceil(nnz * bits / 8),
    }
    best = min(schemes, key=lambda name: schemes[name])
    return {**schemes, "best": best, "crossover_density": (bits - 1) / bits}

`sparse_quant_pack(x, bits, step)`

Pack a sparse + quantised tensor as (mask_packed, codes_packed, meta).

A 1-bit occupancy mask (packbits of x != 0) plus the nonzero values' bits-bit codes (:func:pack_nbit). Footprint ceil(N/8) + ceil(nnz*bits/8) bytes, which beats dense k-bit packing when density nnz/N < (bits-1)/bits. Exact for grid-quantised x. Eager (uses the dynamic nonzero count) -- for storage / event transmission, not a jit loop.

Source code in spyx/experimental/compress.py

def sparse_quant_pack(x, bits, step):
    """Pack a **sparse + quantised** tensor as ``(mask_packed, codes_packed, meta)``.

    A 1-bit occupancy mask (``packbits`` of ``x != 0``) plus the nonzero values' ``bits``-bit
    codes (:func:`pack_nbit`). Footprint ``ceil(N/8) + ceil(nnz*bits/8)`` bytes, which beats
    dense k-bit packing when density ``nnz/N < (bits-1)/bits``. Exact for grid-quantised ``x``.
    Eager (uses the dynamic nonzero count) -- for storage / event transmission, not a jit loop.
    """
    flat = x.reshape(-1)
    mask = flat != 0
    mask_packed = jnp.packbits(mask.astype(jnp.uint8))
    nz = flat[mask]
    offset = 1 << (bits - 1)
    codes = jnp.clip(
        jnp.round(nz / step).astype(jnp.int32) + offset, 0, (1 << bits) - 1
    )
    codes_packed = pack_nbit(codes.astype(jnp.uint32), bits, axis=-1)
    meta = {
        "shape": tuple(x.shape),
        "bits": bits,
        "step": float(step),
        "nnz": int(nz.size),
    }
    return mask_packed, codes_packed, meta

`sparse_quant_unpack(mask_packed, codes_packed, meta)`

Invert :func:sparse_quant_pack to the dense grid-quantised tensor.

Source code in spyx/experimental/compress.py

def sparse_quant_unpack(mask_packed, codes_packed, meta):
    """Invert :func:`sparse_quant_pack` to the dense grid-quantised tensor."""
    shape, bits, step, nnz = meta["shape"], meta["bits"], meta["step"], meta["nnz"]
    n = 1
    for d in shape:
        n *= d
    mask = jnp.unpackbits(mask_packed, count=n).astype(bool)
    codes = unpack_nbit(codes_packed, bits, nnz, axis=-1)
    offset = 1 << (bits - 1)
    vals = (codes.astype(jnp.float32) - offset) * step
    idx = jnp.nonzero(mask, size=nnz)[0]
    return jnp.zeros(n, jnp.float32).at[idx].set(vals).reshape(shape)

`unpack_nbit(packed, bits, length, axis=-1)`

Invert :func:pack_nbit, recovering integer codes length long on axis.

Source code in spyx/experimental/compress.py

def unpack_nbit(packed, bits, length, axis=-1):
    """Invert :func:`pack_nbit`, recovering integer codes ``length`` long on ``axis``."""
    out = None
    for b in range(bits):
        plane = jnp.unpackbits(packed[b], axis=axis, count=length).astype(jnp.uint32)
        shifted = plane << jnp.uint32(b)
        out = shifted if out is None else (out | shifted)
    return out

`unpack_spikes(packed, length, axis=-1)`

Invert :func:pack_spikes, recovering length values along axis.

:param packed: uint8 tensor produced by :func:pack_spikes. :param length: original (pre-pack) size of axis; trims the zero padding introduced when length is not a multiple of 8. :param axis: axis along which the tensor was packed (default last). :return: uint8 tensor of {0, 1} values, length long on axis.

Source code in spyx/experimental/compress.py

def unpack_spikes(packed, length, axis=-1):
    """Invert :func:`pack_spikes`, recovering ``length`` values along ``axis``.

    :param packed: ``uint8`` tensor produced by :func:`pack_spikes`.
    :param length: original (pre-pack) size of ``axis``; trims the zero
        padding introduced when ``length`` is not a multiple of 8.
    :param axis: axis along which the tensor was packed (default last).
    :return: ``uint8`` tensor of ``{0, 1}`` values, ``length`` long on ``axis``.
    """
    return jnp.unpackbits(packed, axis=axis, count=length)

spyx.experimental.stochastic

Experimental stochastic / parallelizable spiking-neuron prototypes.

Stochastic (Bernoulli-spiking) neurons and the SPSN prototype, all built on the parallel prefix-scan (_pscan) membrane. Research-stage; the promoted, production reset-free neuron is :class:spyx.experimental.PSU_LIF (in spyx.nn). See [[SPSN]] (arXiv:2306.12666).

`SPSN`

Bases: Module

Prototype implementation of Stochastic Parallelizable Spiking Neuron:

https://doi.org/10.48550/arXiv.2306.12666

Source code in spyx/experimental/stochastic.py

class SPSN(nnx.Module):
    """
    Prototype implementation of Stochastic Parallelizable Spiking Neuron:

    https://doi.org/10.48550/arXiv.2306.12666
    """

    def __init__(self, hidden_shape: tuple, threshold=1, k=10, *, rngs: nnx.Rngs):
        self.hidden_shape = hidden_shape
        self.threshold = threshold
        self.spike = sigmoid_bernoulli(k, threshold)

        self.beta = nnx.Param(
            nnx.initializers.truncated_normal(stddev=0.25)(
                rngs.params(), self.hidden_shape
            )
            + 0.5
        )

    def __call__(self, key, x):
        beta = jnp.clip(self.beta[:], 0, 1)

        # per-neuron, per-timestep decay kernel B[t, c] = beta[c]**t * (1 - beta[c])
        T = x.shape[1]
        B = jnp.power(beta[None, :], jnp.arange(T)[:, None]) * (1 - beta[None, :])

        fft_B = jnp.fft.rfft(B, n=2 * T, axis=0)[None, :, :]
        fft_X = jnp.fft.rfft(x, n=2 * T, axis=1)

        V = jnp.fft.irfft(fft_X * fft_B, n=2 * T, axis=1)[:, :T, :]

        # calculate whether spike is generated, and update membrane potential
        spikes = self.spike(V, key)

        return spikes, V

`beta = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5)` `instance-attribute`

`hidden_shape = hidden_shape` `instance-attribute`

`spike = sigmoid_bernoulli(k, threshold)` `instance-attribute`

`threshold = threshold` `instance-attribute`

`call(key, x)`

Source code in spyx/experimental/stochastic.py

def __call__(self, key, x):
    beta = jnp.clip(self.beta[:], 0, 1)

    # per-neuron, per-timestep decay kernel B[t, c] = beta[c]**t * (1 - beta[c])
    T = x.shape[1]
    B = jnp.power(beta[None, :], jnp.arange(T)[:, None]) * (1 - beta[None, :])

    fft_B = jnp.fft.rfft(B, n=2 * T, axis=0)[None, :, :]
    fft_X = jnp.fft.rfft(x, n=2 * T, axis=1)

    V = jnp.fft.irfft(fft_X * fft_B, n=2 * T, axis=1)[:, :T, :]

    # calculate whether spike is generated, and update membrane potential
    spikes = self.spike(V, key)

    return spikes, V

`init(hidden_shape, threshold=1, k=10, *, rngs)`

Source code in spyx/experimental/stochastic.py

def __init__(self, hidden_shape: tuple, threshold=1, k=10, *, rngs: nnx.Rngs):
    self.hidden_shape = hidden_shape
    self.threshold = threshold
    self.spike = sigmoid_bernoulli(k, threshold)

    self.beta = nnx.Param(
        nnx.initializers.truncated_normal(stddev=0.25)(
            rngs.params(), self.hidden_shape
        )
        + 0.5
    )

`StochasticAssociativeCuBaLIF`

Bases: Module

Source code in spyx/experimental/stochastic.py

class StochasticAssociativeCuBaLIF(nnx.Module):
    def __init__(self, hidden_shape, threshold=1, k=100, *, rngs: nnx.Rngs):
        self.hidden_shape = hidden_shape
        self.spike = refractory_sigmoid_bernoulli(k, threshold)

        self.alpha = nnx.Param(
            nnx.initializers.truncated_normal(stddev=0.25)(
                rngs.params(), self.hidden_shape
            )
            + 0.5
        )
        self.beta = nnx.Param(
            nnx.initializers.truncated_normal(stddev=0.25)(
                rngs.params(), self.hidden_shape
            )
            + 0.5
        )

    def __call__(self, key, u):
        alpha = jnp.clip(self.alpha[:], 0, 1)
        beta = jnp.clip(self.beta[:], 0, 1)

        # this can probably be condensed.
        _, x = jax.vmap(_pscan, in_axes=(None, 0))(alpha, u)
        _, V = jax.vmap(_pscan, in_axes=(None, 0))(beta, x)

        return self.spike(V, key)

`alpha = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5)` `instance-attribute`

`beta = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5)` `instance-attribute`

`hidden_shape = hidden_shape` `instance-attribute`

`spike = refractory_sigmoid_bernoulli(k, threshold)` `instance-attribute`

`call(key, u)`

Source code in spyx/experimental/stochastic.py

def __call__(self, key, u):
    alpha = jnp.clip(self.alpha[:], 0, 1)
    beta = jnp.clip(self.beta[:], 0, 1)

    # this can probably be condensed.
    _, x = jax.vmap(_pscan, in_axes=(None, 0))(alpha, u)
    _, V = jax.vmap(_pscan, in_axes=(None, 0))(beta, x)

    return self.spike(V, key)

`init(hidden_shape, threshold=1, k=100, *, rngs)`

Source code in spyx/experimental/stochastic.py

def __init__(self, hidden_shape, threshold=1, k=100, *, rngs: nnx.Rngs):
    self.hidden_shape = hidden_shape
    self.spike = refractory_sigmoid_bernoulli(k, threshold)

    self.alpha = nnx.Param(
        nnx.initializers.truncated_normal(stddev=0.25)(
            rngs.params(), self.hidden_shape
        )
        + 0.5
    )
    self.beta = nnx.Param(
        nnx.initializers.truncated_normal(stddev=0.25)(
            rngs.params(), self.hidden_shape
        )
        + 0.5
    )

`StochasticAssociativeLIF`

Bases: Module

Source code in spyx/experimental/stochastic.py

class StochasticAssociativeLIF(nnx.Module):
    def __init__(self, hidden_shape, threshold=1, k=100, spike=True, *, rngs: nnx.Rngs):
        self.hidden_shape = hidden_shape
        self.threshold = threshold
        if spike:
            self.spike = sigmoid_bernoulli(k, threshold)
        else:
            self.spike = lambda x, k: x

        self.beta = nnx.Param(
            nnx.initializers.truncated_normal(stddev=0.25)(
                rngs.params(), self.hidden_shape
            )
            + 0.5
        )

    # x.shape = B, T, C
    def __call__(self, key, x):
        beta = jnp.clip(self.beta[:], 0, 1)

        _, V = jax.vmap(_pscan, in_axes=(None, 0))(beta, x)

        return self.spike(V, key), V

`beta = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5)` `instance-attribute`

`hidden_shape = hidden_shape` `instance-attribute`

`spike = sigmoid_bernoulli(k, threshold)` `instance-attribute`

`threshold = threshold` `instance-attribute`

`call(key, x)`

Source code in spyx/experimental/stochastic.py

def __call__(self, key, x):
    beta = jnp.clip(self.beta[:], 0, 1)

    _, V = jax.vmap(_pscan, in_axes=(None, 0))(beta, x)

    return self.spike(V, key), V

`init(hidden_shape, threshold=1, k=100, spike=True, *, rngs)`

Source code in spyx/experimental/stochastic.py

def __init__(self, hidden_shape, threshold=1, k=100, spike=True, *, rngs: nnx.Rngs):
    self.hidden_shape = hidden_shape
    self.threshold = threshold
    if spike:
        self.spike = sigmoid_bernoulli(k, threshold)
    else:
        self.spike = lambda x, k: x

    self.beta = nnx.Param(
        nnx.initializers.truncated_normal(stddev=0.25)(
            rngs.params(), self.hidden_shape
        )
        + 0.5
    )

`refractory_sigmoid_bernoulli(k=50, threshold=1)`

Source code in spyx/experimental/stochastic.py

def refractory_sigmoid_bernoulli(k=50, threshold=1):
    freq = 2 * jnp.pi * threshold

    @jax.custom_gradient
    def activation(x, key):
        U = x - threshold
        r = jnp.cos(freq * U)
        s = jax.nn.sigmoid(k * U)
        p_n = jnp.maximum(r * s, 0)
        return jax.random.bernoulli(key, p_n).astype(U.dtype), lambda g: (g * p_n, None)

    return activation

`sigmoid_bernoulli(k=10, threshold=1.0, max_prob=0.8)`

Source code in spyx/experimental/stochastic.py

def sigmoid_bernoulli(k=10, threshold=1.0, max_prob=0.8):
    @jax.custom_gradient
    def activation(x, key):
        U = x - threshold
        p_n = jax.nn.sigmoid(k * U) * max_prob
        return jax.random.bernoulli(key, p_n).astype(U.dtype), lambda g: (g * p_n, None)

    return activation

spyx.experimental.hybrid

Surrogate-gradient descent corrected by an orthogonalised evolutionary term. See Surrogate gradients & Gaussian smoothing for the theory and Training methods for where it fits.

Hybrid surrogate-gradient / evolutionary training for spiking networks.

.. note:: Experimental. Unstable API — may change without a deprecation cycle. Import it as from spyx.experimental.hybrid import hybrid_gradient (the :mod:spyx.experimental.hybrid submodule is importable without touching the package __init__).

The idea

Surrogate-gradient descent through a spiking network is cheap but biased: the true forward objective uses a hard Heaviside spike whose gradient is zero almost everywhere, so we substitute a smooth surrogate (spyx.axn) in the backward pass. The resulting direction descends a related landscape, not the true one, and the mismatch is a systematic bias.

Evolutionary strategies (ES / NES) estimate the gradient of the true (hard-spike, non-differentiable) objective from forward evaluations alone, with no surrogate at all. Pure ES is unbiased but high-variance and slow to converge in high dimensions.

hybrid_gradient combines the two so that ES pays only for what the surrogate gets wrong:

g_s = ∇θ loss_surrogate(θ) — the cheap, biased bulk descent direction (one jax.grad through the surrogate spikes).
g_es — an antithetic NES estimate of the gradient of the true loss, drawn over the full flattened parameter vector::

g_es = 1/(2 σ K) Σ_k [loss_true(θ + σ ε_k) − loss_true(θ − σ ε_k)] ε_k, ε_k ~ N(0, I).
Global orthogonalisation (over the whole flattened vector, not per-leaf). Let ĝ_s = g_s / (‖g_s‖ + eps). Project the ES estimate onto the subspace the surrogate does not already cover::

g_orth = g_es − ⟨g_es, ĝ_s⟩ ĝ_s.
Corrected gradient: g = g_s + λ · g_orth.

The surrogate supplies the bulk direction; ES supplies only the correction in the subspace where the surrogate is blind (its bias). Orthogonalising avoids double-counting directions the surrogate already handles. This is the exact complement of Guided-ES (Maheswaranathan et al. 2019), which restricts the ES search to the surrogate's subspace; here we restrict it to the orthogonal complement and add it as an error-correction term.

Self-normalising `λ`

With a raw λ the correction magnitude λ·‖g_orth‖ depends on the ES smoothing σ and sample count K — when ‖g_orth‖ ≫ ‖g_s‖ (common with a high-variance estimate) even a modest λ lets the correction swamp the bulk direction and hurt. Passing normalize=True reinterprets λ as a dimensionless fraction of the surrogate step: the correction is rescaled by λ · ‖g_s‖ / ‖g_orth‖ so that ‖applied correction‖ = λ · ‖g_s‖ exactly, regardless of the ES scale. λ = 0.2 then means "nudge the surrogate step by at most 20 % in the direction it is blind to," which transfers across regimes and keeps the ES term from ever dominating.

Surrogate-steered Self-Guided ES (variance reduction)

The orthogonal correction above lives in the high-dimensional complement, so it is high variance — the reason the raw λ blows up. :func:sges_gradient takes the dual view of Self-Guided ES (Liu et al., IJCAI 2020): instead of adding ES only in the complement, it uses the surrogate direction as SGES's guiding subspace and stratifies the sampling — the along-guide directional derivative is measured exactly (one antithetic pair, no Monte-Carlo variance) and the ES budget is spent on the orthogonal complement. The result is an unbiased estimate of the true gradient with several-fold lower variance than isotropic ES at the same budget. Where Guided ES concentrates ES inside the surrogate subspace, and the orthogonal hybrid puts it entirely outside, SGES does both — cheap-and-exact in-subspace, sampled in the complement.

All the linear algebra happens on the flat parameter vector via :func:jax.flatten_util.ravel_pytree, and perturbations are applied by nnx.split → perturb-flat → nnx.merge, so the machinery is agnostic to the model's pytree structure.

`LossFn = Callable[..., jax.Array]` `module-attribute`

(model, *batch) -> scalar loss. Surrogate losses must be differentiable through the spyx.axn surrogate spikes; true losses need only be evaluable.

`es_gradient(model, loss_true, key, *, batch=(), num_samples=8, sigma=0.01)`

Pure antithetic-NES gradient of the true loss as a param pytree.

Gradient-free: only forward evaluations of loss_true are used, so the loss may be non-differentiable (hard Heaviside spikes, hard accuracy, …). Returned grads match model's Param structure and drop into optimizer.update(model, grads). This is the "pure ES" baseline arm; it is also the term :func:hybrid_gradient orthogonalises against the surrogate.

:param model: the Spyx / Flax NNX module whose params are perturbed. :param loss_true: (model, *batch) -> scalar true objective. :param key: a jax.random.PRNGKey; antithetic pairs share ε. :param batch: extra positional args forwarded to the loss (e.g. (x, y)). :param num_samples: number K of antithetic perturbation pairs. :param sigma: perturbation scale σ (smoothing radius of the estimate). :return: an nnx.State of gradients matching the model's Param pytree.

Source code in spyx/experimental/hybrid.py

def es_gradient(
    model: nnx.Module,
    loss_true: LossFn,
    key: jax.Array,
    *,
    batch: tuple[Any, ...] = (),
    num_samples: int = 8,
    sigma: float = 0.01,
) -> nnx.State:
    """Pure antithetic-NES gradient of the *true* loss as a param pytree.

    Gradient-free: only forward evaluations of ``loss_true`` are used, so the
    loss may be non-differentiable (hard Heaviside spikes, hard accuracy, …).
    Returned grads match ``model``'s ``Param`` structure and drop into
    ``optimizer.update(model, grads)``. This is the "pure ES" baseline arm; it is
    also the term :func:`hybrid_gradient` orthogonalises against the surrogate.

    :param model: the Spyx / Flax NNX module whose params are perturbed.
    :param loss_true: ``(model, *batch) -> scalar`` true objective.
    :param key: a ``jax.random.PRNGKey``; antithetic pairs share ``ε``.
    :param batch: extra positional args forwarded to the loss (e.g. ``(x, y)``).
    :param num_samples: number ``K`` of antithetic perturbation pairs.
    :param sigma: perturbation scale ``σ`` (smoothing radius of the estimate).
    :return: an ``nnx.State`` of gradients matching the model's ``Param`` pytree.
    """
    graphdef, theta, unravel, rest = _split_and_ravel(model)

    def true_loss_flat(flat):
        return loss_true(nnx.merge(graphdef, unravel(flat), rest), *batch)

    g_es = _es_flat(theta, true_loss_flat, key, num_samples=num_samples, sigma=sigma)
    return unravel(g_es)

`hybrid_diagnostics(model, loss_surrogate, loss_true, key, *, batch=(), num_samples=8, sigma=0.01, lam=1.0, eps=1e-08, normalize=False)`

Diagnostics for a hybrid step without applying it.

Returns a dict describing the correction the ES term contributes:

cosine — ⟨g_es, ĝ_s⟩ / ‖g_es‖: alignment of the ES estimate with the surrogate direction. Near ±1 means ES mostly re-derives the surrogate (little to correct); near 0 means ES points somewhere the surrogate is blind (the regime where hybrid should help).
g_orth_norm — ‖g_orth‖: magnitude of the (raw) orthogonal correction.
g_s_norm / g_es_norm — the two source magnitudes.
proj — the scalar projection ⟨g_es, ĝ_s⟩.
lam_eff — the weight actually applied to g_orth (equals lam unless normalize, where it is lam · ‖g_s‖ / ‖g_orth‖).
correction_fraction — ‖lam_eff · g_orth‖ / ‖g_s‖ (equals lam in normalize mode); how big the applied correction is next to the surrogate.
g_s / g_es / g_orth — the flat vectors themselves.

Same signature as :func:hybrid_gradient (minus return_diagnostics).

Source code in spyx/experimental/hybrid.py

def hybrid_diagnostics(
    model: nnx.Module,
    loss_surrogate: LossFn,
    loss_true: LossFn,
    key: jax.Array,
    *,
    batch: tuple[Any, ...] = (),
    num_samples: int = 8,
    sigma: float = 0.01,
    lam: float = 1.0,
    eps: float = 1e-8,
    normalize: bool = False,
) -> dict[str, jax.Array]:
    r"""Diagnostics for a hybrid step *without* applying it.

    Returns a dict describing the correction the ES term contributes:

    - ``cosine`` — ``⟨g_es, ĝ_s⟩ / ‖g_es‖``: alignment of the ES estimate with the
      surrogate direction. Near ``±1`` means ES mostly re-derives the surrogate
      (little to correct); near ``0`` means ES points somewhere the surrogate is
      blind (the regime where hybrid should help).
    - ``g_orth_norm`` — ``‖g_orth‖``: magnitude of the (raw) orthogonal correction.
    - ``g_s_norm`` / ``g_es_norm`` — the two source magnitudes.
    - ``proj`` — the scalar projection ``⟨g_es, ĝ_s⟩``.
    - ``lam_eff`` — the weight actually applied to ``g_orth`` (equals ``lam`` unless
      ``normalize``, where it is ``lam · ‖g_s‖ / ‖g_orth‖``).
    - ``correction_fraction`` — ``‖lam_eff · g_orth‖ / ‖g_s‖`` (equals ``lam`` in
      ``normalize`` mode); how big the applied correction is next to the surrogate.
    - ``g_s`` / ``g_es`` / ``g_orth`` — the flat vectors themselves.

    Same signature as :func:`hybrid_gradient` (minus ``return_diagnostics``).
    """
    _, _, diagnostics = _hybrid_flat(
        model,
        loss_surrogate,
        loss_true,
        key,
        batch=batch,
        num_samples=num_samples,
        sigma=sigma,
        lam=lam,
        eps=eps,
        normalize=normalize,
    )
    return diagnostics

`hybrid_gradient(model, loss_surrogate, loss_true, key, *, batch=(), num_samples=8, sigma=0.01, lam=1.0, eps=1e-08, normalize=False, return_diagnostics=False)`

Surrogate gradient corrected by orthogonalised evolutionary strategies.

Computes g = g_s + λ · g_orth (see the module docstring for the full derivation), where g_s is the surrogate gradient and g_orth is the antithetic-NES estimate of the true gradient with its surrogate-aligned component projected out. The returned grads match model's Param pytree, so::

grads = hybrid_gradient(model, loss_surrogate, loss_true, key, batch=(x, y))
optimizer.update(model, grads)

:param model: the Spyx / Flax NNX module to differentiate. :param loss_surrogate: differentiable (model, *batch) -> scalar (surrogate spikes). Supplies the cheap biased bulk direction g_s. :param loss_true: (model, *batch) -> scalar true objective (may be non-differentiable / hard-spike); evaluated only in the forward pass. :param key: a jax.random.PRNGKey for the ES perturbations. :param batch: extra positional args forwarded to both losses (e.g. (x, y)). :param num_samples: number K of antithetic perturbation pairs. :param sigma: ES perturbation scale σ. :param lam: weight λ on the orthogonal ES correction. λ = 0 recovers pure surrogate descent. With normalize=True it is a dimensionless fraction of the surrogate step rather than a raw scale. :param eps: numerical floor for the normalisation of g_s. :param normalize: if True, self-normalise the correction so its magnitude is exactly λ · ‖g_s‖ — the ES term becomes a bounded fraction of the surrogate step, immune to the ES variance/σ scaling that otherwise lets ‖g_orth‖ swamp ‖g_s‖. Recommended when tuning λ across regimes. :param return_diagnostics: if True also return the diagnostics dict from :func:hybrid_diagnostics (cosine, g_orth_norm, lam_eff, correction_fraction, the flat vectors, …). :return: an nnx.State of grads, or (grads, diagnostics) if return_diagnostics.

Source code in spyx/experimental/hybrid.py

def hybrid_gradient(
    model: nnx.Module,
    loss_surrogate: LossFn,
    loss_true: LossFn,
    key: jax.Array,
    *,
    batch: tuple[Any, ...] = (),
    num_samples: int = 8,
    sigma: float = 0.01,
    lam: float = 1.0,
    eps: float = 1e-8,
    normalize: bool = False,
    return_diagnostics: bool = False,
):
    r"""Surrogate gradient corrected by orthogonalised evolutionary strategies.

    Computes ``g = g_s + λ · g_orth`` (see the module docstring for the full
    derivation), where ``g_s`` is the surrogate gradient and ``g_orth`` is the
    antithetic-NES estimate of the *true* gradient with its surrogate-aligned
    component projected out. The returned grads match ``model``'s ``Param``
    pytree, so::

        grads = hybrid_gradient(model, loss_surrogate, loss_true, key, batch=(x, y))
        optimizer.update(model, grads)

    :param model: the Spyx / Flax NNX module to differentiate.
    :param loss_surrogate: differentiable ``(model, *batch) -> scalar`` (surrogate
        spikes). Supplies the cheap biased bulk direction ``g_s``.
    :param loss_true: ``(model, *batch) -> scalar`` true objective (may be
        non-differentiable / hard-spike); evaluated only in the forward pass.
    :param key: a ``jax.random.PRNGKey`` for the ES perturbations.
    :param batch: extra positional args forwarded to both losses (e.g. ``(x, y)``).
    :param num_samples: number ``K`` of antithetic perturbation pairs.
    :param sigma: ES perturbation scale ``σ``.
    :param lam: weight ``λ`` on the orthogonal ES correction. ``λ = 0`` recovers
        pure surrogate descent. With ``normalize=True`` it is a dimensionless
        *fraction of the surrogate step* rather than a raw scale.
    :param eps: numerical floor for the normalisation of ``g_s``.
    :param normalize: if ``True``, self-normalise the correction so its magnitude
        is exactly ``λ · ‖g_s‖`` — the ES term becomes a bounded fraction of the
        surrogate step, immune to the ES variance/``σ`` scaling that otherwise lets
        ``‖g_orth‖`` swamp ``‖g_s‖``. Recommended when tuning ``λ`` across regimes.
    :param return_diagnostics: if ``True`` also return the diagnostics dict from
        :func:`hybrid_diagnostics` (``cosine``, ``g_orth_norm``, ``lam_eff``,
        ``correction_fraction``, the flat vectors, …).
    :return: an ``nnx.State`` of grads, or ``(grads, diagnostics)`` if
        ``return_diagnostics``.
    """
    unravel, g, diagnostics = _hybrid_flat(
        model,
        loss_surrogate,
        loss_true,
        key,
        batch=batch,
        num_samples=num_samples,
        sigma=sigma,
        lam=lam,
        eps=eps,
        normalize=normalize,
    )
    grads = unravel(g)
    if return_diagnostics:
        return grads, diagnostics
    return grads

`make_hybrid_train_step(loss_surrogate, loss_true, *, num_samples=8, sigma=0.01, lam=1.0, normalize=False)`

Build a single-step hybrid updater.

The returned callable has signature (model, optimizer, key, *batch) -> true_loss and mutates model / optimizer in place via NNX, mirroring :func:spyx.optimize.make_train_step but using :func:hybrid_gradient to build the update. The scalar returned is loss_true evaluated at the pre-update parameters (the objective the ES term actually targets).

:param loss_surrogate: differentiable (model, *batch) -> scalar. :param loss_true: (model, *batch) -> scalar true objective. :param num_samples: number K of antithetic perturbation pairs. :param sigma: ES perturbation scale σ. :param lam: weight λ on the orthogonal ES correction. :param normalize: self-normalise the correction to λ · ‖g_s‖ (see :func:hybrid_gradient). :return: step(model, optimizer, key, *batch) -> true_loss.

Source code in spyx/experimental/hybrid.py

def make_hybrid_train_step(
    loss_surrogate: LossFn,
    loss_true: LossFn,
    *,
    num_samples: int = 8,
    sigma: float = 0.01,
    lam: float = 1.0,
    normalize: bool = False,
) -> Callable[..., jax.Array]:
    r"""Build a single-step hybrid updater.

    The returned callable has signature ``(model, optimizer, key, *batch) ->
    true_loss`` and mutates ``model`` / ``optimizer`` in place via NNX, mirroring
    :func:`spyx.optimize.make_train_step` but using :func:`hybrid_gradient` to
    build the update. The scalar returned is ``loss_true`` evaluated at the
    *pre-update* parameters (the objective the ES term actually targets).

    :param loss_surrogate: differentiable ``(model, *batch) -> scalar``.
    :param loss_true: ``(model, *batch) -> scalar`` true objective.
    :param num_samples: number ``K`` of antithetic perturbation pairs.
    :param sigma: ES perturbation scale ``σ``.
    :param lam: weight ``λ`` on the orthogonal ES correction.
    :param normalize: self-normalise the correction to ``λ · ‖g_s‖`` (see
        :func:`hybrid_gradient`).
    :return: ``step(model, optimizer, key, *batch) -> true_loss``.
    """

    def step(model, optimizer, key, *batch):
        grads = hybrid_gradient(
            model,
            loss_surrogate,
            loss_true,
            key,
            batch=tuple(batch),
            num_samples=num_samples,
            sigma=sigma,
            lam=lam,
            normalize=normalize,
        )
        loss = loss_true(model, *batch)
        optimizer.update(model, grads)
        return loss

    return step

`make_sges_hybrid_train_step(loss_surrogate, loss_true, *, num_samples=8, sigma=0.01, lam=1.0)`

Single-step updater using :func:sges_gradient (surrogate-steered SGES).

step(model, optimizer, key, *batch) -> true_loss — mirrors :func:make_hybrid_train_step but builds the update with the variance-reduced Self-Guided-ES gradient. Returns loss_true at the pre-update parameters.

Source code in spyx/experimental/hybrid.py

def make_sges_hybrid_train_step(
    loss_surrogate: LossFn,
    loss_true: LossFn,
    *,
    num_samples: int = 8,
    sigma: float = 0.01,
    lam: float = 1.0,
) -> Callable[..., jax.Array]:
    r"""Single-step updater using :func:`sges_gradient` (surrogate-steered SGES).

    ``step(model, optimizer, key, *batch) -> true_loss`` — mirrors
    :func:`make_hybrid_train_step` but builds the update with the variance-reduced
    Self-Guided-ES gradient. Returns ``loss_true`` at the pre-update parameters.
    """

    def step(model, optimizer, key, *batch):
        grads = sges_gradient(
            model,
            loss_surrogate,
            loss_true,
            key,
            batch=tuple(batch),
            num_samples=num_samples,
            sigma=sigma,
            lam=lam,
        )
        loss = loss_true(model, *batch)
        optimizer.update(model, grads)
        return loss

    return step

`sges_gradient(model, loss_surrogate, loss_true, key, *, batch=(), num_samples=8, sigma=0.01, lam=1.0, eps=1e-08, return_diagnostics=False)`

Surrogate-steered Self-Guided ES — a variance-reduced 0+1 gradient.

The surrogate gradient g_s steers a Self-Guided-ES estimate g_es of the true (hard-spike) loss gradient: g_s picks the guiding direction, its along-guide component is measured exactly, and ES spends its whole budget on the orthogonal complement (see :func:_sges_flat). Returns g = (1-λ)·g_s + λ·g_es — λ=1 descends on the variance-reduced true- gradient estimate (the surrogate only steers sampling), λ=0 recovers pure surrogate descent.

This is the "0+1" method in its variance-reduced form: the 1st-order surrogate steers where the 0th-order ES samples land, buying Self-Guided ES's variance reduction (Liu et al., IJCAI 2020) with a surrogate-defined guiding subspace. Unlike :func:hybrid_gradient (which adds ES only in the orthogonal complement), ES here also corrects the surrogate's magnitude/sign along its own direction, via the exact directional derivative a.

:param model: the Spyx / Flax NNX module to differentiate. :param loss_surrogate: differentiable (model, *batch) -> scalar (the guide). :param loss_true: (model, *batch) -> scalar true objective (forward-only). :param key: jax.random.PRNGKey for the orthogonal ES perturbations. :param batch: extra positional args forwarded to both losses. :param num_samples: total antithetic pairs K (1 along-guide, K-1 orth). :param sigma: ES perturbation scale σ. :param lam: blend g = (1-λ)g_s + λ g_es; λ=1 = pure SGES estimate. :param eps: numerical floor for normalisation. :param return_diagnostics: also return the diagnostics dict. :return: an nnx.State of grads, or (grads, diagnostics).

Source code in spyx/experimental/hybrid.py

def sges_gradient(
    model: nnx.Module,
    loss_surrogate: LossFn,
    loss_true: LossFn,
    key: jax.Array,
    *,
    batch: tuple[Any, ...] = (),
    num_samples: int = 8,
    sigma: float = 0.01,
    lam: float = 1.0,
    eps: float = 1e-8,
    return_diagnostics: bool = False,
):
    r"""Surrogate-steered Self-Guided ES — a variance-reduced 0+1 gradient.

    The surrogate gradient ``g_s`` *steers* a Self-Guided-ES estimate ``g_es`` of
    the true (hard-spike) loss gradient: ``g_s`` picks the guiding direction, its
    along-guide component is measured exactly, and ES spends its whole budget on
    the orthogonal complement (see :func:`_sges_flat`). Returns
    ``g = (1-λ)·g_s + λ·g_es`` — ``λ=1`` descends on the variance-reduced true-
    gradient estimate (the surrogate only steers sampling), ``λ=0`` recovers pure
    surrogate descent.

    This is the "0+1" method in its variance-reduced form: the 1st-order surrogate
    steers where the 0th-order ES samples land, buying Self-Guided ES's variance
    reduction (Liu et al., IJCAI 2020) with a surrogate-defined guiding subspace.
    Unlike :func:`hybrid_gradient` (which adds ES *only* in the orthogonal
    complement), ES here also corrects the surrogate's magnitude/sign *along* its
    own direction, via the exact directional derivative ``a``.

    :param model: the Spyx / Flax NNX module to differentiate.
    :param loss_surrogate: differentiable ``(model, *batch) -> scalar`` (the guide).
    :param loss_true: ``(model, *batch) -> scalar`` true objective (forward-only).
    :param key: ``jax.random.PRNGKey`` for the orthogonal ES perturbations.
    :param batch: extra positional args forwarded to both losses.
    :param num_samples: total antithetic pairs ``K`` (1 along-guide, ``K-1`` orth).
    :param sigma: ES perturbation scale ``σ``.
    :param lam: blend ``g = (1-λ)g_s + λ g_es``; ``λ=1`` = pure SGES estimate.
    :param eps: numerical floor for normalisation.
    :param return_diagnostics: also return the diagnostics dict.
    :return: an ``nnx.State`` of grads, or ``(grads, diagnostics)``.
    """
    graphdef, theta, unravel, rest = _split_and_ravel(model)

    def surrogate_loss_flat(flat):
        return loss_surrogate(nnx.merge(graphdef, unravel(flat), rest), *batch)

    def true_loss_flat(flat):
        return loss_true(nnx.merge(graphdef, unravel(flat), rest), *batch)

    g_s = jax.grad(surrogate_loss_flat)(theta)
    g_es, a, u, g_orth = _sges_flat(
        theta, true_loss_flat, g_s, key, num_samples=num_samples, sigma=sigma, eps=eps
    )
    g = (1.0 - lam) * g_s + lam * g_es
    grads = unravel(g)
    if return_diagnostics:
        g_s_norm = jnp.linalg.norm(g_s)
        g_es_norm = jnp.linalg.norm(g_es)
        diagnostics = {
            "along_guide_deriv": a,  # exact ⟨∇f_true, û_s⟩ (0th-order, 1 pair)
            "surrogate_along_guide": jnp.dot(g_s, u),  # the surrogate's own magnitude
            "g_orth_norm": jnp.linalg.norm(g_orth),
            "g_s_norm": g_s_norm,
            "g_es_norm": g_es_norm,
            # cosine of the SGES true-gradient estimate with the surrogate.
            "cosine": jnp.dot(g_es, g_s) / (g_es_norm * g_s_norm + eps),
            "g_s": g_s,
            "g_es": g_es,
            "g_orth": g_orth,
        }
        return grads, diagnostics
    return grads

spyx.experimental.zoo

Runnable recipes tagged by application × training method × architecture. Each Recipe exposes build / synthetic_batch / demo on synthetic data; browse with list_recipes(application=..., method=...).

The Spyx recipe zoo — runnable, synthetic-data SNN recipes.

Experimental. This whole subpackage lives under :mod:spyx.experimental; its API may change without a deprecation cycle.

Each recipe is a self-contained, download-free example of training a spiking / state-space model for one application, tagged by method × architecture:

============== ============== ============= ============== application method architecture module ============== ============== ============= ============== control evolutionary LIF-MLP :mod:.control classification surrogate RSNN :mod:.classification language surrogate S5 :mod:.language ============== ============== ============= ==============

Every recipe exposes the same small surface via a :class:Recipe record:

build(rngs) -> nnx.Module — construct the model.
synthetic_batch(...) -> tuple — sample a download-free batch.
loss(model, *batch) -> scalar — a finite objective on that batch.
demo(steps=...) -> list[float] — run a few train/evolve steps and return the fitness/loss history.

The zoo is importable as its own subpackage — from spyx.experimental.zoo import REGISTRY, list_recipes, get — without touching spyx.experimental.__init__.

`Recipe` `dataclass`

A single runnable recipe, keyed by application and tagged by method × arch.

:name: unique registry key. :application: one of 'control', 'classification', 'language'. :method: training method, e.g. 'evolutionary', 'surrogate', 'conversion', 'hybrid'. :architecture: model family, e.g. 'LIF-MLP', 'RSNN', 'S5'. :build: (nnx.Rngs) -> nnx.Module model constructor. :synthetic_batch: (...) -> tuple download-free batch sampler; the tuple is splatted into loss after the model. :describe: one-line human-readable description. :loss: (model, *batch) -> scalar finite objective (fitness for evolutionary recipes, training loss for gradient recipes). :demo: (steps=...) -> list[float] short run returning a fitness/loss history.

Source code in spyx/experimental/zoo/__init__.py

@dataclass(frozen=True)
class Recipe:
    """A single runnable recipe, keyed by application and tagged by method × arch.

    :name: unique registry key.
    :application: one of ``'control'``, ``'classification'``, ``'language'``.
    :method: training method, e.g. ``'evolutionary'``, ``'surrogate'``,
        ``'conversion'``, ``'hybrid'``.
    :architecture: model family, e.g. ``'LIF-MLP'``, ``'RSNN'``, ``'S5'``.
    :build: ``(nnx.Rngs) -> nnx.Module`` model constructor.
    :synthetic_batch: ``(...) -> tuple`` download-free batch sampler; the tuple
        is splatted into ``loss`` after the model.
    :describe: one-line human-readable description.
    :loss: ``(model, *batch) -> scalar`` finite objective (fitness for
        evolutionary recipes, training loss for gradient recipes).
    :demo: ``(steps=...) -> list[float]`` short run returning a fitness/loss
        history.
    """

    name: str
    application: str
    method: str
    architecture: str
    build: Callable[[nnx.Rngs], nnx.Module]
    synthetic_batch: Callable[..., tuple]
    describe: str
    loss: Callable[..., object]
    demo: Callable[..., list[float]]

`get(name)`

Look up a recipe by name, raising KeyError with the valid keys.

Source code in spyx/experimental/zoo/__init__.py

def get(name: str) -> Recipe:
    """Look up a recipe by name, raising ``KeyError`` with the valid keys."""
    if name not in REGISTRY:
        raise KeyError(f"unknown recipe {name!r}; available: {sorted(REGISTRY)}")
    return REGISTRY[name]

`list_recipes(application=None, method=None)`

Return recipes, optionally filtered by application and/or method.

:application: keep only recipes with this application (None = any). :method: keep only recipes with this method (None = any). :return: list of matching :class:Recipe records.

Source code in spyx/experimental/zoo/__init__.py

def list_recipes(
    application: str | None = None, method: str | None = None
) -> list[Recipe]:
    """Return recipes, optionally filtered by ``application`` and/or ``method``.

    :application: keep only recipes with this application (``None`` = any).
    :method: keep only recipes with this method (``None`` = any).
    :return: list of matching :class:`Recipe` records.
    """
    return [
        recipe
        for recipe in REGISTRY.values()
        if (application is None or recipe.application == application)
        and (method is None or recipe.method == method)
    ]

spyx.experimental.matfree

Multiplication-light layers you build with, rather than convert to: ternary (BitNet) weights collapse the matmul to signed accumulations, power-of-two (DeepShift) weights to bit-shifts. Trained from scratch / QAT via straight-through estimators. See Training methods for where this sits relative to post-training quantization (spyx.quant).

Matmul-free linear primitives — ternary (BitNet) and shift-add (DeepShift).

.. note:: Experimental / sketch. Unstable API. These are the native (train-from- scratch, QAT) counterpart to the post-training :func:spyx.quant.bitnet_ternary_rules path: layers whose forward pass replaces the expensive multiplies of a dense matmul with cheap accumulations (ternary) or bit-shifts (power-of-two), so you can build multiplication-light architectures rather than convert them.

The multiply-free idea

A dense layer y = x @ W costs in*out multiplies. Two ways to remove them:

Ternary (BitNet b1.58; Scalable MatMul-free LM, Zhu et al. 2024). Constrain W to {-1, 0, +1} times a per-tensor scale β. Then y = β · (Σ_{W=+1} x − Σ_{W=-1} x) — pure signed accumulation, plus one scale multiply per output.
Shift-add (DeepShift, Elhoushi et al. 2021; ShiftAddLLM, You et al. 2024). Constrain W to signed powers of two ±2^p. Then W·x = ± (x << p) — a bit-shift and a sign, no multiply, on fixed-point hardware.

Both are trained with a straight-through estimator (STE): the forward uses the quantised weight, the backward flows to a full-precision shadow weight.

The spiking synthesis

Spyx's real leverage here: a binary spike activation (s ∈ {0,1}, from any :mod:spyx.nn neuron) times a ternary weight (∈ {-1,0,+1}) is a fully add-only operation — no multiplies anywhere in the layer. Pair these layers with spiking neurons (or feed :func:spyx.experimental.compress.pack_spikes outputs) to get networks that are matmul-free in both operands. See the roadmap in the module for the matmul-free-LM block (ternary channel-mixer + an SSM / ternary-GRU token mixer) that this is the substrate for.

`MLGRU`

Bases: Module

MatMul-free Linear GRU token mixer (Zhu et al., 2024).

The multiply-free replacement for attention: instead of an O(T²) QKᵀ matmul, tokens are mixed by a causal element-wise linear recurrence

.. math:: h_t = f_t \odot h_{t-1} + (1 - f_t) \odot c_t,\qquad y_t = W_o (g_t \odot h_t)

where the gate/candidate projections (f, c, g) and the output o are :class:TernaryLinear (accumulation-only) and everything else is element-wise. The recurrence is a first-order linear scan — parallelisable with jax.lax.associative_scan the same way :class:spyx.experimental.PSU_LIF is; here it uses jax.lax.scan for clarity.

Input/output are batch-major [B, T, D]; the mixing is strictly causal.

Source code in spyx/experimental/matfree.py

class MLGRU(nnx.Module):
    r"""MatMul-free Linear GRU token mixer (Zhu et al., 2024).

    The multiply-free replacement for attention: instead of an ``O(T²)`` ``QKᵀ``
    matmul, tokens are mixed by a **causal element-wise linear recurrence**

    .. math::
        h_t = f_t \odot h_{t-1} + (1 - f_t) \odot c_t,\qquad y_t = W_o (g_t \odot h_t)

    where the gate/candidate projections (``f``, ``c``, ``g``) and the output ``o``
    are :class:`TernaryLinear` (accumulation-only) and everything else is
    element-wise. The recurrence is a first-order linear scan — parallelisable with
    ``jax.lax.associative_scan`` the same way :class:`spyx.experimental.PSU_LIF` is;
    here it uses ``jax.lax.scan`` for clarity.

    Input/output are batch-major ``[B, T, D]``; the mixing is strictly causal.
    """

    def __init__(self, dim: int, hidden: int, *, rngs: nnx.Rngs, activation_bits=8):
        self.f = TernaryLinear(dim, hidden, rngs=rngs, activation_bits=activation_bits)
        self.c = TernaryLinear(dim, hidden, rngs=rngs, activation_bits=activation_bits)
        self.g = TernaryLinear(dim, hidden, rngs=rngs, activation_bits=activation_bits)
        self.o = TernaryLinear(hidden, dim, rngs=rngs, activation_bits=activation_bits)

    def __call__(self, x):
        f = jax.nn.sigmoid(self.f(x))  # forget gate      [B, T, H]
        c = jax.nn.silu(self.c(x))  # candidate state   [B, T, H]
        g = jax.nn.sigmoid(self.g(x))  # output gate       [B, T, H]
        i = (1.0 - f) * c  # input contribution

        f_t = jnp.moveaxis(f, 1, 0)  # time-major for the scan [T, B, H]
        i_t = jnp.moveaxis(i, 1, 0)

        def recur(h, fi):
            ft, it = fi
            h = ft * h + it  # element-wise linear recurrence
            return h, h

        h0 = jnp.zeros((x.shape[0], f.shape[-1]), x.dtype)
        _, h_t = jax.lax.scan(recur, h0, (f_t, i_t))
        h = jnp.moveaxis(h_t, 0, 1)  # back to [B, T, H]
        return self.o(g * h)

`MatMulFreeBlock`

Bases: Module

A matmul-free transformer-style block: pre-norm, MLGRU mixer, ternary MLP.

x = x + MLGRU(RMSNorm(x)); x = x + TernaryMLP(RMSNorm(x)). Every dense operation is ternary (accumulation-only); the token mixing is an element-wise recurrence. Stack these for a matmul-free language model — swap it into research/new/ternary_llm in place of a Transformer block and read the efficiency off spyx.bench.

:param mlp_ratio: channel-mixer hidden width as a multiple of dim.

Source code in spyx/experimental/matfree.py

class MatMulFreeBlock(nnx.Module):
    """A matmul-free transformer-style block: pre-norm, MLGRU mixer, ternary MLP.

    ``x = x + MLGRU(RMSNorm(x));  x = x + TernaryMLP(RMSNorm(x))``. Every dense
    operation is ternary (accumulation-only); the token mixing is an element-wise
    recurrence. Stack these for a matmul-free language model — swap it into
    ``research/new/ternary_llm`` in place of a Transformer block and read the
    efficiency off ``spyx.bench``.

    :param mlp_ratio: channel-mixer hidden width as a multiple of ``dim``.
    """

    def __init__(
        self,
        dim: int,
        *,
        rngs: nnx.Rngs,
        hidden: int | None = None,
        mlp_ratio: int = 4,
        activation_bits: int | None = 8,
    ):
        self.norm1 = RMSNorm(dim, rngs=rngs)
        self.mixer = MLGRU(
            dim, hidden or dim, rngs=rngs, activation_bits=activation_bits
        )
        self.norm2 = RMSNorm(dim, rngs=rngs)
        self.mlp = TernaryMLP(
            dim, dim * mlp_ratio, rngs=rngs, activation_bits=activation_bits
        )

    def __call__(self, x):
        x = x + self.mixer(self.norm1(x))
        x = x + self.mlp(self.norm2(x))
        return x

`RMSNorm`

Bases: Module

Root-mean-square layer norm — a per-feature rescale, no matmul.

The only non-accumulation op in a matmul-free block: an element-wise normalisation (O(D) work), negligible next to a dense layer's O(D²).

Source code in spyx/experimental/matfree.py

class RMSNorm(nnx.Module):
    """Root-mean-square layer norm — a per-feature rescale, no matmul.

    The only non-accumulation op in a matmul-free block: an element-wise
    normalisation (O(D) work), negligible next to a dense layer's O(D²).
    """

    def __init__(self, dim: int, *, rngs: nnx.Rngs, eps: float = 1e-6):
        self.scale = nnx.Param(jnp.ones((dim,)))
        self.eps = eps

    def __call__(self, x):
        rms = jnp.sqrt(jnp.mean(x**2, axis=-1, keepdims=True) + self.eps)
        return (x / rms) * self.scale[...]

`ShiftAddLinear`

Bases: Module

Dense layer with signed-power-of-two weights — DeepShift / ShiftAdd.

Forward: y = x @ W_po2 + b with W_po2 = ±2^p; each product is a shift on fixed-point hardware. Trained via STE through a full-precision shadow weight.

:param min_exp/max_exp: clamp range for the exponents p.

Source code in spyx/experimental/matfree.py

class ShiftAddLinear(nnx.Module):
    """Dense layer with signed-power-of-two weights — DeepShift / ShiftAdd.

    Forward: ``y = x @ W_po2 + b`` with ``W_po2 = ±2^p``; each product is a shift on
    fixed-point hardware. Trained via STE through a full-precision shadow weight.

    :param min_exp/max_exp: clamp range for the exponents ``p``.
    """

    def __init__(
        self,
        in_features: int,
        out_features: int,
        *,
        rngs: nnx.Rngs,
        use_bias: bool = False,
        min_exp: int = -8,
        max_exp: int = 0,
    ):
        self.w = nnx.Param(
            nnx.initializers.lecun_normal()(rngs.params(), (in_features, out_features))
        )
        self.bias = nnx.Param(jnp.zeros((out_features,))) if use_bias else None
        self.min_exp = min_exp
        self.max_exp = max_exp

    def __call__(self, x):
        w = self.w[...]
        w_q = ste(w, power_of_two_weights(w, self.min_exp, self.max_exp))
        y = x @ w_q
        if self.bias is not None:
            y = y + self.bias[...]
        return y

`TernaryLinear`

Bases: Module

Dense layer with ternary {-1,0,+1} weights — a BitNet BitLinear.

Forward: y = β · (x_q @ W_ternary) + b. The x_q @ W_ternary product is accumulation-only. With activation_bits set, activations are absmax-quantised first (BitNet b1.58 + a8). Trained via STE through a full-precision shadow weight.

:param activation_bits: if set, quantise inputs to this many bits (e.g. 8).

Source code in spyx/experimental/matfree.py

class TernaryLinear(nnx.Module):
    """Dense layer with ternary ``{-1,0,+1}`` weights — a BitNet ``BitLinear``.

    Forward: ``y = β · (x_q @ W_ternary) + b``. The ``x_q @ W_ternary`` product is
    accumulation-only. With ``activation_bits`` set, activations are absmax-quantised
    first (BitNet b1.58 + a8). Trained via STE through a full-precision shadow weight.

    :param activation_bits: if set, quantise inputs to this many bits (e.g. 8).
    """

    def __init__(
        self,
        in_features: int,
        out_features: int,
        *,
        rngs: nnx.Rngs,
        use_bias: bool = False,
        activation_bits: int | None = None,
    ):
        self.w = nnx.Param(
            nnx.initializers.lecun_normal()(rngs.params(), (in_features, out_features))
        )
        self.bias = nnx.Param(jnp.zeros((out_features,))) if use_bias else None
        self.activation_bits = activation_bits

    def __call__(self, x):
        if self.activation_bits is not None:
            x = ste(x, activation_quant(x, self.activation_bits))
        w = self.w[...]
        w_ternary, scale = ternary_weights(w)
        w_q = ste(w, w_ternary)  # forward ternary, backward full-precision
        y = (x @ w_q) * scale  # x @ w_ternary is signed accumulation
        if self.bias is not None:
            y = y + self.bias[...]
        return y

`TernaryMLP`

Bases: Module

A matmul-free channel mixer: two :class:TernaryLinear with a nonlinearity.

The multiply-free counterpart of a Transformer/SSM feed-forward block; drop it in wherever a dense MLP sits. Pair with a matmul-free token mixer (an SSM from :mod:spyx.ssm, or a ternary GRU — see the module roadmap) for a matmul-free LM.

Source code in spyx/experimental/matfree.py

class TernaryMLP(nnx.Module):
    """A matmul-free channel mixer: two :class:`TernaryLinear` with a nonlinearity.

    The multiply-free counterpart of a Transformer/SSM feed-forward block; drop it in
    wherever a dense MLP sits. Pair with a matmul-free *token* mixer (an SSM from
    :mod:`spyx.ssm`, or a ternary GRU — see the module roadmap) for a matmul-free LM.
    """

    def __init__(
        self,
        features: int,
        hidden: int,
        *,
        rngs: nnx.Rngs,
        activation_bits: int | None = 8,
    ):
        self.up = TernaryLinear(
            features, hidden, rngs=rngs, activation_bits=activation_bits
        )
        self.down = TernaryLinear(
            hidden, features, rngs=rngs, activation_bits=activation_bits
        )

    def __call__(self, x):
        return self.down(jax.nn.gelu(self.up(x)))

`activation_quant(x, bits=8, eps=1e-05)`

Per-token absmax quantisation of activations to bits (BitNet's a8).

Returns the dequantised value (STE-friendly); pass through :func:ste to train.

Source code in spyx/experimental/matfree.py

def activation_quant(x: jax.Array, bits: int = 8, eps: float = 1e-5) -> jax.Array:
    """Per-token absmax quantisation of activations to ``bits`` (BitNet's a8).

    Returns the dequantised value (STE-friendly); pass through :func:`ste` to train.
    """
    qmax = 2 ** (bits - 1) - 1
    scale = qmax / (jnp.max(jnp.abs(x), axis=-1, keepdims=True) + eps)
    return jnp.round(jnp.clip(x * scale, -qmax, qmax)) / scale

`power_of_two_weights(w, min_exp=-8, max_exp=0, eps=1e-12)`

Round each weight to the nearest signed power of two ±2^p (DeepShift).

p is clamped to [min_exp, max_exp]; the result multiplies as a bit-shift on fixed-point hardware. Near-zero weights round to the smallest magnitude.

Source code in spyx/experimental/matfree.py

def power_of_two_weights(
    w: jax.Array, min_exp: int = -8, max_exp: int = 0, eps: float = 1e-12
) -> jax.Array:
    """Round each weight to the nearest signed power of two ``±2^p`` (DeepShift).

    ``p`` is clamped to ``[min_exp, max_exp]``; the result multiplies as a bit-shift
    on fixed-point hardware. Near-zero weights round to the smallest magnitude.
    """
    sign = jnp.sign(w)
    exp = jnp.round(jnp.log2(jnp.abs(w) + eps))
    exp = jnp.clip(exp, min_exp, max_exp)
    return sign * (2.0**exp)

`ste(x, x_q)`

Straight-through estimator: forward is x_q, backward is identity in x.

ste(x, quantize(x)) evaluates to the quantised value but passes gradients to the full-precision x unchanged — the standard trick for training through a non-differentiable quantiser.

Source code in spyx/experimental/matfree.py

def ste(x: jax.Array, x_q: jax.Array) -> jax.Array:
    """Straight-through estimator: forward is ``x_q``, backward is identity in ``x``.

    ``ste(x, quantize(x))`` evaluates to the quantised value but passes gradients to
    the full-precision ``x`` unchanged — the standard trick for training through a
    non-differentiable quantiser.
    """
    return x + jax.lax.stop_gradient(x_q - x)

`ternary_weights(w, eps=1e-05)`

BitNet b1.58 absmean ternarisation. Returns (w_ternary, scale).

scale = mean(|w|); w_ternary = round(clip(w/scale, -1, 1)) ∈ {-1, 0, +1}. The reconstruction is scale · w_ternary.

Source code in spyx/experimental/matfree.py

def ternary_weights(w: jax.Array, eps: float = 1e-5) -> tuple[jax.Array, jax.Array]:
    """BitNet b1.58 absmean ternarisation. Returns ``(w_ternary, scale)``.

    ``scale = mean(|w|)``; ``w_ternary = round(clip(w/scale, -1, 1)) ∈ {-1, 0, +1}``.
    The reconstruction is ``scale · w_ternary``.
    """
    scale = jnp.mean(jnp.abs(w)) + eps
    w_ternary = jnp.round(jnp.clip(w / scale, -1.0, 1.0))
    return w_ternary, scale

spyx.experimental.onnx

Export a spiking model to ONNX — single-timestep step, or a full temporal loop.

.. warning:: Experimental — unstable API. May change without a deprecation cycle.

A spyx neuron (or a :class:spyx.nn.Sequential of them) implements one timestep of the temporal loop::

(x_t, state) -> (out, new_state)

:func:spyx.nn.run scans this over the time axis with jax.lax.scan. There are two useful things to hand a general runtime (ONNX Runtime, ONNX Runtime Mobile on a phone, a browser, an embedded target):

Per-timestep (sequence_length=None, the default). Export the single feed-forward step above; the application runs the temporal loop, calling the ONNX graph once per timestep and threading the neuron state (membrane potentials, adaptive thresholds, …) itself. ONNX speaks flat tensor I/O, not pytrees, so the exported signature is the flattened state::

step(x_t, state_0, state_1, ...) -> (out, new_state_0, new_state_1, ...)
Full-sequence (sequence_length=T). Export :func:spyx.nn.run over T timesteps so the whole temporal loop lives inside the ONNX graph as a native Loop op::

run(x_seq, state_0, ...) -> (out_seq, final_state_0, ...)

with x_seq shaped (T, batch, *input_shape) and out_seq shaped (T, batch, *out). jax2onnx's scan plugin lowers the jax.lax.scan driving :func:spyx.nn.run straight to an ONNX Loop, so no host-side temporal loop is needed at all — a real advantage over runtimes that lack a clean scan primitive.

:func:step_signature returns a :class:ONNXStepSignature describing the flat layout (order, shapes and dtypes of every state tensor, plus the pytree structure needed to reassemble it) so callers know how to seed state (zeros of the given shapes) and thread new_state_i back into the next call. It needs only JAX, never the conversion stack.

The conversion is a direct jaxpr -> ONNX lowering via jax2onnx <https://pypi.org/project/jax2onnx/>_: jax2onnx.to_onnx traces the pure JAX function and emits an onnx.ModelProto — no TensorFlow, no jax2tf, no TFLite, no tf2onnx. Its scan plugin maps jax.lax.scan to a native ONNX Loop, which is what makes the full-sequence export a single self-contained graph.

jax2onnx (and onnx) are imported lazily inside the functions, so import spyx.experimental.onnx works without them installed. Install the conversion dependencies with::

pip install jax2onnx onnx onnxruntime

Inference only needs onnxruntime (or ONNX Runtime Mobile on-device), not the conversion stack. Only the forward Heaviside spike is exported; the surrogate gradient is training-only and irrelevant to inference.

Example::

import jax.numpy as jnp
from flax import nnx
from spyx import nn
from spyx.experimental import onnx

rngs = nnx.Rngs(0)
model = nn.Sequential(
    nnx.Linear(8, 16, rngs=rngs),
    nn.LIF((16,), rngs=rngs),
    nnx.Linear(16, 4, rngs=rngs),
    nn.LI((4,), rngs=rngs),
)

onnx_bytes = onnx.to_onnx(model, (8,), batch=1)  # per-timestep step
with open("step.onnx", "wb") as f:
    f.write(onnx_bytes)

# Or the whole temporal loop in one graph (native ONNX Loop):
seq_bytes = onnx.to_onnx(model, (8,), batch=1, sequence_length=100)

sig = onnx.step_signature(model, (8,), batch=1)
# sig.state_shapes -> [(1, 16), (1, 4)] : seed each with zeros on-device.

`ONNXStepSignature` `dataclass`

Flat tensor layout of an exported step (or full-sequence) function.

The per-timestep export has the signature step(x_t, *state_flat) -> (out, *new_state_flat); the full-sequence export has run(x_seq, *state_flat) -> (out_seq, *final_state_flat) where x_seq carries a leading time axis. This dataclass records everything a caller needs to drive either: how to seed the state (zeros of state_shapes / state_dtypes), the order state tensors appear as inputs and outputs, and the pytree structure to reassemble the flat state back into the model's native (possibly nested / None-holed) state tree.

:input_shape: Shape of the input tensor (including batch, and, for the full-sequence export, a leading time axis). :input_dtype: NumPy dtype of the input. :state_shapes: Shape of each flattened state tensor, in call order. :state_dtypes: NumPy dtype of each flattened state tensor, in call order. :output_shape: Shape of the primary output tensor. :output_dtype: NumPy dtype of the primary output tensor. :input_names: ONNX graph input names, in call order (x first, then each flat state tensor). :output_names: ONNX graph output names, in call order (primary output first, then each flat new-/final-state tensor). :sequence_length: None for the per-timestep export; T for the full-sequence export. :state_treedef: The pytree structure of the model's native state, so the flat state_i tensors can be reassembled with jax.tree_util.tree_unflatten(state_treedef, state_flat).

Source code in spyx/experimental/onnx.py

@dataclass
class ONNXStepSignature:
    """Flat tensor layout of an exported step (or full-sequence) function.

    The per-timestep export has the signature ``step(x_t, *state_flat) ->
    (out, *new_state_flat)``; the full-sequence export has
    ``run(x_seq, *state_flat) -> (out_seq, *final_state_flat)`` where ``x_seq``
    carries a leading time axis. This dataclass records everything a caller
    needs to drive either: how to seed the state (zeros of ``state_shapes`` /
    ``state_dtypes``), the order state tensors appear as inputs and outputs, and
    the pytree structure to reassemble the flat state back into the model's
    native (possibly nested / ``None``-holed) state tree.

    :input_shape: Shape of the input tensor (including batch, and, for the
        full-sequence export, a leading time axis).
    :input_dtype: NumPy dtype of the input.
    :state_shapes: Shape of each flattened state tensor, in call order.
    :state_dtypes: NumPy dtype of each flattened state tensor, in call order.
    :output_shape: Shape of the primary output tensor.
    :output_dtype: NumPy dtype of the primary output tensor.
    :input_names: ONNX graph input names, in call order (``x`` first, then each
        flat state tensor).
    :output_names: ONNX graph output names, in call order (primary output first,
        then each flat new-/final-state tensor).
    :sequence_length: ``None`` for the per-timestep export; ``T`` for the
        full-sequence export.
    :state_treedef: The pytree structure of the model's native state, so the
        flat ``state_i`` tensors can be reassembled with
        ``jax.tree_util.tree_unflatten(state_treedef, state_flat)``.
    """

    input_shape: tuple[int, ...]
    input_dtype: np.dtype
    state_shapes: list[tuple[int, ...]] = field(default_factory=list)
    state_dtypes: list[np.dtype] = field(default_factory=list)
    output_shape: tuple[int, ...] = ()
    output_dtype: np.dtype = None  # ty: ignore[invalid-assignment]
    input_names: list[str] = field(default_factory=list)
    output_names: list[str] = field(default_factory=list)
    sequence_length: Any = None
    state_treedef: Any = None

    @property
    def num_state(self) -> int:
        """Number of flat state tensors threaded through the step."""
        return len(self.state_shapes)

    def seed_state(self, dtype=None) -> list[np.ndarray]:
        """Return a fresh zero-initialized flat state (one array per tensor).

        :dtype: Override dtype for every state tensor; defaults to each
            tensor's recorded ``state_dtypes`` entry.
        """
        return [
            np.zeros(shape, dtype=dtype if dtype is not None else dt)
            for shape, dt in zip(self.state_shapes, self.state_dtypes, strict=True)
        ]

`num_state` `property`

Number of flat state tensors threaded through the step.

`seed_state(dtype=None)`

Return a fresh zero-initialized flat state (one array per tensor).

:dtype: Override dtype for every state tensor; defaults to each tensor's recorded state_dtypes entry.

Source code in spyx/experimental/onnx.py

def seed_state(self, dtype=None) -> list[np.ndarray]:
    """Return a fresh zero-initialized flat state (one array per tensor).

    :dtype: Override dtype for every state tensor; defaults to each
        tensor's recorded ``state_dtypes`` entry.
    """
    return [
        np.zeros(shape, dtype=dtype if dtype is not None else dt)
        for shape, dt in zip(self.state_shapes, self.state_dtypes, strict=True)
    ]

`step_signature(model, input_shape, *, batch=1, dtype=jnp.float32, sequence_length=None)`

Describe the flat tensor I/O of model's exported step.

Does not require jax2onnx/onnx — it only traces shapes/dtypes with JAX, so callers can plan state seeding/threading without running a conversion. See :class:ONNXStepSignature.

:model: A spyx neuron or :class:spyx.nn.Sequential implementing (x_t, state) -> (out, new_state) and exposing initial_state. :input_shape: Per-timestep input feature shape, excluding batch and time (e.g. (8,) for a length-8 input vector). :batch: Batch dimension of the exported step. Defaults to 1. :dtype: Input/compute dtype. Defaults to jnp.float32. :sequence_length: None (default) describes the per-timestep step; an integer T describes the full-sequence export (leading time axis).

Source code in spyx/experimental/onnx.py

def step_signature(
    model, input_shape, *, batch=1, dtype=jnp.float32, sequence_length=None
) -> ONNXStepSignature:
    """Describe the flat tensor I/O of ``model``'s exported step.

    Does **not** require jax2onnx/onnx — it only traces shapes/dtypes with JAX,
    so callers can plan state seeding/threading without running a conversion.
    See :class:`ONNXStepSignature`.

    :model: A spyx neuron or :class:`spyx.nn.Sequential` implementing
        ``(x_t, state) -> (out, new_state)`` and exposing ``initial_state``.
    :input_shape: Per-timestep input feature shape, *excluding* batch and time
        (e.g. ``(8,)`` for a length-8 input vector).
    :batch: Batch dimension of the exported step. Defaults to 1.
    :dtype: Input/compute dtype. Defaults to ``jnp.float32``.
    :sequence_length: ``None`` (default) describes the per-timestep step;
        an integer ``T`` describes the full-sequence export (leading time axis).
    """
    _, sig = _build_fn(model, batch, input_shape, dtype, sequence_length)
    return sig

`to_onnx(model, input_shape, *, batch=1, dtype=jnp.float32, opset=None, sequence_length=None)`

Export a spiking model to ONNX and return the serialized ModelProto.

With sequence_length=None (default) this exports the single feed-forward step (x_t, state) -> (out, new_state) — no temporal scan — whose flat ONNX signature is step(x_t, *state_flat) -> (out, *new_state_flat). The application runs the temporal loop, calling the graph once per timestep and threading new_state_i back in as state_i. Pair with :func:step_signature to learn the flat state layout and to seed zeros.

With an integer sequence_length=T this exports :func:spyx.nn.run over T timesteps, so the ONNX graph contains the whole temporal loop as a native Loop (jax2onnx's scan plugin lowers the jax.lax.scan to it); the signature becomes run(x_seq, *state_flat) -> (out_seq, *final_state_flat) with a leading time axis of length T on x_seq and out_seq.

Conversion is a direct jaxpr -> ONNX lowering via jax2onnx.to_onnx — no TensorFlow. Only the forward Heaviside spike is exported; the surrogate gradient is training-only and irrelevant to inference.

Requires jax2onnx and onnx (pip install jax2onnx onnx onnxruntime); they are imported lazily here so importing this module does not need them. Inference only needs onnxruntime (or ONNX Runtime Mobile on a phone), not the conversion stack.

:model: A spyx neuron or :class:spyx.nn.Sequential implementing (x_t, state) -> (out, new_state) and exposing initial_state. :input_shape: Per-timestep input feature shape, excluding batch and time (e.g. (8,)). :batch: Batch dimension of the exported graph. Defaults to 1. :dtype: Input/compute dtype. Defaults to jnp.float32. :opset: ONNX opset version to target. None defaults to 21 (recent enough for the native Loop used by the full-sequence export). :sequence_length: None exports the per-timestep step; an integer T exports the full spyx.nn.run over T timesteps. :return: The serialized ONNX ModelProto as bytes.

Source code in spyx/experimental/onnx.py

def to_onnx(
    model,
    input_shape,
    *,
    batch=1,
    dtype=jnp.float32,
    opset=None,
    sequence_length=None,
) -> bytes:
    """Export a spiking model to ONNX and return the serialized ``ModelProto``.

    With ``sequence_length=None`` (default) this exports the single feed-forward
    step ``(x_t, state) -> (out, new_state)`` — no temporal scan — whose flat
    ONNX signature is ``step(x_t, *state_flat) -> (out, *new_state_flat)``. The
    application runs the temporal loop, calling the graph once per timestep and
    threading ``new_state_i`` back in as ``state_i``. Pair with
    :func:`step_signature` to learn the flat state layout and to seed zeros.

    With an integer ``sequence_length=T`` this exports :func:`spyx.nn.run` over
    ``T`` timesteps, so the ONNX graph contains the whole temporal loop as a
    native ``Loop`` (jax2onnx's scan plugin lowers the ``jax.lax.scan`` to it);
    the signature becomes ``run(x_seq, *state_flat) -> (out_seq,
    *final_state_flat)`` with a leading time axis of length ``T`` on ``x_seq``
    and ``out_seq``.

    Conversion is a direct jaxpr -> ONNX lowering via ``jax2onnx.to_onnx`` — no
    TensorFlow. Only the forward Heaviside spike is exported; the surrogate
    gradient is training-only and irrelevant to inference.

    Requires ``jax2onnx`` and ``onnx`` (``pip install jax2onnx onnx
    onnxruntime``); they are imported lazily here so importing this module does
    not need them. Inference only needs ``onnxruntime`` (or ONNX Runtime Mobile
    on a phone), not the conversion stack.

    :model: A spyx neuron or :class:`spyx.nn.Sequential` implementing
        ``(x_t, state) -> (out, new_state)`` and exposing ``initial_state``.
    :input_shape: Per-timestep input feature shape, *excluding* batch and time
        (e.g. ``(8,)``).
    :batch: Batch dimension of the exported graph. Defaults to 1.
    :dtype: Input/compute dtype. Defaults to ``jnp.float32``.
    :opset: ONNX opset version to target. ``None`` defaults to ``21`` (recent
        enough for the native ``Loop`` used by the full-sequence export).
    :sequence_length: ``None`` exports the per-timestep step; an integer ``T``
        exports the full ``spyx.nn.run`` over ``T`` timesteps.
    :return: The serialized ONNX ``ModelProto`` as ``bytes``.
    """
    try:
        import jax2onnx  # noqa: PLC0415  # ty: ignore[unresolved-import]
    except ImportError as exc:  # pragma: no cover - env-dependent
        raise ImportError(
            "spyx.experimental.onnx.to_onnx requires jax2onnx + onnx for the "
            "direct jaxpr -> ONNX conversion. Install them with "
            "`pip install jax2onnx onnx onnxruntime`. (Inference only needs "
            "onnxruntime, not the conversion stack.)"
        ) from exc

    fn, sig = _build_fn(model, batch, input_shape, dtype, sequence_length)

    opset = _DEFAULT_OPSET if opset is None else int(opset)

    # jax2onnx traces the pure JAX fn and emits an onnx.ModelProto directly. Its
    # scan plugin maps the jax.lax.scan driving spyx.nn.run to a native ONNX
    # Loop, so the full-sequence export is a single self-contained graph.
    inputs = [
        jax.ShapeDtypeStruct(sig.input_shape, dtype),
    ]
    inputs += [
        jax.ShapeDtypeStruct(shape, np.dtype(dt))
        for shape, dt in zip(sig.state_shapes, sig.state_dtypes, strict=True)
    ]

    suffix = f"seq{sequence_length}" if sequence_length is not None else "step"
    model_proto = jax2onnx.to_onnx(
        fn,
        inputs,
        model_name=f"spyx_{suffix}",
        opset=opset,
        input_names=sig.input_names,
    )

    return model_proto.SerializeToString()

spyx.experimental

What's here

Re-exported neurons

__call__(x, V)

__init__(hidden_shape, beta=None, threshold=1.0, activation=None, *, rngs)

parallel(x)

a property

decay property

__call__(x, z)

__init__(hidden_shape, lambda_init=None, omega_init=None, threshold=1.0, dt=1.0, activation=None, *, rngs)

initial_state(batch_size)

parallel(x)

spyx.experimental.raven

RavenRSM

decay property

__call__(u)

initial_state(batch_size)

step(state, u_t)

SlotRouter

__call__(u)

SpikingSlotMemory

beta property

__call__(u)

__init__(d_model, n_slots=8, d_slot=None, *, hard_top_k=None, beta_init=0.9, threshold=1.0, activation=None, rngs)

initial_state(batch_size)

step(state, u_t)

make_recall_batch(key, *, batch=8, n_pairs=3, n_keys=8, n_values=8)

spyx.experimental.compress

pack_nbit(codes, bits, axis=-1)

pack_spikes(x, axis=-1)

packed_quant_dense(acts, weight, bits, step)

packed_spike_dense(spikes, weight)

packing_footprint(n_elements, bits, density)

sparse_quant_pack(x, bits, step)

sparse_quant_unpack(mask_packed, codes_packed, meta)

unpack_nbit(packed, bits, length, axis=-1)

unpack_spikes(packed, length, axis=-1)

spyx.experimental.stochastic

SPSN

beta = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5) instance-attribute

hidden_shape = hidden_shape instance-attribute

spike = sigmoid_bernoulli(k, threshold) instance-attribute

threshold = threshold instance-attribute

__call__(key, x)

__init__(hidden_shape, threshold=1, k=10, *, rngs)

StochasticAssociativeCuBaLIF

alpha = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5) instance-attribute

beta = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5) instance-attribute

hidden_shape = hidden_shape instance-attribute

spike = refractory_sigmoid_bernoulli(k, threshold) instance-attribute

__call__(key, u)

__init__(hidden_shape, threshold=1, k=100, *, rngs)

StochasticAssociativeLIF

beta = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5) instance-attribute

hidden_shape = hidden_shape instance-attribute

spike = sigmoid_bernoulli(k, threshold) instance-attribute

threshold = threshold instance-attribute

__call__(key, x)

__init__(hidden_shape, threshold=1, k=100, spike=True, *, rngs)

refractory_sigmoid_bernoulli(k=50, threshold=1)

sigmoid_bernoulli(k=10, threshold=1.0, max_prob=0.8)

spyx.experimental.hybrid

The idea

Self-normalising λ

Surrogate-steered Self-Guided ES (variance reduction)

LossFn = Callable[..., jax.Array] module-attribute

es_gradient(model, loss_true, key, *, batch=(), num_samples=8, sigma=0.01)

hybrid_diagnostics(model, loss_surrogate, loss_true, key, *, batch=(), num_samples=8, sigma=0.01, lam=1.0, eps=1e-08, normalize=False)

hybrid_gradient(model, loss_surrogate, loss_true, key, *, batch=(), num_samples=8, sigma=0.01, lam=1.0, eps=1e-08, normalize=False, return_diagnostics=False)

make_hybrid_train_step(loss_surrogate, loss_true, *, num_samples=8, sigma=0.01, lam=1.0, normalize=False)

make_sges_hybrid_train_step(loss_surrogate, loss_true, *, num_samples=8, sigma=0.01, lam=1.0)

sges_gradient(model, loss_surrogate, loss_true, key, *, batch=(), num_samples=8, sigma=0.01, lam=1.0, eps=1e-08, return_diagnostics=False)

spyx.experimental.zoo

Recipe dataclass

get(name)

list_recipes(application=None, method=None)

spyx.experimental.matfree

The multiply-free idea

The spiking synthesis

MLGRU

`call(x, V)`

`init(hidden_shape, beta=None, threshold=1.0, activation=None, *, rngs)`

`parallel(x)`

`a` `property`

`decay` `property`

`call(x, z)`

`init(hidden_shape, lambda_init=None, omega_init=None, threshold=1.0, dt=1.0, activation=None, *, rngs)`

`initial_state(batch_size)`

`parallel(x)`

`RavenRSM`

`decay` `property`

`call(u)`

`initial_state(batch_size)`

`step(state, u_t)`

`SlotRouter`

`call(u)`

`SpikingSlotMemory`

`beta` `property`

`call(u)`

`init(d_model, n_slots=8, d_slot=None, *, hard_top_k=None, beta_init=0.9, threshold=1.0, activation=None, rngs)`

`initial_state(batch_size)`

`step(state, u_t)`

`make_recall_batch(key, *, batch=8, n_pairs=3, n_keys=8, n_values=8)`

`pack_nbit(codes, bits, axis=-1)`

`pack_spikes(x, axis=-1)`

`packed_quant_dense(acts, weight, bits, step)`

`packed_spike_dense(spikes, weight)`

`packing_footprint(n_elements, bits, density)`

`sparse_quant_pack(x, bits, step)`

`sparse_quant_unpack(mask_packed, codes_packed, meta)`

`unpack_nbit(packed, bits, length, axis=-1)`

`unpack_spikes(packed, length, axis=-1)`

`SPSN`

`beta = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5)` `instance-attribute`

`hidden_shape = hidden_shape` `instance-attribute`

`spike = sigmoid_bernoulli(k, threshold)` `instance-attribute`

`threshold = threshold` `instance-attribute`

`call(key, x)`

`init(hidden_shape, threshold=1, k=10, *, rngs)`

`StochasticAssociativeCuBaLIF`

`alpha = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5)` `instance-attribute`

`beta = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5)` `instance-attribute`

`hidden_shape = hidden_shape` `instance-attribute`

`spike = refractory_sigmoid_bernoulli(k, threshold)` `instance-attribute`

`call(key, u)`

`init(hidden_shape, threshold=1, k=100, *, rngs)`

`StochasticAssociativeLIF`

`beta = nnx.Param(nnx.initializers.truncated_normal(stddev=0.25)(rngs.params(), self.hidden_shape) + 0.5)` `instance-attribute`

`hidden_shape = hidden_shape` `instance-attribute`

`spike = sigmoid_bernoulli(k, threshold)` `instance-attribute`

`threshold = threshold` `instance-attribute`

`call(key, x)`

`init(hidden_shape, threshold=1, k=100, spike=True, *, rngs)`

`refractory_sigmoid_bernoulli(k=50, threshold=1)`

`sigmoid_bernoulli(k=10, threshold=1.0, max_prob=0.8)`

Self-normalising `λ`

`LossFn = Callable[..., jax.Array]` `module-attribute`

`es_gradient(model, loss_true, key, *, batch=(), num_samples=8, sigma=0.01)`

`hybrid_diagnostics(model, loss_surrogate, loss_true, key, *, batch=(), num_samples=8, sigma=0.01, lam=1.0, eps=1e-08, normalize=False)`

`hybrid_gradient(model, loss_surrogate, loss_true, key, *, batch=(), num_samples=8, sigma=0.01, lam=1.0, eps=1e-08, normalize=False, return_diagnostics=False)`

`make_hybrid_train_step(loss_surrogate, loss_true, *, num_samples=8, sigma=0.01, lam=1.0, normalize=False)`

`make_sges_hybrid_train_step(loss_surrogate, loss_true, *, num_samples=8, sigma=0.01, lam=1.0)`

`sges_gradient(model, loss_surrogate, loss_true, key, *, batch=(), num_samples=8, sigma=0.01, lam=1.0, eps=1e-08, return_diagnostics=False)`

`Recipe` `dataclass`

`get(name)`

`list_recipes(application=None, method=None)`

`MLGRU`

`MatMulFreeBlock`

`RMSNorm`

`ShiftAddLinear`

`TernaryLinear`

`TernaryMLP`

`activation_quant(x, bits=8, eps=1e-05)`

`power_of_two_weights(w, min_exp=-8, max_exp=0, eps=1e-12)`

`ste(x, x_q)`

`ternary_weights(w, eps=1e-05)`

`ONNXStepSignature` `dataclass`

`num_state` `property`

`seed_state(dtype=None)`

`step_signature(model, input_shape, *, batch=1, dtype=jnp.float32, sequence_length=None)`