probly.layers.flax.BatchEnsembleLinear¶
- class probly.layers.flax.BatchEnsembleLinear(*args: Any, **kwargs: Any)[source]¶
Bases:
LinearBatchEnsemble Linear layer based on [WTB20].
The effective weight for ensemble member
iis the Hadamard productW * (r_i s_i^T);rmodulates the input features andsthe output features.- kernel¶
nnx.Param, weight matrix of the layer.
- bias¶
nnx.Param of shape
[num_members, out_features](or None if the base layer had no bias).
- in_features¶
int, number of input features.
- out_features¶
int, number of output features.
- use_bias¶
bool, whether to add bias to the output.
- num_members¶
int, number of batch ensemble members.
- r¶
nnx.Param, rank-one factor on the input features.
- s¶
nnx.Param, rank-one factor on the output features.
Initialize a BatchEnsembleLinear layer based on a given Linear layer.
- Parameters:
base_layer – The base
nnx.Linearlayer to wrap.rngs –
nnx.Rngsor seed used to initialize new parameters.num_members – Number of ensemble members.
use_base_weights – If True, share the base layer’s kernel; otherwise initialize a fresh kernel.
init – Initialization scheme for
rands-"normal"(Gaussian, imagenet default) or"random_sign"({-1, +1}, paper Appendix B).r_mean – Mean of the Gaussian initialization of
rwheninit="normal".r_std – Standard deviation of the Gaussian initialization of
rwheninit="normal".s_mean – Mean of the Gaussian initialization of
swheninit="normal".s_std – Standard deviation of the Gaussian initialization of
swheninit="normal".
- __call__(inputs: Array, out_sharding: Any = None) Array[source]¶
Forward pass of the BatchEnsembleLinear layer.
The layer expects an input of shape
[E * B, in_features]with rows[k * B, (k + 1) * B)belonging to ensemble memberk.- Parameters:
inputs – jax.Array, the input of shape
[E * B, in_features].out_sharding – Optional sharding specification for the output array.
- Returns:
jax.Array, Output of shape
[E * B, out_features].
- eval(**attributes)[source]¶
Sets the Module to evaluation mode.
evalusesset_attributesto recursively set attributesdeterministic=Trueanduse_running_average=Trueof all nested Modules that have these attributes. Its primarily used to control the runtime behavior of theDropoutandBatchNormModules.Example:
>>> from flax import nnx ... >>> class Block(nnx.Module): ... def __init__(self, din, dout, *, rngs: nnx.Rngs): ... self.linear = nnx.Linear(din, dout, rngs=rngs) ... self.dropout = nnx.Dropout(0.5) ... self.batch_norm = nnx.BatchNorm(10, rngs=rngs) ... >>> block = Block(2, 5, rngs=nnx.Rngs(0)) >>> block.dropout.deterministic, block.batch_norm.use_running_average (False, False) >>> block.eval() >>> block.dropout.deterministic, block.batch_norm.use_running_average (True, True)
- Parameters:
**attributes – additional attributes passed to
set_attributes.
- iter_children() Iterator[tuple[Key, Module]][source]¶
Warning: this method is method is deprecated; use
iter_children()instead.Iterates over all children
Module’s of the current Module. This method is similar toiter_modules(), except it only iterates over the immediate children, and does not recurse further down. Alias ofiter_children().
- iter_modules() Iterator[tuple[tuple[Key, ...], Module]][source]¶
Warning: this method is method is deprecated; use
iter_modules()instead.Recursively iterates over all nested
Module’s of the current Module, including the current Module. Alias ofiter_modules().
- perturb(name: str, value: Any, variable_type: str | type[Variable[Any]] = <class 'flax.nnx.variablelib.Perturbation'>)[source]¶
Extract gradients of intermediate values during training.
Used with
nnx.capture()to record intermediate values in the forward pass and their gradients in the backward pass. Returns the value plus whatever perturbation is stored undernamein the current capture context, allowing gradient computation viannx.grad.The workflow has four steps: 1. Initialize perturbations with
nnx.capture(model, nnx.Perturbation)2. Run model withnnx.capture(model, nnx.Intermediate, init=perturbations)3. Take gradients with respect to perturbations usingnnx.grad4. Combine results withnnx.merge_state(perturb_grads, intermediates)Note
This creates extra variables of the same size as
value, thus occupies more memory. Use it only to debug gradients in training.Example usage:
>>> from flax import nnx >>> import jax.numpy as jnp >>> class Model(nnx.Module): ... def __call__(self, x): ... x2 = self.perturb('grad_of_x', x) ... return 3 * x2 >>> model = Model() >>> x = 1.0 >>> # Step 1: Initialize perturbations >>> forward = nnx.capture(model, nnx.Perturbation) >>> _, perturbations = forward(x) >>> # Steps 2-4: Capture gradients >>> def train_step(model, perturbations, x): ... def loss(model, perturbations, x): ... return nnx.capture(model, nnx.Intermediate, init=perturbations)(x) ... (grads, perturb_grads), sowed = nnx.grad(loss, argnums=(0, 1), has_aux=True)(model, perturbations, x) ... return nnx.merge_state(perturb_grads, sowed) >>> metrics = train_step(model, perturbations, x) >>> # metrics contains gradients of intermediate values
- Parameters:
name – A string key for storing the perturbation value.
value – The intermediate value to capture gradients for. You must use the returned value (not the original) for gradient capturing to work.
variable_type – The
Variabletype for the stored perturbation. Default isnnx.Perturbation.
- set_attributes(*filters: filterlib.Filter, raise_if_not_found: bool = True, graph: bool | None = None, **attributes: tp.Any) None[source]¶
Sets the attributes of nested Modules including the current Module. If the attribute is not found in the Module, it is ignored.
Example:
>>> from flax import nnx ... >>> class Block(nnx.Module): ... def __init__(self, din, dout, *, rngs: nnx.Rngs): ... self.linear = nnx.Linear(din, dout, rngs=rngs) ... self.dropout = nnx.Dropout(0.5, deterministic=False) ... self.batch_norm = nnx.BatchNorm(10, use_running_average=False, rngs=rngs) ... >>> block = Block(2, 5, rngs=nnx.Rngs(0)) >>> block.dropout.deterministic, block.batch_norm.use_running_average (False, False) >>> block.set_attributes(deterministic=True, use_running_average=True) >>> block.dropout.deterministic, block.batch_norm.use_running_average (True, True)
Filter’s can be used to set the attributes of specific Modules:>>> block = Block(2, 5, rngs=nnx.Rngs(0)) >>> block.set_attributes(nnx.Dropout, deterministic=True) >>> # Only the dropout will be modified >>> block.dropout.deterministic, block.batch_norm.use_running_average (True, False)
- Parameters:
*filters – Filters to select the Modules to set the attributes of.
raise_if_not_found – If True (default), raises a ValueError if at least one attribute instance is not found in one of the selected Modules.
**attributes – The attributes to set.
- sow(variable_type: type[~flax.nnx.variablelib.Variable[~flax.nnx.module.B]] | str, name: str, value: ~flax.nnx.module.A, reduce_fn: ~typing.Callable[[~flax.nnx.module.B, ~flax.nnx.module.A], ~flax.nnx.module.B] = <function <lambda>>, init_fn: ~typing.Callable[[], ~flax.nnx.module.B] = <function <lambda>>) bool[source]¶
Store intermediate values during module execution for later extraction.
Used with
nnx.capture()decorator to collect intermediate values without explicitly passing containers through module calls. Values are stored under the specifiednamein a collection associated withvariable_type.By default, values are appended to a tuple, allowing multiple values to be tracked when the same module is called multiple times.
Example usage:
>>> from flax import nnx >>> import jax.numpy as jnp >>> class Model(nnx.Module): ... def __init__(self, rngs): ... self.linear1 = nnx.Linear(2, 3, rngs=rngs) ... self.linear2 = nnx.Linear(3, 4, rngs=rngs) ... def __call__(self, x): ... x = self.linear1(x) ... self.sow(nnx.Intermediate, 'features', x) ... x = self.linear2(x) ... return x >>> # With the capture decorator, sow returns intermediates >>> model = Model(rngs=nnx.Rngs(0)) >>> @nnx.capture(nnx.Intermediate) ... def forward(model, x): ... return model(x) >>> result, intermediates = forward(model, jnp.ones(2)) >>> assert 'features' in intermediates
Custom init/reduce functions can be passed to control accumulation:
>>> class Model(nnx.Module): ... def __init__(self, rngs): ... self.linear = nnx.Linear(2, 3, rngs=rngs) ... def __call__(self, x): ... x = self.linear(x) ... self.sow(nnx.Intermediate, 'sum', x, ... init_fn=lambda: 0, ... reduce_fn=lambda prev, curr: prev+curr) ... return x
- Parameters:
variable_type – The
Variabletype for the stored value. TypicallyIntermediateor a subclass is used.name – A string key for storing the value in the collection.
value – The value to be stored.
reduce_fn – Function to combine existing and new values. Default appends to a tuple.
init_fn – Function providing initial value for first
reduce_fncall. Default is an empty tuple.
- train(**attributes)[source]¶
Sets the Module to training mode.
trainusesset_attributesto recursively set attributesdeterministic=Falseanduse_running_average=Falseof all nested Modules that have these attributes. Its primarily used to control the runtime behavior of theDropoutandBatchNormModules.Example:
>>> from flax import nnx ... >>> class Block(nnx.Module): ... def __init__(self, din, dout, *, rngs: nnx.Rngs): ... self.linear = nnx.Linear(din, dout, rngs=rngs) ... # initialize Dropout and BatchNorm in eval mode ... self.dropout = nnx.Dropout(0.5, deterministic=True) ... self.batch_norm = nnx.BatchNorm(10, use_running_average=True, rngs=rngs) ... >>> block = Block(2, 5, rngs=nnx.Rngs(0)) >>> block.dropout.deterministic, block.batch_norm.use_running_average (True, True) >>> block.train() >>> block.dropout.deterministic, block.batch_norm.use_running_average (False, False)
- Parameters:
**attributes – additional attributes passed to
set_attributes.