Skip to content

Estimation

kriterion.fit

ModelSummary dataclass

ModelSummary(
    dof: int | float,
    chi2: float,
    chi2_p: float,
    g2: float,
    g2_p: float,
    log_likelihood: float,
    aic: float,
    bic: float,
    sse: float,
)

Goodness-of-fit statistics for a fitted model.

ATTRIBUTE DESCRIPTION
dof

Degrees of freedom.

TYPE: int | float

chi2

Pearson \(\chi^2\) statistic.

TYPE: float

chi2_p

\(p\)-value for the \(\chi^2\) statistic.

TYPE: float

g2

Likelihood-ratio \(G^2\) statistic.

TYPE: float

g2_p

\(p\)-value for the \(G^2\) statistic.

TYPE: float

log_likelihood

Log-likelihood of the fitted model.

TYPE: float

aic

Akaike Information Criterion.

TYPE: float

bic

Bayesian Information Criterion.

TYPE: float

sse

Sum of squared errors between observed and expected cumulative proportions.

TYPE: float

fit

fit(
    model: Model,
    objective: ObjectiveFunction = log_likelihood_objective,
    method: str = "L-BFGS-B",
) -> ModelSummary

Fit a theoretical model to observed data.

PARAMETER DESCRIPTION
model

An instance of a model subclass, e.g. an instance of SignalDetection.

TYPE: Model

objective

One of the objective functions, e.g. \(G^2\).

TYPE: ObjectiveFunction DEFAULT: log_likelihood_objective

method

The type of solver to use (see scipy.optimize.minimize). Note that some are incompatible for fitting detection models.

TYPE: str DEFAULT: 'L-BFGS-B'

Source code in src/kriterion/fit.py
def fit(
    model: Model,
    objective: ObjectiveFunction = objectives.log_likelihood_objective,
    method: str = "L-BFGS-B",
) -> ModelSummary:
    """Fit a theoretical model to observed data.

    Parameters
    ----------
    model :
        An instance of a model subclass, e.g. an instance of `SignalDetection`.
    objective :
        One of the objective functions, e.g. $G^2$.
    method:
        The type of solver to use (see `scipy.optimize.minimize`). Note that some are
        incompatible for fitting detection models.
    """

    # This closure wraps common procedure on each opt iteration.
    def _obj(x: np.ndarray) -> float:
        model.update(x)
        noise_exp, signal_exp = model.compute_expected()
        return objective(signal_exp, noise_exp, model)

    result = minimize(
        fun=_obj, x0=model.x0, bounds=model.bounds, method=method, tol=1e-8
    )

    if not result.success:
        raise Exception(
            f"Failed to fit {model.__class__.__name__} using {objective.__name__}"
        )

    model.update(result.x)

    return _calculate_all_stats(model)

aic

aic(k: int, ll: float) -> float

Akaike's Information Criterion:

\[ 2k-2\ln(\hat{L}) \]

This statistic is useful for model comparisons.

PARAMETER DESCRIPTION
k

Number of estimated parameters in the model.

TYPE: int

ll

The log of the maximised value of the likelihood function for the model.

TYPE: float

Source code in src/kriterion/fit.py
def aic(k: int, ll: float) -> float:
    """Akaike's Information Criterion:

    $$
    2k-2\\ln(\\hat{L})
    $$

    This statistic is useful for model comparisons.

    Parameters
    ----------
    k :
        Number of estimated parameters in the model.
    ll :
        The log of the maximised value of the likelihood function for the model.
    """
    return float(2 * k - 2 * ll)

bic

bic(k: int, n: int, ll: float) -> float

Bayesian Information Criterion

\[ k\ln(n) - 2\ln(\hat{L}) \]

This statistic is useful for model comparisons.

PARAMETER DESCRIPTION
k

Number of estimated parameters in the model.

TYPE: int

n

Total number of observations in the data.

TYPE: int

ll

The log of the maximised value of the likelihood function for the model.

TYPE: float

Source code in src/kriterion/fit.py
def bic(k: int, n: int, ll: float) -> float:
    """Bayesian Information Criterion

    $$
    k\\ln(n) - 2\\ln(\\hat{L})
    $$

    This statistic is useful for model comparisons.

    Parameters
    ----------
    k :
        Number of estimated parameters in the model.
    n :
        Total number of observations in the data.
    ll :
        The log of the maximised value of the likelihood function for the model.
    """
    return float(k * np.log(n) - 2 * ll)

compare_nested

compare_nested(
    restricted: ModelSummary, full: ModelSummary
) -> tuple[float, int | float, ndarray]

Likelihood-ratio test between two nested models.

Tests whether the additional parameters of the fuller model yield a significant improvement in fit, using the difference in \(G^2\) against a \(\chi^2\) distribution with degrees of freedom equal to the difference in parameter counts.

Assumes the two models are nested: the restricted model must be obtainable by fixing one or more of the fuller model's parameters to constants. If they are not nested, the likelihood-ratio test is invalid and AIC or BIC should be used instead via \(\text{AIC}_a - \text{AIC}_b\).

PARAMETER DESCRIPTION
restricted

Fit summary of the simpler (restricted) model. This model should have fewer free parameters, and therefore larger residual degrees of freedom.

TYPE: ModelSummary

full

Fit summary of the fuller model. This model should have more free parameters, and therefore smaller residual degrees of freedom.

TYPE: ModelSummary

RETURNS DESCRIPTION
tuple[float, int | float, ndarray]

(delta_g, delta_dof, p): the likelihood-ratio statistic \(\Delta G^2 = G^2_{\text{restricted}} - G^2_{\text{full}}\), the degrees of freedom \(\Delta\text{dof}\), and the \(p\)-value.

RAISES DESCRIPTION
ValueError

If full does not have more parameters than restricted (i.e. delta_dof <= 0), or if the restricted model fits better than the fuller one (delta_g < 0), which shouldn't occur for correctly nested, correctly fitted models.

Source code in src/kriterion/fit.py
def compare_nested(
    restricted: ModelSummary, full: ModelSummary
) -> tuple[float, int | float, np.ndarray]:
    """Likelihood-ratio test between two nested models.

    Tests whether the additional parameters of the fuller model yield a
    significant improvement in fit, using the difference in $G^2$ against a
    $\\chi^2$ distribution with degrees of freedom equal to the difference in
    parameter counts.

    Assumes the two models are nested: the restricted model must be obtainable
    by fixing one or more of the fuller model's parameters to constants. If
    they are not nested, the likelihood-ratio test is invalid and AIC or BIC
    should be used instead via $\\text{AIC}_a - \\text{AIC}_b$.

    Parameters
    ----------
    restricted :
        Fit summary of the simpler (restricted) model. This model should have fewer free
        parameters, and therefore larger residual degrees of freedom.
    full :
        Fit summary of the fuller model. This model should have more free parameters,
        and therefore smaller residual degrees of freedom.

    Returns
    -------
    tuple[float, int | float, np.ndarray]
        `(delta_g, delta_dof, p)`: the likelihood-ratio statistic
        $\\Delta G^2 = G^2_{\\text{restricted}} - G^2_{\\text{full}}$, the
        degrees of freedom $\\Delta\\text{dof}$, and the $p$-value.

    Raises
    ------
    ValueError
        If `full` does not have more parameters than `restricted` (i.e.
        `delta_dof <= 0`), or if the restricted model fits better than the
        fuller one (`delta_g < 0`), which shouldn't occur for correctly
        nested, correctly fitted models.
    """
    delta_g = restricted.g2 - full.g2
    delta_dof = restricted.dof - full.dof

    if delta_dof <= 0:
        raise ValueError(
            "`full` must have more parameters (smaller dof) than `restricted`"
        )

    if delta_g < 0:
        raise ValueError("restricted model fits better than full - check nesting/fit")

    p = stats.chi2.sf(delta_g, delta_dof)
    return delta_g, delta_dof, p