trustyai.explainers.BackgroundGenerator
- class trustyai.explainers.BackgroundGenerator(datapoints: ndarray | DataFrame | List[PredictionInput], feature_domains=None, seed=0)
Generate a background for the SHAP explainer
Generate a background for the SHAP explainer via one of three algorithms:
sample: Randomly sample a set of provided points
kmeans: Summarize a set of provided points into k centroids
counterfactual: Generate a set of background points that meet certain criteria
- __init__(datapoints: ndarray | DataFrame | List[PredictionInput], feature_domains=None, seed=0)
Initialize the
BackgroundGenerator.- Parameters:
- datapoints
numpy.ndarray,pandas.DataFrame, List[PredictionInput]] The set of datapoints to be used to sample/generate the background, as a:
Numpy array of shape
[n_rows, n_features]Pandas DataFrame with n_rows rows and n_features columns
A list of TrustyAI
PredictionInput
- seedint
The random seed to use in the sampling/generation method
- datapoints
Methods
__init__(datapoints[, feature_domains, seed])Initialize the
BackgroundGenerator.counterfactual(goals, model[, k_per_goal])Generate a background via the CounterfactualExplainer.
kmeans([k])Use k-means clustering over datapoints and return k centroids as the background data set.
sample([k])Randomly sample datapoints.
- counterfactual(goals: ndarray | DataFrame | List[PredictionOutput], model: PredictionProvider, k_per_goal=100, **kwargs)
Generate a background via the CounterfactualExplainer. This lets you specify exact output values that the background dataset conforms to, and thus set the reference point by which all SHAP values compare. For example, if your model is a regression model, choosing a counterfactual goal of 0 will create a background dataset where :math:’f(x) approx 0 forall x in text{background}`, and as such the SHAP values will compare against zero, which is a useful baseline for regression.
- Parameters:
- goals
numpy.ndarray,pandas.DataFrame, List[PredictionOutput]] The set of background datapoints as a:
Numpy array of shape
[n_rows, n_outputs]Pandas DataFrame with n_rows rows and n_outputs columns
A list of TrustyAI
PredictionOutput
- model
PredictionProvider The TrustyAI PredictionProvider, as generated by
Model- k_per_goalint
The number of background datapoints to generate per goal.
- Keyword Arguments:
- k_seeds: int
(default=
5) For each goal, a number of starting seeds from datapoints are used to start the search from. These are the k_seeds points within datapoint whose corresponding outputs are closet to the goal output. Choose a larger number to get a more diverse background dataset, but the search might require larger max_attempt_count, step_count, and timeout_seconds to get good results.
- goal_threshold: float
(default=
.01) The distance (percentage) threshold defining whether a particular output satisfies the goal. Set to 0 to require an exact match, but this will likely require larger max_attempt_count, step_count, and timeout_seconds to get good results.
- chain: boolean
(default=
False) If chaining is set to true, found counterfactual datapoints will be added to the search seeds for subsequent searches. This is useful when a range of counterfactual outputs is desired; for example, if the desired goals are [0, 1, 2, 3], whichever goal is closest to the closest point within datapoints will be searched for first. The found counterfactuals from that search are then included in the search for the second-closest goal, and so on. This is especially helpful if the extremes of the goal range are far outside the range produced by the datapoints. If only
- step_count: int
(default=
5_000) The number of datapoints to evaluate during the search
- timeout_seconds: int
(default=
3) The maximum number of seconds allowed for each counterfactual search. This will set the maximum runtime of the search to roughly timeout_seconds * max_attempt_count * k_per_goal * len(goals)
- goals
- Returns:
- :list:`PredictionInput`
The background dataset to pass to the
SHAPExplainer
- kmeans(k=100)
Use k-means clustering over datapoints and return k centroids as the background data set.
- Parameters:
- kint
The number of centroids to find
- Returns:
- :list:`PredictionInput`
The background dataset to pass to the
SHAPExplainer
- sample(k=100)
Randomly sample datapoints.
- Parameters:
- kint
The number of datapoints to select
- Returns:
- :list:`PredictionInput`
The background dataset to pass to the
SHAPExplainer