Skip to content

GcslRecommender

GcslRecommender implements GCSL (Goal-Conditioned Supervised Learning): one model that produces different recommendations depending on the goals you specify at inference time — no retraining required.

Overview

Purpose: Multi-objective recommendation where you want to steer results toward specific outcome targets — e.g. maximize engagement AND revenue, or trade off popularity against novelty.

Key Insight: Standard recommenders drop outcome columns before training. GcslRecommender keeps them as input features, so the model learns:

P(positive | user, item, context, outcome_1, outcome_2, ...)

At inference, an inference method fills in desired goal values. Items whose feature profile is most consistent with those goals score highest. Change the goals, change the recommendations — same model weights.

When to Use

Perfect For:

  • Balancing multiple metrics (engagement + revenue + diversity)
  • Steering between popular and niche recommendations
  • A/B testing different business objectives without retraining
  • Any scenario where you want one model to serve multiple objectives

Not Ideal For:

Basic Usage

1. Build the Pipeline

from skrec.estimator.classification.xgb_classifier import XGBClassifierEstimator
from skrec.recommender.gcsl.gcsl_recommender import GcslRecommender
from skrec.recommender.gcsl.inference.predefined_value import PredefinedValue
from skrec.scorer.universal import UniversalScorer

# Layer 1: Choose an estimator
estimator = XGBClassifierEstimator({
    "learning_rate": 0.1,
    "n_estimators": 100,
    "max_depth": 5
})

# Layer 2: Choose a scorer
scorer = UniversalScorer(estimator)

# Layer 3: Create the recommender with an inference method
inference = PredefinedValue({"OUTCOME_engagement": 0.8, "OUTCOME_revenue": 0.5})
recommender = GcslRecommender(scorer, inference_method=inference)

2. Prepare Multi-Outcome Data

Your interactions dataset must include OUTCOME_* columns alongside the primary OUTCOME label. These columns become input features during training.

import pandas as pd

# Interactions with multiple outcome dimensions
interactions_df = pd.DataFrame({
    "USER_ID":            ["u1", "u1", "u2", "u2"],
    "ITEM_ID":            ["A",  "B",  "A",  "C"],
    "OUTCOME":            [1,    0,    1,    1],      # primary label
    "OUTCOME_engagement": [0.9,  0.2,  0.7,  0.8],   # kept as feature
    "OUTCOME_revenue":    [5.0,  0.0,  3.0,  8.0],   # kept as feature
})

3. Train

recommender.train(
    interactions_ds=interactions_dataset,
    items_ds=items_dataset
)

4. Recommend

recommendations = recommender.recommend(
    interactions=inference_df,
    users=users_df,
    top_k=10
)

5. Swap Goals Without Retraining

from skrec.recommender.gcsl.inference.percentile_value import PercentileValue

# Switch to a different goal — auto-fits using stored training data
recommender.set_inference_method(
    PercentileValue({"OUTCOME_engagement": 80, "OUTCOME_revenue": 50})
)

# Same model, different recommendations
new_recommendations = recommender.recommend(
    interactions=inference_df,
    users=users_df,
    top_k=10
)

Inference Methods

The inference method controls how goal values are injected at scoring time. All methods follow the sklearn-style fit() / transform() lifecycle.

PredefinedValue — Fixed Goals

Set exact goal values. Simplest method.

from skrec.recommender.gcsl.inference.predefined_value import PredefinedValue

inference = PredefinedValue({
    "OUTCOME_engagement": 0.8,
    "OUTCOME_revenue": 5.0
})

When to use: You know the exact outcome values you want to target.

PercentileValue — Percentile-Based Goals

Target a specific percentile of the training distribution. Goals are always within the training range — the safest option.

from skrec.recommender.gcsl.inference.percentile_value import PercentileValue

inference = PercentileValue({
    "OUTCOME_engagement": 80,  # 80th percentile
    "OUTCOME_revenue": 75      # 75th percentile
})

When to use: You want goals relative to observed data. Percentile 50 = "average"; 80 = "top-20%"; 95 = "elite".

MeanScalarization — Scaled Training Mean

Multiply the per-outcome training mean by a scalar. A scalar of 1.0 targets the average; scalars above 1.0 push toward above-average outcomes.

from skrec.recommender.gcsl.inference.mean_scalarization import MeanScalarization

inference = MeanScalarization({
    "OUTCOME_engagement": 1.2,  # 20% above mean engagement
    "OUTCOME_revenue": 0.8      # 20% below mean revenue
})

When to use: You want to express goals as "X% above/below average".

Out-of-Distribution Warnings

PredefinedValue and MeanScalarization emit a UserWarning when a goal falls outside the observed training range — the model has no signal for values it never saw. PercentileValue is bounded by construction and never triggers this warning.

# This will warn: goal 99.0 is outside the training range [0.0, 5.0]
inference = PredefinedValue({"OUTCOME_revenue": 99.0})

Writing a Custom Inference Method

Subclass BaseInference and implement fit() and transform():

from skrec.recommender.gcsl.inference.base_inference import BaseInference

class ClippedValue(BaseInference):
    """Goals clipped to [training_min, training_max]."""

    def __init__(self, goal_values):
        super().__init__()
        self.goal_values = goal_values

    def fit(self, interactions_df, outcome_cols):
        self._ranges = {
            col: (float(interactions_df[col].min()),
                  float(interactions_df[col].max()))
            for col in outcome_cols
        }
        self.outcome_cols_ = outcome_cols
        self._fitted = True
        return self

    def transform(self, interactions):
        self._check_fitted()
        interactions = interactions.copy()
        for col in self.outcome_cols_:
            lo, hi = self._ranges[col]
            interactions[col] = max(lo, min(hi, self.goal_values[col]))
        return interactions

set_inference_method() auto-calls fit() when the recommender is already trained, so custom methods work immediately:

recommender.set_inference_method(
    ClippedValue({"OUTCOME_engagement": 999.0, "OUTCOME_revenue": -5.0})
)
recommendations = recommender.recommend(interactions=df, top_k=10)

How It Works

Training

GcslRecommender overrides _process_outcome_columns() to be a no-op. Where the base class drops OUTCOME_* columns, GCSL keeps them — so the model trains on:

X = [user_features, item_features, context_features, OUTCOME_engagement, OUTCOME_revenue]
y = OUTCOME (primary label)

Inference

The inference method's transform() overwrites the outcome columns with goal values before scoring:

inference_df["OUTCOME_engagement"] = 0.8   # goal value
inference_df["OUTCOME_revenue"]    = 5.0   # goal value
→ scorer.score_items(inference_df)         # score all items given these goals

Items that historically co-occurred with the requested outcome profile score highest.

Why This Works

The model learns a conditional distribution: P(positive | user, item, goals). By conditioning on different goals at inference, you query: "which items are most consistent with achieving these outcomes for this user?" Change the goals, change the answer — same weights.

Limitation: the model learns correlation, not causation. Injecting revenue=10 doesn't guarantee revenue=10 — it recommends items that historically appeared alongside high revenue.

Scorer Compatibility

GcslRecommender works with all non-sequential scorer types:

Scorer Compatible Notes
UniversalScorer Most common choice
IndependentScorer Per-item models
MulticlassScorer Items as classes

Not compatible: SequentialScorer — use SequentialRecommender for sequence models.

Evaluation

from skrec.evaluator.datatypes import RecommenderEvaluatorType
from skrec.metrics.datatypes import RecommenderMetricType
import numpy as np

eval_data = {
    "logged_items": np.array([["item_A"], ["item_B"]]),
    "logged_rewards": np.array([[1.0], [0.5]])
}

ndcg = recommender.evaluate(
    eval_type=RecommenderEvaluatorType.SIMPLE,
    metric_type=RecommenderMetricType.NDCG_AT_K,
    eval_top_k=5,
    score_items_kwargs={"interactions": inference_df, "users": users_df},
    eval_kwargs=eval_data
)

All standard evaluators and metrics are supported — see Evaluation Guide.

Best Practices

1. Stay In-Distribution

Use PercentileValue when possible — goals are bounded by construction. If using PredefinedValue, keep goals within the observed training range.

2. Validate Feature Importance

After training, check that outcome columns have non-trivial feature importance. If they rank low, goal conditioning has no effect and a standard RankingRecommender would be simpler.

3. Use Real Per-Interaction Outcomes

Outcome columns that vary per interaction (dwell time, purchase amount, explicit rating) give the model more signal than static item properties. Static properties can be learned from item features alone.

4. Compare Goals Systematically

Use set_inference_method() to sweep goals and compare top-k lists or evaluation metrics without retraining. This is the core workflow GCSL enables.

Common Issues

Issue: Goals don't change recommendations

Solution: Check feature importance — if OUTCOME_* columns have low gain, the model ignores them. Increase n_estimators or add more varied training data.

Issue: NotFittedError when calling recommend()

Solution: The inference method needs fit() before transform(). If you construct an inference method manually (outside set_inference_method()), call fit() explicitly:

method = PredefinedValue({"OUTCOME_engagement": 0.8})
method.fit(training_df, ["OUTCOME_engagement"])
recommender.inference_method = method

Using set_inference_method() or passing the method to the constructor avoids this — fit() is called automatically.

Issue: recommend_online() raises NotImplementedError

Expected: recommend_online() is not supported for GCSL because the single-row fast path bypasses goal injection. Use recommend() instead.

Next Steps