Skip to content

Config-Driven Orchestration

Build complete recommender pipelines from a single configuration dictionary using create_recommender_pipeline(). This eliminates manual instantiation and provides a single entry point for MLOps pipelines, agent-driven systems, and A/B testing.

Basic Usage

from skrec.orchestrator import create_recommender_pipeline

config = {
    "recommender_type": "ranking",
    "scorer_type": "universal",
    "estimator_config": {
        "ml_task": "classification",
        "xgboost": {
            "n_estimators": 100,
            "max_depth": 5,
            "learning_rate": 0.1,
        },
    },
}

recommender = create_recommender_pipeline(config)

recommender.train(
    interactions_ds=interactions_dataset,
    items_ds=items_dataset,
    users_ds=users_dataset,
)

recommendations = recommender.recommend(interactions=query_df, top_k=5)

Agent-Friendly Orchestration

scikit-rec exposes a config-driven recommender factory that is especially useful for automated and LLM-based systems. The factory validates compatibility between recommender, scorer, and estimator types, and capability_matrix() can be used by tooling or an agent to enumerate supported options.

Configuration Reference

Top-Level Config (RecommenderConfig)

Key Type Required Description
recommender_type str Yes "ranking", "bandits", "sequential", "hierarchical_sequential", "uplift", "gcsl"
scorer_type str Yes "universal", "independent", "multiclass", "multioutput", "sequential", "hierarchical"
estimator_config dict Yes Estimator configuration (see below)
scorer_config dict No Per-scorer constructor kwargs (see below). Unsupported keys raise ValueError upfront.
recommender_params dict No Per-recommender parameters (see below)

Estimator Config (EstimatorConfig)

Key Type Default Description
estimator_type str "tabular" "tabular", "embedding", or "sequential"
ml_task str "classification" "classification" or "regression" (tabular only)
xgboost dict {} XGBoost hyperparameters (tabular only)
hpo dict HPO configuration (tabular only)
weights dict Sample/feature weighting (tabular only)
embedding dict Embedding model config (embedding only)
sequential dict Sequential model config (sequential only)

Scorer Config (ScorerConfig)

Per-scorer constructor kwargs. Optional; defaults preserve historical scorer behavior. The accepted keys depend on scorer_type; passing a key a scorer doesn't accept raises ValueError upfront. capability_matrix()["scorer_config_keys"] exposes the live whitelist for tooling.

Key Used by Type Description
on_degenerate_target multioutput DegenerateTargetPolicy or "raise" / "constant" Policy for single-class targets in the training slice. "raise" (default) aborts training with the offending column names; "constant" fits a constant predictor for degenerate columns and trains the rest. See MultioutputScorer degenerate-target handling.

Scorers without entries here (multiclass, independent, universal, sequential, hierarchical) accept no scorer_config keys today — passing any key raises.

Embedding Config

"embedding": {
    "model_type": str,  # Required. See table below.
    "params": dict,     # Constructor kwargs passed to the estimator.
}
model_type Class Requires PyTorch
"matrix_factorization" MatrixFactorizationEstimator No (NumPy)
"ncf" NCFEstimator Yes
"two_tower" ContextualizedTwoTowerEstimator Yes
"deep_cross_network" DeepCrossNetworkEstimator Yes
"neural_factorization" NeuralFactorizationEstimator Yes

Sequential Config

"sequential": {
    "model_type": str,  # Required. See table below.
    "params": dict,     # Constructor kwargs passed to the estimator.
}
model_type Class Description
"sasrec_classifier" SASRecClassifierEstimator Self-attentive sequential (binary)
"sasrec_regressor" SASRecRegressorEstimator Self-attentive sequential (continuous)
"hrnn_classifier" HRNNClassifierEstimator Hierarchical RNN (binary)
"hrnn_regressor" HRNNRegressorEstimator Hierarchical RNN (continuous)

Recommender Params

Per-recommender constructor parameters. Only keys relevant to the chosen recommender_type are used.

Key Used by Type Description
max_len sequential int Maximum sequence length (default: 50)
max_sessions hierarchical_sequential int Max past sessions (default: 10)
max_session_len hierarchical_sequential int Max items per session (default: 20)
session_timeout_minutes hierarchical_sequential float Session boundary timeout (default: 30.0)
control_item_id uplift str Required. Control group item ID
mode uplift str "t_learner", "s_learner", or "x_learner". Auto-detects if omitted.
inference_method gcsl dict Goal-injection method (see below)
retriever ranking, gcsl dict Candidate retriever (see below)

Inference Method Config (GCSL)

"inference_method": {
    "type": str,    # "mean_scalarization", "percentile_value", "predefined_value"
    "params": dict, # Constructor kwargs
}

Retriever Config

"retriever": {
    "type": str,    # "popularity", "content_based", "embedding"
    "params": dict, # Constructor kwargs (e.g. {"top_k": 200})
}

Compatibility Rules

The factory validates these constraints at pipeline creation time and raises ValueError with a clear message if violated:

Rule Why
sequential / hierarchical_sequential recommender requires estimator_type: "sequential" Sequential models need sequence data
sequential recommender requires scorer_type: "sequential" SASRec needs SequentialScorer
hierarchical_sequential recommender requires scorer_type: "hierarchical" HRNN needs HierarchicalScorer
sequential / hierarchical scorer requires estimator_type: "sequential" Scorer delegates to SequentialEstimator
embedding estimator only works with scorer_type: "universal" IndependentScorer/MulticlassScorer/MultioutputScorer reject embedding estimators
uplift recommender requires scorer_type: "independent" or "universal" UpliftRecommender needs T-Learner or S-Learner compatible scorer

Programmatic Capability Introspection

For tooling that needs to enumerate what the factory understands (system-prompt builders, config validators, UI pickers, etc.), skrec.orchestrator exposes four authoritative enum tuples plus a capability_matrix() accessor. These are the same values the factory validates against internally, so they stay in lockstep automatically.

from skrec.orchestrator import (
    RECOMMENDER_TYPES,
    SCORER_TYPES,
    ESTIMATOR_TYPES,
    TABULAR_MODEL_TYPES,
    capability_matrix,
)

RECOMMENDER_TYPES    # ('ranking', 'bandits', 'sequential', 'hierarchical_sequential', 'uplift', 'gcsl')
SCORER_TYPES         # ('universal', 'independent', 'multiclass', 'multioutput', 'sequential', 'hierarchical')
ESTIMATOR_TYPES      # ('tabular', 'embedding', 'sequential')
TABULAR_MODEL_TYPES  # ('xgboost', 'lightgbm', 'deepfm')

capability_matrix()
# {
#     "recommender_types": (...),
#     "scorer_types": (...),
#     "estimator_types": (...),
#     "tabular_model_types": ("xgboost", "lightgbm", "deepfm"),
#     "embedding_model_types": ("matrix_factorization", "ncf", "two_tower", ...),
#     "sequential_model_types": ("sasrec_classifier", "sasrec_regressor", ...),
#     "inference_method_types": ("mean_scalarization", "percentile_value", "predefined_value"),
#     "retriever_types": ("popularity", "content_based", "embedding"),
#     "scorer_config_keys": {
#         "multioutput": ("on_degenerate_target",),
#         "multiclass": (),
#         "independent": (),
#         "universal": (),
#         "sequential": (),
#         "hierarchical": (),
#     },
#     "evaluator_types": (...),   # RecommenderEvaluatorType members
#     "metric_types": (...),      # RecommenderMetricType members
# }

The full compatibility reference (which scorer works with which estimator, retriever constraints, etc.) is on the Capability Matrix page.

Complete Examples

1. Tabular: XGBoost + Universal + Ranking

The simplest pipeline — a pointwise XGBoost ranker.

recommender_type: ranking
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 200
    max_depth: 6
    learning_rate: 0.1
    subsample: 0.8
    colsample_bytree: 0.8

2. Embedding: Matrix Factorization + Universal + Ranking

Collaborative filtering via learned user/item embeddings.

recommender_type: ranking
scorer_type: universal
estimator_config:
  estimator_type: embedding
  embedding:
    model_type: matrix_factorization
    params:
      n_factors: 64
      algorithm: als
      epochs: 30

3. Embedding: NCF + Universal + Ranking with Retriever

Neural collaborative filtering with a two-stage retrieval pipeline.

recommender_type: ranking
scorer_type: universal
estimator_config:
  estimator_type: embedding
  embedding:
    model_type: ncf
    params:
      ncf_type: neumf
      gmf_embedding_dim: 32
      mlp_embedding_dim: 32
      epochs: 20
recommender_params:
  retriever:
    type: embedding
    params:
      top_k: 200

4. Sequential: SASRec

Self-attentive sequential recommendation from interaction history.

recommender_type: sequential
scorer_type: sequential
estimator_config:
  estimator_type: sequential
  sequential:
    model_type: sasrec_classifier
    params:
      hidden_units: 64
      num_blocks: 2
      num_heads: 2
      max_len: 50
      epochs: 100
recommender_params:
  max_len: 50

5. Sequential: HRNN (Hierarchical)

Session-aware recommendation with hierarchical GRU.

recommender_type: hierarchical_sequential
scorer_type: hierarchical
estimator_config:
  estimator_type: sequential
  sequential:
    model_type: hrnn_classifier
    params:
      hidden_units: 64
      max_sessions: 10
      max_session_len: 20
      epochs: 100
recommender_params:
  max_sessions: 10
  max_session_len: 20
  session_timeout_minutes: 30.0

6. Uplift: T-Learner

Causal treatment effect estimation with per-treatment models.

recommender_type: uplift
scorer_type: independent
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 200
recommender_params:
  control_item_id: "control_arm"
  mode: t_learner

7. GCSL: Goal-Conditioned Supervised Learning

Multi-objective recommendation with goal injection at inference time.

recommender_type: gcsl
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 200
recommender_params:
  inference_method:
    type: percentile_value
    params:
      percentiles:
        OUTCOME_revenue: 80
        OUTCOME_clicks: 75

8. Contextual Bandits

Exploration/exploitation via epsilon-greedy or static action strategies.

recommender_type: bandits
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 100

Note: Bandit strategies are set after pipeline creation via recommender.set_strategy().

9. Tabular with HPO

Hyperparameter optimization using grid search or randomized search.

recommender_type: ranking
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    objective: binary:logistic
    n_jobs: 1
  hpo:
    hpo_method: grid_search_cv
    param_space:
      n_estimators: [50, 100, 200]
      learning_rate: [0.01, 0.05, 0.1]
      max_depth: [3, 5, 7]
    optimizer_params:
      cv: 3
      scoring: roc_auc

10. Tabular with Sample Weighting

Feature-level and item-level weighting for imbalanced datasets.

recommender_type: ranking
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 100
    colsample_bynode: 0.9
  weights:
    action_weight: 0.8
    item_sample_weights:
      itemA: 1.2
      itemB: 0.5

Loading Config from Files

YAML

import yaml
from skrec.orchestrator import create_recommender_pipeline

with open("config.yaml") as f:
    config = yaml.safe_load(f)

recommender = create_recommender_pipeline(config)

JSON

import json
from skrec.orchestrator import create_recommender_pipeline

with open("config.json") as f:
    config = json.load(f)

recommender = create_recommender_pipeline(config)

Use Cases

Kubeflow Pipelines

from kfp import dsl

@dsl.component
def train_recommender(config_path: str, data_path: str):
    import yaml
    from skrec.orchestrator import create_recommender_pipeline

    with open(config_path) as f:
        config = yaml.safe_load(f)

    recommender = create_recommender_pipeline(config)
    recommender.train(...)

A/B Testing

configs = {
    "variant_a": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "ml_task": "classification",
            "xgboost": {"n_estimators": 100, "max_depth": 5},
        },
    },
    "variant_b": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "estimator_type": "embedding",
            "embedding": {"model_type": "ncf", "params": {"embedding_dim": 32}},
        },
    },
}

for name, config in configs.items():
    recommender = create_recommender_pipeline(config)
    recommender.train(...)
    metrics = recommender.evaluate(...)
    print(f"{name}: {metrics}")

Environment-Specific Configs

import os

env = os.getenv("ENVIRONMENT", "dev")

configs = {
    "dev": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "ml_task": "classification",
            "xgboost": {"n_estimators": 50},
        },
    },
    "prod": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "ml_task": "classification",
            "xgboost": {"n_estimators": 500, "max_depth": 8},
        },
    },
}

recommender = create_recommender_pipeline(configs[env])

Error Handling

The factory raises clear errors for invalid configurations:

# Missing required key
>>> create_recommender_pipeline({"scorer_type": "universal"})
ValueError: 'recommender_type' must be specified in the configuration.

# Typo in recommender_type
>>> create_recommender_pipeline({..., "recommender_type": "ranknig"})
ValueError: Unknown recommender_type 'ranknig'. Valid:
    ('ranking', 'bandits', 'sequential', 'hierarchical_sequential',
     'uplift', 'gcsl')

# Incompatible estimator + scorer
>>> create_recommender_pipeline({..., "estimator_type": "embedding", "scorer_type": "independent"})
ValueError: scorer_type 'independent' does not support embedding estimators.
    Use scorer_type='universal' with embedding estimators.

Factory scope

The tabular estimator path supports XGBoost, LightGBM, and DeepFM natively. Pick one by adding its key to estimator_config:

# XGBoost (default)
create_recommender_pipeline({
    "recommender_type": "ranking",
    "scorer_type": "universal",
    "estimator_config": {"ml_task": "classification", "xgboost": {"n_estimators": 200}},
})

# LightGBM — same shape, just swap the key
create_recommender_pipeline({
    "recommender_type": "ranking",
    "scorer_type": "universal",
    "estimator_config": {"ml_task": "classification", "lightgbm": {"n_estimators": 200, "num_leaves": 63}},
})

# DeepFM — requires scikit-rec[torch]; classification only
create_recommender_pipeline({
    "recommender_type": "ranking",
    "scorer_type": "universal",
    "estimator_config": {"ml_task": "classification", "deepfm": {"embedding_dim": 16, "epochs": 10}},
})

Only one tabular key (xgboost, lightgbm, or deepfm) may be present in a single config — specifying more than one raises ValueError. For estimator types not covered here (custom sklearn models, etc.) you can still compose manually and pass the estimator directly to a scorer.

Next Steps