Config-Driven Orchestration¶

Build complete recommender pipelines from a single configuration dictionary using create_recommender_pipeline(). This eliminates manual instantiation and provides a single entry point for MLOps pipelines, agent-driven systems, and A/B testing.

Basic Usage¶

from skrec.orchestrator import create_recommender_pipeline

config = {
    "recommender_type": "ranking",
    "scorer_type": "universal",
    "estimator_config": {
        "ml_task": "classification",
        "xgboost": {
            "n_estimators": 100,
            "max_depth": 5,
            "learning_rate": 0.1,
        },
    },
}

recommender = create_recommender_pipeline(config)

recommender.train(
    interactions_ds=interactions_dataset,
    items_ds=items_dataset,
    users_ds=users_dataset,
)

recommendations = recommender.recommend(interactions=query_df, top_k=5)

Agent-Friendly Orchestration¶

scikit-rec exposes a config-driven recommender factory that is especially useful for automated and LLM-based systems. The factory validates compatibility between recommender, scorer, and estimator types, and capability_matrix() can be used by tooling or an agent to enumerate supported options.

Configuration Reference¶

Top-Level Config (`RecommenderConfig`)¶

Key	Type	Required	Description
`recommender_type`	str	Yes	`"ranking"`, `"bandits"`, `"sequential"`, `"hierarchical_sequential"`, `"uplift"`, `"gcsl"`
`scorer_type`	str	Yes	`"universal"`, `"independent"`, `"multiclass"`, `"multioutput"`, `"sequential"`, `"hierarchical"`
`estimator_config`	dict	Yes	Estimator configuration (see below)
`scorer_config`	dict	No	Per-scorer constructor kwargs (see below). Unsupported keys raise `ValueError` upfront.
`recommender_params`	dict	No	Per-recommender parameters (see below)

Estimator Config (`EstimatorConfig`)¶

Key	Type	Default	Description
`estimator_type`	str	`"tabular"`	`"tabular"`, `"embedding"`, or `"sequential"`
`ml_task`	str	`"classification"`	`"classification"` or `"regression"` (tabular only)
`xgboost`	dict	`{}`	XGBoost hyperparameters (tabular only)
`hpo`	dict	—	HPO configuration (tabular only)
`weights`	dict	—	Sample/feature weighting (tabular only)
`embedding`	dict	—	Embedding model config (embedding only)
`sequential`	dict	—	Sequential model config (sequential only)

Scorer Config (`ScorerConfig`)¶

Per-scorer constructor kwargs. Optional; defaults preserve historical scorer behavior. The accepted keys depend on scorer_type; passing a key a scorer doesn't accept raises ValueError upfront. capability_matrix()["scorer_config_keys"] exposes the live whitelist for tooling.

Key	Used by	Type	Description
`on_degenerate_target`	`multioutput`	`DegenerateTargetPolicy` or `"raise"` / `"constant"`	Policy for single-class targets in the training slice. `"raise"` (default) aborts training with the offending column names; `"constant"` fits a constant predictor for degenerate columns and trains the rest. See `MultioutputScorer` degenerate-target handling.

Scorers without entries here (multiclass, independent, universal, sequential, hierarchical) accept no scorer_config keys today — passing any key raises.

Embedding Config¶

"embedding": {
    "model_type": str,  # Required. See table below.
    "params": dict,     # Constructor kwargs passed to the estimator.
}

`model_type`	Class	Requires PyTorch
`"matrix_factorization"`	`MatrixFactorizationEstimator`	No (NumPy)
`"ncf"`	`NCFEstimator`	Yes
`"two_tower"`	`ContextualizedTwoTowerEstimator`	Yes
`"deep_cross_network"`	`DeepCrossNetworkEstimator`	Yes
`"neural_factorization"`	`NeuralFactorizationEstimator`	Yes

Sequential Config¶

"sequential": {
    "model_type": str,  # Required. See table below.
    "params": dict,     # Constructor kwargs passed to the estimator.
}

`model_type`	Class	Description
`"sasrec_classifier"`	`SASRecClassifierEstimator`	Self-attentive sequential (binary)
`"sasrec_regressor"`	`SASRecRegressorEstimator`	Self-attentive sequential (continuous)
`"hrnn_classifier"`	`HRNNClassifierEstimator`	Hierarchical RNN (binary)
`"hrnn_regressor"`	`HRNNRegressorEstimator`	Hierarchical RNN (continuous)

Recommender Params¶

Per-recommender constructor parameters. Only keys relevant to the chosen recommender_type are used.

Key	Used by	Type	Description
`max_len`	`sequential`	int	Maximum sequence length (default: 50)
`max_sessions`	`hierarchical_sequential`	int	Max past sessions (default: 10)
`max_session_len`	`hierarchical_sequential`	int	Max items per session (default: 20)
`session_timeout_minutes`	`hierarchical_sequential`	float	Session boundary timeout (default: 30.0)
`control_item_id`	`uplift`	str	Required. Control group item ID
`mode`	`uplift`	str	`"t_learner"`, `"s_learner"`, or `"x_learner"`. Auto-detects if omitted.
`inference_method`	`gcsl`	dict	Goal-injection method (see below)
`retriever`	`ranking`, `gcsl`	dict	Candidate retriever (see below)

Inference Method Config (GCSL)¶

"inference_method": {
    "type": str,    # "mean_scalarization", "percentile_value", "predefined_value"
    "params": dict, # Constructor kwargs
}

Retriever Config¶

"retriever": {
    "type": str,    # "popularity", "content_based", "embedding"
    "params": dict, # Constructor kwargs (e.g. {"top_k": 200})
}

Compatibility Rules¶

The factory validates these constraints at pipeline creation time and raises ValueError with a clear message if violated:

Rule	Why
`sequential` / `hierarchical_sequential` recommender requires `estimator_type: "sequential"`	Sequential models need sequence data
`sequential` recommender requires `scorer_type: "sequential"`	SASRec needs SequentialScorer
`hierarchical_sequential` recommender requires `scorer_type: "hierarchical"`	HRNN needs HierarchicalScorer
`sequential` / `hierarchical` scorer requires `estimator_type: "sequential"`	Scorer delegates to SequentialEstimator
`embedding` estimator only works with `scorer_type: "universal"`	IndependentScorer/MulticlassScorer/MultioutputScorer reject embedding estimators
`uplift` recommender requires `scorer_type: "independent"` or `"universal"`	UpliftRecommender needs T-Learner or S-Learner compatible scorer

Programmatic Capability Introspection¶

For tooling that needs to enumerate what the factory understands (system-prompt builders, config validators, UI pickers, etc.), skrec.orchestrator exposes four authoritative enum tuples plus a capability_matrix() accessor. These are the same values the factory validates against internally, so they stay in lockstep automatically.

from skrec.orchestrator import (
    RECOMMENDER_TYPES,
    SCORER_TYPES,
    ESTIMATOR_TYPES,
    TABULAR_MODEL_TYPES,
    capability_matrix,
)

RECOMMENDER_TYPES    # ('ranking', 'bandits', 'sequential', 'hierarchical_sequential', 'uplift', 'gcsl')
SCORER_TYPES         # ('universal', 'independent', 'multiclass', 'multioutput', 'sequential', 'hierarchical')
ESTIMATOR_TYPES      # ('tabular', 'embedding', 'sequential')
TABULAR_MODEL_TYPES  # ('xgboost', 'lightgbm', 'deepfm')

capability_matrix()
# {
#     "recommender_types": (...),
#     "scorer_types": (...),
#     "estimator_types": (...),
#     "tabular_model_types": ("xgboost", "lightgbm", "deepfm"),
#     "embedding_model_types": ("matrix_factorization", "ncf", "two_tower", ...),
#     "sequential_model_types": ("sasrec_classifier", "sasrec_regressor", ...),
#     "inference_method_types": ("mean_scalarization", "percentile_value", "predefined_value"),
#     "retriever_types": ("popularity", "content_based", "embedding"),
#     "scorer_config_keys": {
#         "multioutput": ("on_degenerate_target",),
#         "multiclass": (),
#         "independent": (),
#         "universal": (),
#         "sequential": (),
#         "hierarchical": (),
#     },
#     "evaluator_types": (...),   # RecommenderEvaluatorType members
#     "metric_types": (...),      # RecommenderMetricType members
# }

The full compatibility reference (which scorer works with which estimator, retriever constraints, etc.) is on the Capability Matrix page.

Complete Examples¶

1. Tabular: XGBoost + Universal + Ranking¶

The simplest pipeline — a pointwise XGBoost ranker.

recommender_type: ranking
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 200
    max_depth: 6
    learning_rate: 0.1
    subsample: 0.8
    colsample_bytree: 0.8

2. Embedding: Matrix Factorization + Universal + Ranking¶

Collaborative filtering via learned user/item embeddings.

recommender_type: ranking
scorer_type: universal
estimator_config:
  estimator_type: embedding
  embedding:
    model_type: matrix_factorization
    params:
      n_factors: 64
      algorithm: als
      epochs: 30

3. Embedding: NCF + Universal + Ranking with Retriever¶

Neural collaborative filtering with a two-stage retrieval pipeline.

recommender_type: ranking
scorer_type: universal
estimator_config:
  estimator_type: embedding
  embedding:
    model_type: ncf
    params:
      ncf_type: neumf
      gmf_embedding_dim: 32
      mlp_embedding_dim: 32
      epochs: 20
recommender_params:
  retriever:
    type: embedding
    params:
      top_k: 200

4. Sequential: SASRec¶

Self-attentive sequential recommendation from interaction history.

recommender_type: sequential
scorer_type: sequential
estimator_config:
  estimator_type: sequential
  sequential:
    model_type: sasrec_classifier
    params:
      hidden_units: 64
      num_blocks: 2
      num_heads: 2
      max_len: 50
      epochs: 100
recommender_params:
  max_len: 50

5. Sequential: HRNN (Hierarchical)¶

Session-aware recommendation with hierarchical GRU.

recommender_type: hierarchical_sequential
scorer_type: hierarchical
estimator_config:
  estimator_type: sequential
  sequential:
    model_type: hrnn_classifier
    params:
      hidden_units: 64
      max_sessions: 10
      max_session_len: 20
      epochs: 100
recommender_params:
  max_sessions: 10
  max_session_len: 20
  session_timeout_minutes: 30.0

6. Uplift: T-Learner¶

Causal treatment effect estimation with per-treatment models.

recommender_type: uplift
scorer_type: independent
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 200
recommender_params:
  control_item_id: "control_arm"
  mode: t_learner

7. GCSL: Goal-Conditioned Supervised Learning¶

Multi-objective recommendation with goal injection at inference time.

recommender_type: gcsl
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 200
recommender_params:
  inference_method:
    type: percentile_value
    params:
      percentiles:
        OUTCOME_revenue: 80
        OUTCOME_clicks: 75

8. Contextual Bandits¶

Exploration/exploitation via epsilon-greedy or static action strategies.

recommender_type: bandits
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 100

Note: Bandit strategies are set after pipeline creation via recommender.set_strategy().

9. Tabular with HPO¶

Hyperparameter optimization using grid search or randomized search.

recommender_type: ranking
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    objective: binary:logistic
    n_jobs: 1
  hpo:
    hpo_method: grid_search_cv
    param_space:
      n_estimators: [50, 100, 200]
      learning_rate: [0.01, 0.05, 0.1]
      max_depth: [3, 5, 7]
    optimizer_params:
      cv: 3
      scoring: roc_auc

10. Tabular with Sample Weighting¶

Feature-level and item-level weighting for imbalanced datasets.

recommender_type: ranking
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 100
    colsample_bynode: 0.9
  weights:
    action_weight: 0.8
    item_sample_weights:
      itemA: 1.2
      itemB: 0.5

Loading Config from Files¶

YAML¶

import yaml
from skrec.orchestrator import create_recommender_pipeline

with open("config.yaml") as f:
    config = yaml.safe_load(f)

recommender = create_recommender_pipeline(config)

JSON¶

import json
from skrec.orchestrator import create_recommender_pipeline

with open("config.json") as f:
    config = json.load(f)

recommender = create_recommender_pipeline(config)

Use Cases¶

Kubeflow Pipelines¶

from kfp import dsl

@dsl.component
def train_recommender(config_path: str, data_path: str):
    import yaml
    from skrec.orchestrator import create_recommender_pipeline

    with open(config_path) as f:
        config = yaml.safe_load(f)

    recommender = create_recommender_pipeline(config)
    recommender.train(...)

A/B Testing¶

configs = {
    "variant_a": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "ml_task": "classification",
            "xgboost": {"n_estimators": 100, "max_depth": 5},
        },
    },
    "variant_b": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "estimator_type": "embedding",
            "embedding": {"model_type": "ncf", "params": {"embedding_dim": 32}},
        },
    },
}

for name, config in configs.items():
    recommender = create_recommender_pipeline(config)
    recommender.train(...)
    metrics = recommender.evaluate(...)
    print(f"{name}: {metrics}")

Environment-Specific Configs¶

import os

env = os.getenv("ENVIRONMENT", "dev")

configs = {
    "dev": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "ml_task": "classification",
            "xgboost": {"n_estimators": 50},
        },
    },
    "prod": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "ml_task": "classification",
            "xgboost": {"n_estimators": 500, "max_depth": 8},
        },
    },
}

recommender = create_recommender_pipeline(configs[env])

Error Handling¶

The factory raises clear errors for invalid configurations:

# Missing required key
>>> create_recommender_pipeline({"scorer_type": "universal"})
ValueError: 'recommender_type' must be specified in the configuration.

# Typo in recommender_type
>>> create_recommender_pipeline({..., "recommender_type": "ranknig"})
ValueError: Unknown recommender_type 'ranknig'. Valid:
    ('ranking', 'bandits', 'sequential', 'hierarchical_sequential',
     'uplift', 'gcsl')

# Incompatible estimator + scorer
>>> create_recommender_pipeline({..., "estimator_type": "embedding", "scorer_type": "independent"})
ValueError: scorer_type 'independent' does not support embedding estimators.
    Use scorer_type='universal' with embedding estimators.

Factory scope¶

The tabular estimator path supports XGBoost, LightGBM, and DeepFM natively. Pick one by adding its key to estimator_config:

# XGBoost (default)
create_recommender_pipeline({
    "recommender_type": "ranking",
    "scorer_type": "universal",
    "estimator_config": {"ml_task": "classification", "xgboost": {"n_estimators": 200}},
})

# LightGBM — same shape, just swap the key
create_recommender_pipeline({
    "recommender_type": "ranking",
    "scorer_type": "universal",
    "estimator_config": {"ml_task": "classification", "lightgbm": {"n_estimators": 200, "num_leaves": 63}},
})

# DeepFM — requires scikit-rec[torch]; classification only
create_recommender_pipeline({
    "recommender_type": "ranking",
    "scorer_type": "universal",
    "estimator_config": {"ml_task": "classification", "deepfm": {"embedding_dim": 16, "epochs": 10}},
})

Only one tabular key (xgboost, lightgbm, or deepfm) may be present in a single config — specifying more than one raises ValueError. For estimator types not covered here (custom sklearn models, etc.) you can still compose manually and pass the estimator directly to a scorer.

Next Steps¶

HPO Guide — Add hyperparameter optimization to configs
Training Guide — Train config-driven pipelines
Capability Matrix — Full compatibility reference
Production Guide — Deploy config-driven systems