Skip to content

Config-Driven Orchestration

Build complete recommender pipelines from a single configuration dictionary using create_recommender_pipeline(). This eliminates manual instantiation and provides a single entry point for MLOps pipelines, agent-driven systems, and A/B testing.

Basic Usage

from skrec.orchestrator import create_recommender_pipeline

config = {
    "recommender_type": "ranking",
    "scorer_type": "universal",
    "estimator_config": {
        "ml_task": "classification",
        "xgboost": {
            "n_estimators": 100,
            "max_depth": 5,
            "learning_rate": 0.1,
        },
    },
}

recommender = create_recommender_pipeline(config)

recommender.train(
    interactions_ds=interactions_dataset,
    items_ds=items_dataset,
    users_ds=users_dataset,
)

recommendations = recommender.recommend(interactions=query_df, top_k=5)

Configuration Reference

Top-Level Config (RecommenderConfig)

Key Type Required Description
recommender_type str Yes "ranking", "bandits", "sequential", "hierarchical_sequential", "uplift", "gcsl"
scorer_type str Yes "universal", "independent", "multiclass", "multioutput", "sequential", "hierarchical"
estimator_config dict Yes Estimator configuration (see below)
recommender_params dict No Per-recommender parameters (see below)

Estimator Config (EstimatorConfig)

Key Type Default Description
estimator_type str "tabular" "tabular", "embedding", or "sequential"
ml_task str "classification" "classification" or "regression" (tabular only)
xgboost dict {} XGBoost hyperparameters (tabular only)
hpo dict HPO configuration (tabular only)
weights dict Sample/feature weighting (tabular only)
embedding dict Embedding model config (embedding only)
sequential dict Sequential model config (sequential only)

Embedding Config

"embedding": {
    "model_type": str,  # Required. See table below.
    "params": dict,     # Constructor kwargs passed to the estimator.
}
model_type Class Requires PyTorch
"matrix_factorization" MatrixFactorizationEstimator No (NumPy)
"ncf" NCFEstimator Yes
"two_tower" ContextualizedTwoTowerEstimator Yes
"deep_cross_network" DeepCrossNetworkEstimator Yes
"neural_factorization" NeuralFactorizationEstimator Yes

Sequential Config

"sequential": {
    "model_type": str,  # Required. See table below.
    "params": dict,     # Constructor kwargs passed to the estimator.
}
model_type Class Description
"sasrec_classifier" SASRecClassifierEstimator Self-attentive sequential (binary)
"sasrec_regressor" SASRecRegressorEstimator Self-attentive sequential (continuous)
"hrnn_classifier" HRNNClassifierEstimator Hierarchical RNN (binary)
"hrnn_regressor" HRNNRegressorEstimator Hierarchical RNN (continuous)

Recommender Params

Per-recommender constructor parameters. Only keys relevant to the chosen recommender_type are used.

Key Used by Type Description
max_len sequential int Maximum sequence length (default: 50)
max_sessions hierarchical_sequential int Max past sessions (default: 10)
max_session_len hierarchical_sequential int Max items per session (default: 20)
session_timeout_minutes hierarchical_sequential float Session boundary timeout (default: 30.0)
control_item_id uplift str Required. Control group item ID
mode uplift str "t_learner", "s_learner", or "x_learner". Auto-detects if omitted.
inference_method gcsl dict Goal-injection method (see below)
retriever ranking, gcsl dict Candidate retriever (see below)

Inference Method Config (GCSL)

"inference_method": {
    "type": str,    # "mean_scalarization", "percentile_value", "predefined_value"
    "params": dict, # Constructor kwargs
}

Retriever Config

"retriever": {
    "type": str,    # "popularity", "content_based", "embedding"
    "params": dict, # Constructor kwargs (e.g. {"top_k": 200})
}

Compatibility Rules

The factory validates these constraints at pipeline creation time and raises ValueError with a clear message if violated:

Rule Why
sequential / hierarchical_sequential recommender requires estimator_type: "sequential" Sequential models need sequence data
sequential recommender requires scorer_type: "sequential" SASRec needs SequentialScorer
hierarchical_sequential recommender requires scorer_type: "hierarchical" HRNN needs HierarchicalScorer
sequential / hierarchical scorer requires estimator_type: "sequential" Scorer delegates to SequentialEstimator
embedding estimator only works with scorer_type: "universal" IndependentScorer/MulticlassScorer/MultioutputScorer reject embedding estimators
uplift recommender requires scorer_type: "independent" or "universal" UpliftRecommender needs T-Learner or S-Learner compatible scorer

Programmatic Capability Introspection

For tooling that needs to enumerate what the factory understands (system-prompt builders, config validators, UI pickers, etc.), skrec.orchestrator exposes three authoritative enum tuples plus a capability_matrix() accessor. These are the same values the factory validates against internally, so they stay in lockstep automatically.

from skrec.orchestrator import (
    RECOMMENDER_TYPES,
    SCORER_TYPES,
    ESTIMATOR_TYPES,
    capability_matrix,
)

RECOMMENDER_TYPES  # ('ranking', 'bandits', 'sequential', 'hierarchical_sequential', 'uplift', 'gcsl')
SCORER_TYPES       # ('universal', 'independent', 'multiclass', 'multioutput', 'sequential', 'hierarchical')
ESTIMATOR_TYPES    # ('tabular', 'embedding', 'sequential')

capability_matrix()
# {
#     "recommender_types": (...),
#     "scorer_types": (...),
#     "estimator_types": (...),
#     "embedding_model_types": ("matrix_factorization", "ncf", "two_tower", ...),
#     "sequential_model_types": ("sasrec_classifier", "sasrec_regressor", ...),
#     "inference_method_types": ("mean_scalarization", "percentile_value", "predefined_value"),
#     "retriever_types": ("popularity", "content_based", "embedding"),
# }

Evaluator and metric enums are not included in capability_matrix() — those already live as proper Enum subclasses in skrec.evaluator.datatypes.RecommenderEvaluatorType and skrec.metrics.datatypes.RecommenderMetricType and are enumerable directly (e.g. list(RecommenderEvaluatorType)).

The full compatibility reference (which scorer works with which estimator, retriever constraints, etc.) is on the Capability Matrix page.

Complete Examples

1. Tabular: XGBoost + Universal + Ranking

The simplest pipeline — a pointwise XGBoost ranker.

recommender_type: ranking
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 200
    max_depth: 6
    learning_rate: 0.1
    subsample: 0.8
    colsample_bytree: 0.8

2. Embedding: Matrix Factorization + Universal + Ranking

Collaborative filtering via learned user/item embeddings.

recommender_type: ranking
scorer_type: universal
estimator_config:
  estimator_type: embedding
  embedding:
    model_type: matrix_factorization
    params:
      n_factors: 64
      algorithm: als
      epochs: 30

3. Embedding: NCF + Universal + Ranking with Retriever

Neural collaborative filtering with a two-stage retrieval pipeline.

recommender_type: ranking
scorer_type: universal
estimator_config:
  estimator_type: embedding
  embedding:
    model_type: ncf
    params:
      ncf_type: neumf
      gmf_embedding_dim: 32
      mlp_embedding_dim: 32
      epochs: 20
recommender_params:
  retriever:
    type: embedding
    params:
      top_k: 200

4. Sequential: SASRec

Self-attentive sequential recommendation from interaction history.

recommender_type: sequential
scorer_type: sequential
estimator_config:
  estimator_type: sequential
  sequential:
    model_type: sasrec_classifier
    params:
      hidden_units: 64
      num_blocks: 2
      num_heads: 2
      max_len: 50
      epochs: 100
recommender_params:
  max_len: 50

5. Sequential: HRNN (Hierarchical)

Session-aware recommendation with hierarchical GRU.

recommender_type: hierarchical_sequential
scorer_type: hierarchical
estimator_config:
  estimator_type: sequential
  sequential:
    model_type: hrnn_classifier
    params:
      hidden_units: 64
      max_sessions: 10
      max_session_len: 20
      epochs: 100
recommender_params:
  max_sessions: 10
  max_session_len: 20
  session_timeout_minutes: 30.0

6. Uplift: T-Learner

Causal treatment effect estimation with per-treatment models.

recommender_type: uplift
scorer_type: independent
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 200
recommender_params:
  control_item_id: "control_arm"
  mode: t_learner

7. GCSL: Goal-Conditioned Supervised Learning

Multi-objective recommendation with goal injection at inference time.

recommender_type: gcsl
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 200
recommender_params:
  inference_method:
    type: percentile_value
    params:
      percentiles:
        OUTCOME_revenue: 80
        OUTCOME_clicks: 75

8. Contextual Bandits

Exploration/exploitation via epsilon-greedy or static action strategies.

recommender_type: bandits
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 100

Note: Bandit strategies are set after pipeline creation via recommender.set_strategy().

9. Tabular with HPO

Hyperparameter optimization using grid search or randomized search.

recommender_type: ranking
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    objective: binary:logistic
    n_jobs: 1
  hpo:
    hpo_method: grid_search_cv
    param_space:
      n_estimators: [50, 100, 200]
      learning_rate: [0.01, 0.05, 0.1]
      max_depth: [3, 5, 7]
    optimizer_params:
      cv: 3
      scoring: roc_auc

10. Tabular with Sample Weighting

Feature-level and item-level weighting for imbalanced datasets.

recommender_type: ranking
scorer_type: universal
estimator_config:
  ml_task: classification
  xgboost:
    n_estimators: 100
    colsample_bynode: 0.9
  weights:
    action_weight: 0.8
    item_sample_weights:
      itemA: 1.2
      itemB: 0.5

Loading Config from Files

YAML

import yaml
from skrec.orchestrator import create_recommender_pipeline

with open("config.yaml") as f:
    config = yaml.safe_load(f)

recommender = create_recommender_pipeline(config)

JSON

import json
from skrec.orchestrator import create_recommender_pipeline

with open("config.json") as f:
    config = json.load(f)

recommender = create_recommender_pipeline(config)

Use Cases

Kubeflow Pipelines

from kfp import dsl

@dsl.component
def train_recommender(config_path: str, data_path: str):
    import yaml
    from skrec.orchestrator import create_recommender_pipeline

    with open(config_path) as f:
        config = yaml.safe_load(f)

    recommender = create_recommender_pipeline(config)
    recommender.train(...)

A/B Testing

configs = {
    "variant_a": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "ml_task": "classification",
            "xgboost": {"n_estimators": 100, "max_depth": 5},
        },
    },
    "variant_b": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "estimator_type": "embedding",
            "embedding": {"model_type": "ncf", "params": {"embedding_dim": 32}},
        },
    },
}

for name, config in configs.items():
    recommender = create_recommender_pipeline(config)
    recommender.train(...)
    metrics = recommender.evaluate(...)
    print(f"{name}: {metrics}")

Environment-Specific Configs

import os

env = os.getenv("ENVIRONMENT", "dev")

configs = {
    "dev": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "ml_task": "classification",
            "xgboost": {"n_estimators": 50},
        },
    },
    "prod": {
        "recommender_type": "ranking",
        "scorer_type": "universal",
        "estimator_config": {
            "ml_task": "classification",
            "xgboost": {"n_estimators": 500, "max_depth": 8},
        },
    },
}

recommender = create_recommender_pipeline(configs[env])

Error Handling

The factory raises clear errors for invalid configurations:

# Missing required key
>>> create_recommender_pipeline({"scorer_type": "universal"})
ValueError: 'recommender_type' must be specified in the configuration.

# Typo in recommender_type
>>> create_recommender_pipeline({..., "recommender_type": "ranknig"})
ValueError: Unknown recommender_type 'ranknig'. Valid:
    ('ranking', 'bandits', 'sequential', 'hierarchical_sequential',
     'uplift', 'gcsl')

# Incompatible estimator + scorer
>>> create_recommender_pipeline({..., "estimator_type": "embedding", "scorer_type": "independent"})
ValueError: scorer_type 'independent' does not support embedding estimators.
    Use scorer_type='universal' with embedding estimators.

Next Steps