Inference Guide¶
This guide covers how to make recommendations after training your models.
Basic Inference¶
All recommenders follow the same recommendation interface:
Parameters:
interactions(optional): DataFrame containing interaction context features andUSER_IDusers(optional): DataFrame containing user features andUSER_ID- For most recommenders, this provides user-level features
- When using a
BaseEmbeddingEstimatorsubclass withUniversalScorerfor real-time inference, this DataFrame should containUSER_IDandUSER_EMBEDDING_NAME(pre-computed embeddings), and optionally other user features top_k: Number of items to recommend per user
Output: NumPy array of shape (n_users, top_k) with recommended item IDs.
Complete Example¶
See the Quick Start Tutorial for a complete walkthrough.
Inference Patterns¶
1. Deterministic Ranking (Default)¶
# Returns top-k items by score (deterministic)
recommendations = recommender.recommend(
interactions=interactions_df,
users=users_df,
top_k=5
)
2. Probabilistic Sampling¶
# Sample from score distribution (exploration)
recommendations = recommender.recommend(
interactions=interactions_df,
users=users_df,
top_k=5,
sampling_temperature=0.3, # Lower = more deterministic
replace=False # Sample without replacement
)
Learn more: RankingRecommender Guide
3. Item Subsetting¶
# Recommend from specific item subset
recommendations = recommender.recommend(
interactions=interactions_df,
users=users_df,
top_k=5,
item_subset=["item_A", "item_B", "item_C"]
)
4. Getting Scores¶
# Get underlying scores for all items
scores = recommender.score_items(
interactions=interactions_df,
users=users_df
)
# Returns: DataFrame with rows=users, columns=items, values=scores
Real-Time Inference Optimization¶
1. Single-User Mode (recommend_online and score_fast)¶
For real-time APIs with one user at a time, use recommend_online() — it skips the
pandas join entirely, applies schema validation, and returns ranked item names directly:
# Preferred: handles schema + feature merging + ranking in one call
recommendations = recommender.recommend_online(
interactions=single_user_interactions_df,
users=single_user_df,
top_k=5,
)
For scoring only (without ranking), call scorer.score_fast() directly with a
pre-merged single-row DataFrame (no USER_ID):
# Pre-merge interactions and user features yourself
features_df = pd.DataFrame({"feat1": [18], "feat2": [0]}) # no USER_ID
scores_df = recommender.scorer.score_fast(features_df)
# Returns: DataFrame with item names as columns
Supported scorers: UniversalScorer, MulticlassScorer, MultioutputScorer,
and IndependentScorer. Not supported for embedding-based estimators
(NCF, Two-Tower, DeepFM) — use score_items() for those.
2. Parallel Inference (Independent Scorer)¶
from skrec.scorer.independent import IndependentScorer
scorer = IndependentScorer(estimator)
scorer.set_parallel_inference(parallel_inference_status=True, num_cores=4)
recommender = RankingRecommender(scorer)
# Inference now parallelized across items
Batch vs Real-Time¶
Batch Inference¶
# Process many users at once
large_interactions_df = pd.DataFrame({"USER_ID": user_ids}) # 1000s of users
large_users_df = pd.DataFrame({"USER_ID": user_ids, ...})
# Single batch call
all_recommendations = recommender.recommend(
interactions=large_interactions_df,
users=large_users_df,
top_k=5
)
Real-Time Inference¶
# Process one user at a time (API endpoint) — no join overhead
def get_recommendations_for_user(user_id, user_features):
interactions_df = pd.DataFrame({"USER_ID": [user_id]})
users_df = pd.DataFrame({"USER_ID": [user_id], **user_features})
recommendations = recommender.recommend_online(
interactions=interactions_df,
users=users_df,
top_k=5,
)
return recommendations.tolist()
Inference by Recommender Type¶
RankingRecommender¶
# Standard deterministic or sampled recommendations
recommendations = recommender.recommend(interactions_df, users_df, top_k=5)
Learn more: RankingRecommender Guide
ContextualBanditsRecommender¶
# Recommendations with exploration
recommendations = recommender.recommend(interactions_df, users_df, top_k=5)
# Check which were exploratory
flags = recommender.get_latest_strategy_flags() # 0=exploit, 1=explore
Learn more: ContextualBanditsRecommender Guide
Embedding-Based Inference (Real-Time with Pre-Computed Embeddings)¶
When using BaseEmbeddingEstimator subclasses (e.g., NeuralFactorizationEstimator, ContextualizedTwoTowerEstimator) with UniversalScorer, you can leverage pre-computed user embeddings for efficient real-time inference:
from skrec.constants import USER_EMBEDDING_NAME
import numpy as np
# Assume embeddings are pre-computed and stored externally (e.g., in Redis)
user_embeddings = get_user_embeddings_from_store(user_ids) # Shape: (n_users, embedding_dim)
# Create users DataFrame with pre-computed embeddings
users_df = pd.DataFrame({
"USER_ID": user_ids,
USER_EMBEDDING_NAME: list(user_embeddings), # List of numpy arrays
# Optionally include other user features if model uses them
"age": user_ages,
"income": user_incomes
})
interactions_df = pd.DataFrame({
"USER_ID": user_ids,
# Include any interaction context features
"time_of_day": ["morning", "evening"]
})
# Real-time inference with pre-computed embeddings
recommendations = recommender.recommend(
interactions=interactions_df,
users=users_df, # Contains pre-computed embeddings
top_k=5
)
Workflow:
- Training: Train the embedding estimator on historical data
- Extract Embeddings: Use
estimator.get_user_embeddings()to extract learned embeddings - Store Embeddings: Save embeddings to an external store (Redis, database, etc.)
- Real-Time Inference: Fetch embeddings and pass them in the
usersDataFrame withUSER_EMBEDDING_NAMEcolumn
Benefits:
- Faster inference: No need to re-compute user embeddings at inference time
- Scalability: User embeddings can be managed separately (cached, updated periodically)
- Flexibility: Works with externally managed embedding stores
Learn more: Embedding Estimators Guide
Performance Tips¶
1. Pre-filter Items¶
# Apply business rules before scoring
valid_items = get_in_stock_items()
recommender.set_item_subset(valid_items)
recommendations = recommender.recommend(...)
2. Batch Requests¶
# Process multiple users together (more efficient)
# Good: 100 users in one call
recommendations = recommender.recommend(large_batch_df, ...)
# Bad: 100 separate calls
for user in users:
recommendations = recommender.recommend(single_user_df, ...)
3. Use Lighter Models¶
# For real-time: LogisticRegression or LightGBM
estimator = LightGBMClassifierEstimator({"n_estimators": 50}) # Faster
# For batch: XGBoost or DeepFM
estimator = XGBClassifierEstimator({"n_estimators": 200}) # Better quality
4. Cache Scores¶
# Cache scores if user context doesn't change frequently
scores_cache = {}
def get_recommendations(user_id):
if user_id not in scores_cache:
scores_cache[user_id] = recommender.score_items(...)
return scores_cache[user_id].nlargest(5)
Common Issues¶
Issue: Inference is slow¶
Solution:
- Use recommend_online() for single-user requests (no join overhead)
- Enable parallel inference for Independent scorer
- Use lighter models (LightGBM, LogisticRegression)
- Batch requests when possible
Issue: Out of memory for large batches¶
Solution: - Process in smaller chunks - Reduce model complexity - Use sparse features
Issue: Recommendations don't match expectations¶
Solution:
- Check score_items() to debug scores
- Verify input data format and features
- Check if item_subset is set correctly
Next Steps¶
- Evaluation Guide - Evaluate recommendation quality
- Production Guide - Deploy to production
- Recommender Types - Explore different recommenders