Model Inference Pipeline

You should use this tutorial if you're working with machine learning model inference results and want to understand the complete workflow from finding scenarios to accessing detailed forecast data. This tutorial covers three key endpoints that work together to provide access to model inference data.

The inference pipeline involves:

Finding scenarios - Use the scenarios API to discover available inference scenarios
Checking inference results - Use the model inference API to see what inference has been completed
Accessing detailed data - Get the actual forecast data with all percentiles and timestamps

note

These endpoints require a valid API token. No special admin permissions are needed.

Prerequisites

Access to the Enertel platform
API token (create one from the user menu in the top right)
Knowledge of target IDs you want to work with

Setup

import pandas as pd
import requests
from datetime import datetime, timedelta

# Your API token
token = '<your-api-token>'
base_url = "https://app.enertel.ai/api"

# Common headers for all requests
headers = {"Authorization": f"Bearer {token}"}

Step 1: Find Available Scenarios

First, let's find inference scenarios for your targets. Scenarios represent specific time periods and conditions under which models can run inference.

# Define your search parameters
target_id = 123
start_date = "2024-10-01T00:00:00Z"
end_date = "2024-10-05T00:00:00Z"

# Get scenarios
scenarios_url = f"{base_url}/scenarios"
scenarios_params = {
    "target_id": target_id,
    "start": start_date,
    "end": end_date,
    "limit": 50
}

scenarios_response = requests.get(
    scenarios_url,
    headers=headers,
    params=scenarios_params
)

scenarios_data = scenarios_response.json()
print(f"Found {len(scenarios_data)} scenarios")

# Convert to DataFrame for easier analysis
scenarios_df = pd.DataFrame(scenarios_data)
print(scenarios_df[['id', 'target_id', 'range_start', 'range_end', 'series_name']].head())

Understanding Scenario Data

Each scenario includes:

id: Unique identifier for the scenario
target_id: The target this scenario belongs to
range_start/range_end: Time period covered by the scenario
scheduled_at: When the scenario was scheduled to run
series_name: Name of the data series (e.g., "DALMP")
key: Object storage key for the scenario data

Step 2: Check Existing Inference Results

Now let's see what inference results already exist for these scenarios and specific models.

# Extract scenario IDs from our results
scenario_ids = [str(s['id']) for s in scenarios_data[:10]]  # First 10 scenarios
model_ids = ["100", "101", "102"]  # Replace with your model IDs

# Get inference results
inference_url = f"{base_url}/model/inference"
inference_params = {
    "scenarios": ",".join(scenario_ids),
    "models": ",".join(model_ids),
    "max_results": 1000
}

inference_response = requests.get(
    inference_url,
    headers=headers,
    params=inference_params
)

inference_results = inference_response.json()
print(f"Found {len(inference_results)} inference results")

# Convert to DataFrame
inference_df = pd.DataFrame(inference_results)
if not inference_df.empty:
    print(inference_df[['id', 'model_id', 'scenario_id', 'model_integration', 'scenario_range_start']].head())

Understanding Inference Results

Each inference result includes:

id: Unique identifier for the inference result
model_id: ID of the model that ran the inference
scenario_id: ID of the scenario used for inference
file_key: Object storage key for the detailed results
model_integration: Type of model integration (e.g., "mlflow")
scenario_range_start/scenario_range_end: Time period of the scenario

Step 3: Access Detailed Inference Data

Finally, let's get the actual forecast data from a specific inference result.

# Pick an inference result to examine in detail
if not inference_df.empty:
    inference_id = inference_df.iloc[0]['id']
    
    # Get detailed forecast data
    detail_url = f"{base_url}/model/inference/{inference_id}"
    detail_response = requests.get(detail_url, headers=headers)
    
    forecast_data = detail_response.json()
    print(f"Retrieved {len(forecast_data)} forecast data points")
    
    # Convert to DataFrame
    forecast_df = pd.DataFrame(forecast_data)
    print(forecast_df.head())
    
    # Show the structure of probabilistic forecasts
    percentile_cols = [col for col in forecast_df.columns if col.startswith('p')]
    print(f"Available percentiles: {percentile_cols}")

Understanding Forecast Data

Each forecast data point includes:

feature_id: ID of the feature being forecasted
timestamp: Forecast timestamp in the target's timezone
value: Expected value (average)
p01 through p99: All percentile forecasts (1st through 99th percentile)

Step 4: Complete Workflow Example

Here's a complete example that combines all three steps into a comprehensive data pipeline:

def get_inference_pipeline_data(target_id, model_ids, start_date, end_date, token):
    """
    Complete pipeline to get inference data from scenarios to detailed forecasts.
    """
    base_url = "https://app.enertel.ai/api"
    headers = {"Authorization": f"Bearer {token}"}
    
    # Step 1: Get scenarios
    print("Step 1: Fetching scenarios...")
    scenarios_response = requests.get(
        f"{base_url}/scenarios",
        headers=headers,
        params={
            "target_id": target_id,
            "start": start_date,
            "end": end_date,
            "limit": 100
        }
    )
    scenarios = scenarios_response.json()
    print(f"Found {len(scenarios)} scenarios")
    
    if not scenarios:
        print("No scenarios found for the given criteria")
        return None
    
    # Step 2: Get inference results
    print("Step 2: Fetching inference results...")
    scenario_ids = [str(s['id']) for s in scenarios]
    inference_response = requests.get(
        f"{base_url}/model/inference",
        headers=headers,
        params={
            "scenarios": ",".join(scenario_ids),
            "models": ",".join([str(m) for m in model_ids]),
            "max_results": 1000
        }
    )
    inference_results = inference_response.json()
    print(f"Found {len(inference_results)} inference results")
    
    if not inference_results:
        print("No inference results found")
        return pd.DataFrame(scenarios), pd.DataFrame()
    
    # Step 3: Get detailed data for a sample of results
    print("Step 3: Fetching detailed forecast data...")
    all_forecasts = []
    
    # Get detailed data for first 5 inference results (or fewer if less available)
    sample_results = inference_results[:5]
    
    for i, result in enumerate(sample_results):
        print(f"Fetching details for inference {i+1}/{len(sample_results)}")
        detail_response = requests.get(
            f"{base_url}/model/inference/{result['id']}",
            headers=headers
        )
        
        if detail_response.status_code == 200:
            forecast_data = detail_response.json()
            
            # Add metadata to each forecast point
            for point in forecast_data:
                point['inference_id'] = result['id']
                point['model_id'] = result['model_id']
                point['scenario_id'] = result['scenario_id']
                point['model_integration'] = result['model_integration']
            
            all_forecasts.extend(forecast_data)
    
    # Convert to DataFrames
    scenarios_df = pd.DataFrame(scenarios)
    inference_df = pd.DataFrame(inference_results)
    forecasts_df = pd.DataFrame(all_forecasts)
    
    print(f"Pipeline complete:")
    print(f"- {len(scenarios_df)} scenarios")
    print(f"- {len(inference_df)} inference results")
    print(f"- {len(forecasts_df)} detailed forecast points")
    
    return scenarios_df, inference_df, forecasts_df

# Example usage
target_id = 123  # Your target ID
model_ids = [100, 101]   # Your model IDs
start_date = "2024-10-01T00:00:00Z"
end_date = "2024-10-05T00:00:00Z"
token = "<your-api-token>"

scenarios_df, inference_df, forecasts_df = get_inference_pipeline_data(
    target_id, model_ids, start_date, end_date, token
)


## Data Analysis Examples

Once you have the data, here are some common analysis patterns:

### Analyzing Forecast Distributions

```python
# Look at the forecast distribution for a specific timestamp
if not forecasts_df.empty:
    sample_forecast = forecasts_df.iloc[0]
    
    # Extract percentiles
    percentiles = {}
    for col in forecasts_df.columns:
        if col.startswith('p') and col[1:].isdigit():
            percentile = int(col[1:])
            percentiles[percentile] = sample_forecast[col]
    
    # Plot forecast distribution
    import matplotlib.pyplot as plt
    
    plt.figure(figsize=(10, 6))
    plt.plot(list(percentiles.keys()), list(percentiles.values()), 'b-', linewidth=2)
    plt.axhline(y=sample_forecast['value'], color='r', linestyle='--', label='Point forecast')
    plt.xlabel('Percentile')
    plt.ylabel('Forecast Value')
    plt.title(f'Forecast Distribution for {sample_forecast["timestamp"]}')
    plt.legend()
    plt.grid(True, alpha=0.3)
    plt.show()

Time Series Analysis

# Analyze forecasts over time for a specific feature
if not forecasts_df.empty:
    # Group by feature and sort by timestamp
    feature_forecasts = forecasts_df[forecasts_df['feature_id'] == forecasts_df['feature_id'].iloc[0]]
    feature_forecasts = feature_forecasts.sort_values('timestamp')
    
    # Convert timestamp to datetime
    feature_forecasts['timestamp'] = pd.to_datetime(feature_forecasts['timestamp'])
    
    # Plot time series with uncertainty bands
    plt.figure(figsize=(12, 6))
    plt.fill_between(feature_forecasts['timestamp'], 
                     feature_forecasts['p10'], 
                     feature_forecasts['p90'], 
                     alpha=0.3, label='80% confidence interval')
    plt.plot(feature_forecasts['timestamp'], feature_forecasts['p50'], 'b-', label='Median forecast')
    plt.plot(feature_forecasts['timestamp'], feature_forecasts['value'], 'r--', label='Point forecast')
    plt.xlabel('Time')
    plt.ylabel('Forecast Value')
    plt.title('Forecast Time Series with Uncertainty')
    plt.legend()
    plt.xticks(rotation=45)
    plt.tight_layout()
    plt.show()

Best Practices

Batch Processing: When processing many inference results, consider batching your requests to avoid overwhelming the API.

Error Handling: Always check response status codes and handle errors gracefully:

if response.status_code != 200:
    print(f"Error: {response.status_code} - {response.text}")
    return None

Data Validation: Verify that you have the expected data structure before processing:

required_cols = ['timestamp', 'value', 'p50', 'feature_id']
if not all(col in forecasts_df.columns for col in required_cols):
    print("Warning: Missing expected columns in forecast data")

Timezone Awareness: The timestamps in the detailed forecast data are already converted to the target's timezone, but be aware of this when combining with other data sources.
Memory Management: For large datasets, consider processing data in chunks rather than loading everything into memory at once.

Troubleshooting

No scenarios found: Check that your target ID is correct and that scenarios exist for your date range.

No inference results: Verify that models have actually run inference for your scenarios.

403 Forbidden: Ensure you have a valid API token and that it hasn't expired.

Empty forecast data: Some inference results may not have processed successfully. Check the file_key field to ensure the data exists in object storage.

This tutorial provides a complete workflow for accessing and analyzing model inference data through the Enertel API. The three endpoints work together to give you full visibility into the inference pipeline from scenarios to detailed probabilistic forecasts.

Model Inference Pipeline

Prerequisites​

Setup​

Step 1: Find Available Scenarios​

Understanding Scenario Data​

Step 2: Check Existing Inference Results​

Understanding Inference Results​

Step 3: Access Detailed Inference Data​

Understanding Forecast Data​

Step 4: Complete Workflow Example​

Time Series Analysis​

Best Practices​

Troubleshooting​