I. Overview and Workflow¶

A. Introduction¶

This Jupyter Notebook provides an end-to-end analysis of Switzerland’s energy data, focusing on load forecasting and time series diagnostics. It is designed for clarity and reproducibility, with each section logically organized and annotated.

The notebook is divided into sections using markdown headers (e.g., data loading, preprocessing, modeling, diagnostics).
Code blocks are accompanied by comments for clarity.

Purpose¶

The notebook aims to:

Understand the seasonal and stochastic properties of the Swiss power grid data.
Fit and compare ARIMA, SARIMA, and SARIMAX models.
Evaluate model adequacy through residual diagnostics and choose the best model through MAPE.

B. Energy Data Retrieval¶

Data Sources¶

Swissgrid operational data: Weekly records of energy consumption, production, control energy, and prices.
ECMWF seasonal forecasts: Monthly ensemble mean temperature data aggregated over Switzerland, used as a covariate for SARIMAX modeling.

0. Installing Dependencies

#Dependencies
%pip install openpyxl matplotlib pandas darts statsmodels -q
#Dependencies for weather data
%pip install xarray cfgrib eccodes -q

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.

file_paths = [
    '../Archive/Data/EnergieUebersichtCH-2022.xlsx',
    '../Archive/Data/EnergieUebersichtCH-2023.xlsx',
    '../Archive/Data/EnergieUebersichtCH-2024.xlsx',
    '../Archive/Data/EnergieUebersichtCH-2025.xlsx',
]
sheet_name = 'Zeitreihen0h15'
total = 29000

1.1 Retrieving Energy Data

import pandas as pd

if 'dfs_loaded' not in globals():
    dfs = []
    for path in file_paths:
        xls = pd.ExcelFile(path)
        df_year = pd.read_excel(xls, sheet_name=sheet_name, header=1)
        dfs.append(df_year)
    dfs_loaded = True

1.2 Combining all years into a single DataFrame and rename columns

df = pd.concat(dfs, ignore_index=True)
column_mapping = { 
    'kWh': 'Total Energy Consumed by End Users (kWh)',
    'kWh.1': 'Total Energy Production (kWh)',
    'kWh.2': 'Total Energy Consumption (kWh)',
    'kWh.3': 'Net Outflow (kWh)',
    'kWh.4': 'Grid Feed-In (kWh)',
    'kWh.5': 'Positive Secondary Control Energy (kWh)',
    'kWh.6': 'Negative Secondary Control Energy (kWh)',
    'kWh.7': 'Positive Tertiary Control Energy (kWh)',
    'kWh.8': 'Negative Tertiary Control Energy (kWh)',
    'kWh.11': 'Secondary Control Energy Prices (€/MWh)' 
}

df.rename(columns=column_mapping, inplace=True)

D. Data Cleaning¶

Checking for missing values

import plotly.express as px

df.loc[df['Total Energy Consumption (kWh)'].isna(), 'color'] = 'Total Energy Consumption'

# Mask for rows with any NA values
mask = df.isna().any(axis=1)
df_missing = df[mask].copy()
df_missing['is_missing'] = True  # optional for consistent y-axis

print(df_missing.head())

Empty DataFrame
Columns: [Zeitstempel, Total Energy Consumed by End Users (kWh), Total Energy Production (kWh), Total Energy Consumption (kWh), Net Outflow (kWh), Grid Feed-In (kWh), Positive Secondary Control Energy (kWh), Negative Secondary Control Energy (kWh), Positive Tertiary Control Energy (kWh), Negative Tertiary Control Energy (kWh), kWh.9, kWh.10, Secondary Control Energy Prices (€/MWh), kWh.12, kWh.13, kWh.14, kWh.15, kWh.16, kWh.17, kWh.18, kWh.19, Euro/MWh, Euro/MWh.1, Euro/MWh.2, Euro/MWh.3, kWh.20, kWh.21, kWh.22, kWh.23, kWh.24, kWh.25, kWh.26, kWh.27, kWh.28, kWh.29, kWh.30, kWh.31, kWh.32, kWh.33, kWh.34, kWh.35, kWh.36, kWh.37, kWh.38, kWh.39, kWh.40, kWh.41, kWh.42, kWh.43, kWh.44, kWh.45, kWh.46, kWh.47, kWh.48, kWh.49, kWh.50, kWh.51, kWh.52, kWh.53, kWh.54, kWh.55, kWh.56, kWh.57, kWh.58, kWh.59, color, is_missing]
Index: []

[0 rows x 67 columns]

Is each index unique?

df.index.is_unique

True

Check min, max, std for anomalies

print(df['Total Energy Consumption (kWh)'].describe()[1:9])

mean    1.803166e+06
std     2.656051e+05
min     1.038507e+06
25%     1.626075e+06
50%     1.808781e+06
75%     1.983396e+06
max     2.603810e+06
Name: Total Energy Consumption (kWh), dtype: float64

E. Data Aggregation¶

1.1 Aggregate Data by week, starting Monday, drop incomplete sums, backup df

df['Zeitstempel'] = pd.to_datetime(
    df['Zeitstempel'],
    format='%d.%m.%Y %H:%M',   
    dayfirst=True,            
    errors='raise'            
)

df = df.set_index('Zeitstempel')
df = df.resample('W-Mon').sum()
df = df[1:-1]
df_backup = df
energy_consumption_data = df['Total Energy Consumption (kWh)'].copy()

C. Weather Data Retrieval¶

time_index = df.index

import cdsapi
import xarray as xr
import pandas as pd

c = cdsapi.Client()
lat_min = 45.8
lat_max = 48.5
lon_min = 5.9
lon_max = 10.5

# Load and combine datasets once
if 'weather_loaded' not in globals():
    ds_before_2025 = xr.open_dataset(
        "../Archive/Data/Weather/Seasonal forecast monthly statistics on single levels - CH- 2021-2024.grib",
        engine="cfgrib"
    )
    ds_2025 = xr.open_dataset(
        "../Archive/Data/Weather/Seasonal forecast monthly statistics on single levels - CH- 2025.grib",
        engine="cfgrib"
    )

    ds = xr.concat([ds_before_2025, ds_2025], dim="time")
    swiss = ds['t2m'].sel(
        latitude=slice(48.5, 45.8),
        longitude=slice(5.9, 10.5)
    )
    weather_loaded = True

# Step 1: Average over spatial and ensemble dimensions
monthly_avg = swiss.mean(dim=['latitude', 'longitude', 'number'])
monthly_mean = monthly_avg.mean(dim='step')  # Mean over lead times

# Step 2: Convert to pandas Series and round index to start of month
monthly_series = monthly_mean.to_series()
monthly_series.index = monthly_series.index.to_period("M").to_timestamp()

# Step 3: Filter out data before 2022
monthly_series = monthly_series[monthly_series.index >= pd.Timestamp("2022-01-01")]

# Step 4: Map each weekly point to its corresponding monthly temperature
assert isinstance(time_index, pd.DatetimeIndex)

weekly_temp = pd.Series(
    index=time_index,
    data=[monthly_series.get(dt.to_period("M").to_timestamp(), pd.NA) for dt in time_index]
)

weekly_temp = weekly_temp.dropna()

# Optional: enforce datetime index and infer frequency
weekly_temp.index = pd.DatetimeIndex(weekly_temp.index)
try:
    weekly_temp.index.freq = pd.infer_freq(weekly_temp.index)
except:
    pass

# Preview
print(weekly_temp.head())

Zeitstempel
2022-01-10    272.436157
2022-01-17    272.436157
2022-01-24    272.436157
2022-01-31    272.436157
2022-02-07    275.031769
Freq: W-MON, dtype: float32