I. Overview and Workflow¶
A. Introduction¶
This Jupyter Notebook provides an end-to-end analysis of Switzerland’s energy data, focusing on load forecasting and time series diagnostics. It is designed for clarity and reproducibility, with each section logically organized and annotated.
Navigation¶
- The notebook is divided into sections using markdown headers (e.g., data loading, preprocessing, modeling, diagnostics).
- Code blocks are accompanied by comments for clarity.
Purpose¶
The notebook aims to:
- Understand the seasonal and stochastic properties of the Swiss power grid data.
- Fit and compare ARIMA, SARIMA, and SARIMAX models.
- Evaluate model adequacy through residual diagnostics and choose the best model through MAPE.
B. Energy Data Retrieval¶
Data Sources¶
- Swissgrid operational data: Weekly records of energy consumption, production, control energy, and prices.
- ECMWF seasonal forecasts: Monthly ensemble mean temperature data aggregated over Switzerland, used as a covariate for SARIMAX modeling.
0. Installing Dependencies
#Dependencies
%pip install openpyxl matplotlib pandas darts statsmodels -q
#Dependencies for weather data
%pip install xarray cfgrib eccodes -q
Note: you may need to restart the kernel to use updated packages. Note: you may need to restart the kernel to use updated packages.
file_paths = [
'../Archive/Data/EnergieUebersichtCH-2022.xlsx',
'../Archive/Data/EnergieUebersichtCH-2023.xlsx',
'../Archive/Data/EnergieUebersichtCH-2024.xlsx',
'../Archive/Data/EnergieUebersichtCH-2025.xlsx',
]
sheet_name = 'Zeitreihen0h15'
total = 29000
1.1 Retrieving Energy Data
import pandas as pd
if 'dfs_loaded' not in globals():
dfs = []
for path in file_paths:
xls = pd.ExcelFile(path)
df_year = pd.read_excel(xls, sheet_name=sheet_name, header=1)
dfs.append(df_year)
dfs_loaded = True
1.2 Combining all years into a single DataFrame and rename columns
df = pd.concat(dfs, ignore_index=True)
column_mapping = {
'kWh': 'Total Energy Consumed by End Users (kWh)',
'kWh.1': 'Total Energy Production (kWh)',
'kWh.2': 'Total Energy Consumption (kWh)',
'kWh.3': 'Net Outflow (kWh)',
'kWh.4': 'Grid Feed-In (kWh)',
'kWh.5': 'Positive Secondary Control Energy (kWh)',
'kWh.6': 'Negative Secondary Control Energy (kWh)',
'kWh.7': 'Positive Tertiary Control Energy (kWh)',
'kWh.8': 'Negative Tertiary Control Energy (kWh)',
'kWh.11': 'Secondary Control Energy Prices (€/MWh)'
}
df.rename(columns=column_mapping, inplace=True)
D. Data Cleaning¶
Checking for missing values
import plotly.express as px
df.loc[df['Total Energy Consumption (kWh)'].isna(), 'color'] = 'Total Energy Consumption'
# Mask for rows with any NA values
mask = df.isna().any(axis=1)
df_missing = df[mask].copy()
df_missing['is_missing'] = True # optional for consistent y-axis
print(df_missing.head())
Empty DataFrame Columns: [Zeitstempel, Total Energy Consumed by End Users (kWh), Total Energy Production (kWh), Total Energy Consumption (kWh), Net Outflow (kWh), Grid Feed-In (kWh), Positive Secondary Control Energy (kWh), Negative Secondary Control Energy (kWh), Positive Tertiary Control Energy (kWh), Negative Tertiary Control Energy (kWh), kWh.9, kWh.10, Secondary Control Energy Prices (€/MWh), kWh.12, kWh.13, kWh.14, kWh.15, kWh.16, kWh.17, kWh.18, kWh.19, Euro/MWh, Euro/MWh.1, Euro/MWh.2, Euro/MWh.3, kWh.20, kWh.21, kWh.22, kWh.23, kWh.24, kWh.25, kWh.26, kWh.27, kWh.28, kWh.29, kWh.30, kWh.31, kWh.32, kWh.33, kWh.34, kWh.35, kWh.36, kWh.37, kWh.38, kWh.39, kWh.40, kWh.41, kWh.42, kWh.43, kWh.44, kWh.45, kWh.46, kWh.47, kWh.48, kWh.49, kWh.50, kWh.51, kWh.52, kWh.53, kWh.54, kWh.55, kWh.56, kWh.57, kWh.58, kWh.59, color, is_missing] Index: [] [0 rows x 67 columns]
Is each index unique?
df.index.is_unique
True
Check min, max, std for anomalies
print(df['Total Energy Consumption (kWh)'].describe()[1:9])
mean 1.803166e+06 std 2.656051e+05 min 1.038507e+06 25% 1.626075e+06 50% 1.808781e+06 75% 1.983396e+06 max 2.603810e+06 Name: Total Energy Consumption (kWh), dtype: float64
E. Data Aggregation¶
1.1 Aggregate Data by week, starting Monday, drop incomplete sums, backup df
df['Zeitstempel'] = pd.to_datetime(
df['Zeitstempel'],
format='%d.%m.%Y %H:%M',
dayfirst=True,
errors='raise'
)
df = df.set_index('Zeitstempel')
df = df.resample('W-Mon').sum()
df = df[1:-1]
df_backup = df
energy_consumption_data = df['Total Energy Consumption (kWh)'].copy()
C. Weather Data Retrieval¶
time_index = df.index
import cdsapi
import xarray as xr
import pandas as pd
c = cdsapi.Client()
lat_min = 45.8
lat_max = 48.5
lon_min = 5.9
lon_max = 10.5
# Load and combine datasets once
if 'weather_loaded' not in globals():
ds_before_2025 = xr.open_dataset(
"../Archive/Data/Weather/Seasonal forecast monthly statistics on single levels - CH- 2021-2024.grib",
engine="cfgrib"
)
ds_2025 = xr.open_dataset(
"../Archive/Data/Weather/Seasonal forecast monthly statistics on single levels - CH- 2025.grib",
engine="cfgrib"
)
ds = xr.concat([ds_before_2025, ds_2025], dim="time")
swiss = ds['t2m'].sel(
latitude=slice(48.5, 45.8),
longitude=slice(5.9, 10.5)
)
weather_loaded = True
# Step 1: Average over spatial and ensemble dimensions
monthly_avg = swiss.mean(dim=['latitude', 'longitude', 'number'])
monthly_mean = monthly_avg.mean(dim='step') # Mean over lead times
# Step 2: Convert to pandas Series and round index to start of month
monthly_series = monthly_mean.to_series()
monthly_series.index = monthly_series.index.to_period("M").to_timestamp()
# Step 3: Filter out data before 2022
monthly_series = monthly_series[monthly_series.index >= pd.Timestamp("2022-01-01")]
# Step 4: Map each weekly point to its corresponding monthly temperature
assert isinstance(time_index, pd.DatetimeIndex)
weekly_temp = pd.Series(
index=time_index,
data=[monthly_series.get(dt.to_period("M").to_timestamp(), pd.NA) for dt in time_index]
)
weekly_temp = weekly_temp.dropna()
# Optional: enforce datetime index and infer frequency
weekly_temp.index = pd.DatetimeIndex(weekly_temp.index)
try:
weekly_temp.index.freq = pd.infer_freq(weekly_temp.index)
except:
pass
# Preview
print(weekly_temp.head())
Zeitstempel 2022-01-10 272.436157 2022-01-17 272.436157 2022-01-24 272.436157 2022-01-31 272.436157 2022-02-07 275.031769 Freq: W-MON, dtype: float32