Learning Spatiotemporal Dynamics in Wholesale Energy Markets with Dynamic Mode Decomposition

,


Introduction
Wholesale energy markets are ubiquitous across the globe and provide a competitive mechanism to balances electrical energy supply and demand and ensure reliable operation of large-scale power systems. [41] Energy prices encapsulate complex trade-offs between costs and flexibility of diverse generation technologies (e.g., coal, nuclear, natural gas, solar, wind, hydro), demand elasticity (e.g., demand response), and operational limits of high-voltage transmission systems. These constraints interact with uncertainties from demand, renewable availability, unplanned equipment outages to create strong spatial-temporal variations in energy prices. [12,40] Figure 1 illustrates stochastic and periodic variations in wholesale prices for six nodes in the day-ahead market run by the California Independent System Operator (CAISO). In this paper, we analyze hourly time-series data obtained from http://oasis.caiso.com/ for all 6587 nodes (locations) during the calendar year 2015. Descriptive statistics for these data are reported in [12,13].
The economics of modern electricity generation and consumption is driven by these price variations. For example, the economic case for energy storage systems is based, in part, on energy arbitrage. Through "buy low, sell high" operating policies, storage systems can monetize time fluctuations in wholesale energy prices. In their seminal work, Walawalkar et al. [48] use mathematical optimization to estimate the maximum possible revenue from historical market data and compare energy storage technology options. This now-standard approach has been applied by many technologies including batteries [12,43,16,30,44], and concentrated solar thermal with energy storage [13,25,22,27,14]. Likewise, time-varying energy prices incentivize large energy consumers to either directly participate in wholesale markets or demand response programs. [1,34]. Early work focused on optimizing the schedule of large energy consumers such as flour mills, air separation, and combined heat and power plants subject to time-varying electricity prices. [mitra2013optimal , 4, 21, 5, 29] Recent work has focused on providing ancillary services with aluminum smelters, air separation plants, heat ventilation and air conditioning systems, and chemical plants. [dowling2018economic, 50, 49, 32] Yet many of these approaches assume market forecasts are available. Both time-series and machine learning techniques have been used to extract trends from massive market datasets and make accurate forecasts. For example, Conejo et al. demonstrate autoregressive integrated moving average (ARIMA) models can predict 2002 Spanish DAM market prices with 10% mean squared error (MSE). [9] Likewise, Garcia et al. used generalized autoregressive conditional heteroskedasticity (GARCH) methods to predict Spanish and California DAM market prices from 2000 with 10% MSE. [18] These classical time-series models require tedious hypertuning (e.g., feature selection, residual analysis), especially to capture periodic trends. As such, recent work focuses on applying machine learning algorithms for DAM forecasting [2,17,45,6,33,35,39,3,35,26] [46]. Wavelet transformations add robustness to noise and provide a systematic framework to capture diverse timescale [46,3,8]. All of these frameworks build individual forecasting models for each node in the electricity market, which can cause high computational times, necessitate model tuning at each node, and offers no fundamental insights about the aggregated dynamics of energy markets.
In this paper, we propose the first application of Dynamic Mode Decomposition (DMD) to forecasting day-ahead energy market prices. DMD is a so-called equation-free dimensionality reduction technique to extract complex, nonlinear, periodic behavior from experimental and computational simulations. [37,38,31,47,24,38,36] DMD is best-known for its ability to discover low-dimension spatial and temporal insights from computational fluid dynamics, although emerging applications include video processing and financial (stock) markets [24]. For example, Mann and Kutz [28] proposed a stock trading algorithm build on spatiotemporal information encoded in DMD modes and demonstrate 5 times higher growth than the S&P 500 index via retrospective analysis. Likewise, Cui and Long [10] showed how DMD-based trading beat a benchmark moving-average strategy in a bull market for Chinese A-stocks. Hua et al. [20] identified four new reproducible modes (cycles) by analyzing stock prices between 1992 and 2014 for major companies. All of these works underscore that DMD can successfully capture the underling nonlinear dynamics in stock markets from data alone. In this paper, we show that Augmented DMD is an accurate short-term forecaster energy prices in wholesale electricity markets. ADMD requires only a modest amount of training data, much less than machine learning methods, and is orders of magnitude faster than ARIMA methods.
The paper is organized as follows: Section 2 provides a mathematical primer on DMD and Augmented DMD (ADMD). Section 3 explores singular value decomposition (SVD) rank reduction, interprets dominate DMD modes, shows how DMD suffers from the so-called standing wave problem. Section 4 demonstrates the superior performance of ADMD. Section 5 demonstrates how ADMD forecasting can capture up to 92% available revenue in a simple energy arbitrage optimize case study. Finally, Section 7 summarizes conclusions and future work.

Mathematical Primer on DMD
In this section, we review the mathematical foundations of DMD in the context of energy market forecasts. See Kutz et al. [24] and Schmid et al. [38] for additional details.

DMD Algorithm
DMD starts with an input matrix X t ∈ R m×n where the rows (m) of X t are the spatial data features and the columns (n) are discrete time measurements. For day-ahead energy market analysis, each row is a node (location in the transmission network), each column is an hour, and each element is a price. The price matrix X t is then partitioned into two matrices X ∈ R m×(n−1) and X ∈ R m×(n−1) , as shown in Eqs. (2) and (3). Here each x i ∈ R m×1 is a single time step of spatial data in price matrix X t . (1) The overall goal of DMD is to compute a best fit linear operator, A ∈ R m×m in Eq. (4), which advances the energy price dynamics by one timestep (one hour).
A is a finite dimensional approximation to the infinite dimensional Koopman operator, K , which exactly maps any measurement function f from one state vector, x i , to another state vector, x i+1 : For practical analysis, we desire a low rank approximation to the linear operator A. This can be accomplished via singular value decomposition (SVD) of the data matrix X ∈ R m×(n−1) : Here U ∈ R m×m and V ∈ R (n−1)×(n−1) are unitary matrices and * is the conjugate transpose. Matrix Σ ∈ R m×(n−1) has singular values σ on the diagonal ordered σ 1 ≥ σ 2 ≥ ... ≥ σ n−1 . When m > n − 1, which is often the case in DMD analysis, the bottom of Σ are rows of zeros: Retaining only the r largest singular values and vectors gives a rank r approximation to X. The choice of r is an important tuning parameter in DMD workflows (see Section 3). The maximum number of singular values is r max = min(m, n − 1).
Next, we calculate A using Eq. (4) and the pseudoinverse X −1 . Substituting Eqs. (6) and (8) gives: Often it is more convenient to use the low-rank projectionÃ ∈ R r×r : which advances the low-dimensional projected states,x ∈ R r×1 , from timestep k to k + 1: High-dimensional states are reconstructed via projection with U r ∈ R m×r : BecauseÃ is a linear operator, the price trajectories can be calculated analytically using an eigendecomposition:Ã Here the columns of W ∈ C r×r are eigenvectors and the diagonal of Λ are the corresponding eigenvalues. The spatial modes, Φ = [φ 1 · · · φ r ] ∈ C m×r , are then calculated via Eq. (14), and represent how prices depend on geographic location in the electric transmission network. Next, the discrete time eigenvalues, Λ, are converted to continuous time eigenvalues, Ω ∈ C r×r , via Eq. (15). Here i and j are row and column indices, respectively, such that i = j are the diagonals of Λ and Ω. Thus Ω encodes price time dynamics that are independent of spatial behavior.
The low-dimensional time dynamics, D(t) ∈ C r×1 , are defined in Eq. (16). The vector b ∈ C r×1 is the initial condition of the low-dimensional system, which is obtained by solving Eq. (17), often via least-squares.
This concludes the decomposition of the original price data. A key advantage of DMD is that forecasting for any time and every node (location) can be quickly computed via Eq. (18).
In DMD literature, reconstruction of the original data refers to evaluating Eq. (18) for all of the times in the original original X t input matrix. Recall that DMD assumes a low-dimensional linear system governs the data. When this assumption is violated, the reconstructed prices can differ dramatically from the actual (training) prices. We further discuss the relationship between reconstruction error and forecasting error in Section 3.

Augmented DMD Algorithm
A well-documented challenge for standard DMD is modeling simple dynamics of standing waves, e.g., x(t) = sin(t) [24]. The fundamental problem for this example is that the input matrix X is rank one, and as such DMD identifies a single mode with a positive real eigenvalue. But a sine wave is the solution to the differential equationẍ + x = 0; a conjugate pair of complex eigenvalues are needed to correctly recapitulate a sine wave. For typical days, the energy price closely resembles a simple oscillating wave ( Fig. 1), which offers one explanation for the mediocre reconstruction and forecasting errors with standard DMD.
Augmented DMD (ADMD) overcomes the so-called standing wave problem by stacking time-shifted copies of input data to form the augmented input matrix which has higher rank than the original input matrix. This is analogous to applying a finite difference rules to estimate unobserved derivates needed to model higher order systems (e.g., sine wave) and differencing data to achieve stationary for ARIMA time-series modeling [7]. To implement ADMD, we stack l < n time-shifted copies of the data to form the augmented input matrix. The augmented matrix is then partitioned into two matrices, similar to Eqs. (2) and (3): The DMD algorithm prescribed in Eqs.
(2) -(18) is applied to augmented matrices X aug ∈ R (m·l)×(n−l) and X aug ∈ R (m·l)×(n−l) in place of X and X , giving eigenvalues Λ aug and modes Φ aug . The first m rows of Φ aug correspond to the current (not shifted) time and are used to forecast x(t). For ADMD, the maximum number of singular values is r max = min(m × l, n − l)

DMD Results and Discussion
We now apply DMD to CAISO energy price data. Our analysis focuses on truncation level, window size, consistency, forecasting, interpretation of DMD modes. Throughout this paper, we measure percent error using the 2 norm: where X DMD are either reconstructions or forecasts calculated with Eq. (18).

Optimal Singular Value Trunctionation to Minimize Reconstruction Error
An essential tuning parameter in DMD analysis is the level of singular value truncation, especially for noisy data [24]. We begin by systematically exploring the impact r, the number of singular values retained in Eqs. (8) -(10) on reconstruction and forecasting error for DAM energy price data. It is well-known that retaining too many singular values, especially for noisy datasets, can corrupt DMD results. [24] Fig. 2 shows this trade-off holds for DAM price data for all nodes for three days (X t ∈ R m=6587×n=96 ). Too much truncation leads to decaying reconstruction with damped dynamics, whereas not enough truncation leads to an exponentially growing reconstruction (diverges to ∞) with exaggerated dynamics. The optimal truncation level was determined via sensitivity analysis and corresponds with the minimum reconstruction error of 12%, computed via Eq. (20). We reiterate these findings are consistent with DMD literature for noisy datasets. We now explore how the optimal singular value truncation level, r, changes for different datasets. Fig. 3 shows the sensitivity analysis of reconstruction error versus r for three four-day windows considering all nodes (X t ∈ R m=6587×n=96 ). For all three datasets, we clearly see the trade-off between not enough and too much truncation. Moreover, the optimal truncation level (lowest reconstruction error averaged over all nodes) varies between r = 81 and r = 91, which is 81/96 = 84% and 91/96 = 95% of the full rank number of singular values. This means that standard DMD fails to extract coherent low-rank structures from the DAM energy price data. Moreover, each individual training window requires a different r to recover the optimal reconstruction. We will revisit these two observations in Section 4 when we discuss ADMD.
Next, we quantify how DMD performance varies for individual nodes. Fig. 4 shows the distribution of reconstruction errors across all nodes for three 4-day training windows. These errors are approximately normally distributed for the January 1 st -4 th dataset, but skewed-left for the January 11 th -14 th dataset and bimodal for January 6 th -9 th dataset. All three datasets show statistical outliers, which are likely due to random phenomena in the market clearing (price setting) process. Next, we compute average reconstruction errors for individual nodes using the optimal r (determined via sensitivity analysis) for all eight-day training windows (X t ∈ R m=6587×n=192 ) spanning January 1 st through 30 th . Fig. 5 visualizes means reconstruction errors in a histogram (all nodes) and overlaid on a map (≈ 2000 nodes with known latitude and longitude coordinates). We find 75% of nodes have an average reconstruction error between 12% and 13% for January 2015. Reconstruction errors follow a right-skewed distribution, with 1.9% of nodes with 20% to 30% error. From these results, we six geographically separated nodes in California such that nodes 1 and 2 have great (less than 11%), nodes 3 and 4 have average (≈ 12%), and nodes 5 and 6 have poor (greater than 16%) reconstruction errors. Table 1 highlights official CAISO node names, location, and mean reconstruction error. For these six nodes, we reproduce the sensitivity analysis from Fig. 3. The reconstruction error versus r plots are only 1% different from those shown in Fig. 3. This suggests that the optimal r for the entire dataset (all nodes) is near optimal for individual nodes. Moreover, there are no obvious geographic trends in the distribution of reconstruction errors in Fig. 5 (middle).    Fig. 5 (right), ordered from lowest to highest mean reconstruction error. The node names are the unique identifiers used by CAISO.

Optimal Training Window Size to Minimize Reconstruction Error
Besides r, selecting the window size, i.e., the dimensions of the input data X t ∈ R m×n , is another important choice in DMD analysis including financial market forecasting [28]. We repeat sensitivity analysis of reconstruction error versus r for January 1 st to 4 th (n = 24 × 4 = 96) with m = 20, 100, 1000, 2000, 3000, and 6587 randomly selected nodes. Figure 6 shows that, in general, as m increases, the minimum reconstruction decreases. This suggests that standard DMD is extracting useful multimode dynamics and correlations such that more data (larger m) improves recapitulation of trends (lower error). With only 20 nodes, reconstruction error is between 25% and 40%, which suggests there is insufficient data to extract meaningful trends. We will revisit this finding in Section 4.
Next, we explore how the time horizon n impacts reconstruction error. Training windows for 2 through 15 days (n ∈ {48, 72, 96, · · · , 360}) were evaluated for a rolling horizon for a whole year (2015) using all 6587 nodes in Figure 7. Each box and whisker plot in Figure 7 reports the reconstruction error with the optimal r (lowest average reconstruction error for all nodes). For the year 2015 data set, we see the median reconstruction error gradually increases with n. The lowest median error is 16% for n = 48 (2 days), which is the shortest time horizon considered. These results are consistent with intuition. DAM prices are strongly periodic with well-documented seasonal, day of the week, and daily trends. [15] A two-day window captures two full cycles. But too long of a training window attempts to learn one monolithic linear operator A in Eq. (4) to explain dynamics across too many days. We conclude by highlight our findings to maximize m (consider all nodes) and minimize n (short time horizon) are consistent with literature that suggests DMD inputs should be skinny, i.e., with m >> n [24].

From Optimal Reconstruction to Optimal Forecasts
We now explore how insights for specifying r, m, and n to optimize reconstruction translate to forecasting. Figure 8 shows sensitivity analysis of forecasting and reconstruction errors versus truncation level r. For all three data sets considered, we observe the optimal r for reconstruction and forecasting are similar.
To generalize this trend, we conduct sensitivity analysis over the entire year with 4-day training windows in a rolling horizon framework. On average, there was only a 4% difference between r values that minimized reconstruction error and day 1 forecast errors. Differences were 0.4% and 0.8% for reconstruction error versus day 2 and day 3 forecast errors, respectively. From these results, we propose a simple heuristic: truncate DMD the with best r for reconstruction to produce for the (near) optimal forecast. with data from January 1 st through 4 th . January 5 th , 6 th , and 7 th correspond to the day 1, 2, and 3 forecasts, respectively.
Next, we develop guidelines for the minimum training horizon length n to create stable multi-day DMD price forecasts. Computing optimal bids for participating in day-ahead energy markets, including CAISO, often require 48-hour or longer forecasts [12]. We perform a sensitivity analysis of all possible 2 to 15day training windows in the year 2015 and record forecast errors for 1 to 4 days after the end of each training window. Fig. 9 shows forecasting error averaged over all of these windows. When the training window is shorter than the forecasting horizon, the forecasting error is large. For example, forecasting 4 days into the future using only a 1-day training horizon gives 50% average errors. In contrast, forecasting 4 days into the future with a 5-day or longer training horizon gives average errors less than 20%. This trend is consistent for all forecasting horizons considered and leads to another heuristic: choose the training horizon length n to be longer than the desired forecasting horizon. Beyond this threshold, the average forecasting error only gradually changes as n increases. For shorter forecasts, e.g., 1 day, longer training windows slightly increase forecasting error. But for longer forecasts, e.g., 4 days, longer training windows slightly decrease forecasting errors. We emphasize that while reconstruction error is minimized with shorter training horizons (see Fig. 7), forecasting requires longer horizons. We speculate that unplanned outages, transmission constraints, uncertain electricity demands, and other factors add noise to the energy prices. This noise causes reconstruction error to increase with training window length. But, for accurate forecasts, more data is required to separate meaningful dynamics from noise, and hence longer training horizons are required.

DMD Mode Analysis
We now interpret the DMD modes, similar to previous financial market analyses [24,28,20]. Specifically, temporal modes (eigenvalues) capture price periodicity and spatial modes weight different locations (nodes) in the electricity market. Figure 10 shows spatial-temporal DMD modes for a 4-day training window n = 96 without truncation. In the top three plots, we see eigenvalues ordered by the prominence of the singular values. The first six singular values explain 99% of the variability in the dataset. Interestingly, as the most important singular values (lowest number) correspond to the lowest frequency oscillations. Beyond singular value 60, this linear trend between singular value number and oscillation frequency ceases. We hypothesize the least prominent singular values are localized fluctuations in energy prices caused by external factors. We find the real eigenvalue components are both positive and negative. We do not see a clear trend with real eigenvalue components and the number of retained singular values r. The middle three plots show how the three most prominent DMD modes φ 1 , φ 2 , and φ 3 are spatially distributed. We see all three modes place the least weight on prices from the San Fransisco area. The bottom three plots explore the maximum, minimum, and median contributions to φ 1 . We see the minimum contribution has price spikes on the last day. This makes sense, as we expect the most prominent singular value to capture the average trends of the entire CA system and deemphasize localized events. The imaginary and real components of the eigenvalues associated with each singular value ordered from largest to smallest. (d), (e), (f ) Three most prominent spatial modes, φ 1 , φ 2 , and φ 3 , respectively. Larger absolute values (red or blue) indicate a larger weight for the specific location in the DMD mode. (i) Time-series price data from January 4 th to 7 th for corresponding to the nodes with the maximum (most positive), minimum (most negative), and median, respectively, weights in φ 1 .

ADMD Results and Discussion
As discussed in Section 3, standard DMD, even with optimal truncation, is only able to achieve a 16% reconstruction error for the CAISO dataset from 2015. We now show that ADMD overcomes these limitations and consistently achieve less than 5% reconstruction error.

ADMD is Less Sensitivity to Truncation
First, we explore the sensitivity of ADMD to truncation level r for three n = 96 training windows containing 100 randomly selected nodes and augmentation level l = 48. Figure 11 clearly shows ADMD is more robust with 1.5 to 3 times smaller reconstruction error than DMD. ADMD reconstruction error reaches a plateau around r = 15%, suggesting ADMD successfully identifies a low-rank structure. To explain this, we verified the condition number of X is 10,000 times larger than X aug . These results support our hypothesis that standard DMD struggles to extract low-rank trends because of the standing wave problem. Figure 11: Sensitivity analysis of reconstruction error versus truncation level r shows that ADMD consistently outperforms DMD. For this comparison, m = 100, l = 48, and n = 96 such that X ∈ R 96×100 and X aug ∈ R 4800×48 . DMD performs much better in Figure 6 with m = 6587 whereas this Figure considers m = 100 to benchmark against ADMD.

ADMD Performs Best Augmenting Over At Least 1 Day
Next, we explore the impact of augmentation level l in Eq. 19 on ADMD performance. In Fig. 12, we see reconstruction error is not robust with l = 12 and l = 18. In contrast, reconstruction error reaches a plateau around truncation level r/r max = 15% with l = 24, l = 36, and l = 48. We hypothesize this is because the dominant price dynamics in energy markets have 24-hour periodicity. Based on these results, we recommend l = 36, l = 48, or a similar value. Figure 12: Sensitivity analysis of ADMD reconstruction error with respect to the truncation level r/r max where r max = min(m · l, n − l). Results show l ≥ 24 is needed for a robust low rank approximation with a small reconstruction error.

ADMD Does Not Benefit from Training Many Nodes Simultaneously
Next, we determine if ADMD benefits from analyzing multiple nodes simultaneously. Recall, DMD performs best when the data matrix X is "tall and skinny", i.e., there are many more nodes than timesteps. As consequence, we show in Section 3.2 DMD works best with m = 6587 (all nodes considered). But as augmentation level l increases, the number of rows in X aug increase and the number of columns decrease. This means ADMD can be applied to a single node. Figure 13 compares reconstruction error for a single node and 100 randomly selected nodes over with r = 8 and l = 48 for 96 hours of training data. Surprisingly, we find reconstruction error only improves 0.65% when forecasting many nodes simultaneously. This suggests there is less valuable information available in the spatial dynamics than previously postulated. Figure 13: Comparison of ADMD with m = 1 (red) and m = 100. Each panel is a different 48-hour training window. There is little benefit to reconstructing and forecasting multiple nodes simultaneously.

ADMD Outperforms DMD for Forecasting
Next, we compare forecasting error from DMD and ADMD for a single node. Fig. 14 shows forecasts for January 16 th and 17 th , 2015 (48 hours) using the previous 96 hours for training. DMD forecasts were constructed with all 6587 nodes, whereas ADMD was only trained using the node of interest (same as Fig. 14) with r = 8 and l = 48. The results show the ADMD forecast closely tracks the realized price with around 10% forecasting error. In contrast, the DMD forecast fails to predict major temporal features and is out of phase. Figure 14: Comparison of DMD and ADMD forecasts using n = 96 training data (January 12 th through 15 th , 2015) and two-days of testing data (January 16 th and 17 th , 2015). The ADMD forecast is superior and most importantly captures the phase of the testing data.

Optimal Market Arbitrage for Energy Storage Systems
Accurate price forecasts are important for many aspects of grid reliability, including optimal control of individual energy systems such as conventional generators, grid-connected storage, microgrids, and virtual power plants. In this section, we benchmark the performance of deterministic model predictive control of a price-taker battery energy storage system self-scheduling into a wholesale energy market using different price forecasting techniques including DMD and ADMD.

Optimization Formulation
We consider a storage system with time-varying energy hold-up E i at time i. The system can purchase (charge) or sell (discharge) energy to and from the market at rates c i and d i , respectively, during time interval [i, i + 1). The energy storage operator seeks to maximize net revenue as follows: where x i is the market price forecast during time interval [i, i + 1). The hold-up is constrained by an energy balance and subject to round trip efficiency η. Variable bounds, parameter values, and units are specified in Table 2. Figure ?? illustrates the interaction between the energy storage system and the energy market. Fig. 15 compares the solution to Eq. (21) for the market node near Chino, CA. Price 48-hour forecasts were generated with ADMD using the previous 96 hours of prices for the node of interest as training data with r = 8 and l = 48. As expected, the optimal solution is to sell energy when the forecasted price is lowest and buy energy when the forecasted price is the highest. Although price forecasts are consistently biased (e.g., higher than the actual price) the periodicity (e.g., when prices increase and decrease) is correct, leading to an effective control strategy.   (Middle) As expected, the optimal arbitrage policy is to charge the battery (buy from the market) during low price periods and discharge (sell) at higher prices. (Bottom) The state-of-charge (SOC) in the battery is the integral of the charge and discharge signals. The black marks the SOC at the end of the first day (E 24 ). In the rolling horizon framework, this value is the fixed initial condition E 0 for the subsequent day.

Backcasting Benchmark
We use simple backcasting as a benchmark forecasting strategy [43]. In backcasting, prices for the n most recent observed days as used as forecasts for the next n days. We compare 48-hour rolling horizon forecasting errors for backcasting, DMD (48 hours of training data, optimal truncation), and ADMD (96 hours of data using, r = 8, l = 48) averaged over all 6587 nodes in CAISO for January 5 th , 2015 to December 7 th , 2015. Surprisingly, we found median forecasting errors of 17%, 26%, and 31% for backcasting, ADMD, and DMD respectively. Below we show the lowest median forecasting error does not always correspond with the maximum realized revenue. Sioshansi et al observe backcasting is extremely effective because it recapitulates the periodicity of day-ahead energy market prices. [42]

Rolling Horizon Computational Results
We now compare the performance of DMD, ADMD, and backcasting for energy arbitrage by repeatedly solving Eq. (21) in a rolling horizon framework for January th , 2015 to December 9 th , 2015. Each instance of Eq. (21) is first solved with a 48 hour prediction horizon. The computed policy is then simulated over the first 24 hours and profits are recorded with the real (historical) price. The simulation window is then advanced 24 hours. The value for E 24 for the previous day is copied to E 0 for the next day.
We start by considering all 6587 nodes in the CAISO market. For DMD, we use a 48-hour training window (X t ∈ R 48×6587 ) with optimal truncation. For ADMD, we generate forecasts individually for each node using r = 8 and l = 48 with 96 hours of training data (thus 6587 unique instances of X aug ∈ R 48×48 ). Backcasting was performed using the previous 48 hours of data. Figure 16 compares the market revenues for optimal arbitrage with perfect information, backcast, DMD, and ADMD price forecasts. The perfect information scenario, which uses historical prices in Eq. (21), represents the maximum available revenue. We see hot spots in the perfect information map, which show the greatest potential for arbitrage is in central California as well as the coasts. We find that DMD performs poorly, yielding only 24% of the perfect information revenue. This is expected as DMD suffers from the so-called standing wave problem. In contrast, ADMD and backcasting capture 81% and 84% of the perfect information revenue, respectively. Backcasting and ADMD capture the hot spot in central California but miss the opportunities along the coast. These results are summarized in Table 3.
Next we identified 2209 nodes (2209 / 6587 = 33.5% of the original data set) for which the prices were always between 0 $ / MWh and 200 $ / MWh during calendar year 2015. We refer to these as Select Nodes. We find that ADMD and backcasting recover 92% and 88% of potential revenues, respectively, for only select nodes. This is a modest increase compared to all nodes. DMD, however, recovers less relative profit when only considering select nodes. We hypothesize this is because DMD benefits from forecasting as many nodes as possible.
Backcasting DMD ADMD All Nodes 83.7% 24.3% 80.8% Select Nodes 88.3% 21.7% 92.2% Table 3: Comparison of net revenues, relative to the perfect information case, for each forecasting method for January 5 th to December 9 th in 2015. Select nodes are locations where the market price was always between 0 $ / MWh and 200 $ / MWh. DMD forecasting greatly underperforms relative to ADMD and backcasting. ADMD and backcasting perform well through most of CAISO, but fail to capture the largest revenue opportunities across the coast.
The perfect information reference shows the maximum available revenue.

Computational Timing Results
A key advantage of DMD and ADMD is computational speed. Training is quick, often limited by the SVD calculation, and forecasting is almost instantaneous using the explicit formula in Eq. (18). To demonstrate these benefits, we performed simple computational benchmarks using an Auto-Regressive Integrated Moving Average (ARIMA) model implemented in the StatsModel package. We considered 48 hours of training data and ARIMA model form (p, d, q) = (7, 1, 1). Training and forecasting with an ARIMA model took 33 seconds per node on average. In contrast, it took 11.2 seconds in total for ADMD (r = 8, l = 48, n = 96) to train and forecast for all 6587 nodes, which is 0.002 seconds per node. On a per-node basis, ADMD is 20,000 times faster than ARIMA. Our custom implementation of DMD using SciPy is available on GitHub (https://github. com/dowlinglab/dmd-energy-markets). These benchmark tests were performed on a 2014 MacBook Pro (3.0 GHz Intel Core i7 processor, 16 GB of RAM) using an Anaconda Python environment.

Conclusions and Future Work
In this paper, we extend the success of DMD in forecasting financial markets [28,10] to wholesale electricity markets. We find that standard DMD is unable to extract coherent low-rank structures in California Independent System Operator (CAISO) day-ahead market prices for the calendar year 2015. Instead, we find Augmented DMD (ADMD), which is equivalent to differencing in a time-series model, overcomes the so-called standing wave problem and can consistently reconstruct and forecast with errors less than 5% with a low-rank approximation. We benchmark DMD, ADMD, and backcasting in a rolling horizon model predictive control case study for energy arbitrage with a market connected battery. We show that ADMD outperforms backcasting by capturing 92% versus 88% of available net market revenue at nodes without extreme prices. We also find that ADMD is up to 20,000 times faster than ARIMA for training and forecasting.
There are several opportunities to further enhance DMD for energy market forecasting. Denoising procedures can greatly improve the performance of DMD with experimental data and should be exploring for energy price forecasting. [11,19] Multiresolution DMD may better identify and recapitulate dynamics at disparate time-scales in electricity markets. [23] Both of these methods may help explain why ADMD as currently implemented does not benefit from forecasting multiple nodes simultaneously. We also plan to more thoroughly benchmark DMD against other forecasting techniques, including emerging machine learning methods, to understand trade-offs between data requirements, computational time, need for model tuning, and forecasting error. Real-time markets (RTMs) set prices in 5 to 15-minute increments and are much more volatile than day-ahead markets. [13] Studying the performance of DMD for RTMs and other geographic regions is another future opportunity.