• 📖 Cover
  • Contents

Chapter 1: Alpha Diagnostics & the Information Coefficient

What you will learn

Book 1 ended at the moment an alpha model went into production. Quick refresher in plain English:

  • Alpha model — a model that predicts which stocks (or assets) will go up tomorrow, this month, or this year. It outputs a number per stock per date. If you read the model’s output table top-down, you get a ranked list of stocks: most-recommended at the top.

This chapter is what happens after you build that model. A live alpha model is not a static object — the market keeps changing under it. Your job, as the person who owns the model, is to monitor it: each month, decide whether the model is still useful or whether it has quietly stopped working.

The toolkit for that monitoring is small and standard. It is built around one number — the Information Coefficient (IC) — and five plots: the monthly IC bar chart, the rolling 12-month IC line, the cumulative IC curve, the twin-axis IC-vs-wealth overlay, and the predicted-vs-realized P&L scatter. Read together, these five charts tell you within thirty seconds whether the model you shipped six months ago is still doing what you said it would do.

A few more first-use definitions you will need:

  • Backtest — running the model on historical data and asking “if I had traded this signal back then, what would have happened?”
  • Walk-forward — the honest kind of backtest: train only on data up to month \(t-1\), test on month \(t\), slide forward.
  • Decay — a model that “worked” eventually stops working, because markets adapt or the model overfits a regime that ended. Detecting decay before it costs money is the entire point of this chapter.

Section overview

Why an alpha model needs to be watched

There is a hazard in quantitative investing that does not exist in physics or chemistry: the laws change while you are using them. A model fit on data from 2005–2015 captures one particular joint distribution of features and returns. If that distribution were stable forever, you could put the model in a glass case and run it untouched for twenty years. It is not stable. Investors copy good signals (so they stop being good), economies move through cycles, and once-in-a-decade events — the 2009 recovery, the March-2020 COVID dislocation, the 2022 rate hikes — periodically reshuffle which stocks behave like which.

The dangerous feature of model decay is that it is silent. The model keeps producing predictions, the portfolio keeps generating trades, and the P&L curve keeps drifting up and down. A routine backtest will not tell you that the predictive correlation between \(\hat R_{i,t+1}\) (what the model said) and \(R_{i,t+1}\) (what really happened) has been sliding from \(0.05\) to \(0.02\) to \(-0.01\) over the past nine months. By the time the equity curve visibly flattens, you have lost six to twelve months of attention and capital. The whole point of diagnostics is to catch the decay months before the wealth curve makes it obvious.

The silent failure mode

A backtest tells you whether the model worked in the past. A wealth curve tells you whether the strategy is currently making money. Neither tells you whether the model’s predictive content — its information about the future cross-section — is still alive. The Information Coefficient is the only metric that does, and it does it month by month rather than after the fact.

Beta vs alpha — a quick recap

Before we move further, recall the distinction from Book 1, Chapter 5. Plain English first: a beta model explains today’s return using today’s factor returns — it is a backward-looking decomposition, like saying “the stock went down 3% today because the market went down 2% and tech sold off.” Formally:

\[ r_{i,t} - r_{f,t} = \alpha_i + \beta_{i,1} F_{1,t} + \cdots + \beta_{i,K} F_{K,t} + \varepsilon_{i,t}. \]

Every variable on the right-hand side is dated \(t\), just like the return on the left. The regression explains a return that already happened. It is a diagnostic, not a forecast.

An alpha model is forward-looking — it predicts next month’s return using only information available today:

\[ \hat R_{i,t+1} = f(\mathbf X_{i,t}), \]

where every component of \(\mathbf X_{i,t}\) is something you can observe at time \(t\) (book-to-market, recent momentum, news sentiment, etc.) and the target \(R_{i,t+1}\) is one period in the future. Whatever the functional form of \(f\) — linear regression with shrinkage (Ridge, LASSO), Random Forest, Gradient Boosting (HGBR), or a deep neural network — the contract is the same: features known today, return tomorrow.

The reason this distinction matters for diagnostics is that beta-model residuals can be evaluated with classical statistics (the \(\alpha\) intercept has a standard error and a \(t\)-statistic). Alpha-model predictions cannot. There is no closed-form formula for the uncertainty of \(\hat R\); the only honest way to evaluate an alpha model is to deploy it walk-forward and measure how well its predictions order the stocks against the actual future returns. That measurement is the Information Coefficient.

Pointer to Book 1

If the distinction between contemporaneous and predictive modelling, or the rationale for walk-forward backtesting, is hazy, return briefly to Book 1, Chapter 5. The present chapter assumes you can write down the protocol — at month \(t\), train on data dated strictly before \(t\), predict at \(t\), observe at \(t+1\) — without effort. Everything that follows builds on that protocol.

The Information Coefficient

Where you’ll see this. If you intern at a quant fund, the very first thing your PM will ask you to produce for any new signal is the IC time series and the cumulative IC plot. Master these two charts and you can sit at the senior table on day one.

Intuition — what an IC really is

Imagine you wrote down at the start of the month a ranked list of 300 stocks, top of the list = your strongest “buy” pick, bottom = your strongest “avoid” pick. A month later, you look at what actually happened and write down a second list: the same 300 stocks, ranked from best realised return to worst.

The IC is just: “how similar are the two ranked lists?” Quantified on a \([-1, +1]\) scale.

  • IC \(= +1\): the two lists are identical (perfect prediction).
  • IC \(= 0\): your predicted ranking has nothing to do with what happened (coin flip).
  • IC \(= -1\): your top picks were the worst stocks (anti-predictive).

A “real” alpha model in equities lives near IC \(\approx +0.05\). Yes, that low — markets are very noisy. We will see why \(0.05\) is actually enough to make real money.

🎥 Watch — Spearman’s Rank Correlation (9 min)

The IC is just Spearman’s rank correlation applied to (prediction, realisation) pairs. This video explains why using ranks instead of raw magnitudes makes the metric robust to the outliers and nonlinearities that are ubiquitous in financial returns.

— StatQuest

Definition

Plain English first, then the formula. At the start of month \(t\), the model produces one number per stock — a predicted return for next month. There are \(N_t\) stocks. After waiting one period, we observe what actually happened: one realised return per stock. The Information Coefficient at month \(t\) is just the Spearman rank correlation between these two columns of numbers — predictions and realisations:

\[ \mathrm{IC}_t \;=\; \mathrm{Spearman}\!\Bigl(\{\hat R_{i,t+1}\}_{i=1}^{N_t},\; \{R_{i,t+1}\}_{i=1}^{N_t}\Bigr). \]

Mechanically: rank the predictions from 1 (smallest) to \(N_t\) (largest), rank the realisations the same way, and compute the ordinary Pearson correlation of the two rank vectors. The output is a number in \([-1, 1]\). A positive IC means stocks the model said would outperform did in fact outperform (on the rank scale); a negative IC means the model is anti-predictive (its top picks did worst); an IC near zero means the model is no better than coin-flipping.

For a respectable equity alpha model on a monthly horizon, a typical \(\mathrm{IC}_t\) falls in the range \(0.02\) to \(0.08\). These are small numbers. They look pathological to anyone trained on regression in marketing or biology, where ICs of \(0.5\) are routine. They are not pathological — they are the empirical reality of public-equity return prediction, and they translate into substantial Sharpe ratios for the reasons we develop in Section 1.4.

Caption. A labeled anatomy of the IC: rank both columns, feed them into Spearman, get one number per cross-section; the IC time series then yields a mean (level) and an ICIR (stability).

Caption. Even a healthy alpha model produces a mix of positive (blue) and negative (red) monthly ICs; the dashed line is the mean, and the spread around it is the sampling noise from which the standard error and \(t\)-statistic of the mean are computed.

Why Spearman, not Pearson

Intuition — Spearman vs Pearson

Pearson correlation uses the raw numbers — so a single stock that returned +80% (because of a buyout) will tug Pearson around violently. Spearman correlation first replaces every number by its rank (1, 2, 3, …, \(N\)) and then runs Pearson. So a +80% return and a +8% return become “rank 300” and “rank 299” — just one step apart. The big outlier is squashed to a single rank step. That is exactly what you want for portfolios, because you only ever trade the top names and bottom names — you do not care whether the #1 stock returned 8% or 80%, only that it was at the top of the list.

Caption. A single \(+8.0\) outlier slashes Pearson’s correlation while Spearman barely budges — the rank transform compresses the offender into one rank step, which is exactly the robustness portfolios built from rankings need.

The choice of rank correlation rather than ordinary Pearson correlation is deliberate and consequential. Three reasons.

Robustness to outliers. A single tail observation — a stock that returned \(+80\%\) or \(-50\%\) in one month because of a takeover or a fraud — can swing a Pearson correlation from \(0.05\) to \(-0.10\). Spearman compresses the offender to its rank and absorbs the shock. In a cross-section of 500 stocks, the difference between the largest realised return and the second-largest is exactly one rank, regardless of whether the gap is \(5\%\) or \(50\%\).

Monotone invariance. If the true relationship between features and returns is monotonic but nonlinear — say, the top-decile signal predicts a \(3\%\) return and the bottom-decile signal predicts \(-3\%\), with everything else in between — Pearson will under-state the predictive power unless the model nails the magnitudes exactly. Spearman cares only about ordering, so it credits the model for getting the ranking right even when the predicted magnitudes are systematically too small or too large. For portfolios built from rankings (long the top \(K\), short the bottom \(K\)) this is exactly the right notion.

No need to standardise the predictions. Pearson is sensitive to the scale and centring of both inputs. Spearman applies the rank transform first, so a model that systematically over-predicts the size of returns gets the same IC as a perfectly calibrated model with the same ordering. Since portfolios are built from ranks anyway, this is the metric that aligns with how you will actually use the signal.

In real work, Pearson and Spearman ICs on alpha-model output are usually within \(0.005\) of each other in stable months and diverge by \(0.02\) or more in months containing extreme returns. The discipline you should adopt is: compute and report Spearman by default, and only fall back to Pearson when you have a specific reason — for example, when evaluating a calibrated return forecast that you plan to feed into a mean-variance optimiser.

Mean IC, IC standard error, IC \(t\)-statistic

Intuition — why we don’t trust a single month

A single month’s IC could be high by luck or low by bad luck. So we collect IC\(_1\), IC\(_2\), … , IC\(_T\) across many months and ask two questions: (1) what is the average level? — that is the mean IC; (2) how stable is it from month to month? — that is the standard deviation \(\sigma(\mathrm{IC})\), and the closely related ICIR. A high mean with high volatility is a coin flip in disguise; a modest mean with low volatility is a real, durable signal.

A single \(\mathrm{IC}_t\) is dominated by sampling noise. We need to average over time. Let the model produce ICs for \(T\) months — \(\mathrm{IC}_1, \mathrm{IC}_2, \ldots, \mathrm{IC}_T\). The mean IC is the simple average over the months:

\[ \overline{\mathrm{IC}} \;=\; \frac{1}{T} \sum_{t=1}^{T} \mathrm{IC}_t. \]

This is the central tendency: the average rank correlation the model achieves in a typical month. Note that mean IC is a cross-time average of a cross-sectional statistic, so it absorbs the cross-sectional sampling noise inside each month and is left only with month-to-month variation.

In plain English: how uncertain are we about the mean IC? Standard formula from intro statistics — divide the month-to-month standard deviation by \(\sqrt{T}\):

\[ \mathrm{SE}(\overline{\mathrm{IC}}) \;=\; \frac{\sigma(\mathrm{IC}_t)}{\sqrt{T}}, \]

assuming the monthly ICs are approximately independent of each other. (They are, in real data, slightly auto-correlated — regimes persist for a while — but the correction is small for monthly cross-sections.)

The corresponding IC \(t\)-statistic is exactly the same idea as a \(t\)-statistic from intro stats: how many standard errors is the mean away from zero?

\[ t_{\mathrm{IC}} \;=\; \frac{\overline{\mathrm{IC}}}{\mathrm{SE}(\overline{\mathrm{IC}})} \;=\; \overline{\mathrm{IC}} \cdot \frac{\sqrt{T}}{\sigma(\mathrm{IC}_t)}. \]

This is the number you cite in a research memo. By convention, \(t_{\mathrm{IC}} > 2\) is “the mean IC is statistically distinguishable from zero at the 5% level”; \(t_{\mathrm{IC}} > 3\) is “compelling”; \(t_{\mathrm{IC}} > 4\) is “extraordinary — suspect a data leak.”

Intuition — ICIR is the ‘is this signal stable?’ number

The ICIR asks: if your average IC is \(0.05\) but it swings between \(-0.10\) and \(+0.20\) every month, is that really a stable signal? The ICIR divides the mean IC by its monthly standard deviation:

\[\mathrm{ICIR} = \frac{\overline{\mathrm{IC}}}{\sigma(\mathrm{IC}_t)}.\]

It is the per-month Sharpe ratio of the IC time series — the same flavour as a \(t\)-statistic over time, but not divided by \(\sqrt{T}\). ICIR above \(0.30\) on monthly data is good; above \(0.50\) is excellent. Multiply by \(\sqrt{12}\) to annualise.

A closely related statistic is the Information Coefficient Information Ratio (sometimes called ICIR, sometimes IR-IC, sometimes just IR — naming varies by author):

\[ \mathrm{ICIR} \;=\; \frac{\overline{\mathrm{IC}}}{\sigma(\mathrm{IC}_t)}. \]

Do not confuse ICIR with \(t_{\mathrm{IC}}\) — they differ by a factor of \(\sqrt{T}\).

A worked example

The cell below simulates 120 months of cross-sections with \(N = 300\) stocks per month and a true (population) predictive correlation of \(0.06\). Plain-English roadmap of the code: for each month, draw a hidden quality signal for every stock; build a realised return that is correlated with the signal at level \(0.06\); build the model’s prediction as the signal plus extra noise (the model is imperfect); compute the IC for that month. Repeat for 120 months, then summarise.

What we just got. Five numbers that together describe a signal: the average IC, its month-to-month volatility, the \(t\)-statistic, the ICIR, and the hit rate (fraction of months the IC was positive).

Interpretation. The mean IC near \(0.04\) is below the population value of \(0.06\) because the model only sees the signal plus measurement error. The \(t\)-statistic comfortably exceeds 2, so the IC is statistically distinguishable from zero. The ICIR is in the healthy range. The hit rate — the fraction of months with positive IC — is well above 50%, which is itself a useful sanity check: a real model with \(t_{\mathrm{IC}} \approx 3\) should win the rank-correlation game in roughly 65–70% of months.

Reporting convention

A short alpha-model memo should always include the four numbers above plus the hit rate: \(\overline{\mathrm{IC}}\), \(\sigma(\mathrm{IC}_t)\), \(t_{\mathrm{IC}}\), ICIR, and hit rate. Together they let a reader judge both the magnitude of the predictive edge and the reliability with which the model achieves it.

The Fundamental Law of Active Management

Where you’ll see this. Anyone selling you a “three-factor model with Sharpe 3” is implicitly claiming an IC. Use the Fundamental Law to back out what IC they’re claiming. If it comes out larger than \(0.10\), raise an eyebrow.

Intuition — Grinold’s Fundamental Law in one breath

Skill (your IC) × number of independent bets (breadth) \(=\) your annual risk-adjusted return. More formally, the annual Sharpe you can hope for is approximately

\[\text{annual Sharpe} \;\approx\; \mathrm{IC} \times \sqrt{\mathrm{BR}}.\]

Two slogans: - A small IC plus lots of independent bets beats a big IC on a few bets. That is why quants hold portfolios of 100+ stocks rebalanced 12 times a year, not 3 stocks held for a decade. - Double the bets \(\to\) Sharpe goes up by \(\sqrt{2} \approx 1.41\times\) — if the bets are independent. The “if” is doing all the work in real life.

Statement

The single deepest result in active-management theory — the bridge between a tiny rank correlation and a substantial portfolio Sharpe ratio — is Grinold’s Fundamental Law of Active Management (Grinold, 1989; Grinold and Kahn, 1995):

\[ \mathrm{IR} \;\approx\; \mathrm{IC} \cdot \sqrt{\mathrm{BR}}. \]

Here \(\mathrm{IR}\) is the information ratio of the resulting portfolio (mean active return divided by tracking error, annualised — basically a Sharpe ratio), \(\mathrm{IC}\) is the per-bet information coefficient (the rank or Pearson correlation between predicted and realised active returns), and \(\mathrm{BR}\) is the breadth — the number of independent bets the manager makes per year.

The law is approximate, not exact. The derivation assumes things that real portfolios violate to varying degrees (independence of bets across stocks and months, no constraints on portfolio weights, perfect translation of forecast into weight). But the order of magnitude it predicts is consistently right, and the qualitative implication — that a quant fund survives on breadth, not on the absolute size of any one signal — is one of the most important pieces of intuition in the field.

Caption. The Fundamental Law in three labeled pieces — annual skill (\(\mathrm{IR}\)) equals per-period skill (\(\mathrm{IC}\)) times the square root of how many independent bets you place; rearranging gives the implied-breadth reality check.

Derivation sketch

Plain English: we are going to show why \(\mathrm{IR} \approx \mathrm{IC}\sqrt{\mathrm{BR}}\) by looking first at one bet, then at many independent bets summed together. The standard intuition from intro statistics — averages of \(n\) independent things have variance shrinking like \(1/n\) but mean staying the same — is the engine driving the \(\sqrt{\mathrm{BR}}\) factor.

Consider a single bet. You have a forecast \(\hat r\) for the active return of one position; the realised active return is \(r\). Suppose forecasts and realisations have unit variance and correlation \(\rho\) (so \(\rho\) plays the role of IC):

\[ \hat r, r \sim \mathcal{N}(0, 1), \quad \mathrm{Corr}(\hat r, r) = \rho. \]

You invest in proportion to the forecast: \(w = \hat r / k\) for some scaling constant \(k\). The realised P&L on this one bet is \(w \cdot r = \hat r r / k\). Its expectation conditional on the forecast is

\[ \mathbb E[w \cdot r \mid \hat r] = \frac{\hat r}{k} \cdot \mathbb E[r \mid \hat r] = \frac{\hat r}{k} \cdot \rho \hat r = \frac{\rho}{k} \hat r^2. \]

Averaging over the unconditional distribution of \(\hat r\) (which has variance 1), the expected P&L per bet is \(\rho / k\). The variance of the P&L per bet (to leading order in \(\rho\), which is small) is \(1/k^2\). So the per-bet Sharpe (expected P&L divided by its standard deviation) is

\[ \mathrm{SR}_{\text{single bet}} = \frac{\rho/k}{1/k} = \rho = \mathrm{IC}. \]

That is the punchline for one bet: the Sharpe of a single independent forecast equals its IC. Now suppose you make \(\mathrm{BR}\) independent bets per year, each with the same IC. Independent bets add their means linearly and their variances linearly (this is the basic “variance of a sum of independent variables” rule from intro statistics), so the annualised P&L mean grows like \(\mathrm{BR} \cdot \rho/k\), the annualised P&L variance grows like \(\mathrm{BR}/k^2\), and the annualised Sharpe is

\[ \mathrm{IR} = \frac{\mathrm{BR} \cdot \rho/k}{\sqrt{\mathrm{BR}/k^2}} = \rho \sqrt{\mathrm{BR}} = \mathrm{IC} \sqrt{\mathrm{BR}}. \]

That is the Fundamental Law.

Where the assumptions fail

The derivation assumes independent bets. A monthly alpha model that takes 60 stock positions in any given month does not place 60 independent bets that month — all 60 share the same macro environment, the same sector backdrop, the same liquidity conditions. So when the market goes down they tend to go down together. Effective breadth is therefore smaller than the nominal count \(\mathrm{BR}_{\text{nominal}} = K \cdot 12\). We will quantify the gap below using implied breadth.

Implied breadth: a reality check

Same equation, rearranged to solve for breadth instead of Sharpe:

\[ \mathrm{BR}_{\text{implied}} = \left( \frac{\mathrm{SR}}{\mathrm{IC}} \right)^2. \]

In plain English: given a backtested Sharpe and a measured mean IC, this tells you how many effective independent bets the strategy actually placed. If \(\mathrm{BR}_{\text{implied}}\) is close to the nominal count (\(K\) positions \(\times\) \(12\) months), the independent-bets assumption is roughly holding. If \(\mathrm{BR}_{\text{implied}}\) is a small fraction of nominal — say \(20\%\) — then cross-sectional correlation between positions is eating most of the breadth.

In real work at equity alpha funds, implied breadth typically runs \(30\)–\(50\%\) of nominal for a long-short top-K strategy. This is sobering on first encounter and useful as a calibration tool. It says: adding a hundredth position to a portfolio of \(K = 100\) is not really one more independent bet — it is more like one-third or one-half of one.

A numerical illustration

Plug in real-world orders of magnitude. A model with \(\mathrm{IC} = 0.05\), run on \(K = 60\) stocks per side (long \(60\), short \(60\)) rebalanced 12 times per year, has

\[ \mathrm{BR}_{\text{nominal}} = 60 \times 12 = 720, \qquad \sqrt{\mathrm{BR}_{\text{nominal}}} \approx 26.8, \qquad \mathrm{IR}_{\text{nominal}} \approx 0.05 \times 26.8 \approx 1.34. \]

A backtested Sharpe of, say, \(0.85\) then implies

\[ \mathrm{BR}_{\text{implied}} = \left(\frac{0.85}{0.05}\right)^2 = 289, \]

or about \(40\%\) of nominal. The gap of roughly \(2.5\times\) is your cross-sectional correlation penalty.

What we just got. A direct numerical comparison between the Sharpe the Fundamental Law would predict on paper, and the Sharpe an honest backtest actually delivers — plus the “implied breadth” gap between them.

Interpretation. The Fundamental Law over-predicts the Sharpe by a factor of \(\mathrm{IR}_{\text{nominal}} / \mathrm{SR}_{\text{actual}} \approx 1.6\) in this stylised example. Real equity quant funds report multiples in the 1.5–3.0 range. Two readings of the same number: pessimistic (“we only get \(40\%\) of the breadth we naively count”) or operational (“now we know how much margin to leave when budgeting a new strategy”).

Why the Law is the most useful identity in this chapter

Three uses, all routine if you do this for a job. First, sanity-checking a backtest: if your IC is \(0.03\) and you somehow report a Sharpe of \(5\), the law tells you the breadth would have to be \(\sim 28{,}000\) — implausible unless you are trading every minute. Treat that result as suspicious. Second, capacity planning: if you want to scale a model from \(K = 30\) to \(K = 100\), the breadth grows by a factor of \(\sqrt{100/30} \approx 1.8\times\) at most, often less. Third, signal-vs-portfolio diagnosis: if Sharpe falls but IC is stable, the problem is in portfolio construction; if IC falls and Sharpe falls together, the problem is the signal itself.

The Monthly IC Time Series

Where you’ll see this. This bar chart is the first picture on the wall in any quant equity team’s morning meeting. If you can read it fluently, you understand 80% of what alpha-model monitoring is about.

What to plot

A single mean IC summarises 60 or 120 numbers into one. That throws away most of what you wanted to see. A serious diagnostic plots the full time series of \(\mathrm{IC}_t\) as a bar chart, with positive bars in one colour and negative bars in another. Three patterns become visible at a glance:

  1. Dispersion. The thickness of the cloud of bars around zero. A model with \(\overline{\mathrm{IC}} = 0.04\) and \(\sigma(\mathrm{IC}_t) = 0.06\) has a tight cluster of positive bars and the occasional negative one; the same mean IC with \(\sigma = 0.15\) has a much messier picture. Dispersion is what kills statistical significance; a tight \(\sigma\) is as important as a high mean.

  2. Regime visibility. A long run of consecutive positive bars followed by a cluster of negative bars indicates regime change. The mean IC over the whole sample can disguise this — five years of healthy IC averaged with two years of broken IC still produces a positive mean. The bar chart does not let you average the picture away.

  3. Tail months. Extreme negative IC months — a single month of \(\mathrm{IC} = -0.20\), for instance — are usually associated with macro stress events: the March-2020 COVID reversal, the August-2007 “quant quake”, the November-2016 US election surprise. Knowing which months these are helps separate “the model is broken” from “every model was broken that month.”

A demonstration

Plain-English roadmap of the next cell: we manufacture a 144-month IC series with a healthy mean of about \(0.045\), then inject 10 bad months in the middle (months 60–70) to mimic a regime event. We then plot the bars, colour them by sign, and overlay the long-run mean as a dashed line.

What we just got. One bar per month of IC, coloured blue for positive and red for negative, plus the long-run mean as a dashed line — the canonical lead figure of an alpha-model report.

Interpretation. The bar chart immediately shows the regime cluster around months 60–70 — a stretch of red bars deeper than usual. Without this picture you would just see a mean IC of about \(0.04\) and a \(t\)-statistic in the high single digits, with no way to flag that something exceptional happened mid-sample. This is why every IC summary that ships with a model carries a bar chart as the lead figure.

How to read a bar chart at a glance

Healthy alpha model: most bars blue, a sprinkling of red, no obvious clusters of red. Decaying model: gradual loss of blue density, recent months mostly red. Regime-event model: long run of blue interrupted by a dense cluster of deep red. Each pattern implies a different intervention — keep running, retrain on a shorter window, pause and audit, respectively.

Rolling and Cumulative IC

Where you’ll see this. Where the monthly bar chart shows you what happened, the rolling and cumulative IC plots show you the trend — the two questions a senior PM will ask after looking at the bar chart for ten seconds: “is it getting better or worse?” and “what is the total return on this model?”

Intuition — cumulative IC is the model’s equity curve

Plain English: cumulative IC is just the running total of all the monthly ICs you’ve seen so far — IC\(_1\) + IC\(_2\) + … + IC\(_t\). Plot it like a stock price.

  • A rising line means the model is working in the period.
  • A flat line means the model has stopped working (each new IC is near zero, so the running total is not climbing any more).
  • A falling line means the model is anti-predicting (negative ICs are eating into your running total).

In other words: cumulative IC is to the alpha model what the wealth curve is to a trading strategy. The slope is the local mean IC.

Rolling 12-month IC: smoothing without averaging away

The bar chart is high-information but visually noisy. A complementary view is the rolling 12-month average IC — the average of the most recent 12 monthly ICs, recomputed each month:

\[ \mathrm{RollIC}_t \;=\; \frac{1}{12}\sum_{s=t-11}^{t} \mathrm{IC}_s. \]

The rolling average smooths the cross-sectional noise that dominates an individual month’s IC and shows the trend — whether the underlying predictive content of the model is rising, flat, or falling. A horizontal rolling IC near \(0.04\) over four years says the model is delivering on its training-set claim. A rolling IC that drifts from \(0.05\) down to \(0.01\) over the same period says the model is decaying even if the cumulative wealth curve is still climbing on inertia.

A subtle but important point: when plotting rolling IC, lag it by one month. The IC at month \(t\) is the correlation between predictions made at \(t\) and realisations observed at \(t+1\). So \(\mathrm{IC}_t\) is only known after the realisations arrive — at the very end of month \(t+1\). If you plot rolling IC at month \(t\) without lagging, the picture implicitly uses the future. A .shift(1) after the rolling mean fixes this. Trivially small in the math; consequential when the chart is the one shown to a Portfolio Committee.

Cumulative IC: the model’s equity curve

Even more compactly, the cumulative IC is the running sum:

\[ \mathrm{CumIC}_t \;=\; \sum_{s=1}^{t} \mathrm{IC}_s. \]

Plotted against time, this is the analogue of the wealth curve, but for the predictive content of the model rather than the dollars it earned. The slope of \(\mathrm{CumIC}\) at any point is the local mean IC. A healthy model produces a roughly linearly increasing cumulative IC curve. The slope’s interpretation is precise: \(\mathrm{slope} \approx \overline{\mathrm{IC}}_{\text{local}}\) — the number of “rank-correlation units” the model harvests each month.

Three diagnostic readings of the cumulative IC curve:

  1. Steady linear climb. Model is working consistently. Slope equals mean IC.
  2. Flat stretch. Model’s predictive content has gone to zero for a period — IC is hovering around zero. Investigate why; the wealth curve will follow with a lag.
  3. Downward run. Model is anti-predicting. Stop. Audit. Do not retrain in panic.

Caption. The cumulative-IC line is read like a wealth curve: rising slope is the local mean IC, a flat stretch means the model has gone to sleep, and a downward run is anti-prediction.

Caption. The rolling IC (blue, left axis) shows the local level of the signal, while the cumulative IC (red, right axis) shows its running total — when rolling IC sags toward zero around months 48–84, the cumulative curve flattens by exactly the same amount, because slope of cumulative IC equals the local mean.

The cumulative IC curve is also useful for sub-period analysis. Pick two dates, read the cumulative IC at each, take the difference, divide by months elapsed: that gives you the model’s mean IC over that interval. This kind of slicing is much harder on a bar chart or rolling line.

What we just got. Two stacked panels — the rolling 12-month IC line (top, with a green “healthy” threshold and a red “danger” fill) and the cumulative IC (bottom, the running-total equity curve for the signal).

Interpretation. The cumulative IC curve climbs cleanly through the first 60 months, flattens during the decay window, then resumes climbing — but with a shallower slope. The rolling IC plot makes the decay visible as a downward drift from about \(0.05\) to about \(0.015\), with a brief excursion into the “danger zone” near month 90. If you had been watching this in real time, you would have flagged decay around month 70 and intervened — by retraining, by switching features, or by stepping down position size — well before the wealth curve made the problem obvious.

Why two plots, not one

Rolling and cumulative IC carry overlapping information but with different visual emphasis. Rolling IC shows the level of the signal each month; cumulative IC shows the running total. A flat patch in cumulative IC corresponds to a near-zero patch in rolling IC. If you do this for a job, you will keep both on the same monitor — the eye picks up changes in slope of cumulative IC before changes in level of rolling IC, because the cumulative chart compresses the noise.

IC vs Wealth: The Twin-Axis Decay Detector

Where you’ll see this. If you could only ship a single chart to a Portfolio Committee, ship this one. It is the chart a PM points to when arguing for or against pulling the plug on a strategy.

The one chart you must produce

On a single x-axis (months), plot two things:

  • Left y-axis (blue): monthly IC bars plus the rolling 12-month IC line.
  • Right y-axis (red): cumulative wealth of the strategy.

The two series usually move together. When they don’t, you have caught something important.

There are four typical readings of the chart:

Pattern Reading Action
IC up, wealth up Healthy: signal and execution both working Continue
IC flat, wealth up Drifting: wealth is moving on momentum; signal is fading Investigate, prepare retrain
IC down, wealth down Confirmed decay: signal is broken, P&L is following Stop, audit, retrain on shorter window
IC up, wealth down Execution/construction problem: signal is fine, portfolio isn’t Audit portfolio construction, costs

The fourth pattern is the most diagnostic. If the IC is healthy but wealth is declining, the model is not broken — the portfolio construction is. Common causes are transaction-cost drag on a high-turnover strategy, sector concentration that happened to hurt during the period, or a small-cap tilt that paid a liquidity premium back. None of these problems are fixed by retraining the alpha model — they are fixed by changing how you turn predictions into trades.

Caption. Cumulative IC and the strategy equity curve are mechanically coupled: during the shaded window the rolling IC drops near zero, cumulative IC stops climbing, and wealth plateaus in lock-step — a flat cumulative-IC stretch is always followed by a flat wealth stretch within a month or two.

Plain-English roadmap for the next cell: simulate 132 months of IC with a forced “decay window” of bad months around months 80–100; convert the IC series into a stylised long-short return series (return ≈ scaled IC + noise); then plot IC on the left axis (in blue) and the wealth curve on the right axis (in red). The visual cross-check is the whole point.

What we just got. A twin-axis chart: IC bars and IC trend on the left (blue), cumulative wealth on the right (red). The eye compares trend in IC to trend in wealth in a single glance.

Interpretation. The IC plunges in the window months 80–100 and the wealth curve stops climbing during the same period. The visual cross-check is unmistakable: red bars cluster, blue line dips below zero, red wealth curve plateaus. Either of the two series alone might be dismissed as random; together they confirm a real decay episode. After month 100, both recover.

The discipline this chart enforces

You are not allowed to evaluate the wealth curve in isolation. Every quant who has lived through a quant-quake has the same story: the wealth curve looked fine for months while the IC was bleeding out, and by the time the wealth turned, the damage was unrecoverable. The twin-axis chart forces you to look at the predictive and the realised sides together.

Model Decay Detection

Where you’ll see this. In real work, this is the part of monitoring that produces actions — emails, Slack alerts, pages to the on-call quant. The rest of the dashboard is informational; the decay rules are operational.

Intuition — three rules of thumb for spotting decay

A model has decayed when its IC has fallen meaningfully below the IC it had during training and early deployment — and the fall is not just a one-month fluke. Three handy rules of thumb you can apply by eye, before any code runs:

  1. One bad month is noise. Three consecutive negative months is suspicious. Six straight months below \(0.02\) is evidence.
  2. Magnitude matters, not just sign. An IC of \(+0.005\) is mathematically positive but economically zero (it would imply a Sharpe well below any reasonable hurdle). Treat anything below \(\approx 0.02\) as “essentially zero”.
  3. Compare slopes, not just levels. The slope of cumulative IC over the most recent 12 months versus the prior 24 months is the cleanest decay statistic. A 50% drop in slope is a strong signal.

What “decay” means precisely

A model has decayed when its forward-looking IC has fallen meaningfully below the IC that was achievable in training and early deployment, and the change is not a transient. Three observations sharpen this definition.

Transient vs persistent. A single month of bad IC is not decay; it is sampling noise. Three consecutive months of negative IC are suspicious. Six consecutive months of IC \(< 0.02\) — even if not strictly negative — are evidence of substantive deterioration. A useful trigger is three or more consecutive months of negative rolling IC combined with at least six months since the last retrain. The first condition catches the decay; the second guards against retraining-induced false alarms.

Magnitude vs sign. “Negative IC” makes for a vivid headline, but the more economically relevant breakpoint is often “IC near zero.” A model with IC \(= 0.005\) produces a portfolio with Sharpe \(\approx 0.005 \times \sqrt{700} \approx 0.13\) — well below any reasonable hurdle. Anything below an empirical floor of about \(0.02\) for monthly equity ICs is “essentially zero” in real work, even if the formal sign is positive.

Single-period vs cumulative. The cumulative IC slope over the past 12 or 24 months is the cleanest decay statistic. Compare it to the slope over the previous 24 months. A drop of more than 50% in slope is a strong decay signal.

Detection rules

A small set of rules covers most setups in real work. They are conservative by design — false positives waste a retrain cycle, but a false negative leaves a broken model running.

Rule Trigger Severity
R1. IC collapse Monthly \(\mathrm{IC} < -0.03\) CRITICAL
R2. IC decay Rolling 6-month \(\mathrm{IC} < 0.02\) WARN
R3. Negative streak 3 consecutive months of negative monthly IC WARN
R4. Cumulative slope drop Trailing 12-mo cum IC slope < 50% of prior 24-mo slope WARN
R5. Hit rate collapse Trailing 12-mo hit rate < 40% WARN

The conventional response to a WARN is to investigate but not to act blindly; the conventional response to a CRITICAL is to halt new position openings and convene a model-review meeting. Both should be logged with the date, the rule that fired, and the model state at that moment.

A worked decay detector

Plain-English roadmap for the next cell: build a fake IC series with three regimes (healthy / broken / recovered), walk through it month by month, and at each month check whether any of the rules above are tripped. Print a summary table of every alert and the first eight in detail.

What we just got. A counter of how many alerts each rule fired, plus the first eight alerts in chronological order — the kind of log entry a monitoring system writes when it detects trouble.

Interpretation. Alerts cluster densely in the decay window (months 48–72), exactly where the underlying IC turned. The rules ramp in a sensible order: R3_streak (three consecutive negative months) tends to fire first because it has the shortest memory; R2_decay and R5_hitrate follow as the longer rolling windows fill with bad data; R1_collapse only fires on the truly disastrous individual months. After the model recovers, the alerts subside — but with a lag of several months, because the rolling windows still contain bad data from the broken regime.

False positive vs false negative

Set the trigger thresholds too tight and you will burn compute and operator attention on retrains that achieve nothing. Set them too loose and a real decay episode runs for an extra quarter before anyone notices. The practical answer is asymmetric: prefer a few extra retrains over a missed decay. A retrain is cheap; six months of capital allocated to a broken model is not.

Retraining and Wealth

Where you’ll see this. Once you decide a model is decaying, what do you actually do? Real funds pick one of three policies and stick with it. Knowing the menu is most of the battle.

The three retraining policies

What happens after the model decays? Three policies cover the practical space.

No-retrain. Fit the model once on a long initial window — say, the first eight years of available data — and run it forever. This is the cleanest scientific protocol: results are reproducible, look-ahead is impossible, and the equity curve directly measures whether the original-training-set claim held up. It is also the policy with the most decay risk, because it has no mechanism for adapting to regime change.

Scheduled retrain. Retrain on a fixed calendar — annually is the standard, sometimes quarterly for fast-moving signals. The training set typically expands (use all data available up to the retrain date) or rolls (use the most recent \(W\) months). Scheduled retraining buys regime adaptation in exchange for a small computational cost and a slightly more complex deployment story.

Hybrid: scheduled plus IC-triggered. Retrain on the calendar and trigger an extra retrain whenever a decay rule fires (e.g., rolling 6-month IC turns negative). The hybrid policy is the best of both: routine maintenance keeps the model current; emergency retrains catch genuine regime shifts when they happen. The cost is a higher operational burden — someone has to monitor the rules — and a real risk that an emergency retrain after a single rough quarter actually worsens the model by re-fitting to noise.

What each policy looks like on the equity curve

The dominant pattern is intuitive. The no-retrain policy delivers a clean, monotone wealth curve in good regimes and a flat-to-declining one through structural breaks. The scheduled policy has the same general shape but with small inflection points at each retrain date — the new model sometimes has a slightly different signature than the old. The hybrid policy looks like the scheduled one most of the time but with extra inflection points around regime events; if the trigger fired correctly, the curve resumes climbing; if it fired falsely, the curve experiences a brief pause as the new model finds its footing.

Plain-English roadmap for the next cell: we simulate three wealth curves on the same random shocks but apply different retraining policies. The “regime change” happens at month 60 — that is where the policies diverge.

What we just got. Three wealth curves on one chart, labelled with their Sharpe ratios; a black dotted line marks the regime change, a red dotted line marks the hybrid emergency retrain. The visual question is: how do the policies look after the regime change?

Interpretation. In a stationary regime all three policies produce essentially identical wealth curves. They only diverge after the simulated regime change at month 60. The no-retrain curve flattens; the scheduled curve recovers partially at the next January retrain; the hybrid curve fires an emergency retrain at month 66 and recovers slightly earlier. In real backtests the gaps between policies are usually \(0.10\)–\(0.30\) Sharpe points — meaningful but not enormous. The bigger benefit of disciplined retraining is risk: drawdowns under hybrid retraining are typically 30–40% smaller than under no-retrain because the strategy is not allowed to drift unchecked through a long decay episode.

Pick a policy, then run it

The choice between scheduled and hybrid is not a knob to optimise on the backtest. It is an operational choice. If you have an analyst whose job is to watch the IC dashboard every day, the hybrid policy puts that person to work. If the model is fully autonomous, the scheduled policy is simpler and harder to misuse. Whichever you pick, commit to the rule and run it consistently — switching policies mid-stream because a recent month went badly is the fastest route to over-fitting your operational decisions to noise.

P&L Attribution

Where you’ll see this. IC tells you whether the model still has predictive content. P&L attribution tells you whether the portfolio you actually traded delivered what the model promised. In real work, IC dashboards live with the modelling team and P&L attribution lives with the portfolio team — and the two teams compare notes every Monday morning.

Predicted vs realised monthly P&L

A complementary diagnostic operates at the portfolio level, not the cross-section. For each month, compute two numbers:

  • Predicted portfolio P&L: \(\hat\Pi_t = \sum_i w_{i,t} \hat R_{i,t+1}\) — sum of (weight on each stock) \(\times\) (predicted return on each stock). This is what the model promised.
  • Realised portfolio P&L: \(\Pi_t = \sum_i w_{i,t} R_{i,t+1}\) — same sum, but with the actual returns instead of the predicted ones. This is what actually arrived.

If the model’s predictions are calibrated (the right size, not just the right ordering) and the portfolio construction is faithful to the predictions, these two series should move together — month by month, the realised P&L should be a noisy version of the predicted P&L. The most useful summary is their rolling 12-month correlation — the Pearson correlation between \(\hat\Pi\) and \(\Pi\) computed over a moving 12-month window:

\[ \rho_t^{(12)} \;=\; \mathrm{Corr}\!\Bigl(\{\hat\Pi_s\}_{s=t-11}^{t},\; \{\Pi_s\}_{s=t-11}^{t}\Bigr). \]

A healthy attribution profile has \(\rho_t^{(12)} > 0.30\) most of the time. A drift of this correlation toward zero — or worse, negative — is one of the clearest early warning signs that the model is no longer useful for portfolio sizing, even if its rank-IC is still positive.

The distinction between rank-IC and prediction-P&L correlation deserves a moment. Rank-IC says: “the model ranks the cross-section correctly.” Prediction-P&L correlation says: “the model’s magnitudes are right enough that, when fed into the portfolio construction, the predicted P&L looks like the realised P&L.” A model can pass the first test and fail the second — e.g., a model that systematically over-predicts the size of returns will still rank stocks correctly but will produce predicted P&Ls four times the realised P&L. For top-K equal-weight constructions this is fine (only ranks matter). For score-weighted constructions or anything fed into a mean-variance optimiser, miscalibrated magnitudes can wreck the portfolio.

Decomposition: signal, sizing, residual

A more granular attribution splits monthly realised P&L into two pieces — the part the model predicted and everything else:

\[ \Pi_t \;=\; \underbrace{\hat\Pi_t}_{\text{predicted}} \;+\; \underbrace{\bigl(\Pi_t - \hat\Pi_t\bigr)}_{\text{residual}}. \]

The residual is the part of P&L the model did not predict. It is the sum of (a) cross-sectional sampling noise — month-to-month variation in idiosyncratic returns that no model could have foreseen — and (b) miscalibration — systematic over- or under-prediction by the model.

Over a long sample, the residual should average to zero if the model is well-calibrated; a persistently positive mean residual means the model under-predicts (the strategy makes more than the model says it should), a persistently negative mean residual means the model over-predicts. The ratio \(\sigma(\text{residual}) / \sigma(\hat\Pi)\) is a quick measure of how much of the P&L variation the model captures (smaller is better).

A worked attribution

Plain-English roadmap for the next cell: simulate 132 months of predicted and realised P&L where the two are correlated in normal times but decoupled in a window of bad months (70–95); then plot a scatter of pred vs real (left) and a rolling 12-month correlation line (right).

What we just got. Two diagnostic panels — a scatter of predicted vs realised P&L with the \(y=x\) reference line, plus a rolling 12-month correlation of pred vs real on the right. The rolling line is the early-warning signal.

Interpretation. The scatter on the left shows the typical signal-in-noise picture of a real alpha model: a cloud of points loosely organised around the \(y=x\) line, full-sample correlation of about \(0.20\). The rolling correlation on the right is the diagnostic-quality plot: most of the time it sits comfortably above the \(0.30\) threshold, but in the window months 70–95 it dives into the “anti-predictive zone.” Without watching the rolling correlation you would not see this episode at all — the scatter compresses it into a few diffuse points. The lesson, again, is to monitor over time, not over the whole sample.

The point of P&L attribution

The IC tells you whether the model is working. P&L attribution tells you whether the portfolio is working — whether the model’s predictions are translating, month by month, into the realised P&L it promised. The two are correlated but not identical. If you do this for a job, you will monitor both, with the IC dashboard owned by the modelling team and the P&L attribution owned by the portfolio team.

A Worked Diagnostic Dashboard

Where you’ll see this. This is the single-screen status report you would put up on the wall at a quant fund. It is the deliverable a junior quant produces in week one — and it is the deliverable you should be able to produce on your own by the end of this chapter.

This section assembles all of the diagnostics into a single running dashboard. It simulates a model that is healthy for the first half of the sample, decays in the middle, and partially recovers at the end — the typical life cycle of a real alpha model. Every plot you have seen so far appears on the dashboard, in the order you would scan them: IC bar chart, rolling IC, cumulative IC, IC-vs-wealth twin axis, rolling P&L correlation. The whole dashboard runs in your browser using only synthetic data generated inside the cell.

Plain-English roadmap of the long cell below — seven steps:

  1. Simulate a panel of 144 months \(\times\) 250 stocks with a population IC that is healthy → decays → recovers.
  2. Compute monthly IC, its \(t\)-stat, ICIR, hit rate.
  3. Build a top-25 long-short portfolio and compute its wealth curve and Sharpe.
  4. Compute predicted vs realised portfolio P&L plus their rolling 12-month correlation.
  5. Run the decay rules over the IC series and log every alert.
  6. Render a four-panel dashboard.
  7. Print a one-paragraph summary report.

What we just got. A four-panel monitoring dashboard plus a printed report — the kind of one-screen summary a junior quant pastes into a daily status channel.

Interpretation. The dashboard reproduces the standard four-panel layout used by real monitoring systems: monthly IC, rolling IC, IC-vs-wealth, P&L-correlation. The simulated decay between months 60 and 96 is visible in all four panels and would have been flagged by the alert rules. Behind the simple presentation lies the full apparatus we developed in this chapter — Spearman rank correlation, the Fundamental Law, monitoring rules, P&L attribution — assembled into a single browser-runnable program.

What to add for a real production dashboard

The simulation above keeps the code under 200 lines. A real production dashboard layers on three more things: (1) drift monitoring (a Kolmogorov–Smirnov test on the distribution of each input feature, to catch the case where the inputs themselves are changing; covered in Book 1 Chapter 5); (2) turnover and transaction-cost tracking (a separate panel of monthly portfolio turnover and the cost drag it produces); (3) feature-level IC (the IC of each individual input feature against returns, plotted alongside the overall model-level IC, to pinpoint which feature has decayed). Each of these is a small addition to step 2 of the code above.

Summary

This chapter is about looking after a model after it has gone to work. The single number that does most of the looking-after is the Information Coefficient — the Spearman rank correlation between the model’s predictions for next month and the realised returns. From that one number, four standard plots, the Fundamental Law, and a handful of monitoring rules, you have everything you need to run a credible alpha-model diagnostic operation.

The key results to keep handy:

  • Definition: \(\mathrm{IC}_t = \mathrm{Spearman}(\hat R_{i,t+1}, R_{i,t+1})\), computed cross-sectionally each month.
  • Use Spearman, not Pearson — robust to outliers, monotone-invariant, aligned with rank-based portfolio constructions.
  • Standard reporting tuple: \(\overline{\mathrm{IC}}\), \(\sigma(\mathrm{IC}_t)\), \(t_{\mathrm{IC}}\), ICIR, hit rate.
  • Fundamental Law: \(\mathrm{IR} \approx \mathrm{IC} \sqrt{\mathrm{BR}}\). Implied breadth \(\mathrm{BR}_{\text{implied}} = (\mathrm{SR}/\mathrm{IC})^2\) is the single cleanest reality check on a backtest.
  • Five canonical plots: monthly IC bar chart, rolling 12-month IC, cumulative IC, IC-versus-wealth twin axis, rolling correlation of predicted vs realized P&L.
  • Detection rules: collapse (\(\mathrm{IC} < -0.03\)), decay (rolling-6 \(< 0.02\)), streak (3 negative months), slope drop, hit-rate collapse.
  • Retraining: pick a policy (no-retrain, scheduled, hybrid), commit to it, and let the equity curve be evaluated against the policy you chose — not against the policy you wish you had chosen ex post.
The discipline this chapter enforces

A model is not a fixed object. It is a process that interacts with a non-stationary world, and its predictive content can erode silently for months while its wealth curve still looks comfortably upward-sloping. The Information Coefficient is the only metric that catches the erosion early enough to do something about it. Build the dashboard. Look at it every month. When the rules fire, take them seriously.

Exercises

Exercise 1.1 — Spearman versus Pearson on a single outlier

Generate a cross-section of \(N = 200\) stocks with predicted and realized returns each drawn from \(\mathcal N(0, 1)\) and correlated at \(\rho = 0.05\). Compute the Spearman and Pearson ICs. Now replace one realized return with the value \(+8.0\) (an extreme outlier) and recompute both. Repeat the experiment 1{,}000 times with different random seeds. Report the standard deviation of the Spearman IC and the standard deviation of the Pearson IC across the 1{,}000 trials. Which is more stable? By how much? Explain the gap in terms of the rank transform.

Exercise 1.2 — Fundamental Law and implied breadth

Take the worked diagnostic dashboard in Section 1.10 and treat its mean IC and realized Sharpe as fixed. Compute the implied breadth and compare it to the nominal breadth \(K \cdot 12\). Now modify the simulation to make the predictions correlated across stocks within a month (add a common factor that affects every stock’s prediction). Rerun. How does the implied breadth change? At what level of within-month prediction correlation does the implied breadth fall to \(20\%\) of nominal? Discuss what this tells you about the empirical limits on monthly alpha-model breadth.

Exercise 1.3 — Detection rules on a real-looking IC time series

Construct an IC time series of 96 months in three regimes: 36 months of healthy mean \(0.05\) and std \(0.07\); 24 months of broken mean \(-0.01\) and std \(0.08\); 36 months of recovered mean \(0.03\) and std \(0.07\). Implement the five detection rules listed in Section 1.7. Report the first month each rule fires within the broken regime, and the last month each rule fires after the regime recovers. Rank the rules by responsiveness (first to fire = most responsive) and by hysteresis (last to clear = slowest to recover). Which rule would you assign to a CRITICAL severity and which to WARN, and why?

Exercise 1.4 — Predicted-versus-realized P&L attribution

Using the worked dashboard’s panel of stocks and predictions, build the top-K long-short portfolio for \(K \in \{10, 25, 50, 100\}\). For each \(K\), compute the full-period correlation between predicted P&L and realized P&L. Plot the correlation versus \(K\). Is the relationship monotone? At what \(K\) does the correlation stop improving? Use this to argue for a practical \(K\) choice and explain how the answer would differ if you cared about portfolio Sharpe rather than P&L correlation.

Exercise 1.5 — Compare retraining policies on a broken regime

Re-run the worked dashboard simulation with a step change in the population IC: drop it from \(0.06\) to \(0.005\) exactly at month 60 and keep it there. Implement three retraining policies — no retrain, annual retrain on an expanding window, and a hybrid that triggers an extra retrain when rolling-6-month IC falls below \(0.005\). Report Sharpe, maximum drawdown, and final wealth for each policy. Which policy delivers the best risk-adjusted return? How sensitive is the answer to the timing of the regime change?

Exercise 1.6 — Author your own dashboard

Pick a public dataset of monthly equity returns (the Ken French data library is a good starting point: 12 industry portfolios since 1926). Take industry mean return over the past 3 months as the single feature; predict next month’s return cross-sectionally; run the full diagnostic dashboard developed in Section 1.10 across the entire sample. Report mean IC, ICIR, Sharpe of a top-3-vs-bottom-3 long-short, and the implied breadth. Then write a one-paragraph memo to a hypothetical Portfolio Committee saying whether you would deploy the strategy and why. This is the deliverable a junior quant produces in week one.

 

Prof. Xuhu Wan · HKUST ISOM · Model Risk in Quantitative Finance