The Stink: Methods and Data Sources

Dataset

4,067 finished Premier League fixtures across 11 seasons:

Season Matches WhoScored Mclachbot Notes
2015/16380Yes
2016/17380Yes
2017/18380YesYesMclachbot coverage begins
2018/19380YesYes
2019/20379YesYes1 unscraped match
2020/21380YesYes
2021/22377YesYes3 unscraped matches
2022/23380YesYes
2023/24380YesYes
2024/25380YesYes
2025/26271YesYesThrough late Feb (GW27) — feed ended

Total: 4,067 matches with WhoScored event-level data. Opta aggregated shot patterns (mclachbot) cover 9 seasons from 2017/18 onwards (180 team-season records).

Data sources

WhoScored / Opta event data

  • Match-level event streams scraped from WhoScored and stored in Postgres (whoscored_match_data)
  • Exported to src/data/pl-meta/whoscored-events.json via scripts/export-whoscored-pl-meta.ts
  • Shot context determined by Opta qualifiers: 22 (RegularPlay), 23 (FastBreak), 24 (SetPiece), 25 (FromCorner), 26 (DirectFreekick)
  • Throw-in set pieces tagged via qualifier 160 (ThrowinSetPiece) — shots from throw-in sequences that do not carry the standard shot-context qualifiers. Counted under set pieces in the export
  • Penalties identified by qualifier 9; own goals excluded (qualifier 28)
  • Body part: 15 (Head), 20 (RightFoot), 72 (LeftFoot)

Opta aggregated shot patterns (mclachbot)

  • Source: data/mclachbot-shots/aggregated-by-team-season.json — 9 seasons (2017/18 to 2025/26), 180 team-season records, 32 unique teams
  • Per-team-season breakdown: total shots/goals/xG, dead ball shots/goals/xG, pattern-level splits (corner, set_piece, free_kick, throw_in), including header sub-counts per pattern
  • Includes against-side totals (shots/goals/xG conceded) for each pattern
  • Current-season granular data in src/data/pl-meta/mclachbot-aggregated/: one file per restart type (corner, set_piece, free_kick, throw_in), 2025/26 only, 20 teams. Includes attack and defence sides per team

SportMonks Football API

  • src/data/pl-meta/fixture-stats.json: fixture-level statistics for 5 seasons (2021/22 to 2025/26), including corner counts
  • src/data/pl-meta/squad-physicals.json: starts-weighted average height per team-season, 6 seasons (2020/21 to 2025/26). Only 2025/26 used in the article (height vs header conversion scatter)
  • src/data/pl-meta/header-shots-by-club.json: per-club header shots, goals, conversion rate, and xG for 2025/26

Volley chart data (Sankey + suppression)

  • src/data/pl-meta/volley-chart-data.json: 2025/26 only, 20 teams
  • Sankey: set-piece shots conceded, structured as flows from source → phase → body part → outcome. Built from WhoScored event data with custom sequence analysis
  • "Second phase" = any follow-up shot within 20 seconds of the initial restart event
  • Suppression: dead-ball shots faced, goals conceded, and xG conceded per team, broken down by restart type (corner, set piece, free kick, throw-in)

Definitions

  • Dead ball: set piece + from corner + direct free kick. Excludes penalties. This is the article's primary unit of analysis
  • Corner conversion: goals scored from corner-sourced shots / total shots from corners (not corners taken)
  • Restart types: corners, indirect free kicks (set_piece in Opta), direct free kicks (free_kick), and throw-in set-piece sequences (throw_in)
  • Second phase: follow-up shots occurring within 20 seconds of the initial restart event. Used in the Sankey diagram
  • Big Six: Arsenal, Chelsea, Liverpool, Manchester City, Manchester United, Tottenham Hotspur
  • xG (expected goals): Opta-provided xG values. This article does not use a custom xG model
  • Rolling window: 20-match rolling average used for the construction timeline chart

Chart methodology

1. Where goals come from (stacked area, 11 seasons)

  • Source: whoscored-events.json (4,067 matches, 2015/16 to 2025/26)
  • Per-season aggregation of all shots and goals by context: open play, dead ball, counter attack, penalty
  • Toggles between % of goals and % of shots. "Focus" view isolates a single segment

2. The structural break (league-level trends, 9 seasons)

  • Source: aggregated-by-team-season.json (2017/18 to 2025/26)
  • Three league-wide metrics per season: dead-ball shot share (intent), dead-ball goal share (outcome), corner conversion (execution)
  • Also counts teams with 25%+ of goals from dead balls each season

3. The miasma theory (height vs header conversion, 2025/26)

  • Height: starts-weighted average from squad-physicals.json (2025/26 only)
  • Header conversion: from header-shots-by-club.json (2025/26 only)
  • Rank comparison chart: teams ranked by height on the left axis and by header conversion on the right, with connecting lines

4. Engineered corners (corner xG per shot, 9 seasons)

  • Source: aggregated-by-team-season.json, corner pattern
  • xG per corner shot for Arsenal and Manchester City, 2017/18 to 2025/26, with coaching change markers (Jover to City 2019, Jover to Arsenal 2021)

5. Construction timeline (rolling 20-match, 11 seasons)

  • Source: whoscored-events.json, filtered to Arsenal and Man City
  • Rolling 20-match window: dead-ball conversion rate (goals/shots) and dead-ball shot share (shots/total shots)
  • Coaching events marked: Jover → City (July 2019), Jover → Arsenal (July 2021)
  • Toggle between "Jover focus" (zoomed to coaching-change window) and "Full history"

6. Different pipes (restart mix, 2025/26)

  • Source: mclachbot-aggregated/ current-season files
  • Stacked bar showing share of set-piece xG (or goals) by restart type: corner, indirect FK, throw-in, direct FK
  • Toggles: xG vs goals, attack ("for") vs defence ("against"). Five featured teams plus league average

7. Who drags the average? (throw-in index, 2025/26)

  • Source: mclachbot-aggregated/ current-season files
  • Throw-in share of total set-piece xG (or goals), ranked across all 20 teams, with league average reference line
  • Shows attack and defence sides

8. Brentford's throw (throw-in volume over time)

  • Source: aggregated-by-team-season.json, Brentford only, throw_in pattern
  • Bars: throw-in shots per season. Line: xG per throw-in shot. Covers Brentford's Premier League seasons (2021/22 to 2025/26)

9. Set-piece shots against (Sankey, 2025/26)

  • Source: volley-chart-data.json, sankey section
  • Flow diagram: restart source → phase (first ball / second phase) → body part → outcome (goal / on target / off target / blocked)
  • Team selector compares any club against the league aggregate

10. Attack and defence (suppression dot plot, 2025/26)

  • Defence side: volley-chart-data.json suppression data (dead-ball goals conceded)
  • Attack side: mclachbot-aggregated/all-dead-ball.json (dead-ball goals scored)
  • Sorted by net dead-ball goals (scored minus conceded)

Caveats

  1. 2025/26 is in progress: 271 of 380 matches in WhoScored data. All current-season charts and statistics will change as the season completes.
  2. WhoScored data has small gaps: 2019/20 is missing 1 match (379 of 380), 2021/22 is missing 3 matches (377 of 380). These are unscraped fixtures and should not materially affect aggregates.
  3. "Set piece" categorisation relies on Opta qualifier tagging. Qualifier 160 (ThrowinSetPiece) was discovered during analysis — some shots from throw-in sequences were initially unclassified. The export script now counts them correctly.
  4. The mclachbot aggregated dataset starts at 2017/18. Charts using this source cover 9 seasons, not 11. The first two seasons (2015/16 and 2016/17) appear only in the stacked area and construction timeline charts, which use WhoScored event data directly.
  5. Height data is starts-weighted squad average. It does not capture set-piece-specific personnel (e.g. a team may bring on a tall substitute for late corners). Single-season snapshot (2025/26).
  6. Corner conversion is volatile at team level with small samples. A single season of 20–40 corner shots per club is insufficient to distinguish skill from variance. League-wide rates (400+ corner shots) are more stable.
  7. The Sankey "second phase" is defined as follow-up shots within 20 seconds of the initial restart event. This window is arbitrary but consistent across all teams and matches.
  8. xG values are Opta-provided. This article does not use a custom expected goals model. Opta xG is a proprietary model and its methodology is not publicly documented in full.
  9. Single-season charts (scatter, restart mix, Sankey, suppression) reflect a snapshot of 2025/26 at the time of data export. They do not show trends and should not be extrapolated.
  10. Man City decline table shows selected seasons (2017/18, 2021/22, 2024/25, 2025/26) rather than all 9 available. This is an editorial choice to highlight the arc: pre-Jover, peak, and decline.
  11. Arsenal pre/post-Jover xG comparison uses 2017/18–2020/21 as "before" and 2021/22–2024/25 as "with Jover" (4 seasons each). 2025/26 is excluded from the "with" average because it is incomplete.