The Rabbit Hole: Cracking the FPL Price Algorithm (Part 1 of 7)

Part 1 of 7: Cracking the FPL Price Algorithm

Edit (15/2/26): People have already started looking at my data and offering help. I have noticed that I had an error in the fall model. Honest test F1 is 0.60, not 0.63. The previous number had test-set selection bias baked in.

Everyone who plays FPL has watched a player rise or fall in price and wondered why. I decided to actually find out or at least try.

What started as a weekend curiosity turned into 4 seasons of data, 720,000 rows, machine learning models, a server that never sleeps. This is the story of what I found.

The moment it started

I transferred out a player. He rose overnight. I lost 0.1m.

If you play FPL you know the feeling. You stare at the app and think: how does this actually work? The price went up, but why him? Why not the other player with more transfers? What are the rules?

The FPL website tells you almost nothing. There’s no documentation. No explanation. Just prices that change overnight while you sleep, decided by an algorithm that nobody outside of FPL towers has ever seen.

So I did what any reasonable person would do. I opened the API.

What everyone thinks they know

There’s a lot of conventional wisdom about FPL prices. Most of it is wrong.

“It’s based on net transfers.” Sort of, but not how you think.

“The most transferred-in player always rises.” Definitely not.

“fplstatistics will tell you who’s going to rise.” Sometimes. Their accuracy is… we’ll get to that.

Here’s what the FPL website actually tells you about price changes: nothing. But the API, the public endpoint that every FPL app and tool uses, gives you three key fields per player, updated once a day:

FPL API response showing key transfer fields

transfers_in_event, transfers_out_event, and selected_by_percent. That’s it. That’s your starting material. Three numbers per player per day, and from those three numbers, sites like fplstatistics.co.uk try to predict which players will change price.

The question is whether you can do better. And the answer is yes, but it’s going to take a while.

The first discovery that hooked me

I pulled the API for the first time and plotted net transfers against “did this player rise the next day.” And immediately something didn’t add up.

Thiago with 413k transfers didn't rise, Keane with 17k did — Thiago: **6 Jan 2026** — 413,851 net transfers, 29.2% owned, no rise.

Keane: **1 Dec 2025** — 17,648 net transfers, 2.3% owned, rose. (He also rose twice more that month on higher numbers.)

Thiago got 413,851 net transfers in a single day. Didn’t rise. Meanwhile Keane, with 17,648 transfers, rose the same week.

Thiago is owned by 29.2% of managers. Keane is owned by 2.3%.

That difference matters. A lot. But I didn’t know that yet. All I knew was that the simple “most transfers = price rise” model was obviously wrong, and now I needed to know what the actual model was.

This is how rabbit holes start. You notice one thing that doesn’t make sense, and instead of moving on with your life, you pull on the thread.

Haaland got 335,160 net transfers one day in September. 47.4% of all managers brought him in. He didn’t rise. If the most transferred player in the game, on one of the biggest transfer days of the season, doesn’t rise. Then whatever the algorithm is doing, it’s not what people think it’s doing.

The scale of the problem

Let me give you the numbers so you understand what we’re dealing with.

720,254 player-days, 2,035 rises scattered among them

720,254 player-days across 4 seasons. That’s roughly 800 players tracked every day for 4 years.

Of those 720,254 days, exactly 2,035 were rises. That’s 0.28%.

Falls are more common, 7,934, about 1.1% but that’s a different story for later.

So the task is this: find the pattern in 0.28% of the data. It’s like trying to spot a specific person in a crowd of 350, except the crowd changes every day and the person is wearing the same clothes as everyone else.

If you built a model that just said “nobody will rise today, ever” it would be correct 99.7% of the time. And completely useless.

This is the fundamental problem. Not finding the signal but finding it in a sea of noise where almost everything is “nothing happened.”

The decision to go deep

At this point a normal person would have read a Reddit thread about it and moved on. I decided I needed every daily snapshot of every FPL player for the last 4 seasons.

The Wayback Machine (that site that archives the entire internet) has snapshots of the FPL API going back years. Every bootstrap-static endpoint, every day, every player. Someone at archive.org is doing god’s work and I doubt they know that one of the beneficiaries is a bloke trying to reverse-engineer a fantasy football algorithm.

Three seasons of historical data from the Wayback Machine. One season of live data from a Supabase pipeline I built to capture the current season in real time. 122,000 records from the live collection alone.

4 seasons of FPL data from 2022-23 to 2025-26

“I’ll just scrape a few weeks of data” became “I need everything from 2022 onwards.” The dataset landed at 720,254 rows in a single parquet file, one row per player per day, with price, ownership, transfers in, transfers out, form, status, the lot.

This file combined_all_seasons.parquet became the centre of everything that followed. Every experiment, every model, every discovery started with loading this file.

The research begins

Phase 1 was boring but necessary: data quality. Timestamp consistency. Missing days. Duplicates. Leakage checks, making sure I wasn’t accidentally using tomorrow’s data to predict today. The kind of work that nobody writes blog posts about because it’s tedious and important in equal measure.

Phase 2 was more interesting: classifying each day as rise, fall, or no-change. Matching price changes to the daily snapshots. Building the target variable.

And immediately there were puzzles. Players who rose with surprisingly low transfers. Players who didn’t rise despite massive demand. Days where nothing happened even though the transfer market was going mad.

Season	Records	Rises	Falls	Days
2022-23	156,011	453	2,093	222
2023-24	231,700	620	1,945	300
2024-25	202,361	571	2,006	280
2025-26	130,182	391	1,890	175
Total	720,254	2,035	7,934	977

Each of those 2,035 rises had a story. Each one was a decision the algorithm made based on rules I couldn’t see. But the decisions were all in the data. I just couldn’t see the pattern yet.

That realisation that the answer was already there, sitting in 720,000 rows, waiting to be found is what turned a weekend project into a months-long obsession.

A disclaimer before we go any further

I should be honest with you now, before you read six more parts expecting a triumph at the end.

I haven’t cracked the code. At best, the model I built gets about 66% of rises right and 64% of falls. That’s the headline number after months of work.

To put that in context, here’s where the big three price prediction sites sit this season:

So on rises I’d sit somewhere between FFFix and FFHub. On falls I’d actually beat all three. However, there’s a massive caveat.

Those sites are predicting *live*. In real time. Right now. They’re looking at today’s transfers and telling you what’s going to happen tonight. My numbers come from backtesting on historical data, running the model against past seasons where I already know what happened. That’s obviously easier. You’re not predicting the future, you’re pattern-matching the past.

The live version of my model, running on an actual server making real predictions each night, scores lower. Significantly lower. We’ll get to why in Part 5.

So if you’re reading this hoping I’ve built something that beats LiveFPL. I haven’t. What I have done is pulled apart *how* the algorithm works, built something that gets close enough to be useful, and learned a lot about the gap between “works on my laptop” and “works in the real world.” That story, I think, is worth telling. In reality Im hoping for someone to go “Hey you forgot to carry the 1” and then my model predicts everything correctly

Technical Sidebar: The Data Pipeline

If you’re here for the data science, here’s what the pipeline looks like:

– Source: FPL API bootstrap-static endpoint, one snapshot per day
– Historical: 3 seasons scraped from Wayback Machine (2022-23 through 2024-25)
– Live: 2025-26 season collected via Supabase with a daily cron job (122k records in player_daily_activity.json)
– Master dataset: combined_all_seasons.parquet — 720,254 records, one row per player per day
– Fields per row: player_id, date, price, ownership_percent, transfers_in/out (daily and event), form, status, news, team, position, season, gameweek
– Labels: is_rise and is_fall derived from price_change_daily (matched against official price change records)
– Validation scripts: p1_extract_timestamps.py, p1_missing_days.py, p1_leakage_check.py, p1_duplicates.py
– Class balance: 0.28% rises, 1.10% falls, 98.62% no change

The single biggest data quality issue was timestamp alignment, making sure each row represented the state of the player before the price change decision, not after. Getting this wrong means leaking future information into the model, which makes your results look amazing and your predictions worthless.

What’s next

I had the data. 720,000 rows across 4 seasons. Now I just needed to figure out what the algorithm was actually doing with it.

Spoiler: it took months. And the answer was both simpler and more complicated than I expected.

Next: Part 2 — “720,000 Rows of Obsession”

This is Part 1 of a 7-part series about reverse-engineering the FPL price change algorithm. The research behind this series powers fplcore.com.

Response

720,000 Rows of Obsession — Cracking the FPL Price Algorithm (Part 2 of 7) – FPL Core Blog

February 18, 2026

[…] Read Part 1: “The Rabbit Hole” […]

LikeLike

Reply