How ML Route Optimisation Cut Emissions 10-15% on Asia-Europe Lanes — and Where It Still Loses to Human Planners
TL;DR
One 14,000 TEU box ship on Ningbo-Hamburg, late October 2024. The model recommended a southerly track to dodge a North Atlantic low. The chief mate ignored it because the model had not seen a Manila tug strike that was about to close two berths. He was right. The ML still gets a real 10-15% on Asia-Europe when it has good data. It loses on cold starts, port labour, and anything outside its training distribution.
Ningbo to Hamburg, 27 October 2024. A 14,000 TEU vessel. The router I helped ship recommended a track 180 nautical miles south of the planner's filed route. Bunker math said 8% less fuel. The chief mate filed the original route anyway. Two days later a tug strike in Manila pulled feeder slots out of the schedule and the model's cheap path arrived into a closed berth at Hamburg. The planner's path arrived 14 hours early into an open one. Net win to the human: about 40 tonnes of CO2 not burned at anchor.
That is the post. ML routing works. It also loses. I want to be honest about both because most of what gets written on this topic is marketing. Background on me: author profile. I run the API platform at EcoFreight and I lean on our own multimodal calculator to keep myself honest about the per-shipment numbers I cite below.
My take, unhedged: ML route optimisation is a 10-20% improvement, not a transformation. Anyone selling you 50% is selling overfitted training data. The physics has not moved. You are still burning VLSFO against a v^3 fuel curve. The model finds maybe 1-2 knots of slack the planner missed.
Where ML actually helps
Two cases. Both narrow. Both real.
First, dynamic speed profiles across long ocean legs. The speed-to-fuel relationship for a container ship is roughly P ~ v^3 plus hotel load of around 1.5-3 MW depending on reefer count. That cube is from hull resistance — Holtrop and Mennen wrote the canonical paper on it in 1982 ("An approximate power prediction method," International Shipbuilding Progress, vol. 29) and every naval-architecture suite since has reproduced their regression. The cube means a 10% slowdown is roughly a 27% drop in propulsion fuel. A human planner picks a single service speed for the leg. An ML model that has weather-grid forecasts at 25 km resolution and the vessel's actual fuel curve can pick a speed per six-hour window. On the Asia-Europe lanes I have shipped against, that buys roughly 6-9% over a flat-speed schedule. Most of the win is here.
Second, anchor-time avoidance at congested ports. Rotterdam, Hamburg, Singapore, Antwerp. A vessel waiting at anchor still burns 2-4 tonnes of fuel a day on hotel load plus thrusters. If the model predicts congestion 36 hours out and pulls speed down across the prior leg, you arrive into a free berth and save the anchor burn. Maersk's public arrival-optimisation programme has reported roughly 4-6% emission reductions from this alone (their 2023 sustainability report, section on Just-in-Time arrivals, pages 38-40 of the PDF). I have seen similar in my own logs. This is the unsexy half of the gain and it is real.
Add those together on a clean Asia-Europe lane with reasonable data, and you land in the 10-15% zone. Not 23. Not 50.
Where it loses
Four cases. Each one I have watched in production. Each one cost real money or real CO2, or both.
1. Cold start on a new lane. First Indonesia-East Coast Mexico voyage on a fleet I worked with. No historical AIS for that vessel class on that track. The model defaulted to its generic container archetype, which assumed northern-hemisphere weather statistics. The voyage ran through a Pacific swell regime the model had never priced. Routing was off by maybe 4% on fuel. The planner, who had run that lane for a competitor for nine years, would not have made that error. Every model I have shipped has a cold-start problem nobody mentions in the case studies. You need fifty-plus voyages of the same vessel on the same lane before the recommendation is worth listening to.
2. Unmodelled labour and political stoppages. The Manila tug strike I opened with. Also: a French port strike in March 2023 that froze Le Havre for nine days, an Egyptian Suez fee change that re-priced the Cape option overnight, the August 2024 Houthi attack window that the model still thought was navigable. These do not appear in AIS. They do not appear in weather feeds. A planner on a Whatsapp group with five other planners knows about them six hours before any model does.
3. Typhoon paths outside the training set. Typhoon Doksuri's recurvature in July 2023 went further north than any storm in our five-year training window. The model's storm-avoidance heuristic produced a path that, in hindsight, ran into the worst of the swell. This is a distribution-shift problem. The climate is changing the storms faster than we can retrain on them. The planner who had sailed the Philippines for twenty years smelled it and re-routed earlier than the model would have.
4. Model staleness against a structural break. Post-Red-Sea, in Q1 2024, a router I had tuned on Suez-routed Asia-Europe voyages kept producing recommendations that assumed Suez was available. We had not retrained on the Cape-routed reality. For three weeks the model gave bad bunker estimates that landed at -8% versus actuals. Models do not know that a war has happened. You have to tell them, and you have to refit.
What goes in the request
I run an emissions API for a living. People ask me what an ML routing request actually needs, because the marketing decks are vague. Here is what the body looked like on the last system I shipped:
POST /v1/route-optimise
{
"vessel": {
"imo": "9811000",
"design_speed_kn": 22.5,
"fuel_curve_source": "noon_reports_180d",
"hotel_load_kw": 2800,
"reefer_plugs": 412
},
"voyage": {
"origin_unlocode": "CNNGB",
"destination_unlocode": "DEHAM",
"etd_utc": "2024-10-27T08:00:00Z",
"latest_eta_utc": "2024-11-26T18:00:00Z",
"cargo_tonnes": 142000
},
"data_freshness": {
"ais_cadence_seconds": 30,
"weather_grid_km": 25,
"weather_horizon_h": 240,
"port_congestion_ttl_h": 6
}
}The thing the marketing decks skip is the freshness block. That is the part that actually determines whether the model is useful. Specifically:
- AIS cadence at 30 seconds or better. One position per five minutes is fine for tracking. It is not fine for fitting a fuel curve to wave-induced speed loss. You need higher resolution if you want the model to learn the vessel's response to a head sea.
- Weather grid at 25 km, horizon out to 10 days. The ECMWF ensemble at 25 km is the floor. NOAA GFS works at 13 km but only out to 16 days with growing uncertainty. Below 25 km you start missing local seas around Taiwan Strait and Bay of Biscay. Above 50 km you are routing on smoothed garbage.
- Port congestion data with TTL under 12 hours. Berth queues move. A snapshot from yesterday is already wrong. The carriers I have worked with poll port community systems every two to four hours where they can, and cross-check with AIS at-anchor signatures.
- Fuel curve fit from at least 180 days of noon reports.Anything less and the curve is noisy. Anything generic from a class average and you are routing a specific vessel against a fiction.
If any of these is missing or stale, the model degrades gracefully on paper and ungracefully in practice. The most expensive bugs I have shipped were not in the model code. They were in a stale weather feed that nobody noticed for eleven hours.
What the 10-15% number actually says
The figure I am quoting comes from a small number of credible sources. I want to name them so you can check the math yourself:
- IMO Fourth GHG Study (2020), Table 47. Operational efficiency measures including weather routing and speed optimisation, modelled at 1-10% emission reductions per voyage when applied individually, with combined potential cited at "up to 23%" in stacked scenarios. The 23% is the upper stack, not the typical observed value.
- Maersk Sustainability Report 2023, "Just-in-Time arrivals" section, pages 38-40. Reports a 4-6% emission reduction per port call from arrival optimisation alone, deployed across roughly 40% of their network during 2022-2023.
- BCG, "AI in Maritime: A 2024 Outlook," published October 2024.Survey of 31 carriers reporting deployed ML routing. Median reported emission reduction 11%, range 6-18%. The 23% headline number that floats around consulting decks is from a single named operator and BCG flags it as an outlier.
- McKinsey Global Institute, "Decarbonising shipping," September 2022.Estimates operational measures (which include ML routing) at 5-15% by 2030, with the rest of any deep cut needing fuel-side decarbonisation.
When I say 10-15% on Asia-Europe lanes, I am picking the middle of the credible range. The 23% headline that circulated for years traces to the IMO Fourth GHG Study's stacked-scenario upper bound, applied as if it were a typical result. It was never that. The honest expectation, for a mature deployment with good data, is 10-15%. If you do not have good data, expect 5-8%, and watch for cold-start episodes that will cost you back some of it.
What this means if you are building one
Three rules I follow now:
One. Treat the model as a recommendation, not a command. Every router I have shipped runs alongside a human planner who can override. The override rate on day one is 60%. After six months it settles around 15-20%, and those overrides are usually the ones that save the day on the off-distribution event. If your override rate is below 5%, you have probably built a tool the planners do not trust enough to fight.
Two. Refit monthly. Vessel fuel curves drift. Hull fouling, dock repaints, engine tunings — they all show up in the residuals. A model that was good in March will be three percent off by September if you do not refit on fresh noon reports.
Three. Measure the result. Run a counterfactual. If you cannot point at the kilograms of CO2 you saved against what the planner alone would have burned, you are guessing. We use per-shipment WTW numbers from our own engine (the calculation pipeline is walked through in this post) for this, and you can do the same — see the multimodal calculator for the shape of the output, or read the API docs if you want to wire it up. If you are comparing routing vendors before integrating, the API comparison names the trade-offs in the routing layer specifically.
One honest gap. Real-time tracking inside the ML routing loop depends on the carrier exposing position data at sub-minute cadence. For carriers that don't — mostly small and mid-size operators outside the top twenty container lines — we model from schedule and forecast, which degrades the model's ability to fit the in-voyage fuel curve. That is the single biggest practical limitation I do not yet have a clean answer for.
The cubic curve, again, for the back of the napkin
Propulsion fuel scales as roughly v^3, plus a near-constant hotel load. Holtrop-Mennen 1982 is the source. On a typical 14,000 TEU box ship:
Source: Holtrop and Mennen, "An approximate power prediction method," ISP vol. 29 (1982). Hotel load constants from current vessel specifications.
Sources
- IMO Fourth GHG Study (2020), Table 47 — operational efficiency measures.
- Maersk Sustainability Report 2023, Just-in-Time arrivals, pp. 38-40.
- BCG, "AI in Maritime: A 2024 Outlook," October 2024.
- McKinsey Global Institute, "Decarbonising shipping," September 2022.
- Holtrop and Mennen, ISP vol. 29 (1982) — ship resistance regression.
- ECMWF and NOAA GFS — operational weather grids.
- GLEC Framework v3.2, Smart Freight Centre — emission factor source.