AMD Zen Server RAS: Address Translation Library and FRU Memory Poison Manager
Reliability and serviceability support for AMD's Zen-based server platforms, including the Address Translation Library that decodes memory-error reports back to physical DIMM locations and the FRU Memory Poison Manager that tracks bad memory pages across reboots. Used on EPYC servers and Instinct MI300 accelerators to handle ECC events in datacentre and HPC systems.
recommendation
It should stay because the code is new (added in 2024), actively maintained with bug fixes landing through 2025 and 2026, and directly supports hardware AMD is still selling today, including EPYC 9004 server CPUs and Instinct MI300 AI accelerators. There is no sign of any deprecation or removal discussion upstream.
repository signals
sources
- kernel.org
Official kernel docs describe AMD Address Translation Library (ATL) as the address-translation component for Zen-based systems' memory-error handling.
- spinics.net
ATL received an upstream maintenance patch on 2026-03-07 ('Only load ATL when needed'), indicating current maintenance rather than removal.
- spinics.net
A 2025 patch updated both ATL and FMPM for MI300 row-address masking, showing substantive post-merge fixes in this directory.
- amd.com
AMD still markets the Instinct MI300 series accelerators, which align with the MI300-specific code paths present in this directory.
- amd.com
AMD still markets EPYC 9004 server CPUs, matching the Zen 4 server platform scope called out by ATL documentation and Kconfig help.
codex reasoning notes (technical)
Real driver directory: local shell inspection found loadable modules in drivers/ras/amd/fmpm.c and drivers/ras/amd/atl/core.c plus Kconfig entries and MAINTAINERS coverage. URLs were obtained via web search: kernel.org RAS docs (search_query) for scope, spinics msg24046 (search_query) for 2026 ATL maintenance activity, spinics msg5628570 (search_query) for 2025 ATL/FMPM bug-fix traffic, and AMD product pages (search_query) for current EPYC 9004 and Instinct MI300 market availability. No removal/deprecation thread was found; this code is new (2024+), maintained, and tied to currently sold server/HPC platforms, so removal/deprecation is not indicated. Deployment is low because it is limited to ECC/RAS-capable AMD server and accelerator systems rather than broad commodity hardware.