coredump-debug
Debug segfaults and crashes in JAX/XLA/ROCm training workloads using coredump analysis. Use when the user has a coredump file, SIGSEGV, segfault, crash dump, or core file to analyze. Covers GDB backtrace extraction, identifying the crash cause from registers and disassembly, finding and cloning the correct source code versions, and reading the relevant code to determine the root cause.
Changelog: Source: GitHub https://github.com/AMD-AGI/maxtext-slurm
No comments yet. Be the first one!