← Back to home

Project Case Study

Autograd Engine (From Scratch)

Updated: May 2026

View Repository →

Built a reverse-mode autodiff engine from scratch to understand how computation graphs are constructed and executed. Focused on correctness, scalability, and system-level trade-offs in gradient computation.

Problem

Modern ML frameworks abstract away gradient computation, making it difficult to understand how backpropagation and graph execution work internally. This project rebuilds autograd from first principles to analyze system behavior and execution trade-offs.

System Design

  • • Dynamic computation graph (define-by-run)
  • • Reverse-mode autodiff using topological traversal
  • • Gradient accumulation across dependency paths

Architecture

The system builds a dynamic computation graph during forward execution and performs reverse-mode differentiation using topological traversal.

Autograd architecture diagram

Results & Insights

  • • Gradient correctness validated (~1e-10 error)
  • • Backward pass scales approximately linearly with graph size
  • • Recursive traversal introduced depth limits (~1k nodes)
  • • Replaced recursion with iterative DFS, scaling to 10,000+ nodes
  • • Observed trade-offs between flexibility, memory usage, and execution efficiency

Benchmark

Backward pass performance increases with graph size, demonstrating near-linear scaling behavior.

Backward benchmark graph

Takeaway: Autograd performance depends not only on mathematical correctness, but also on graph traversal strategy, memory behavior, and execution overhead.

Technical Stack

Python · NumPy · Autograd · Backpropagation · Performance Analysis