Atlas AI
Distributed AI infrastructure platform for transformer systems, inference optimization, scaling analysis, observability, and performance engineering.
Distributed ML · Optimization · Systems
I build distributed training systems, autograd engines, and performance-focused ML infrastructure from first principles—analyzing how memory, communication, and compute constraints shape real-world performance.
Selected Work
Distributed AI infrastructure platform for transformer systems, inference optimization, scaling analysis, observability, and performance engineering.
Systems-oriented profiler for analyzing communication overhead, memory bottlenecks, scaling efficiency, and distributed training behavior in large-scale ML workloads.
Developer infrastructure platform for automated benchmark regression detection, performance analysis, and GitHub pull request feedback in ML systems and backend workflows.
Distributed training simulator analyzing scaling efficiency, communication overhead, and system-level bottlenecks across data-parallel workloads.
Reverse-mode autodiff engine with dynamic computation graphs and topological backpropagation. Verified gradient correctness and analyzed trade-offs between memory usage, execution efficiency, and graph flexibility.
CLI-based ML reproducibility auditor that evaluates repositories for engineering quality, system design patterns, and reproducibility signals using GitHub API analysis.
CLI-based experiment tracking system for reproducible ML workflows, enabling structured run logging, metric comparison, and evaluation across experiments.
Journey
2017 — 2020
Built foundational knowledge in programming, data structures, and core system concepts, forming the base for further study in computer science and engineering.
2020 — 2023
Built strong foundations in computer science, including data structures, operating systems, distributed systems, and machine learning, shaping a systems-oriented approach to problem solving.
2023 — Present
Building distributed training systems, autograd engines, and performance benchmarking tools from first principles.
Next
Moving toward building and studying large-scale machine learning systems, combining distributed systems, optimization, and performance engineering to understand how models behave under real-world constraints.