Distributed ML · Optimization · Systems

Omprakash Sahani

ML Systems Engineer

I build distributed training systems, autograd engines, and performance-focused ML infrastructure from first principles—analyzing how memory, communication, and compute constraints shape real-world performance.

View Selected Work Download Resume

Selected Work

Systems projects focused on performance, scalability, and real-world behavior

Featured

Atlas AI

Distributed AI infrastructure platform for transformer systems, inference optimization, scaling analysis, observability, and performance engineering.

PythonTransformersDistributed SystemsPerformance

View Code →

Featured

Distributed Training Profiler

Systems-oriented profiler for analyzing communication overhead, memory bottlenecks, scaling efficiency, and distributed training behavior in large-scale ML workloads.

PythonDistributed SystemsPerformanceMemory

View Code →

Benchmark Guardian

Developer infrastructure platform for automated benchmark regression detection, performance analysis, and GitHub pull request feedback in ML systems and backend workflows.

PythonFastAPIGitHub AppsPerformance

View Code →

Distributed Training

Distributed training simulator analyzing scaling efficiency, communication overhead, and system-level bottlenecks across data-parallel workloads.

PythonDistributed SystemsAll-ReducePerformance

View Code →

Autograd Engine

Reverse-mode autodiff engine with dynamic computation graphs and topological backpropagation. Verified gradient correctness and analyzed trade-offs between memory usage, execution efficiency, and graph flexibility.

PythonAutogradBackpropagationPerformance

View Code →

ML Reproducibility Auditor

CLI-based ML reproducibility auditor that evaluates repositories for engineering quality, system design patterns, and reproducibility signals using GitHub API analysis.

PythonCLIReproducibilityML Systems

View Code →

ML Experiment Tracker

CLI-based experiment tracking system for reproducible ML workflows, enabling structured run logging, metric comparison, and evaluation across experiments.

PythonCLIReproducibilityML Systems

View Code →

Journey

From foundations to ML systems engineering

2017 — 2020

Diploma in Computer Engineering

Built foundational knowledge in programming, data structures, and core system concepts, forming the base for further study in computer science and engineering.

2020 — 2023

B.Tech in Computer Science and Engineering

Built strong foundations in computer science, including data structures, operating systems, distributed systems, and machine learning, shaping a systems-oriented approach to problem solving.

2023 — Present

ML Systems Engineering (Independent)

Building distributed training systems, autograd engines, and performance benchmarking tools from first principles.

ML Systems Engineering & Research

Moving toward building and studying large-scale machine learning systems, combining distributed systems, optimization, and performance engineering to understand how models behave under real-world constraints.