We are building the cloud execution layer for Physical AI and next-generation multimodal workloads. We begin with the models powering robots, world simulators, spatial systems, and visual intelligence, and build the infrastructure that makes these compute-intensive AI workloads faster, easier, and more cost-effective to run.
The Role:
We are looking for an ML Systems & Inference Engineer to build the technical foundation of our platform. This role sits at the intersection of model serving, GPU systems, performance engineering, and cloud infrastructure.
This is not a generic backend role or a pure research role. The ideal candidate should be able to understand model code, profile runtimes, identify bottlenecks, develop optimizations, measure improvements, and ship reliable infrastructure solutions.
Selected intern's day-to-day responsibilities include
Build and optimize cloud inference pipelines for Physical AI, multimodal, generative, simulation, and world-model workloads.
Improve performance across startup time, queue time, latency, throughput, GPU utilization, reliability, and cost per output or job.
Develop platform execution components, including model packaging, warm pools, artifact and model caching, batching, queueing, scheduling, and model-aware execution policies.
Apply optimization techniques such as dynamic batching, quantized model variants, and torch. compile, TensorRT, ONNX Runtime, caching, routing, and distributed execution.
Profile bottlenecks across GPU compute, memory bandwidth, CPU preprocessing, I/O, model loading, serialization, queueing, and serving overhead.
Build benchmarking and evaluation systems to measure latency, throughput, startup time, memory usage, GPU utilization, cost, reliability, and workload quality.
Convert execution telemetry into product capabilities such as performance reporting, cost visibility, configuration recommendations, and workload comparisons.
Don't want to miss the next one?
Subscribe to daily email alerts for roles matching your interests.
are available for the work from home job/internship
can start the work from home job/internship between 29th Jun'26 and 3rd Aug'26
are available for duration of 6 months
have relevant skills and interests
Other requirements
Experience serving or optimizing multimodal workloads such as video, image generation, 3D, VLM, VLA, world models, simulations, synthetic data, gaming, or similar AI applications.
Experience with GPU cloud platforms like AWS, GCP, Azure, CoreWeave, Lambda, RunPod, Modal, or equivalent environments.
Experience with multi-GPU inference, cluster scheduling, or multi-tenant serving systems.
Familiarity with CUDA C++, Triton kernels, CUTLASS, PTX, or GPU performance optimization tools.
Knowledge of compiler and runtime stacks such as TVM, MLIR, XLA, TorchInductor, or TensorRT.
Strong experience in ML systems, inference serving, GPU-backed deployment, and performance optimization.
Deep PyTorch expertise with the ability to convert research implementations into production-ready systems.
Experience deploying and operating large AI models in cloud GPU environments.
Proficiency with performance debugging tools like PyTorch Profiler and NVIDIA Nsight.
Hands-on experience with inference frameworks such as TensorRT, Triton, ONNX Runtime, Ray Serve, vLLM, SGLang, Modal, or BentoML.
Understanding of batching, scheduling, quantization, compilation, distributed inference, Docker, cloud GPUs, and backend development.
Strong problem-solving skills, engineering judgment, and ability to work in a fast-paced startup environment.
Perks
Certificate Flexible work hours 5 days a week
Number of openings
3
About Hubnine India Private Limited
Delhi
Hubnine is a data-driven software company leveraging modern AI/ML tools to improve outcomes for our clients.