Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni · 2026-06-22T01:30:47.087Z

CPU: AMD RYZEN THREADRIPPER PRO 9995WX M/B: ASUS PRO WS WRX90E-SAGE SE RAM: SAMSUNG DDR5-5600 ECC/REG 1TB(128GBx8) SSD: SAMSUNG 990 PRO M.2 NVMe 4TB VGA: NVIDIA H200 141GB NVL(4EA) POWER: CORSAIR WS3000 ATX3.1(2EA) CASE: Phanteks ENTHOO PRO 2 T https://lnkd.in/gRZcTcBC

San Jose, Kaliforniya, Birleşik Devletler

Subramaniyam Venkata Pooni adlı kullanıcının tam profili görüntülemek için oturum açın

Subramaniyam Venkata sizi DaaX.ai şirketindeki 3 kişiyle tanıştırabilir

E-posta veya telefon

Şifre

Şifrenizi mi unuttunuz?

veya

LinkedIn‘de yeni misiniz? Hemen katılın

Devam Et’i tıklayarak veya oturum açarak LinkedIn Kullanıcı Anlaşması’nı, Gizlilik Politikası’nı ve Çerez Politikası’nı kabul edersiniz.

5 B takipçi 500+ bağlantı

Subramaniyam Venkata Pooni ile ortal bağlantıları görüntüle

Subramaniyam Venkata sizi DaaX.ai şirketindeki 3 kişiyle tanıştırabilir

E-posta veya telefon

Şifre

Şifrenizi mi unuttunuz?

veya

LinkedIn‘de yeni misiniz? Hemen katılın

Devam Et’i tıklayarak veya oturum açarak LinkedIn Kullanıcı Anlaşması’nı, Gizlilik Politikası’nı ve Çerez Politikası’nı kabul edersiniz.

Profili görüntülemek için katılın

CS²B TECHNOLOGIES INC

Indian Institute of Technology Madras

Hakkında

Founding Principal Architect of CS²B Technologies Inc (🔗…

Hizmetler

Teklif talep et

Subramaniyam Venkata Pooni adlı kullanıcıya ait yazılar

Speculative Decoding on the DGX Spark (GB10)

2 Tem 2026

Speculative Decoding on the DGX Spark (GB10)

Llama-3.3-70B + 8B Draft — a 3.
Flash Attention kernel in NVIDIA's new cuTile Python DSL

2 Tem 2026

Flash Attention kernel in NVIDIA's new cuTile Python DSL

I built a Flash Attention kernel in NVIDIA's new cuTile Python DSL and optimized it step by step on a DGX Spark (GB10).…
What a $4,000 Desktop Supercomputer Can Actually Do Benchmarking LLM Inference on the NVIDIA DGX Spark (GB10)

1 Tem 2026

What a $4,000 Desktop Supercomputer Can Actually Do Benchmarking LLM Inference on the NVIDIA DGX Spark (GB10)

FP4 vs FP8 · vLLM vs TensorRT-LLM · thermal & clock behavior — all measured on-device. The NVIDIA DGX Spark puts a…

3 Yorum
Own the Stack, Rent the Frontier: The Case for a Personal AI Computer

21 Haz 2026

Own the Stack, Rent the Frontier: The Case for a Personal AI Computer

For about fifteen years, personal computing was a story of disappearance. Your files moved into someone else's cloud…
The $3.6 Trillion Question Nobody on the Roadshow Wants Asked

9 Haz 2026

The $3.6 Trillion Question Nobody on the Roadshow Wants Asked

Three companies are lining up to go public at a combined valuation of roughly $3.6 trillion.
The Token Bill Comes Due

8 Haz 2026

The Token Bill Comes Due

How the cheapest unit of compute in history became the most expensive line on the enterprise ledger — and why the…
Smart Agents, No Kernel: Why Enterprise AI Needs an Operating System

6 Haz 2026

Smart Agents, No Kernel: Why Enterprise AI Needs an Operating System

Smart Agents, No Kernel: Why Enterprise AI Needs an Operating System We keep talking about making agents smarter. After…

2 Yorum
Your SOC Was Built for a World That's Ending

5 Haz 2026

Your SOC Was Built for a World That's Ending

The 90-day patch window just became a liability. Here's what defending at machine speed actually requires.

1 Yorum
Measuring the Quality of Software Design

4 Haz 2026

Measuring the Quality of Software Design

Design quality feels like something you know when you see it — and that's exactly the problem. "Good design" is…
Picking an Architecture Without Lying to Yourself

3 Haz 2026

Picking an Architecture Without Lying to Yourself

Every architecture decision I've ever regretted started the same way: I fell in love with a solution before I…

See all articles

Faaliyet

5 B takipçi

Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni

3s
Bu yayını rapor et
Subramaniyam Venkata Pooni bunu yayınladı
I arrived in America with two suitcases and a dream. The formula: arrive with almost nothing, work hard, achieve a title. The implication: the system works, and if it didn't work for you, look inward. Why not a H-1B, a signed offer letter, a relocation stipend, and a cousin with a spare bedroom in Fremont ? But those don't fit the template. LinkedIn has turned the immigrant story into a Mad Libs exercise: [number] suitcases, $[small number] in my pocket, [one-way ticket / borrowed coat], and now [C-suite title]. Cue 40,000 likes. Three problems with this narrative: 1. Survivorship bias. For every two-suitcase VP, thousands arrived with the same suitcases and got ground down by visa limbo, credential non-recognition, or plain bad luck. Their stories don't trend. 2. Omitted variables. Many tellers arrived with elite degrees, employer sponsorship, and professional networks. The suitcases were light; the invisible capital was not. 3. It weaponizes struggle. Compressing decades of structural difficulty into an inspirational hook trains audiences to see immigration as an individual grit test rather than a policy environment — one where a decades-long green card backlog can define an entire career. Immigrant stories deserve telling. But honest ones include the H-1B handcuffs, the country-cap queues, the years of deferred agency. That version is less viral and more true. Here's the thing — the real immigrant story is harder AND less cinematic. It's credential re-verification. It's visa renewals that hold your career hostage for a decade. It's watching younger colleagues get promoted while your green card sits in a queue. It's your kids translating for you at the DMV. Nobody posts that version. It doesn't fit in a hook line. So the next time you see 'two suitcases and a dream' — ask what got edited out. Usually, it's the parts that would actually teach you something. #immigration #careers #authenticity
2 Yorum
Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni

8s
Bu yayını rapor et
Subramaniyam Venkata Pooni bunu paylaştı
Algorithms Are Logarithmic, AI Is Quadratic — The Hidden Cost of Long Context Windows "Attention is quadratic — cost per token scales with sequence length; long context is expensive for structural reasons, not pricing choice." KV cache, positional generalization, and effective-vs-advertised are parallel constraints. Core thesis: "The 1M token window is the megapixel race of LLMs — every spec sheet has one, almost no real workload uses it." Advertised capacity ≠ effective context; production long-context is an architecture problem, not a model problem. Long context vs RAG: not competitors — retrieval + reranking feeding a capped window beats stuffing 800K tokens of structured noise. Long context wins only for coherent artifacts needing whole-document structure (contracts, codebases). Failure modes: lost-in-the-middle (20–40% mid-context accuracy drop), multi-needle/multi-hop collapse at 1M. RAG not obsolete; 32–128K multi-zone for a 200-page contract; 10M advertised but no public evidence of quality workloads near it; build domain-specific multi-needle tests. "95% of real workloads live under 128K." https://lnkd.in/gGgzyRvG

Algorithms Are Logarithmic, AI Is Quadratic — The Hidden Cost of Long Context Windows

Algorithms Are Logarithmic, AI Is Quadratic — The Hidden Cost of Long Context Windows
Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni

2g
Bu yayını rapor et
Subramaniyam Venkata Pooni bunu paylaştı
Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni

2g
Bu yayını rapor et
Subramaniyam Venkata Pooni bunu paylaştı
Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni

3g
Bu yayını rapor et
Subramaniyam Venkata Pooni bunu paylaştı
Benchmark: vLLM vs TensorRT-LLM on DGX Spark (GB10) Model: Llama-3.1-8B · FP4 · 600-token generation · clock-capped @ 2200 MHz Two independent automated runs. RUN 1 vLLM (NVFP4) single-stream: 39.2 tok/s concurrent (16): 519.1 tok/s TensorRT-LLM (FP4) single-stream: 19.6 tok/s concurrent (16): 624.6 tok/s RUN 2 vLLM (NVFP4) single-stream: 40.9 tok/s concurrent (16): 540.1 tok/s TensorRT-LLM (FP4) single-stream: 19.8 tok/s concurrent (16): 615.5 tok/s THE TAKEAWAY Each engine wins a different category — and the split held across both runs: vLLM → ~2× faster single-stream latency (~40 vs ~20 tok/s) TensorRT-LLM → ~15% higher batched throughput (~620 vs ~530 tok/s) vLLM is the pick for interactive / single-user workloads (chat, coding assistants). TensorRT-LLM is the pick for high-concurrency serving and batch pipelines. At 8B, both converge to a similar concurrent ceiling — because on the GB10, LLM inference is memory-bandwidth bound (~273 GB/s), and both engines saturate the same bus. The "most optimized engine" doesn't automatically win; match the engine to whether you're latency- or throughput-bound. Method: fixed prompt, 600-token cap, OpenAI-compatible endpoint, engines served one at a time (each with the full GPU), averaged over repeated automated runs. Clock-capped at 2200 MHz for a fair, thermally-stable comparison. #AIInfrastructure #LLM #NVIDIA #DGXSpark #Inference #Quantization #MLOps #EdgeAI
3 Yorum
Subramaniyam Venkata Pooni bunu yeniden yayınladı
Bu yayını rapor et
Sunil Baliga

Sunil Baliga

5g

Subramaniyam Venkata Pooni bunu yeniden yayınladı
Today we achieved two key milestones: (1) We launched self-service. Customers can sign up to use our LAKEer (natural language search of unstructured data) and SQLer (natural language-to-SQL generation) agents via our website. They can start with our free plan and then move to a paid plan as their requirements grow. We believe trustworthy answers are of critical importance to enterprise users. Earlier this year, LAKEer achieved an industry-leading score of 77.7 on the Google DeepMind FACTS Grounding Benchmark, which measures factual accuracy and grounding — a critical requirement for enterprise AI, where incorrect answers can lead to costly mistakes. DaaX achieves this accuracy by giving LLMs a helping hand, combining them with knowledge graphs, reasoners, verifiers, domain knowledge, and company language (neuro-symbolic AI). Domain knowledge is the second milestone we achieved today. (2) We launched an ecosystem where partners can monetize their domain knowledge by developing Domain Knowledge Cartridges for use with our platform. These cartridges provide domain knowledge to our tech. We announced Fission Labs (a leader in applied AI engineering, Fission Labs) as our inaugural partner for Domain Knowledge Cartridges ecosystem. They plan to develop cartridges for use with LAKEer and SQLer. Cartridges our partners develop are their IP and they can monetize them as they want. Two cartridges developed by DaaX (for eCommerce and Oil & Gas) are available today. We have some more exciting announcements in the pipeline - stay tuned!

DaaX.ai

DaaX.ai

5g

Subramaniyam Venkata Pooni bunu yeniden yayınladı
DaaX Launches Self-Service AI, Powered by Domain Knowledge Cartridges for Trustworthy Answers Customers can sign up at www.daax.ai and start using LAKEer (natural language search of unstructured data) and SQLer (natural language to SQL generation) in minutes — beginning with a free plan, with no sales calls or procurement cycles DaaX also opens our platform for partners to build and sell Industry AI knowledge packages. Partners can now monetize their industry-specific expertise for use with our benchmark-proven enterprise AI platform. Read the full press release at https://lnkd.in/gp4QwMpy

Most Trustworthy Agentic Enterprise Search

Most Trustworthy Agentic Enterprise Search
2 Yorum
Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni

1h
Bu yayını rapor et
Subramaniyam Venkata Pooni bunu paylaştı
This week's Token Drop podcast: Who Decides What's "Safe" AI? Ethics, Trust & the Global Language Divide Dr. Sarah Luger, AI safety and evaluation expert, generative AI research director, University of Edinburgh PhD, and a contributor to ML Commons' AI Luminate benchmark, who earlier in her career worked on IBM Watson's Jeopardy! Challenger, was this week's Token Drop guest panelist. We trace how the industry's language has shifted from ethics to responsible AI to trust and transparency to safety and security, and why that shift matters. Listen at https://lnkd.in/gwwksuHH Dr. Luger's profile: https://lnkd.in/g2sJ8wWc #artificialintelligence

Who Decides What's "Safe" AI? Ethics, Trust & the Global Language Divide

Who Decides What's "Safe" AI? Ethics, Trust & the Global Language Divide
1 Yorum
Subramaniyam Venkata Pooni bunu yeniden yayınladı
Bu yayını rapor et
Subramaniyam Venkata Pooni bunu yeniden yayınladı

SVCE Alumni Association Sri Venkateswara College of Engineering (AASVCE)

SVCE Alumni Association Sri Venkateswara College of Engineering (AASVCE)

1h

Subramaniyam Venkata Pooni bunu yeniden yayınladı
One of ours. A Padma Shri awardee. Heartiest congratulations to Prof. Dr V. Kamakoti (Batch of 1989), Director of IIT Madras, on being conferred the Padma Shri by the Government of India. The SVCE alumni community is delighted to see one of our distinguished alumni receive this well-deserved recognition. #PadmaShri #SVCEAlumni #AlumniAchievement

public_profile__posts
18 Yorum
Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni

1h
Bu yayını rapor et
Subramaniyam Venkata Pooni bunu paylaştı
CPU: AMD RYZEN THREADRIPPER PRO 9995WX M/B: ASUS PRO WS WRX90E-SAGE SE RAM: SAMSUNG DDR5-5600 ECC/REG 1TB(128GBx8) SSD: SAMSUNG 990 PRO M.2 NVMe 4TB VGA: NVIDIA H200 141GB NVL(4EA) POWER: CORSAIR WS3000 ATX3.1(2EA) CASE: Phanteks ENTHOO PRO 2 T https://lnkd.in/gRZcTcBC

$150,000 The world's best workstation

$150,000 The world's best workstation
4 Yorum

Subramaniyam Venkata Pooni bir yorumu yanıtladı 2s

Fair question, Garth — and honestly, no. It's harder today. The two-suitcase era stories mostly date from when an H-1B was near-certain and a green card took 2-3 years. Today the H-1B is a lottery with ~25% odds, the India EB-2/EB-3 backlog runs into decades, and a layoff gives you 60 days to find a sponsor or leave. So if anything, the myth is more misleading now: the survivors who post these stories climbed a ladder that's since been pulled up several rungs. That's exactly why I think the honest version matters more than the cinematic one. I have students who've completed their master's degrees reaching out to me for help securing jobs — they have to leave the country within a month.

Subramaniyam Venkata Pooni bir yorumu yanıtladı 2g

Deepak Chaturvedi Thank you — fair pushback, and you're right on the omission. I held the simple case on purpose (single model, no spec decoding, stock CUDA-graph config) to isolate the quantization/engine/kernel variables, but that means the piece undersells the device. Speculative decoding is the clearest gap. On a bandwidth-bound box it's arguably the lever — a small draft model proposes tokens the target verifies in one memory pass, spending spare compute to avoid weight reads, exactly where this hardware is constrained. A 27B target with a small draft is the right next experiment, and I'll run it. On stacking: my read is it helps capacity and multi-service hosting (two 273 GB/s domains over the CX-7 link) but not single-model bandwidth — you don't sum the buses for one model. Agree the production gap is real: an LLM + embedder + reranker + vector store fills 128 GB far faster than any single-model benchmark suggests.

Subramaniyam Venkata Pooni bir gönderiye yorum yaptı 4g

Huge day for daax.ai

Subramaniyam Venkata Pooni bunu beğendi
Bu yayını rapor et
Sunil Baliga

Sunil Baliga

5g

Subramaniyam Venkata Pooni bunu beğendi
Today we achieved two key milestones: (1) We launched self-service. Customers can sign up to use our LAKEer (natural language search of unstructured data) and SQLer (natural language-to-SQL generation) agents via our website. They can start with our free plan and then move to a paid plan as their requirements grow. We believe trustworthy answers are of critical importance to enterprise users. Earlier this year, LAKEer achieved an industry-leading score of 77.7 on the Google DeepMind FACTS Grounding Benchmark, which measures factual accuracy and grounding — a critical requirement for enterprise AI, where incorrect answers can lead to costly mistakes. DaaX achieves this accuracy by giving LLMs a helping hand, combining them with knowledge graphs, reasoners, verifiers, domain knowledge, and company language (neuro-symbolic AI). Domain knowledge is the second milestone we achieved today. (2) We launched an ecosystem where partners can monetize their domain knowledge by developing Domain Knowledge Cartridges for use with our platform. These cartridges provide domain knowledge to our tech. We announced Fission Labs (a leader in applied AI engineering, Fission Labs) as our inaugural partner for Domain Knowledge Cartridges ecosystem. They plan to develop cartridges for use with LAKEer and SQLer. Cartridges our partners develop are their IP and they can monetize them as they want. Two cartridges developed by DaaX (for eCommerce and Oil & Gas) are available today. We have some more exciting announcements in the pipeline - stay tuned!

DaaX.ai

DaaX.ai

5g

Subramaniyam Venkata Pooni bunu beğendi
DaaX Launches Self-Service AI, Powered by Domain Knowledge Cartridges for Trustworthy Answers Customers can sign up at www.daax.ai and start using LAKEer (natural language search of unstructured data) and SQLer (natural language to SQL generation) in minutes — beginning with a free plan, with no sales calls or procurement cycles DaaX also opens our platform for partners to build and sell Industry AI knowledge packages. Partners can now monetize their industry-specific expertise for use with our benchmark-proven enterprise AI platform. Read the full press release at https://lnkd.in/gp4QwMpy

Most Trustworthy Agentic Enterprise Search

Most Trustworthy Agentic Enterprise Search
2 Yorum

See all activities

Deneyim ve Eğitim

CS²B TECHNOLOGIES INC

*******

********* ** *********
********

********* ********** ***** ******* * ***** * *** * ****
****** ********* ** ********** ******

******** ********* ******** ****** GATE score CSE 98%
********* **********

******** ****** ********** *********** ******* ****** **** ***********

Subramaniyam Venkata Pooni adlı kişinin tam deneyimin görüntüleyin

Unvan, işte kalma süresi ve daha fazlasını görün.

veya

Devam Et’i tıklayarak veya oturum açarak LinkedIn Kullanıcı Anlaşması’nı, Gizlilik Politikası’nı ve Çerez Politikası’nı kabul edersiniz.

Lisanslar ve Sertifikalar

AI Agent Engineering: From ReAct, Agentic RAG to Multi-Agent Orchestration

Maven

Eki 2025 tarihinde verildi

Yeterlilik Kimliği ejRjKlLJ

Yeterlilik belgesini gör
VMware by Broadcom Center for Advanced Learning ACE 2025

Broadcom

Nis 2025 tarihinde verildi

Yeterlilik belgesini gör
VMware by Broadcom, Center for Advanced Learning AAC 2025

Broadcom

Nis 2025 tarihinde verildi

Yeterlilik belgesini gör
Technical Deep Dive Livefire: VCF Multi-location Implementation (NSX Federation)

Broadcom

Mar 2025 tarihinde verildi

Yeterlilik belgesini gör
Technical Deep Dive Livefire : Mastering Aria Automation with Deployment Placement & YAML Template Configuration

Broadcom

Şub 2025 tarihinde verildi

Yeterlilik belgesini gör
Technical Deep Dive Livefire: VCF Networking Design

Broadcom

Şub 2025 tarihinde verildi

Yeterlilik belgesini gör
VCF Livefire: Infrastructure and Automation

Broadcom

Oca 2025 tarihinde verildi

Yeterlilik belgesini gör

Patentler

Backup procedure with transparent load balancing

Yayın tarihi 12 Şubat 2013 US 8,375,396
De-duplication/Chunking Patent

Diğer patent sahipleri
Patenti gör
Method and apparatus for generating persistent path identifiers

Yayın tarihi 27 Ekim 2009 US 7,610,295
Resource Unique Identifiers

Diğer patent sahipleri
Patenti gör
Method and apparatus for identifying multiple paths to discovered SCSI devices and specific to set of physical path information

Yayın tarihi 20 Şubat 2007 US 7,181,553
Multi-Path detection patent

Diğer patent sahipleri
Patenti gör
Method and apparatus for identifying multiple paths to a SCSI device using a calculated unique identifier

Yayın tarihi 27 Haziran 2006 US 7,069,354
MultiPath Detection Patent

Diğer patent sahipleri
Patenti gör
Method and arrangement for communicating with SCSI devices

Yayın tarihi 23 Ağustos 2005 US 6,934,711
Lun Addressing Patent

Diğer patent sahipleri
Patenti gör
Method and arrangement for dynamic detection of SCSI devices on linux host

1 Ekim 2002 tarihinde dosyalandı US 10/260,419
Dynamic Device Detection

Diğer patent sahipleri
Patenti gör

Kurslar

Advanced Programming with Python By david Beazley

-
Functional programming in Scala by John De Goes

-
Implementing a Raft Consensus Algorithm

-
The art of Functional Design by John De Goes

-
Write a Compiler (in Python) by David Beazley

-

Projeler

🚀Crusty Lox Interpreter in Rust [Based on Crafting Interpreters by Bob Nystrom]

Ara 2024 - Halen

Description:
Reimplemented a tree-walking interpreter for the Lox language using idiomatic Rust, emphasizing functional programming techniques and immutability over OOP. The project explores compiler construction fundamentals: lexing, parsing, AST generation, and interpretation with proper error handling and runtime environments.

Key Features:
Scanner / Lexer: Tokenizes Lox source using Rust string manipulation and pattern matching
Parser: Constructs ASTs from token streams with…

Description:
Reimplemented a tree-walking interpreter for the Lox language using idiomatic Rust, emphasizing functional programming techniques and immutability over OOP. The project explores compiler construction fundamentals: lexing, parsing, AST generation, and interpretation with proper error handling and runtime environments.

Key Features:
Scanner / Lexer: Tokenizes Lox source using Rust string manipulation and pattern matching
Parser: Constructs ASTs from token streams with recursive descent techniques
Evaluator: Implements tree-walk evaluation supporting expressions, variables, functions, and control flow
Environment: Mutable runtime state using nested HashMap environments (scope chains)
Project Architecture: Modular design using idiomatic Rust crates and cargo-based organization

Rust Skills Developed:

Ownership and lifetimes in recursive data structures
Enums, pattern matching, and traits
Functional thinking and iterator combinators
Modularization and test-driven development

References:
Repo: github.com/SamPooni/crusty_interpreter[private]
Based on Crafting Interpreters [Python-based evaluator → translated to idiomatic Rust]
LLMOps Frameworks | Prompt Engineering | RAG Observability

Oca 2025 - Ağu 2025

✅ Led design of proprietary prompt-based LLM finetuning pipelines for Mistral-7B, Phi-2, and Gemma models, leveraging RLHF-based pairwise prompt evaluations. Integrated runtime observability for every step—from embedding generation to output hallucination scoring.
✅ Designed CI/CD workflows for chunking, embedding, inference retry, and telemetry rollback via GitLab + Argo-based pipelines. Benchmarked structured prompt extractors with FastAPI, LangChain, and pgvector-backed retrieval…

✅ Led design of proprietary prompt-based LLM finetuning pipelines for Mistral-7B, Phi-2, and Gemma models, leveraging RLHF-based pairwise prompt evaluations. Integrated runtime observability for every step—from embedding generation to output hallucination scoring.
✅ Designed CI/CD workflows for chunking, embedding, inference retry, and telemetry rollback via GitLab + Argo-based pipelines. Benchmarked structured prompt extractors with FastAPI, LangChain, and pgvector-backed retrieval systems.
✅ Delivered GPU-aware observability modules to trace vRAM fragmentation, token throughput, and latency spikes—instrumented using Prometheus exporters and real-time dashboards in Grafana.
✅ Introduced model versioning logic and signature hash matching to automate compatibility and rollback decisions during deployment of multi-tenant RAG-based GenAI systems.

Key KPIs & Business Impact: Led prompt-based fine-tuning of LLMs like Mistral-7B, Phi-2, and Gemma, achieving a 30% improvement in response consistency using RLHF-driven evaluation pipelines. Built 4× faster CI/CD workflows for chunking, embedding, inference retries, and telemetry rollback using GitLab and Argo. Delivered 100% runtime traceability of token throughput, vRAM fragmentation, and latency spikes through Prometheus and Grafana. Introduced automated versioning and signature-based hash matching to enable zero-downtime rollback, while inference retry mechanisms maintained <10s recovery time, enhancing reliability of multi-tenant RAG deployments.
AI Performance Engineering | HPC Systems | LLM Inference Optimization

Haz 2024 - Ağu 2025

✅ Currently optimizing AI performance for both training and inference workloads across large-scale LLMs such as Llama 3.1, Llama 2–70B, Mixtral, BERT, ResNet, 3D U-Net, and Stable Diffusion using high-performance Supermicro and Dell GPU clusters. Tuning involves TensorRT, vLLM, Triton, and vCUDA workflows, aligned with BIOS, NUMA, storage tiering, and memory management optimizations.
✅ Custom performance tuning on NVIDIA (B200, GH200, H100), AMD (MI350X, MI325X, MI300X), and Intel Xeon…

✅ Currently optimizing AI performance for both training and inference workloads across large-scale LLMs such as Llama 3.1, Llama 2–70B, Mixtral, BERT, ResNet, 3D U-Net, and Stable Diffusion using high-performance Supermicro and Dell GPU clusters. Tuning involves TensorRT, vLLM, Triton, and vCUDA workflows, aligned with BIOS, NUMA, storage tiering, and memory management optimizations.
✅ Custom performance tuning on NVIDIA (B200, GH200, H100), AMD (MI350X, MI325X, MI300X), and Intel Xeon platforms, with submitted MLPerf benchmarks demonstrating up to 3x model throughput gains over baseline.
✅ Integrated end-to-end performance tracing pipelines using Prometheus, OpenSearch, and Grafana for AI inference and training workloads, enabling real-time observability on GPU utilization, batch latencies, memory fragmentation, and inference token rates.
✅ Developed prompt optimization and RLHF pipelines for LLM evaluation and fine-tuning using LoRA/PEFT, including architecture comparisons between prompt-based tuning and RAG-based grounding with vector retrieval systems.
✅ Engineered AI workload bring-up flows from scratch, including prompt injection, self-healing batch retries, model profiling, GPU saturation detection, and fine-tuning with hardware-aware compilers (ONNX, MLIR).

Key KPIs & Business Impact: Delivered up to 3× throughput improvements in MLPerf benchmarks across NVIDIA (B200, GH200, H100) and AMD (MI350X, MI300X) platforms through BIOS, NUMA, and batch-size tuning. Achieved 40% reduction in latency jitter and 2× faster LLM fine-tuning using LoRA with ONNX and MLIR-based compilers. Engineered full-stack automation to bring up models in under 2 minutes, and implemented self-healing inference pipelines with a 95% success rate. Deployed real-time GPU observability using Prometheus, Grafana, and OpenSearch, enabling 100% visibility into token rates, memory fragmentation, and GPU utilization.
AIaaS for CSPs Expertise

Haz 2024 - Ağu 2025

✅GenAI Platform Automation: Enabled delivery of VMware Private AI Foundation with NVIDIA (PAIF-N) to empower Cloud Service Providers (CSPs) to monetize Generative AI with a production-ready, multi-tenant platform.
✅AIaaS Monetization Layers: GPU as a Service (GaaS) — GPU-backed VM rental, AI PaaS — Pre-configured DL environments (Jupyter, PyTorch, Triton), Model-as-a-Service (MaaS) — Hosted inference APIs (LLaMA, Falcon, Mixtral), AI Applications — Custom chatbots, document agents, RAG…

✅GenAI Platform Automation: Enabled delivery of VMware Private AI Foundation with NVIDIA (PAIF-N) to empower Cloud Service Providers (CSPs) to monetize Generative AI with a production-ready, multi-tenant platform.
✅AIaaS Monetization Layers: GPU as a Service (GaaS) — GPU-backed VM rental, AI PaaS — Pre-configured DL environments (Jupyter, PyTorch, Triton), Model-as-a-Service (MaaS) — Hosted inference APIs (LLaMA, Falcon, Mixtral), AI Applications — Custom chatbots, document agents, RAG assistants
✅RAG Stack CI/CD: Authored SDKs and pipelines for chunking, embeddings, inference retries, rollback, telemetry

Key KPIs & Business Impact: less than 2 min to deploy full LLM + RAG pipeline, ~40% margins vs. 10–15% for traditional IaaS, Zero-touch provisioning of GPU-backed AI Workstations,100% model traceability via GitLab + Harbor, 95%+ developer adoption of platform tools, 3x faster time-to-revenue via pre-integrated GenAI stack
LLM Agent Programming

Haz 2024 - Ağu 2025

✅ Engineered modular agents using LangChain, Triton, FastAPI, and pgvector for RAG-backed assistants and structured extractors.

Key KPIs & Business Impact: 95%+ task success, less than 2s latency, 80%+ tool accuracy, 50% fewer hallucinations, 3× code reuse, 10K+ tool calls/week, full observability via logging/tracing
VMware Aria Automation | Multi-Cloud IaC & CI/CD | Hybrid Cloud

Ara 2020 - Ağu 2025

✅Designed and deployed advanced automation using VMware Aria Automation 8.x, with high availability (HA) clustering, SAML/LDAP-based identity integration, and RBAC controls. Architected self-service hybrid cloud platforms across VCF, AWS, Azure, and GCP using Aria Service Broker and dynamic NSX-T policies.
✅Enabled full observability using Aria Operations, authoring lightweight collectors and telemetry agents in C++ and Python for performance metrics

Key KPIs & Business Impact: Drove…

✅Designed and deployed advanced automation using VMware Aria Automation 8.x, with high availability (HA) clustering, SAML/LDAP-based identity integration, and RBAC controls. Architected self-service hybrid cloud platforms across VCF, AWS, Azure, and GCP using Aria Service Broker and dynamic NSX-T policies.
✅Enabled full observability using Aria Operations, authoring lightweight collectors and telemetry agents in C++ and Python for performance metrics

Key KPIs & Business Impact: Drove CI/CD automation, multi-vCenter orchestration, and post-sales technical governance across Tier-1 telco operators in the Americas $50M+ SOWs influenced, 100% 5G Open RAN success, 30%+ faster TTM, 20+ exec sessions, 50+ CI/CD flows, 50+ CNFs onboard, 90%+ SLA compliance, 3× adoption growth, 5+ roadmap wins, less than 10% post-deploy issue
Compiler for WebAssembly based AI Edge Inference (Python, LLVM, MLIR)

Kas 2019 - Ara 2020

✅Designed and implemented a custom MLIR-based compiler in Python targeting WebAssembly (WASM) for browser-native ML inference with near-native performance.

✅Created a statically typed, C-like DSL supporting LLVM IR and WASM backends, enabling developers to write edge ML logic that compiles to highly efficient bytecode.

✅Delivered a full compiler toolchain: front-end parser, transpiler to C, LLVM codegen, and a runtime interpreter, achieving performance comparable to hand-tuned…

✅Designed and implemented a custom MLIR-based compiler in Python targeting WebAssembly (WASM) for browser-native ML inference with near-native performance.

✅Created a statically typed, C-like DSL supporting LLVM IR and WASM backends, enabling developers to write edge ML logic that compiles to highly efficient bytecode.

✅Delivered a full compiler toolchain: front-end parser, transpiler to C, LLVM codegen, and a runtime interpreter, achieving performance comparable to hand-tuned C++.

✅Use cases include IoT analytics, offline AI agents, and real-time edge inferencing in constrained environments like browsers and embedded devices.

References:
Repo: https://github.com/SamPooni/compilers [private]
🚀Scalable Distributed ML Parameter Server (Python, Asyncio, Raft)

Eyl 2019 - Ara 2020

✅Engineered a high-performance distributed Parameter Server in Python for scalable ML training across multi-node clusters, supporting tens of thousands of parameter updates per second.

✅Architected a fault-tolerant key-value store with Raft-based consensus for leader election, replication, and dynamic node membership, ensuring strong consistency and high availability.

✅Integrated priority-aware scheduling for gradient aggregation and asynchronous messaging via asyncio, improving…

✅Engineered a high-performance distributed Parameter Server in Python for scalable ML training across multi-node clusters, supporting tens of thousands of parameter updates per second.

✅Architected a fault-tolerant key-value store with Raft-based consensus for leader election, replication, and dynamic node membership, ensuring strong consistency and high availability.

✅Integrated priority-aware scheduling for gradient aggregation and asynchronous messaging via asyncio, improving inter-node throughput by 40%+ under peak load.

✅Designed for seamless integration with PyTorch/TensorFlow training loops and extensible for federated learning or reinforcement learning workloads.

References:
Repo: https://github.com/SamPooni/pyraft [private]
Java Language new Features Experimentation

May 2020 - Tem 2020

1. Design Patterns: Applying Powerful Design Ideas
2. Scala Essentials: The Intriguing Parts
3. Functional Programming in Java: Creating Maintainable Code
4. Java Modules: From Legacy to Modularized Code
5. The New Java: Languages and JDK Features from 9 to 14
Applied Research & Development in Software Design Patterns and Testing Frameworks in Python

Nis 2020 - Haz 2020

Designed and developed a modular, high-impact software engineering framework focused on pragmatic, scalable programming practices. The project explored how to build complex systems by focusing on composability, interface design, layered abstractions, and testable architecture—rather than language or framework specifics. It integrated functional, object-oriented, and event-driven paradigms into a cohesive design philosophy.

The Core Components the project concentrates on are: Data…

Designed and developed a modular, high-impact software engineering framework focused on pragmatic, scalable programming practices. The project explored how to build complex systems by focusing on composability, interface design, layered abstractions, and testable architecture—rather than language or framework specifics. It integrated functional, object-oriented, and event-driven paradigms into a cohesive design philosophy.

The Core Components the project concentrates on are: Data Abstraction Layer, Interface Contracts,Compositional Class Architectures, Reactive Event Systems, Functional Primitives, Verification & Test Harness and Problem-Driven Design Process

References:
Repo: https://github.com/SamPooni/advanced_python_programming[private]

Onurlar ve Ödüller

Certificate of Outstanding Contributions and Innovation

Huawei, New Jersey Research center

Ara 2018

In recognition of outstanding contributions to AI in the wireless space

Alınan tavsiyeler

18 kişi, Subramaniyam Venkata Pooni adlı kullanıcıyı tavsiye etti

Görmek için katılın

Subramaniyam Venkata Pooni adlı üyenin tam profilini görüntüleyin

Ortak tanıdıklarınızı görün
Başka biri aracılığıyla tanış
Subramaniyam Venkata Pooni ile doğrudan iletişime geçin

Tam profili görüntülemek için katılın

Diğer benzer profiller

Soham Mehta

Soham Mehta

Acceler

14 B takipçi
Hindistan

Profili Görüntüle
Frank Kane

Frank Kane

Sundog Education

34 B takipçi
Los Angeles Metropol Bölgesi

Profili Görüntüle
Hugo Shi

Hugo Shi

Saturn Cloud

15 B takipçi
Bellevue, WA

Profili Görüntüle
Boris Berenberg 🦦

Boris Berenberg 🦦

z0.ai

8 B takipçi
New York, NY

Profili Görüntüle
Pankaj Parekh

Pankaj Parekh

Zyoti Technology

8 B takipçi
Fremont, CA

Profili Görüntüle
Carl Hewitt

Carl Hewitt

HavenCNC

6 B takipçi
Kitty Hawk, NC

Profili Görüntüle
Adam Menges

Adam Menges

Andreessen Horowitz

6 B takipçi
San Francisco Bay Bölgesi

Profili Görüntüle
David Iacoponi

David Iacoponi

Beryl Technology, LLC

8 B takipçi
Chattanooga, TN

Profili Görüntüle
Sebastian Pereyro

Sebastian Pereyro

Empirical LLC

8 B takipçi
San Diego, CA

Profili Görüntüle
James Chillingworth

James Chillingworth

OBETEC

11 B takipçi
Los Angeles, CA

Profili Görüntüle
Mayuko Inoue

Mayuko Inoue

Apple

17 B takipçi
Birleşik Devletler

Profili Görüntüle
Babu Munagala

Babu Munagala

Produced consistent success while transitioning to new functional/technical areas and roles ranging from Engineer to CEO. Technology leader with 29 years in enterprise software development across multiple domains including Financial, Telecom, Real Estate, Hospitality, Education, and Automotive. Demonstrated excellence in conceptualizing, designing and implementing complex end-to-end Cloud and on-prem automation using varied technologies: AI/ML Python Java SQL Javascript C++ GoLang Solidity. Recruited, trained, and mentored talented executives and rock solid programmers, maintaining a 90+% retention rate. Recognized as an authority on Blockchain, was invited to speak at Blockchain conferences in San Francisco, Washington DC, Singapore, Dubai, and most major cities in India. Mentioned in the majority of Indian media and global giants like CNBC, Forbes, and Nasdaq. • Identifying, and attending to root causes in products, processes, and people • Allrounder, innovating, adapting & improving continuously • Intimately involved in all phases of the product lifecycle • Rare combination of people skills & analytical capabilities • Directly managed 80 people • Good at handling multiple levels of abstraction

6 B takipçi
Saratoga, CA

Profili Görüntüle
Bill Liu

Bill Liu

AICamp

13 B takipçi
Seattle, WA

Profili Görüntüle
Anvisha Pai

Anvisha Pai

Moda

12 B takipçi
New York City Metropol Bölgesi

Profili Görüntüle
Ali Afrouzi

Ali Afrouzi

AI Incorporated

30 B takipçi
Kanada

Profili Görüntüle
Oguz Arslan

Oguz Arslan

Enjoyvent

5 B takipçi
Burlingame, CA

Profili Görüntüle
Kris Dillon

Kris Dillon

Deep Fork Technology

5 B takipçi
Oklahoma City Metropol Bölgesi

Profili Görüntüle
Bill Sieglein

Bill Sieglein

CISO Executive Network

11 B takipçi
Baltimore, MD

Profili Görüntüle
Patrick Leary

Patrick Leary

DAD Equality #KidsDeserveEqualParents #FamilyLawReform #SupportSharedParenting

5 B takipçi
Saco, ME

Profili Görüntüle

Hakkında

Hizmetler

Subramaniyam Venkata Pooni adlı kullanıcıya ait yazılar

Speculative Decoding on the DGX Spark (GB10)

Flash Attention kernel in NVIDIA's new cuTile Python DSL

What a $4,000 Desktop Supercomputer Can Actually Do Benchmarking LLM Inference on the NVIDIA DGX Spark (GB10)

Own the Stack, Rent the Frontier: The Case for a Personal AI Computer

The $3.6 Trillion Question Nobody on the Roadshow Wants Asked

The Token Bill Comes Due

Smart Agents, No Kernel: Why Enterprise AI Needs an Operating System

Your SOC Was Built for a World That's Ending

Measuring the Quality of Software Design

Picking an Architecture Without Lying to Yourself

Faaliyet

5 B takipçi

Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni

Subramaniyam Venkata Pooni

Sunil Baliga

DaaX.ai

Subramaniyam Venkata Pooni

SVCE Alumni Association Sri Venkateswara College of Engineering (AASVCE)

Subramaniyam Venkata Pooni

Sunil Baliga

DaaX.ai

Deneyim ve Eğitim

CS²B TECHNOLOGIES INC

******** ********* *********

Subramaniyam Venkata Pooni adlı kişinin tam deneyimin görüntüleyin

Unvan, işte kalma süresi ve daha fazlasını görün.

Lisanslar ve Sertifikalar

Patentler

Yayın tarihi 12 Şubat 2013 US 8,375,396

Yayın tarihi 27 Ekim 2009 US 7,610,295

Yayın tarihi 20 Şubat 2007 US 7,181,553

Yayın tarihi 27 Haziran 2006 US 7,069,354

Yayın tarihi 23 Ağustos 2005 US 6,934,711

1 Ekim 2002 tarihinde dosyalandı US 10/260,419

Kurslar

Advanced Programming with Python By david Beazley

-

Functional programming in Scala by John De Goes

-

Implementing a Raft Consensus Algorithm

-

The art of Functional Design by John De Goes

-

Write a Compiler (in Python) by David Beazley

-

Projeler

🚀Crusty Lox Interpreter in Rust [Based on Crafting Interpreters by Bob Nystrom]

Ara 2024 - Halen

LLMOps Frameworks | Prompt Engineering | RAG Observability

Oca 2025 - Ağu 2025

AI Performance Engineering | HPC Systems | LLM Inference Optimization

Haz 2024 - Ağu 2025

AIaaS for CSPs Expertise

Haz 2024 - Ağu 2025

LLM Agent Programming

Haz 2024 - Ağu 2025

VMware Aria Automation | Multi-Cloud IaC & CI/CD | Hybrid Cloud

Ara 2020 - Ağu 2025

Compiler for WebAssembly based AI Edge Inference (Python, LLVM, MLIR)

Kas 2019 - Ara 2020

🚀Scalable Distributed ML Parameter Server (Python, Asyncio, Raft)

Eyl 2019 - Ara 2020

Java Language new Features Experimentation

May 2020 - Tem 2020

Applied Research & Development in Software Design Patterns and Testing Frameworks in Python

Nis 2020 - Haz 2020

Onurlar ve Ödüller

Certificate of Outstanding Contributions and Innovation

Huawei, New Jersey Research center

Alınan tavsiyeler

Paul-André Raymond

Jason Whitt

Subramaniyam Venkata Pooni adlı üyenin tam profilini görüntüleyin