Kundan Kumar
  • About
  • CV
  • Research
  • Projects & Notes
    • ✨ All Projects

    • 📚 Notes
    • Large Language Models
    • Deep & Machine Learning
    • Statistics
    • AI Security

    • 🧰 Resources
    • AI for Smart Grids
    • ResearchScientist-Handbook
  • Teaching
  • Blogs

On this page

  • Professional Experience
  • Education
  • Teaching Experience
  • Research Experience
  • Skills
  • Honors & Awards
  • Service
  • Projects
  • Edit this page
  • Report an issue

Curriculum Vitae

📄 Resume

Professional Experience

NREL logo

National Renewable Energy Laboratory (NREL)

Machine Learning Engineer (Intern) May 2024  —  Jan 2025
  • Developed novel machine learning models for automated network topology inference and resilient control policy optimization for complex distributed systems under extreme and uncertain operating scenarios.
  • Designed and implemented Bayesian neural network based semi-supervised learning frameworks with explicit uncertainty estimation to learn from limited and unreliable labeled data in energy distribution networks, achieving up to 98% improvement in model accuracy across varying label-availability and data-quality settings.
  • Co-authored the paper “Advanced Semi-Supervised Learning with Uncertainty Estimation for Phase Identification in Distribution Systems,” accepted at IEEE Power & Energy Society General Meeting (PES GM) 2025, demonstrating robust phase identification under noisy measurements and scarce ground-truth labels.
  • Deployed the trained ML models into a practical workflow for distribution utilities, including data preprocessing pipelines, model serving, and evaluation; implemented containerized deployment and API-based integration with existing grid analytics tools to enable reproducible, scalable, and operator-ready decision support.

Comcast logo

Comcast

Software Engineer Jul 2019  —  Feb 2020
  • Built and scaled real-time data pipelines using Amazon Kinesis, RabbitMQ, and microservices, processing 1TB+ streaming data/day to support fraud detection and system monitoring at production scale.
  • Developed and deployed ML-based anomaly and behavior-analysis models, incorporating temporal features and probabilistic scoring, leading to a 70% reduction in fraudulent activity through early-signal detection.
  • Delivered low-latency analytics dashboards with PrestoDB, Athena, and Python, enabling real-time visibility into network performance and fraud patterns and improving cross-team decision speed and operational response.
  • Collaborated with cross-functional teams (security, data engineering, product) to integrate fraud-intelligence signals into production systems, improving detection latency, system resilience, and customer-impact mitigation.

IBM logo

IBM

Software Engineer Jan 2019  —  Jun 2019
  • Optimized large-scale cloud infrastructure on OpenShift, implementing adaptive auto-scaling and resource-allocation policies that reduced operational costs by 30% while improving system stability.
  • Built a real-time monitoring and observability platform using Grafana, Flask, and distributed exporters, providing actionable visibility across 100+ cloud servers and critical microservices.
  • Designed and automated performance-monitoring and alerting pipelines, cutting incident response time by 60% through intelligent thresholds, anomaly alerts, and integrated on-call workflows.

HP logo

Hewlett Packard Enterprise (HPE)

Software Engineer Apr 2017  —  Dec 2018
  • Led the zero-downtime migration of critical enterprise applications from HPI → HPE domains, coordinating infrastructure, DNS, and service-cutover workflows to ensure seamless continuity for all users.
  • Built and integrated OAuth 2.0–based authentication and secure REST APIs using Spring Boot, hardening access controls and improving reliability for applications serving 50K+ active users.
  • Designed and deployed a microservices architecture across Apache and WebLogic servers, optimizing service boundaries and request flows to achieve a 40% improvement in system response time.

TCS logo

Tata Consultancy Services (TCS)

System Engineer Jul 2012  —  Dec 2015
  • Engineered high-throughput ETL pipelines for large-scale data-warehouse integration, reliably processing 100GB+ of data per day and improving downstream analytics latency.
  • Optimized database performance through advanced SQL tuning, indexing, and query-plan diagnostics, reducing execution times by 70% across critical workloads.
  • Delivered $100K in annual cost savings by leading database-optimization and storage-efficiency initiatives, earning an organizational Excellence Award for impact on operational efficiency.

Education

Iowa State University logo

Iowa State University

Ph.D. in Computer Science (Minor: Statistics) 2020  —  2025 (Expected)
  • Research Focus: Deep Reinforcement Learning, Physics-Informed AI, Uncertainty Quantification, Bayesian Modeling, Secure & Robust Learning, and LLM-Driven Autonomous Agents. Work spans critical-infrastructure optimization, safety-critical control, and large-scale distributed systems.
  • Technical Contributions: Developed physics-informed DRL algorithms for real-time control, Bayesian neural networks with uncertainty estimation for limited/unreliable data, adversarial robustness frameworks for cyber-physical systems, and LLM-augmented decision-making for energy and autonomous systems.
  • Graduate Coursework:
    • Deep Learning, Natural Language Processing, Advanced Machine Learning, Computer Vision, AI for cybersecurity, Algorithms, Database Systems, Computer Networking
    • Statistical Theory, Empirical Methods, Experimental Design, Data Analysis and Visualization

Teaching Experience

Iowa State University logo

Iowa State University

Teaching Assistant 2020  —  2025

Department of Computer Science

  • Supported multiple undergraduate and graduate courses, including Software Development Practices, Database Systems, and Spreadsheets—impacting 300+ students across semesters.
  • Led weekly labs and office hours, guiding students through debugging, system design reasoning, data modeling, and code quality challenges; strengthened problem-solving skills across diverse cohorts.
  • Designed programming assignments, quizzes, and real-world case studies aligned with industry workflows, agile practices, and modern software engineering tools (Git, CI/CD, databases).
  • Mentored students on semester-long capstone projects, coaching teams on architecture decisions, sprint planning, testing, and documentation to simulate professional engineering environments.

Teaching Portfolio

Research Experience

Iowa State University logo

Iowa State University

Research Assistant Aug 2022  —  Jul 2025

Physics-Informed Deep Reinforcement Learning for Critical Infrastructure Systems

  • Conducted research on physics-informed deep reinforcement learning (DRL) for large-scale distributed networks, advancing intelligent resource management and security for critical infrastructure
  • Applied computational DRL algorithms in smart energy systems, optimizing real-time control policies to minimize voltage violations, reduce power loss, and improve system stability across diverse operating conditions.
  • Developed physics-informed actor–critic algorithms that embed domain constraints into the learning process, achieving 30% higher resource-allocation efficiency and significantly reducing violations in complex networks.
  • Designed adversarial attack detection and mitigation frameworks for grid-scale AI models, performing systematic stress testing and implementing defensive DRL techniques to enhance robustness against security threats.
  • Created transfer-learning pipelines enabling DRL agents to generalize across networks of different sizes and topologies, cutting retraining time by 40% when deploying to new environments.
  • Built a Python-based simulation and real-time control framework integrating OPAL-RT hardware with OpenDSS and distributed system components to support HIL (hardware-in-the-loop) experiments.
  • Integrated LLM-driven reasoning to support adaptive control, predictive optimization, and human-AI collaboration inside simulation environments, improving interpretability and situational awareness.
Research Assistant Aug 2020  —  Jul 2022

Deep Reinforcement Learning & Safety-Critical AI for Autonomous Systems

  • Conducted research on deep reinforcement learning and safety-critical autonomy, focusing on robust perception, control, and sequential decision-making in high-stakes environments.
  • Utilized CARLA simulator to develop end-to-end autonomous driving stacks, including vision-based perception, object detection, trajectory planning, and policy learning under complex traffic dynamics.
  • Applied deep computer vision models for object recognition, semantic segmentation, and multi-sensor fusion, enabling reliable situational awareness and improving downstream control performance in autonomous driving systems.

Research Portfolio

Skills

Programming
Python, R, Java, C++, SAS, MATLAB, SQL, HTML/CSS, JavaScript (Node.js, React)
Machine / Deep Learning
scikit-learn, TensorFlow, PyTorch, pandas, NumPy, Matplotlib, Seaborn, Gym, RLlib, Stable-Baselines3
LLMs & NLP
OpenAI API, Hugging Face Transformers, LangChain, LlamaIndex, RAG (retrievers, chunking, reranking), vector DBs (FAISS, Chroma, Pinecone), prompt engineering & structured outputs, evaluation (Ragas, Promptfoo)
Agentic Systems & Orchestration
LangGraph (stateful workflows), LangChain Agents (ReAct/MRKL/tools), function/tool calling, multi-agent design, planning & memory, tool-use (search/code/execution), guards & grounding
MLOps & Deployment
MLflow, Weights & Biases, Docker, Kubernetes, FastAPI, gRPC, CI/CD (GitHub Actions, Jenkins, CircleCI), model versioning, scalable inference, monitoring & drift detection
AI Safety & Robustness
Safety evaluations & red-teaming pipelines, gradual rollout strategies, alignment & policy-enforcement guardrails, interpretability tooling (SHAP, LIME, Captum), adversarial robustness frameworks (Foolbox, RobustML), anomaly & drift monitoring, reliability & stress testing for safety-critical deployments.
HPC & Big Data
Hadoop, Hive, Spark, Kafka, Kinesis, Presto, Athena, distributed computing: SLURM, MPI, OpenMP
Simulation & Modeling
OPAL-RT (HIL), OpenDSS (power systems), CARLA (autonomous driving), CityLearn, cyber-physical system simulation & real-time control frameworks
Optimization
Gurobi, Pyomo, BoTorch, Optuna, Hyperopt
Visualization & GIS
Tableau, ArcGIS, Leaflet, Plotly, Dash
Cloud & DevOps
AWS (EC2, S3, Lambda, EKS), GCP, DigitalOcean, Terraform, Docker, Kubernetes, Git, Jenkins, CircleCI

Honors & Awards

  • Selected, Seventh Workshop on Autonomous Energy Systems @ NREL (2024)
  • Selected, ByteBoost Workshop on Accelerating HPC Research Skills (2024)
  • Selected, Oxford Machine Learning Summer School (OxML) (2022)
  • Excellence Award, Database Optimization @ TCS
  • 2nd Place, BAJA SAE India (Safest Terrain Vehicle Category, National Level)
  • Won multiple robotics competitions at inter-university technical festivals.

Service

  • Reviewer:

    • IEEE Transactions on Industrial Informatics (2025)
    • Conference on Neural Information Processing Systems (Ethics)(2025)
    • IEEE Transactions on Neural Networks and Learning Systems (2024)
    • IEEE PES GM, Grid Edge & ISGT (2023, 2024)
  • Mock Interviewer: Supporting underrepresented minorities in tech.

  • Volunteer, Prayaas India (BIT): NGO providing quality education to underprivileged children in slums and villages.

Projects

RAG Energy Advisor badge RAG-Enhanced Energy Advisor
Retrieval-augmented LLM agent for adaptive control and decision.

LLM Grid Planner badge LLM-Driven Grid Planner
Natural-language-guided reinforcement learning for smart grid management.

CityLearn badge LLM‑Powered Energy Optimizer
Multi‑building energy optimization in CityLearn with LLM guidance.

Check My Projects

© 2025 Kundan Kumar ∙ Made with Quarto

  • Edit this page
  • Report an issue
  • Contact