Preface

This book is a comprehensive guide on Safe AI, bringing together the foundations of machine learning security, adversarial robustness, AI alignment, and trustworthy deployment practices for modern cyber-physical systems (CPS) and LLM-based autonomous agents.
It is written as a hybrid between a research handbook, a course textbook, and a practical engineering guide for building and securing intelligent systems.

The material integrates insights from academic literature, real-world deployment case studies, and hands-on adversarial evaluations—spanning from classical adversarial machine learning to emerging red/blue-team techniques for large-scale language models.
Special emphasis is placed on CPS, where failures in AI behavior directly influence physical infrastructure such as power grids, transportation systems, and industrial control environments.

Each chapter begins with explicit learning objectives, conceptual explanations, and diagrams, followed by practical examples, research notes, and additional curated references.
The aim is to equip the reader with the full intellectual toolkit required to design, analyze, attack, defend, and align AI systems operating in safety-critical settings.

The intended audience should be comfortable with programming and have working familiarity with the following concepts and methods:

Probability and statistics for uncertainty quantification
Algorithms and data structures for secure and efficient computation
Reinforcement learning for sequential decision-making in CPS
Optimization under constraints
- convex and non-convex optimization
- gradient-based and gradient-free methods
- robust and stochastic optimization
Adversarial machine learning
- evasion, poisoning, and backdoor attacks
- model extraction and inversion
- adversarial training and certified defenses
Optimization & Control in CPS
- model predictive control (MPC)
- dynamic system stability and safety constraints
- uncertainty modeling and robust control strategies
Robustness techniques
- adversarial training
- defensive distillation
- randomized smoothing and certified guarantees
Testing and evaluation
- black-box, grey-box, and white-box testing
- stress-testing and red-team adversarial probes
- safety benchmarks for LLMs and RL agents
Trustworthiness and safe deployment
- explainability and transparency
- data governance and privacy preservation
- monitoring, auditing, and incident response in AI systems
AI security and alignment foundations
- prompt injection and jailbreak mechanisms
- safety guardrails and defensive prompting
- AI alignment principles, goal specification, and misgeneralization
- safe agent architectures and oversight mechanisms
Research methodology
- experimental design
- reproducibility and benchmarking
- responsible evaluation of high-stakes AI systems

This book may serve students, researchers, engineers, and practitioners seeking a deep and structured understanding of modern AI security—from the mathematical fundamentals to frontier challenges in alignment, LLM safety, and real-world CPS integration.

Author

Kundan Kumar
https://kundan-kumarr.github.io/

Citation

Kumar, K. (2026). Safe AI for Cyber-Physical & Intelligent Systems:
Model Security, Adversarial Robustness, Agent Safety, and Trustworthy Methods.
Edition 2026-01.

License

This work is licensed under the MIT License.