Welcome to my blog! 🚀 This is a hub for exploring cutting-edge technologies like Kubernetes, Istio, GPU management, Golang, and software development. Each article bridges theoretical knowledge with practical insights, offering hands-on guidance for cloud-native systems, distributed architectures, and advanced engineering practices. Whether you’re troubleshooting Kubernetes issues, diving into Istio’s traffic management, or optimizing GPU utilization, you’ll find valuable resources here. Join me in unraveling complex tech concepts and advancing your skills!
Large Language Models (LLMs) and AI Infrastructure
Delving into GPU Management with Kubernetes
- Kubernetes GPU Management Basics: Introduction to Device Plugin and Source Code Analysis
- Advanced Kubernetes GPU Management: Enabling Nvidia MPS
- Troubleshooting: Resolving “Failed to initialize NVML: Unknown Error”
Istio: In-Depth Traffic Management for Microservices
- Istio Control Plane Management on Kubernetes: Multi-Instance Deployment
- Practical Multi-Environment Application Development: Building Microservices with Istio
- Exploring Istio Core Technologies: Network Principles and Sidecar Auto Injection
- Resolving Memory Error in Istioctl Analyze Command
Kubernetes (K8s)
- Learning Kubernetes by Running Applications: A Beginner’s Guide
- Understanding the Difference Between K8s Affinity and Taint/Toleration: Node Affinity and Anti-Affinity Configuration
- The Mechanism and Strategies of the Default Kubernetes Scheduler: In-Depth Understanding Scheduler Mechanism
- Effective Use of Secret, ConfigMap, and Lease in Kubernetes: Detailed Explanation and Examples Using Secret, ConfigMap, and Lease
- K8s Cloud Provider Source Code Analysis: An In-Depth Look at Kubernetes Cloud Providers Source Code Analysis
- Resolving OCI Runtime Create Failed: Expected CgroupsPath: Fixing Container Runtime Configuration Issues
- Client-go Label Selector Causing CPU Throttling: Diagnosing and Fixing CPU Limitation Issues
Software Development: Best Practices & Techniques
🎯 About Me: Over the past year, I have remained actively engaged in the open-source community, primarily contributing to cloud-native, Kubernetes, and AI infrastructure projects. My work spans reporting and resolving complex bugs in production environments, proposing and implementing new features for GPU scheduling and observability, and improving deployment documentation for large-scale AI serving frameworks such as vLLM. I have also helped refine workflows, provided troubleshooting for GPU operators, and advocated for enhancements to better support enterprise requirements in hybrid cloud clusters. My efforts reflect a deep commitment to improving reliability, scalability, and user experience in modern infrastructure systems.