VirtRigaud Documentation
Welcome to the VirtRigaud documentation. VirtRigaud is a Kubernetes operator for managing virtual machines across multiple hypervisors including vSphere, Libvirt/KVM, and Proxmox VE.
Quick Navigation
Getting Started
- 15-Minute Quickstart - Get up and running quickly
- Installation Guide - Helm installation instructions
- Helm CRD Upgrades - Managing CRD updates
Core Documentation
- Custom Resource Definitions - Complete API reference
- Examples - Practical configuration examples
- Cloud-Init Configuration - UserData and MetaData guide
- Provider Documentation - Provider development guide
- Provider Capabilities Matrix - Feature comparison
Provider-Specific Guides
- vSphere Provider - VMware vCenter/ESXi integration
- Libvirt Provider - KVM/QEMU virtualization
- Proxmox VE Provider - Proxmox Virtual Environment
- Provider Tutorial - Build your own provider
- Provider Versioning - Version management
Advanced Features
- VM Lifecycle Management - Advanced VM operations
- VM Adoption - Onboard existing VMs into VirtRigaud
- Nested Virtualization - Run hypervisors in VMs
- Graceful Shutdown - Proper VM shutdown handling
- VM Snapshots - Backup and restore
- Remote Providers - Provider architecture
Operations & Administration
- Observability - Monitoring and metrics
- Security - Security best practices
- Resilience - High availability and fault tolerance
- Upgrade Guide - Version upgrade procedures
- vSphere Hardware Versions - Hardware compatibility
Security Configuration
API Reference
- CLI Tools Reference - Command-line interface guide
- CLI API Reference - Detailed CLI documentation
- Metrics Catalog - Available metrics
- Provider Catalog - Available providers
Development
- Testing Workflows Locally - Local CI/CD testing
- Contributing - Contribution guidelines
- Development Guide - Developer setup
Examples Directory
- Example README - Overview of all examples
- Complete Examples - Working configuration files
- Advanced Examples - Complex scenarios
- Security Examples - Security configurations
Version Information
This documentation covers VirtRigaud v0.2.3.
Recent Changes
- v0.2.3: Provider feature parity - Reconfigure, Clone, TaskStatus, ConsoleURL
- v0.2.2: Nested virtualization, TPM support, snapshot management
- v0.2.1: Critical fixes and documentation updates
- v0.2.0: Production-ready vSphere and Libvirt providers
See CHANGELOG.md for complete version history.
Provider Status
| Provider | Status | Maturity | Documentation |
|---|---|---|---|
| vSphere | Production Ready | Stable | Guide |
| Libvirt/KVM | Production Ready | Stable | Guide |
| Proxmox VE | Production Ready | Beta | Guide |
| Mock | Complete | Testing | PROVIDERS.md |
Support
- GitHub Issues: github.com/projectbeskar/virtrigaud/issues
- Discussions: github.com/projectbeskar/virtrigaud/discussions
- Slack: #virtrigaud on Kubernetes Slack
Quick Links
- Main README - Project overview
- CHANGELOG - Version history
- Contributing - How to contribute
- License - Apache License 2.0
15-Minute Quickstart
This guide will get you up and running with VirtRigaud in 15 minutes using both vSphere and Libvirt providers.
Prerequisites
- Kubernetes cluster (1.24+)
- kubectl configured
- Helm 3.x
- Access to a vSphere environment (optional)
- Access to a Libvirt/KVM host (optional)
API Support
Default API: v1beta1 - The recommended stable API for all new deployments.
Legacy API: v1beta1 - Served for compatibility but deprecated. See the upgrade guide for migration instructions.
All resources support seamless conversion between API versions via webhooks.
Step 1: Install VirtRigaud
Using Helm (Recommended)
# Add the VirtRigaud Helm repository
helm repo add virtrigaud https://projectbeskar.github.io/virtrigaud
helm repo update
# Install with default settings (CRDs included automatically)
helm install virtrigaud virtrigaud/virtrigaud \
--namespace virtrigaud-system \
--create-namespace
# Or install with specific providers enabled
helm install virtrigaud virtrigaud/virtrigaud \
--namespace virtrigaud-system \
--create-namespace \
--set providers.vsphere.enabled=true \
--set providers.libvirt.enabled=true
# To skip CRDs if already installed separately
helm install virtrigaud virtrigaud/virtrigaud \
--namespace virtrigaud-system \
--create-namespace \
--skip-crds
Using Kustomize
# Clone the repository
git clone https://github.com/projectbeskar/virtrigaud.git
cd virtrigaud
# Apply base installation
kubectl apply -k deploy/kustomize/base
# Or apply with overlays
kubectl apply -k deploy/kustomize/overlays/standard
Step 2: Verify Installation
# Check that the manager is running
kubectl get pods -n virtrigaud-system
# Check CRDs are installed
kubectl get crds | grep virtrigaud
# Verify API conversion is working (v1beta1 <-> v1beta1)
kubectl get crd virtualmachines.infra.virtrigaud.io -o yaml | yq '.spec.conversion'
# Check manager logs
kubectl logs -n virtrigaud-system deployment/virtrigaud-manager
Step 3: Configure a Provider
Option A: vSphere Provider
Create a secret with vSphere credentials:
kubectl create secret generic vsphere-credentials \
--namespace default \
--from-literal=endpoint=https://vcenter.example.com \
--from-literal=username=administrator@vsphere.local \
--from-literal=password=your-password \
--from-literal=insecure=false
Create a vSphere provider:
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: vsphere-prod
namespace: default
spec:
type: vsphere
endpoint: https://vcenter.example.com
credentialSecretRef:
name: vsphere-credentials
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.2.3"
service:
port: 9090
defaults:
datastore: "datastore1"
cluster: "cluster1"
folder: "virtrigaud-vms"
Option B: Libvirt Provider
Create a secret with Libvirt connection details:
kubectl create secret generic libvirt-credentials \
--namespace default \
--from-literal=uri=qemu+ssh://root@libvirt-host.example.com/system \
--from-literal=username=root \
--from-literal=privateKey="$(cat ~/.ssh/id_rsa)"
Create a Libvirt provider:
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: libvirt-lab
namespace: default
spec:
type: libvirt
endpoint: qemu+ssh://root@libvirt-host.example.com/system
credentialSecretRef:
name: libvirt-credentials
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-libvirt:v0.2.0"
service:
port: 9090
defaults:
defaultStoragePool: "default"
defaultNetwork: "default"
Apply the provider configuration:
kubectl apply -f provider.yaml
π‘ Behind the scenes: VirtRigaud automatically converts your Provider resource into the appropriate command-line arguments, environment variables, and secret mounts for the provider pod. See the configuration flow documentation for complete details.
Step 4: Create a VM Class
Define resource templates for your VMs:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: small
namespace: default
spec:
cpu: 2
memoryMiB: 2048
disks:
- name: root
sizeGiB: 20
type: thin
networks:
- name: default
type: "VM Network" # vSphere network name
kubectl apply -f vmclass.yaml
Step 5: Create a VM Image
Define the base image for your VMs:
vSphere Image (OVA)
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMImage
metadata:
name: ubuntu-20-04
namespace: virtrigaud-system
spec:
source:
vsphere:
ovaURL: "https://cloud-images.ubuntu.com/releases/20.04/ubuntu-20.04-server-cloudimg-amd64.ova"
checksum: "sha256:abc123..."
datastore: "datastore1"
folder: "vm-templates"
prepare:
onMissing: Import
timeout: "30m"
Libvirt Image (qcow2)
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMImage
metadata:
name: ubuntu-20-04
namespace: virtrigaud-system
spec:
source:
libvirt:
qcow2URL: "https://cloud-images.ubuntu.com/releases/20.04/ubuntu-20.04-server-cloudimg-amd64.img"
checksum: "sha256:def456..."
storagePool: "default"
prepare:
onMissing: Import
timeout: "30m"
kubectl apply -f vmimage.yaml
Step 6: Create Your First VM
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: my-first-vm
namespace: default
spec:
providerRef:
name: vsphere-prod # or libvirt-lab
namespace: default
classRef:
name: small
namespace: default
imageRef:
name: ubuntu-20-04
namespace: default
powerState: "On"
userData:
cloudInit:
inline: |
#cloud-config
users:
- name: ubuntu
sudo: ALL=(ALL) NOPASSWD:ALL
ssh_authorized_keys:
- ssh-rsa AAAAB3... your-public-key
packages:
- curl
- vim
networks:
- name: default
networkRef:
name: default-network
namespace: default
kubectl apply -f vm.yaml
Step 7: Monitor VM Creation
# Watch VM status
kubectl get vm my-first-vm -w
# Check detailed status
kubectl describe vm my-first-vm
# View events
kubectl get events --field-selector involvedObject.name=my-first-vm
# Check provider logs
kubectl logs -n virtrigaud-system deployment/virtrigaud-provider-vsphere
Step 8: Access Your VM
# Get VM IP address
kubectl get vm my-first-vm -o jsonpath='{.status.ips[0]}'
# Get console URL (if supported)
kubectl get vm my-first-vm -o jsonpath='{.status.consoleURL}'
# SSH to the VM (once it has an IP)
ssh ubuntu@<vm-ip>
Step 9: Try Advanced Operations
Create a Snapshot
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMSnapshot
metadata:
name: my-vm-snapshot
namespace: default
spec:
vmRef:
name: my-first-vm
nameHint: "pre-update-snapshot"
memory: true
Clone the VM
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClone
metadata:
name: my-vm-clone
namespace: default
spec:
sourceRef:
name: my-first-vm
target:
name: cloned-vm
classRef:
name: small
namespace: default
linked: true
Scale with VMSet
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMSet
metadata:
name: web-servers
namespace: default
spec:
replicas: 3
template:
spec:
providerRef:
name: vsphere-prod
namespace: default
classRef:
name: small
namespace: default
imageRef:
name: ubuntu-20-04
namespace: default
powerState: "On"
Step 10: Clean Up
# Delete VM
kubectl delete vm my-first-vm
# Delete snapshots and clones
kubectl delete vmsnapshot my-vm-snapshot
kubectl delete vmclone my-vm-clone
kubectl delete vmset web-servers
# Uninstall VirtRigaud (optional)
helm uninstall virtrigaud -n virtrigaud-system
kubectl delete namespace virtrigaud-system
Next Steps
- Browse Complete Examples for production-ready configurations
- Explore the VM Lifecycle Guide
- Learn about Advanced Networking
- Set up Monitoring and Observability
- Configure Security and RBAC
- Read the Remote Providers Documentation
- Read the Provider Development Guide
Troubleshooting
If you encounter issues:
- Check the Troubleshooting Guide
- Verify your provider credentials and connectivity
- Check the manager and provider logs
- Ensure your Kubernetes cluster meets the requirements
- File an issue on GitHub
Helm-only Installation & Verify Conversion
This guide covers installing virtrigaud using only Helm (without pre-applying CRDs via Kustomize) and verifying that API conversion is working correctly.
Helm-only Install
VirtRigaud can be installed using only Helm, which will automatically install all required CRDs including conversion webhook configuration.
Prerequisites
- Kubernetes cluster (1.26+)
- Helm 3.8+
kubectlconfigured to access your cluster
Installation
# Add the virtrigaud Helm repository
helm repo add virtrigaud https://projectbeskar.github.io/virtrigaud
helm repo update
# Or install directly from source
git clone https://github.com/projectbeskar/virtrigaud.git
cd virtrigaud
# Install virtrigaud with CRDs
helm install virtrigaud charts/virtrigaud \
--namespace virtrigaud \
--create-namespace \
--wait \
--timeout 10m
Or install directly from source
git clone https://github.com/projectbeskar/virtrigaud.git
cd virtrigaud
helm install virtrigaud charts/virtrigaud
βnamespace virtrigaud
βcreate-namespace
βwait
βtimeout 10m
### Skip CRDs (if already installed)
If you need to install the chart without CRDs (e.g., they're managed separately):
```bash
helm install virtrigaud charts/virtrigaud \
--namespace virtrigaud \
--create-namespace \
--skip-crds \
--wait
Verify Conversion
After installation, verify that API conversion is working correctly.
Check CRD Conversion Configuration
# Verify all CRDs have conversion webhook configuration
kubectl get crd virtualmachines.infra.virtrigaud.io -o yaml | yq '.spec.conversion'
Expected output:
strategy: Webhook
webhook:
clientConfig:
service:
name: virtrigaud-webhook
namespace: virtrigaud
path: /convert
conversionReviewVersions:
- v1
Check API Versions
Verify that both v1beta1 and v1beta1 versions are available:
# Check available versions for VirtualMachine CRD
kubectl get crd virtualmachines.infra.virtrigaud.io -o jsonpath='{.spec.versions[*].name}' | tr ' ' '\n'
Expected output:
v1beta1
v1beta1
Verify Storage Version
Confirm that v1beta1 is set as the storage version:
# Check storage version
kubectl get crd virtualmachines.infra.virtrigaud.io -o jsonpath='{.spec.versions[?(@.storage==true)].name}'
Expected output:
v1beta1
Test Conversion
Create resources using different API versions and verify conversion works:
# Create a VM using v1beta1 API
cat <<EOF | kubectl apply -f -
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: test-vm-alpha
namespace: default
spec:
providerRef:
name: test-provider
classRef:
name: small
imageRef:
name: ubuntu-22
powerState: "On"
EOF
# Read it back as v1beta1
kubectl get vm test-vm-alpha -o yaml | grep "apiVersion:"
# Should show: apiVersion: infra.virtrigaud.io/v1beta1
# Create a VM using v1beta1 API
cat <<EOF | kubectl apply -f -
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: test-vm-beta
namespace: default
spec:
providerRef:
name: test-provider
classRef:
name: small
imageRef:
name: ubuntu-22
powerState: On
EOF
# Clean up test resources
kubectl delete vm test-vm-alpha test-vm-beta
Troubleshooting
Conversion Webhook Missing
If the conversion webhook is missing or not configured:
# Check if webhook service exists
kubectl get svc virtrigaud-webhook -n virtrigaud
# Check webhook pod logs
kubectl logs -l app.kubernetes.io/name=virtrigaud -n virtrigaud
# Verify webhook certificate
kubectl get secret virtrigaud-webhook-certs -n virtrigaud
Conversion Webhook Failing
If conversion is failing:
# Check conversion webhook logs
kubectl logs -l app.kubernetes.io/name=virtrigaud -n virtrigaud | grep conversion
# Test webhook connectivity
kubectl get --raw "/api/v1/namespaces/virtrigaud/services/virtrigaud-webhook:webhook/proxy/convert"
# Check webhook certificate validity
kubectl get secret virtrigaud-webhook-certs -n virtrigaud -o yaml
API Version Issues
If certain API versions arenβt working:
# List all available APIs
kubectl api-resources | grep virtrigaud
# Check specific CRD status
kubectl describe crd virtualmachines.infra.virtrigaud.io
# Verify controller is running
kubectl get pods -l app.kubernetes.io/name=virtrigaud -n virtrigaud
Integration with GitOps
ArgoCD
apiVersion: argoproj.io/v1beta1
kind: Application
metadata:
name: virtrigaud
spec:
source:
chart: virtrigaud
repoURL: https://projectbeskar.github.io/virtrigaud
targetRevision: "1.0.0"
helm:
values: |
manager:
image:
repository: ghcr.io/projectbeskar/virtrigaud/manager
tag: v1.0.0
Flux
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: virtrigaud
spec:
chart:
spec:
chart: virtrigaud
sourceRef:
kind: HelmRepository
name: virtrigaud
version: "1.0.0"
values:
manager:
image:
repository: ghcr.io/projectbeskar/virtrigaud/manager
tag: v1.0.0
Migration from Kustomize to Helm
If youβre currently using Kustomize for CRD management and want to switch to Helm:
-
Backup existing resources:
kubectl get vms,providers,vmclasses -A -o yaml > virtrigaud-backup.yaml -
Uninstall Kustomize-managed CRDs (optional):
kubectl delete -k config/default -
Install via Helm:
helm install virtrigaud charts/virtrigaud --namespace virtrigaud --create-namespace -
Restore resources:
kubectl apply -f virtrigaud-backup.yaml
The conversion webhook will handle any necessary API version transformations automatically.
Automatic CRD Upgrades in VirtRigaud Helm Chart
Overview
VirtRigaud Helm chart now supports automatic CRD upgrades during helm upgrade. This eliminates the need for manual CRD management and provides a seamless upgrade experience.
The Problem
By default, Helm has a limitation:
- CRDs are installed during
helm install - CRDs are NOT upgraded during
helm upgrade
This means users had to manually apply CRD updates before upgrading, which was:
- Error-prone
- Easy to forget
- Breaks GitOps workflows
- Causes version drift between chart and CRDs
The Solution
VirtRigaud uses Helm Hooks with a Kubernetes Job to automatically apply CRDs during both install and upgrade:
kubectl Image
VirtRigaud builds and publishes its own kubectl image as part of the release process. This image:
- Based on Alpine Linux for minimal size (~50MB)
- Includes kubectl 1.32.0 binary from official Kubernetes releases
- Includes bash and shell for scripting support
- Runs as non-root user (UID 65532)
- Verified with SHA256 checksums
- Signed with Cosign and includes SBOM
- Security scanned but uses official kubectl binary (vulnerabilities tracked upstream)
The image is automatically built and tagged to match each VirtRigaud release version, ensuring version consistency across all components.
Image Location: ghcr.io/projectbeskar/virtrigaud/kubectl:<version>
How It Works
- Pre-Upgrade Hook: Before the main upgrade starts, a Job is created
- CRD Application: The Job applies all CRDs using
kubectl apply --server-side - Safe Upgrades: Server-side apply handles conflicts gracefully
- Automatic Cleanup: Job is deleted after successful completion
Architecture
helm upgrade virtrigaud
β
[Pre-Upgrade Hook -10]
β
ConfigMap with CRDs created
β
[Pre-Upgrade Hook -5]
β
ServiceAccount + RBAC created
β
[Pre-Upgrade Hook 0]
β
Job applies CRDs via kubectl
β
[Standard Helm Resources]
β
Manager & Providers deployed
β
[Hook Cleanup]
β
Job & Hook resources deleted
Features
Enabled by Default
No configuration needed - just works:
helm upgrade virtrigaud virtrigaud/virtrigaud -n virtrigaud-system
Server-Side Apply
Uses kubectl apply --server-side for:
- Safe conflict resolution
- Field management
- No ownership conflicts
GitOps Compatible
Works seamlessly with:
- ArgoCD: Helm hooks execute properly
- Flux: Compatible with HelmRelease CRD upgrades
- Terraform: Helm provider handles hooks
Configurable
Customize the upgrade behavior:
crdUpgrade:
enabled: true # Enable/disable automatic upgrades
image:
repository: ghcr.io/projectbeskar/virtrigaud/kubectl # VirtRigaud kubectl image
tag: "v0.2.0" # Auto-updated to match release version
backoffLimit: 3
ttlSecondsAfterFinished: 300
waitSeconds: 5
resources:
limits:
cpu: 100m
memory: 128Mi
Usage Examples
Standard Upgrade (Automatic CRDs)
# CRDs are automatically upgraded
helm upgrade virtrigaud virtrigaud/virtrigaud \
-n virtrigaud-system
Disable Automatic CRD Upgrade
# Disable if you manage CRDs separately
helm upgrade virtrigaud virtrigaud/virtrigaud \
-n virtrigaud-system \
--set crdUpgrade.enabled=false
Manual CRD Management
# Apply CRDs manually before upgrade
kubectl apply -f charts/virtrigaud/crds/
# Then upgrade without CRD management
helm upgrade virtrigaud virtrigaud/virtrigaud \
-n virtrigaud-system \
--set crdUpgrade.enabled=false
Skip CRDs Entirely
# Skip CRDs during upgrade (for external CRD management)
helm upgrade virtrigaud virtrigaud/virtrigaud \
-n virtrigaud-system \
--skip-crds \
--set crdUpgrade.enabled=false
GitOps Integration
ArgoCD
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: virtrigaud
spec:
source:
chart: virtrigaud
targetRevision: 0.2.2
helm:
values: |
crdUpgrade:
enabled: true # Automatic upgrades work!
syncPolicy:
automated:
prune: true
selfHeal: true
Note: ArgoCD executes Helm hooks properly, so CRDs will be upgraded automatically.
Flux
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: virtrigaud
spec:
chart:
spec:
chart: virtrigaud
version: 0.2.2
values:
crdUpgrade:
enabled: true # Automatic upgrades work!
install:
crds: CreateReplace
upgrade:
crds: CreateReplace
Note: Fluxβs crds: CreateReplace works alongside our hook-based upgrades for maximum compatibility.
Troubleshooting
Check CRD Upgrade Job
# View job status
kubectl get jobs -n virtrigaud-system -l app.kubernetes.io/component=crd-upgrade
# View job logs
kubectl logs -n virtrigaud-system -l app.kubernetes.io/component=crd-upgrade
# View job details
kubectl describe job -n virtrigaud-system -l app.kubernetes.io/component=crd-upgrade
Common Issues
1. RBAC Permissions
Symptom: Job fails with βforbiddenβ errors
Solution: Ensure the ServiceAccount has CRD permissions:
kubectl get clusterrole -l app.kubernetes.io/component=crd-upgrade
kubectl describe clusterrole <role-name>
2. Image Pull Failures
Symptom: Job fails to start, ImagePullBackOff
Solution: Check image configuration:
crdUpgrade:
image:
repository: ghcr.io/projectbeskar/virtrigaud/kubectl
tag: "v0.2.2-rc1" # Use matching VirtRigaud version
pullPolicy: IfNotPresent
3. CRD Conflicts
Symptom: Apply errors about field conflicts
Solution: Server-side apply handles this automatically, but you can force:
kubectl apply --server-side=true --force-conflicts -f charts/virtrigaud/crds/
4. Job Not Cleaning Up
Symptom: Old jobs remain after upgrade
Solution: Adjust TTL or manually clean:
kubectl delete jobs -n virtrigaud-system -l app.kubernetes.io/component=crd-upgrade
Debug Mode
Enable verbose logging:
helm upgrade virtrigaud virtrigaud/virtrigaud \
-n virtrigaud-system \
--debug
Migration Guide
Migrating from Manual CRD Management
If you were previously managing CRDs manually:
-
Enable automatic upgrades:
helm upgrade virtrigaud virtrigaud/virtrigaud \ -n virtrigaud-system \ --set crdUpgrade.enabled=true -
Verify CRDs are upgraded:
kubectl get crd -l app.kubernetes.io/name=virtrigaud -
Remove manual steps from your upgrade process
Migrating to External CRD Management
If you want to manage CRDs externally (e.g., separate Helm chart):
-
Disable automatic upgrades:
crdUpgrade: enabled: false -
Extract CRDs:
helm show crds virtrigaud/virtrigaud > my-crds.yaml -
Manage CRDs separately:
kubectl apply -f my-crds.yaml
Technical Details
Hook Weights
The upgrade process uses weighted hooks for proper ordering:
| Weight | Resource | Purpose |
|---|---|---|
-10 | ConfigMap | Store CRD content |
-5 | RBAC | Create permissions |
0 | Job | Apply CRDs |
Resource Requirements
The CRD upgrade job is lightweight:
resources:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 50m
memory: 64Mi
Security
- Runs as non-root user (65532)
- Read-only root filesystem
- No privilege escalation
- Minimal RBAC (only CRD permissions)
- Automatic cleanup after completion
Compatibility
- Kubernetes: 1.25+
- Helm: 3.8+
- kubectl: 1.24+ (in Job image)
Best Practices
- Use Automatic Upgrades: Enable by default for best UX
- Monitor Job Logs: Check logs during first upgrade
- Test in Dev First: Verify upgrades in non-production
- Backup CRDs: Keep backups before major upgrades
- Review Changelogs: Check for breaking CRD changes
FAQ
Q: Will this delete my existing resources?
A: No. CRD upgrades are additive and preserve existing Custom Resources.
Q: What happens if the job fails?
A: Helm upgrade will fail, leaving your cluster in the previous state. Fix the issue and retry.
Q: Can I use this with ArgoCD?
A: Yes! ArgoCD properly executes Helm hooks.
Q: Does this work with Flux?
A: Yes! Flux HelmRelease handles hooks correctly.
Q: How do I roll back?
A: Use helm rollback. CRDs are not rolled back (Kubernetes limitation).
Q: Can I customize the kubectl image?
A: Yes, via crdUpgrade.image.repository and crdUpgrade.image.tag. The default uses the official Kubernetes kubectl image from registry.k8s.io.
References
Custom Resource Definitions (CRDs)
This document describes all the Custom Resource Definitions (CRDs) provided by virtrigaud.
VirtualMachine
The VirtualMachine CRD represents a virtual machine instance.
Spec
| Field | Type | Required | Description |
|---|---|---|---|
providerRef | ObjectRef | Yes | Reference to the Provider resource |
classRef | ObjectRef | Yes | Reference to the VMClass resource |
imageRef | ObjectRef | Yes | Reference to the VMImage resource |
networks | []VMNetworkRef | No | Network attachments |
disks | []DiskSpec | No | Additional disks |
userData | UserData | No | Cloud-init configuration |
metaData | MetaData | No | Cloud-init metadata configuration |
placement | Placement | No | Placement hints |
powerState | string | No | Desired power state (On/Off) |
tags | []string | No | Tags for organization |
Status
| Field | Type | Description |
|---|---|---|
id | string | Provider-specific VM identifier |
powerState | string | Current power state |
ips | []string | Assigned IP addresses |
consoleURL | string | Console access URL |
conditions | []Condition | Status conditions |
observedGeneration | int64 | Last observed generation |
lastTaskRef | string | Reference to last async task |
provider | map[string]string | Provider-specific details |
Example
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: demo-web-01
spec:
providerRef:
name: vsphere-prod
classRef:
name: small
imageRef:
name: ubuntu-22-template
networks:
- name: app-net
ipPolicy: dhcp
powerState: On
VMClass
The VMClass CRD defines resource allocation for virtual machines.
Spec
| Field | Type | Required | Description |
|---|---|---|---|
cpu | int32 | Yes | Number of virtual CPUs |
memoryMiB | int32 | Yes | Memory in MiB |
firmware | string | No | Firmware type (BIOS/UEFI) |
diskDefaults | DiskDefaults | No | Default disk settings |
guestToolsPolicy | string | No | Guest tools policy |
extraConfig | map[string]string | No | Provider-specific configuration |
Example
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: small
spec:
cpu: 2
memoryMiB: 4096
firmware: UEFI
diskDefaults:
type: thin
sizeGiB: 40
VMImage
The VMImage CRD defines base templates/images for virtual machines.
Spec
| Field | Type | Required | Description |
|---|---|---|---|
vsphere | VSphereImageSpec | No | vSphere-specific configuration |
libvirt | LibvirtImageSpec | No | Libvirt-specific configuration |
prepare | ImagePrepare | No | Image preparation options |
Example
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMImage
metadata:
name: ubuntu-22-template
spec:
vsphere:
templateName: "tmpl-ubuntu-22.04-cloudimg"
libvirt:
url: "https://cloud-images.ubuntu.com/jammy/current/jammy-server-cloudimg-amd64.img"
format: qcow2
VMNetworkAttachment
The VMNetworkAttachment CRD defines network configurations.
Spec
| Field | Type | Required | Description |
|---|---|---|---|
vsphere | VSphereNetworkSpec | No | vSphere-specific network config |
libvirt | LibvirtNetworkSpec | No | Libvirt-specific network config |
ipPolicy | string | No | IP assignment policy |
macAddress | string | No | Static MAC address |
Example
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMNetworkAttachment
metadata:
name: app-net
spec:
vsphere:
portgroup: "PG-App"
ipPolicy: dhcp
Provider
The Provider CRD configures hypervisor connection details.
Spec
| Field | Type | Required | Description |
|---|---|---|---|
type | string | Yes | Provider type (vsphere/libvirt/etc) |
endpoint | string | Yes | Provider endpoint URI |
credentialSecretRef | ObjectRef | Yes | Secret containing credentials |
insecureSkipVerify | bool | No | Skip TLS verification |
defaults | ProviderDefaults | No | Default placement settings |
rateLimit | RateLimit | No | API rate limiting |
Example
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: vsphere-prod
spec:
type: vsphere
endpoint: https://vcenter.example.com
credentialSecretRef:
name: vsphere-creds
defaults:
datastore: datastore1
cluster: compute-cluster-a
Common Types
ObjectRef
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Object name |
namespace | string | No | Object namespace |
DiskSpec
| Field | Type | Required | Description |
|---|---|---|---|
sizeGiB | int32 | Yes | Disk size in GiB |
type | string | No | Disk type |
name | string | No | Disk name |
UserData
| Field | Type | Required | Description |
|---|---|---|---|
cloudInit | CloudInitConfig | No | Cloud-init configuration |
MetaData
| Field | Type | Required | Description |
|---|---|---|---|
inline | string | No | Inline cloud-init metadata in YAML format |
secretRef | ObjectRef | No | Secret containing cloud-init metadata |
CloudInitConfig
| Field | Type | Required | Description |
|---|---|---|---|
secretRef | ObjectRef | No | Secret containing cloud-init data |
inline | string | No | Inline cloud-init configuration |
Examples
This document provides practical examples for using VirtRigaud with the Remote provider architecture.
Quick Start Examples
All VirtRigaud providers now run as Remote providers. Here are the essential examples to get started:
Basic Provider Setup
- vSphere Provider - Basic vSphere provider configuration
- LibVirt Provider - Basic LibVirt provider configuration
Complete Working Examples
- Complete vSphere Setup - End-to-end vSphere VM creation
- Advanced vSphere Setup - Production-ready vSphere configuration
- LibVirt Complete Setup - End-to-end LibVirt VM creation
- Multi-Provider Setup - Using multiple providers together
Individual Resource Examples
- VMClass - VM resource allocation template
- VMImage - VM image/template definition
- VMNetworkAttachment - Network configuration
- Simple VM - Basic virtual machine
Advanced Examples
- Security Configuration - RBAC, network policies, external secrets
- Advanced Operations - Snapshots, reconfiguration, lifecycle management
Example Directory Structure
docs/examples/
βββ provider-*.yaml # Provider configurations
βββ complete-example.yaml # Full working setup
βββ *-advanced-example.yaml # Production configurations
βββ vm*.yaml # Individual resource definitions
βββ advanced/ # Advanced operations
βββ security/ # Security configurations
βββ secrets/ # Credential examples
Key Changes from Previous Versions
Remote-Only Architecture
All providers now run as separate pods with the Remote runtime:
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: my-provider
spec:
type: vsphere # or libvirt, proxmox
endpoint: https://vcenter.example.com
credentialSecretRef:
name: provider-creds
runtime:
mode: Remote # Required - only mode supported
image: "ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.2.3"
service:
port: 9090
Current API Schema (v0.2.3)
- VMClass: Standard Kubernetes resource quantities (
cpus: 4,memory: "4Gi") - VMImage: Provider-specific source configurations
- VMNetworkAttachment: Network provider abstractions
- VirtualMachine: Declarative power state management
Configuration Management
Providers receive configuration through:
- Endpoint: Environment variable
PROVIDER_ENDPOINT - Credentials: Mounted secret files in
/etc/virtrigaud/credentials/ - Runtime: Managed automatically by the provider controller
Getting Started
- Choose your provider from the basic examples above
- Create credentials secret (see
examples/secrets/) - Apply provider configuration with required
runtimesection - Define VM resources (VMClass, VMImage, VMNetworkAttachment)
- Create VirtualMachine referencing your resources
For detailed setup instructions, see:
Need Help?
- Check the Remote Providers documentation for architecture details
- Review provider-specific guides for setup instructions
- Look at complete examples for working configurations
- See troubleshooting tips for common issues
Provider Development Guide
This document explains how to implement a new provider for VirtRigaud.
Overview
Providers are responsible for implementing VM lifecycle operations on specific hypervisor platforms. VirtRigaud uses a Remote Provider architecture where each provider runs as an independent gRPC service, communicating with the manager controller.
Provider Interface
All providers must implement the contracts.Provider interface:
type Provider interface {
// Validate ensures the provider session/credentials are healthy
Validate(ctx context.Context) error
// Create creates a new VM if it doesn't exist (idempotent)
Create(ctx context.Context, req CreateRequest) (CreateResponse, error)
// Delete removes a VM (idempotent)
Delete(ctx context.Context, id string) (taskRef string, err error)
// Power performs a power operation on the VM
Power(ctx context.Context, id string, op PowerOp) (taskRef string, err error)
// Reconfigure modifies VM resources
Reconfigure(ctx context.Context, id string, desired CreateRequest) (taskRef string, err error)
// Describe returns the current state of the VM
Describe(ctx context.Context, id string) (DescribeResponse, error)
// IsTaskComplete checks if an async task is complete
IsTaskComplete(ctx context.Context, taskRef string) (done bool, err error)
}
Implementation Steps
1. Create Provider Package
Create a new package under internal/providers/ for your provider:
internal/providers/yourprovider/
βββ provider.go # Main provider implementation
βββ session.go # Connection/session management
βββ tasks.go # Async task handling
βββ converter.go # Type conversions
βββ network.go # Network operations
βββ storage.go # Storage operations
2. Implement the Provider
package yourprovider
import (
"context"
"github.com/projectbeskar/virtrigaud/api/v1beta1"
"github.com/projectbeskar/virtrigaud/internal/providers/contracts"
)
type Provider struct {
config *v1beta1.Provider
client YourProviderClient
}
func NewProvider(ctx context.Context, provider *v1beta1.Provider) (contracts.Provider, error) {
// Initialize your provider client
// Parse credentials from secret
// Establish connection
return &Provider{
config: provider,
client: client,
}, nil
}
func (p *Provider) Validate(ctx context.Context) error {
// Check connection health
// Validate credentials
return nil
}
// Implement other interface methods...
3. Create Provider gRPC Server
Create a gRPC server for your provider:
// cmd/provider-yourprovider/main.go
package main
import (
"context"
"log"
"net"
"google.golang.org/grpc"
"github.com/projectbeskar/virtrigaud/pkg/grpc/provider"
"github.com/projectbeskar/virtrigaud/internal/providers/yourprovider"
)
func main() {
lis, err := net.Listen("tcp", ":9090")
if err != nil {
log.Fatal(err)
}
s := grpc.NewServer()
provider.RegisterProviderServer(s, &yourprovider.GRPCServer{})
log.Println("Provider server listening on :9090")
if err := s.Serve(lis); err != nil {
log.Fatal(err)
}
}
4. Handle Credentials
Providers should read credentials from Kubernetes secrets. Common credential fields:
username/password: Basic authenticationtoken: API token authenticationtls.crt/tls.key: TLS client certificates
Example:
func (p *Provider) getCredentials(ctx context.Context) (*Credentials, error) {
secret := &corev1.Secret{}
err := p.client.Get(ctx, types.NamespacedName{
Name: p.config.Spec.CredentialSecretRef.Name,
Namespace: p.config.Namespace,
}, secret)
if err != nil {
return nil, err
}
return &Credentials{
Username: string(secret.Data["username"]),
Password: string(secret.Data["password"]),
}, nil
}
Error Handling
Use the provided error types for consistent error handling:
import "github.com/projectbeskar/virtrigaud/internal/providers/contracts"
// For not found errors
return contracts.NewNotFoundError("VM not found", err)
// For retryable errors
return contracts.NewRetryableError("Connection timeout", err)
// For validation errors
return contracts.NewInvalidSpecError("Invalid CPU count", nil)
Asynchronous Operations
For long-running operations, return a task reference:
func (p *Provider) Create(ctx context.Context, req CreateRequest) (CreateResponse, error) {
taskID, err := p.client.CreateVMAsync(...)
if err != nil {
return CreateResponse{}, err
}
return CreateResponse{
ID: vmID,
TaskRef: taskID,
}, nil
}
func (p *Provider) IsTaskComplete(ctx context.Context, taskRef string) (bool, error) {
task, err := p.client.GetTask(taskRef)
if err != nil {
return false, err
}
return task.IsComplete(), nil
}
Type Conversions
Convert between CRD types and provider-specific types:
func (p *Provider) convertVMClass(class contracts.VMClass) YourProviderVMSpec {
return YourProviderVMSpec{
CPUs: class.CPU,
Memory: class.MemoryMiB * 1024 * 1024, // Convert to bytes
// ... other conversions
}
}
Testing
Create unit tests for your provider:
func TestProvider_Create(t *testing.T) {
provider := &Provider{
client: &mockClient{},
}
req := contracts.CreateRequest{
Name: "test-vm",
// ... populate request
}
resp, err := provider.Create(context.Background(), req)
assert.NoError(t, err)
assert.NotEmpty(t, resp.ID)
}
Provider-Specific CRD Fields
Update the CRD types to include provider-specific fields:
// In VMImage types
type YourProviderImageSpec struct {
ImageID string `json:"imageId,omitempty"`
Checksum string `json:"checksum,omitempty"`
}
// In VMNetworkAttachment types
type YourProviderNetworkSpec struct {
NetworkID string `json:"networkId,omitempty"`
VLAN int32 `json:"vlan,omitempty"`
}
Best Practices
- Idempotency: All operations should be idempotent
- Error Classification: Use appropriate error types
- Resource Cleanup: Ensure proper cleanup in Delete operations
- Logging: Use structured logging with context
- Timeouts: Respect context timeouts
- Rate Limiting: Implement client-side rate limiting
- Retry Logic: Handle transient failures gracefully
Examples
See the existing providers for reference:
internal/providers/vsphere/- vSphere implementationinternal/providers/libvirt/- Libvirt implementation (production ready)
Provider Configuration
Each provider type should support these configuration options:
- Connection endpoints
- Authentication credentials
- Default placement settings
- Rate limiting configuration
- Provider-specific options
Example Provider spec:
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: my-provider
spec:
type: yourprovider
endpoint: https://api.yourprovider.com
credentialSecretRef:
name: provider-creds
defaults:
region: us-west-2
zone: us-west-2a
rateLimit:
qps: 10
burst: 20
Provider Capabilities Matrix
This document provides a comprehensive overview of VirtRigaud provider capabilities as of v0.2.3.
Overview
VirtRigaud supports multiple hypervisor platforms through a provider architecture. Each provider implements the core VirtRigaud API while supporting platform-specific features and capabilities.
Core Provider Interface
All providers implement these core operations:
- Validate: Test provider connectivity and credentials
- Create: Create new virtual machines
- Delete: Remove virtual machines and cleanup resources
- Power: Control VM power state (On/Off/Reboot)
- Describe: Query VM state and properties
- GetCapabilities: Report provider-specific capabilities
Provider Status
| Provider | Status | Implementation | Maturity |
|---|---|---|---|
| vSphere | β Production Ready | govmomi-based | Stable |
| Libvirt/KVM | β Production Ready | virsh-based | Stable |
| Proxmox VE | β Production Ready | REST API-based | Beta |
| Mock | β Complete | In-memory simulation | Testing |
Comprehensive Capability Matrix
Core Operations
| Capability | vSphere | Libvirt | Proxmox | Mock | Notes |
|---|---|---|---|---|---|
| VM Create | β | β | β | β | All providers support VM creation |
| VM Delete | β | β | β | β | With resource cleanup |
| Power On/Off | β | β | β | β | Basic power management |
| Reboot | β | β | β | β | Graceful and forced restart |
| Suspend | β | β | β | β | Memory state preservation |
| Describe | β | β | β | β | VM state and properties |
| Reconfigure | β | β οΈ | β | β | CPU/Memory/Disk changes (Libvirt requires restart) |
| TaskStatus | β | N/A | β | β | Async operation tracking |
| ConsoleURL | β | β | β οΈ | β | Remote console access (Proxmox planned) |
Resource Management
| Capability | vSphere | Libvirt | Proxmox | Mock | Notes |
|---|---|---|---|---|---|
| CPU Configuration | β | β | β | β | Cores, sockets, threading |
| Memory Allocation | β | β | β | β | Static memory sizing |
| Hot CPU Add | β | β | β | β | Online CPU expansion |
| Hot Memory Add | β | β | β | β | Online memory expansion |
| Resource Reservations | β | β | β | β | Guaranteed resources |
| Resource Limits | β | β | β | β | Resource capping |
Storage Operations
| Capability | vSphere | Libvirt | Proxmox | Mock | Notes |
|---|---|---|---|---|---|
| Disk Creation | β | β | β | β | Virtual disk provisioning |
| Disk Expansion | β | β | β | β | Online disk growth |
| Multiple Disks | β | β | β | β | Multi-disk VMs |
| Thin Provisioning | β | β | β | β | Space-efficient disks |
| Thick Provisioning | β | β | β | β | Pre-allocated storage |
| Storage Policies | β | β | β | β | Policy-based placement |
| Storage Pools | β | β | β | β | Organized storage management |
Network Configuration
| Capability | vSphere | Libvirt | Proxmox | Mock | Notes |
|---|---|---|---|---|---|
| Basic Networking | β | β | β | β | Single network interface |
| Multiple NICs | β | β | β | β | Multi-interface VMs |
| VLAN Support | β | β | β | β | Network segmentation |
| Static IP | β | β | β | β | Fixed IP assignment |
| DHCP | β | β | β | β | Dynamic IP assignment |
| Bridge Networks | β | β | β | β | Direct host bridging |
| Distributed Switches | β | β | β | β | Advanced vSphere networking |
VM Lifecycle
| Capability | vSphere | Libvirt | Proxmox | Mock | Notes |
|---|---|---|---|---|---|
| Template Deployment | β | β | β | β | Deploy from templates |
| Clone Operations | β Complete | β | β | β | Full VM duplication with snapshot support |
| Linked Clones | β | β | β | β | COW-based clones with automatic snapshot creation |
| Full Clones | β | β | β | β | Independent copies |
| VM Reconfiguration | β Complete | β οΈ Restart Required | β | β | Online resource modification |
Snapshot Operations
| Capability | vSphere | Libvirt | Proxmox | Mock | Notes |
|---|---|---|---|---|---|
| Create Snapshots | β | β | β | β | Point-in-time captures |
| Delete Snapshots | β | β | β | β | Snapshot cleanup |
| Revert Snapshots | β | β | β | β | Restore VM state |
| Memory Snapshots | β | β | β | β | Include RAM state |
| Quiesced Snapshots | β | β | β | β | Consistent filesystem |
| Snapshot Trees | β | β | β | β | Hierarchical snapshots |
Image Management
| Capability | vSphere | Libvirt | Proxmox | Mock | Notes |
|---|---|---|---|---|---|
| OVA/OVF Import | β | β | β | β | Standard VM formats |
| Cloud Image Download | β | β | β | β | Remote image fetch |
| Content Libraries | β | β | β | β | Centralized image management |
| Image Conversion | β | β | β | β | Format transformation |
| Image Caching | β | β | β | β | Performance optimization |
Guest Operating System
| Capability | vSphere | Libvirt | Proxmox | Mock | Notes |
|---|---|---|---|---|---|
| Cloud-Init | β | β | β | β | Guest initialization |
| Guest Tools | β | β | β | β | Enhanced guest integration |
| Guest Agent | β | β | β | β | Runtime guest communication |
| Guest Customization | β | β | β | β | OS-specific customization |
| Guest Monitoring | β | β | β | β | Resource usage tracking |
Advanced Features
| Capability | vSphere | Libvirt | Proxmox | Mock | Notes |
|---|---|---|---|---|---|
| High Availability | β | β | β | β | Automatic failover |
| DRS/Load Balancing | β | β | β | β | Resource optimization |
| Fault Tolerance | β | β | β | β | Zero-downtime protection |
| vMotion/Migration | β | β | β | β | Live VM migration |
| Resource Pools | β | β | β | β | Hierarchical resource mgmt |
| Affinity Rules | β | β | β | β | VM placement policies |
Monitoring & Observability
| Capability | vSphere | Libvirt | Proxmox | Mock | Notes |
|---|---|---|---|---|---|
| Performance Metrics | β | β | β | β | CPU, memory, disk, network |
| Event Logging | β | β | β | β | Operation audit trail |
| Health Checks | β | β | β | β | VM and guest health |
| Alerting | β | β | β | β | Threshold-based notifications |
| Historical Data | β | β | β | β | Performance history |
| Console URL Generation | β | β | β οΈ | β | Web/VNC console access (Proxmox planned) |
| Guest Agent Integration | β | β | β Complete | β | IP detection and guest info |
Provider-Specific Features
vSphere Exclusive
- vCenter Integration: Full vCenter Server and ESXi support
- Content Library: Centralized template and ISO management
- Distributed Resource Scheduler (DRS): Automatic load balancing
- vMotion: Live migration between hosts
- High Availability (HA): Automatic VM restart on host failure
- Fault Tolerance: Zero-downtime VM protection
- Storage vMotion: Live storage migration
- vSAN Integration: Hyper-converged storage
- NSX Integration: Software-defined networking
- Hot Reconfiguration: Online CPU/memory/disk changes with hot-add support
- TaskStatus Tracking: Real-time async operation monitoring via govmomi
- Clone Operations: Full and linked clones with automatic snapshot handling
- Web Console URLs: Direct vSphere web client console access
Libvirt/KVM Exclusive
- Virsh Integration: Command-line management
- QEMU Guest Agent: Advanced guest OS integration
- KVM Optimization: Native Linux virtualization
- Bridge Networking: Direct host network bridging
- Storage Pool Flexibility: Multiple storage backend support
- Cloud Image Support: Direct cloud image deployment
- Host Device Passthrough: Hardware device assignment
- Reconfiguration Support: CPU/memory/disk changes via virsh (restart required)
- VNC Console Access: Direct VNC console URL generation for remote viewers
Proxmox VE Exclusive
- Web UI Integration: Built-in management interface
- Container Support: LXC container management
- Backup Integration: Built-in backup and restore
- Cluster Management: Multi-node cluster support
- ZFS Integration: Advanced filesystem features
- Ceph Integration: Distributed storage
- Guest Agent IP Detection: Accurate IP address extraction via QEMU guest agent
- Hot-plug Reconfiguration: Online CPU/memory/disk modifications
- Complete CRD Integration: Full Kubernetes custom resource support
Mock Provider Features
- Testing Scenarios: Configurable failure modes
- Performance Simulation: Controllable operation delays
- Sample Data: Pre-populated demonstration VMs
- Development Support: Full API coverage for testing
Supported Disk Types
| Provider | Disk Formats | Notes |
|---|---|---|
| vSphere | thin, thick, eagerZeroedThick | vSphere native formats |
| Libvirt | qcow2, raw, vmdk | QEMU-supported formats |
| Proxmox | qcow2, raw, vmdk | Proxmox storage formats |
| Mock | thin, thick, raw, qcow2 | Simulated formats |
Supported Network Types
| Provider | Network Types | Notes |
|---|---|---|
| vSphere | distributed, standard, vlan | vSphere networking |
| Libvirt | virtio, e1000, rtl8139 | QEMU network adapters |
| Proxmox | virtio, e1000, rtl8139 | Proxmox network models |
| Mock | bridge, nat, distributed | Simulated network types |
Provider Images
All provider images are available from the GitHub Container Registry:
- vSphere:
ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.2.3 - Libvirt:
ghcr.io/projectbeskar/virtrigaud/provider-libvirt:v0.2.3 - Proxmox:
ghcr.io/projectbeskar/virtrigaud/provider-proxmox:v0.2.3 - Mock:
ghcr.io/projectbeskar/virtrigaud/provider-mock:v0.2.3
Choosing a Provider
Use vSphere When:
- You have existing VMware infrastructure
- You need enterprise features (HA, DRS, vMotion)
- You require advanced networking (NSX, distributed switches)
- You need centralized management (vCenter)
Use Libvirt/KVM When:
- You want open-source virtualization
- Youβre running on Linux hosts
- You need cost-effective virtualization
- You want direct host integration
Use Proxmox VE When:
- You need both VMs and containers
- You want integrated backup solutions
- You need cluster management
- You want web-based management
Use Mock Provider When:
- Youβre developing or testing VirtRigaud
- You need to simulate VM operations
- Youβre creating demos or training materials
- Youβre testing VirtRigaud without hypervisors
Performance Considerations
vSphere
- Best for: Large-scale enterprise deployments
- Scalability: Hundreds to thousands of VMs
- Overhead: Higher due to feature richness
- Resource Efficiency: Excellent with DRS
Libvirt/KVM
- Best for: Linux-based deployments
- Scalability: Moderate to large deployments
- Overhead: Low, near-native performance
- Resource Efficiency: Good with proper tuning
Proxmox VE
- Best for: SMB and mixed workloads
- Scalability: Small to medium deployments
- Overhead: Moderate
- Resource Efficiency: Good with clustering
Future Roadmap
Planned Enhancements
vSphere
- vSphere 8.0 support
- Enhanced NSX integration
- GPU passthrough support
- vSAN policy automation
Libvirt
- Live migration support
- SR-IOV networking
- NUMA topology optimization
- Enhanced performance monitoring
Proxmox
- HA configuration
- Storage replication
- Advanced networking
- Performance optimizations
Support Matrix
| Feature Category | vSphere | Libvirt | Proxmox | Mock |
|---|---|---|---|---|
| Production Ready | β | β | β Beta | β Testing |
| Documentation | Complete | Complete | Complete | Complete |
| Community Support | Active | Active | Growing | N/A |
| Enterprise Support | Available | Available | Available | N/A |
Version History
- v0.2.3: Provider feature parity - Reconfigure, Clone, TaskStatus, ConsoleURL
- v0.2.2: Nested virtualization, TPM support, comprehensive snapshot management
- v0.2.1: Critical fixes, documentation updates, VMClass disk settings
- v0.2.0: Production-ready vSphere and Libvirt providers
- v0.1.0: Initial provider framework and mock implementation
This document reflects VirtRigaud v0.2.3 capabilities. For the latest updates, see the VirtRigaud documentation.
vSphere Provider
The vSphere provider enables VirtRigaud to manage virtual machines on VMware vSphere environments, including vCenter Server and standalone ESXi hosts. This provider is designed for enterprise production environments with comprehensive support for vSphere features.
Overview
This provider implements the VirtRigaud provider interface to manage VM lifecycle operations on VMware vSphere:
- Create: Create VMs from templates, content libraries, or OVF/OVA files
- Delete: Remove VMs and associated storage (with configurable retention)
- Power: Start, stop, restart, and suspend virtual machines
- Describe: Query VM state, resource usage, guest info, and vSphere properties
- Reconfigure: Hot-add CPU/memory, resize disks, modify network adapters (v0.2.3+)
- Clone: Create full or linked clones from existing VMs or templates (v0.2.3+)
- Snapshot: Create, delete, and revert VM snapshots with memory state
- TaskStatus: Track asynchronous operations with progress monitoring (v0.2.3+)
- ConsoleURL: Generate vSphere web client console URLs (v0.2.3+)
- ImagePrepare: Import OVF/OVA, deploy from content library, or ensure template existence
Prerequisites
β οΈ IMPORTANT: Active vSphere Environment Required
The vSphere provider connects to VMware vSphere infrastructure and requires active vCenter Server or ESXi hosts.
Requirements:
- vCenter Server 7.0+ or ESXi 7.0+ (running and accessible)
- User account with appropriate privileges for VM management
- Network connectivity from VirtRigaud to vCenter/ESXi (HTTPS/443)
- vSphere infrastructure:
- Configured datacenters, clusters, and hosts
- Storage (datastores) for VM files
- Networks (port groups) for VM connectivity
- Resource pools for VM placement (optional)
Testing/Development:
For development environments:
- Use VMware vSphere Hypervisor (ESXi) free version
- vCenter Server Appliance evaluation license
- VMware Workstation/Fusion with nested ESXi
- EVE-NG or GNS3 with vSphere emulation
Authentication
The vSphere provider supports multiple authentication methods:
Username/Password Authentication (Common)
Standard vSphere user authentication:
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: vsphere-prod
namespace: default
spec:
type: vsphere
endpoint: https://vcenter.example.com/sdk
credentialSecretRef:
name: vsphere-credentials
# Optional: Skip TLS verification (development only)
insecureSkipVerify: false
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.2.3"
service:
port: 9090
Create credentials secret:
apiVersion: v1
kind: Secret
metadata:
name: vsphere-credentials
namespace: default
type: Opaque
stringData:
username: "virtrigaud@vsphere.local"
password: "SecurePassword123!"
Session Token Authentication (Advanced)
For environments using external authentication:
apiVersion: v1
kind: Secret
metadata:
name: vsphere-token
namespace: default
type: Opaque
stringData:
token: "vmware-api-session-id:abcd1234..."
Service Account Authentication (Recommended)
Create a dedicated service account with minimal required privileges:
# vSphere privileges for VirtRigaud service account:
# - Datastore: Allocate space, Browse datastore, Low level file operations
# - Network: Assign network
# - Resource: Assign virtual machine to resource pool
# - Virtual machine: All privileges (or subset based on requirements)
# - Global: Enable methods, Disable methods, Licenses
Configuration
Connection Endpoints
| Endpoint Type | Format | Use Case |
|---|---|---|
| vCenter Server | https://vcenter.example.com/sdk | Multi-host management (recommended) |
| vCenter FQDN | https://vcenter.corp.local/sdk | Internal domain environments |
| vCenter IP | https://192.168.1.10/sdk | Direct IP access |
| ESXi Host | https://esxi-host.example.com | Single host environments |
Deployment Configuration
Using Helm Values
# values.yaml
providers:
vsphere:
enabled: true
endpoint: "https://vcenter.example.com/sdk"
insecureSkipVerify: false # Set to true for self-signed certificates
credentialSecretRef:
name: vsphere-credentials
namespace: virtrigaud-system
Production Configuration with TLS
# Create secret with credentials and TLS certificates
apiVersion: v1
kind: Secret
metadata:
name: vsphere-secure-credentials
namespace: virtrigaud-system
type: Opaque
stringData:
username: "svc-virtrigaud@vsphere.local"
password: "SecurePassword123!"
# Optional: Custom CA certificate for vCenter
ca.crt: |
-----BEGIN CERTIFICATE-----
# Your vCenter CA certificate here
-----END CERTIFICATE-----
---
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: vsphere-production
namespace: virtrigaud-system
spec:
type: vsphere
endpoint: https://vcenter.prod.example.com/sdk
credentialSecretRef:
name: vsphere-secure-credentials
insecureSkipVerify: false
Development Configuration
# For development with self-signed certificates
providers:
vsphere:
enabled: true
endpoint: "https://esxi-dev.local"
insecureSkipVerify: true # Only for development!
credentialSecretRef:
name: vsphere-dev-credentials
Multi-vCenter Configuration
# Deploy multiple providers for different vCenters
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: vsphere-datacenter-a
spec:
type: vsphere
endpoint: https://vcenter-a.example.com/sdk
credentialSecretRef:
name: vsphere-credentials-a
---
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: vsphere-datacenter-b
spec:
type: vsphere
endpoint: https://vcenter-b.example.com/sdk
credentialSecretRef:
name: vsphere-credentials-b
vSphere Infrastructure Setup
Required vSphere Objects
The provider expects the following vSphere infrastructure to be configured:
Datacenters and Clusters
# Example vSphere hierarchy:
Datacenter: "Production"
βββ Cluster: "Compute-Cluster"
β βββ ESXi Host: esxi-01.example.com
β βββ ESXi Host: esxi-02.example.com
β βββ ESXi Host: esxi-03.example.com
βββ Datastores:
β βββ "datastore-ssd" # High-performance storage
β βββ "datastore-hdd" # Standard storage
β βββ "datastore-backup" # Backup storage
βββ Networks:
βββ "VM Network" # Default VM network
βββ "DMZ-Network" # DMZ port group
βββ "Management" # Management network
Resource Pools (Optional)
# Create resource pools for workload isolation
Datacenter: "Production"
βββ Cluster: "Compute-Cluster"
βββ Resource Pools:
βββ "Development" # Dev workloads (lower priority)
βββ "Production" # Prod workloads (high priority)
βββ "Testing" # Test workloads (medium priority)
VM Configuration
VMClass Specification
Define CPU, memory, and vSphere-specific settings:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: standard-vm
spec:
cpus: 4
memory: "8Gi"
# vSphere-specific configuration
spec:
# VM hardware settings
hardware:
version: "vmx-19" # Hardware version
firmware: "efi" # BIOS or EFI
secureBoot: true # Secure boot (EFI only)
enableCpuHotAdd: true # Hot-add CPU
enableMemoryHotAdd: true # Hot-add memory
# CPU configuration
cpu:
coresPerSocket: 2 # CPU topology
enableVirtualization: false # Nested virtualization
reservationMHz: 1000 # CPU reservation
limitMHz: 4000 # CPU limit
# Memory configuration
memory:
reservationMB: 2048 # Memory reservation
limitMB: 8192 # Memory limit
shareLevel: "normal" # Memory shares (low/normal/high)
# Storage configuration
storage:
diskFormat: "thin" # thick/thin/eagerZeroedThick
storagePolicy: "VM Storage Policy - SSD" # vSAN storage policy
# vSphere placement
placement:
datacenter: "Production" # Target datacenter
cluster: "Compute-Cluster" # Target cluster
resourcePool: "Production" # Target resource pool
datastore: "datastore-ssd" # Preferred datastore
folder: "/vm/virtrigaud" # VM folder
VMImage Specification
Reference vSphere templates, content library items, or OVF files:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMImage
metadata:
name: ubuntu-22-04-template
spec:
# Template from vSphere inventory
source:
template: "ubuntu-22.04-template"
datacenter: "Production"
folder: "/vm/templates"
# Or from content library
# source:
# contentLibrary: "OS Templates"
# item: "ubuntu-22.04-cloud"
# Or from OVF/OVA URL
# source:
# ovf: "https://releases.ubuntu.com/22.04/ubuntu-22.04-server-cloudimg-amd64.ova"
# Guest OS identification
guestOS: "ubuntu64Guest"
# Customization specification
customization:
type: "cloudInit" # cloudInit, sysprep, or linux
spec: "ubuntu-cloud-init" # Reference to customization spec
Complete VM Example
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: web-application
spec:
providerRef:
name: vsphere-prod
classRef:
name: standard-vm
imageRef:
name: ubuntu-22-04-template
powerState: On
# Disk configuration
disks:
- name: root
size: "100Gi"
storageClass: "ssd-storage"
# vSphere-specific disk options
spec:
diskMode: "persistent" # persistent, independent_persistent, independent_nonpersistent
diskFormat: "thin" # thick, thin, eagerZeroedThick
controllerType: "scsi" # scsi, ide, nvme
unitNumber: 0 # SCSI unit number
- name: data
size: "500Gi"
storageClass: "hdd-storage"
spec:
diskFormat: "thick"
controllerType: "scsi"
unitNumber: 1
# Network configuration
networks:
# Primary application network
- name: app-network
portGroup: "VM Network"
# Optional: Static IP assignment
staticIP:
address: "192.168.100.50/24"
gateway: "192.168.100.1"
dns: ["192.168.1.10", "8.8.8.8"]
# Management network
- name: mgmt-network
portGroup: "Management"
# DHCP assignment (default)
# vSphere-specific placement
placement:
datacenter: "Production"
cluster: "Compute-Cluster"
resourcePool: "Production"
folder: "/vm/applications"
datastore: "datastore-ssd" # Override class default
host: "esxi-01.example.com" # Pin to specific host (optional)
# Guest customization
userData:
cloudInit:
inline: |
#cloud-config
hostname: web-application
users:
- name: ubuntu
sudo: ALL=(ALL) NOPASSWD:ALL
ssh_authorized_keys:
- "ssh-ed25519 AAAA..."
packages:
- nginx
- docker.io
- open-vm-tools # VMware tools for guest integration
runcmd:
- systemctl enable nginx
- systemctl enable docker
- systemctl enable open-vm-tools
Advanced Features
VM Reconfiguration (v0.2.3+)
The vSphere provider supports online VM reconfiguration for CPU, memory, and disk resources:
# Reconfigure VM resources
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: web-server
spec:
vmClassRef: medium # Change from small to medium
powerState: "On"
Capabilities:
- Online CPU Changes: Hot-add CPUs to running VMs (requires guest OS support)
- Online Memory Changes: Hot-add memory to running VMs (requires guest OS support)
- Disk Resizing: Expand disks online (shrinking not supported for safety)
- Automatic Fallback: Falls back to offline changes if hot-add not supported
- Intelligent Detection: Only applies changes when needed
Memory Format Support:
- Standard units:
2Gi,4096Mi,2048MiB,2GiB - Parser handles multiple memory unit formats
Limitations:
- Disk shrinking prevented to avoid data loss
- Some guest operating systems require special configuration for hot-add
- BIOS firmware VMs have limited hot-add support (use EFI firmware)
VM Cloning (v0.2.3+)
Create full or linked clones of existing VMs and templates:
# Clone from existing VM
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: web-server-02
spec:
vmClassRef: small
vmImageRef: web-server-01 # Source VM
cloneType: linked # or "full"
Clone Types:
- Full Clone: Independent copy with separate storage
- Linked Clone: Space-efficient copy using snapshots
- Automatically creates snapshot if none exists
- Requires less storage and faster creation
- Parent VM must remain available
Use Cases:
- Rapid test environment provisioning
- Development environment duplication
- Template-based deployments
- Disaster recovery scenarios
Task Status Tracking (v0.2.3+)
Monitor asynchronous vSphere operations in real-time:
# VirtRigaud automatically tracks long-running operations
# No manual configuration needed
# Task tracking provides:
# - Real-time task state (queued, running, success, error)
# - Progress percentage
# - Error messages for failed tasks
# - Integration with vSphere task manager
Features:
- Automatic tracking of all async operations
- Progress monitoring via govmomi task manager
- Detailed error reporting
- Task history visibility in vCenter
Console Access (v0.2.3+)
Generate direct vSphere web client console URLs:
# Access provided in VM status
kubectl get vm web-server -o yaml
status:
consolURL: "https://vcenter.example.com/ui/app/vm;nav=h/urn:vmomi:VirtualMachine:vm-123:xxxxx/summary"
phase: Running
Features:
- Direct browser-based VM console access
- No additional tools required
- Works with vSphere web client
- Includes VM instance UUID for reliable identification
- Generated automatically in Describe operations
Template Management
Creating Templates
# Convert existing VM to template
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMTemplate
metadata:
name: create-ubuntu-template
spec:
sourceVM: "ubuntu-base-vm"
datacenter: "Production"
targetFolder: "/vm/templates"
templateName: "ubuntu-22.04-template"
# Template metadata
annotation: |
Ubuntu 22.04 LTS Template
Created: 2024-01-15
Includes: cloud-init, open-vm-tools
# Template customization
powerOff: true # Power off before conversion
removeSnapshots: true # Clean up snapshots
updateTools: true # Update VMware tools
Content Library Integration
# Deploy from content library
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMImage
metadata:
name: centos-stream-9
spec:
source:
contentLibrary: "OS Templates"
item: "CentOS-Stream-9"
datacenter: "Production"
# Content library item properties
properties:
version: "9.0"
provider: "CentOS"
osType: "linux"
Storage Policies
# VMClass with vSAN storage policy
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: high-performance
spec:
cpus: 8
memory: "32Gi"
spec:
storage:
# vSAN storage policies
homePolicy: "VM Storage Policy - Performance" # VM home/config files
diskPolicy: "VM Storage Policy - SSD Only" # Virtual disks
swapPolicy: "VM Storage Policy - Standard" # Swap files
# Traditional storage
datastoreCluster: "DatastoreCluster-SSD" # Datastore cluster
antiAffinityRules: true # VM anti-affinity
Network Advanced Configuration
# Advanced networking with distributed switches
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMNetworkAttachment
metadata:
name: advanced-networking
spec:
networks:
# Distributed port group
- name: frontend
portGroup: "DPG-Frontend-VLAN100"
distributedSwitch: "DSwitch-Production"
vlan: 100
# NSX-T logical switch
- name: backend
portGroup: "LS-Backend-App"
nsx: true
securityPolicy: "Backend-Security-Policy"
# SR-IOV for high performance
- name: storage
portGroup: "DPG-Storage-VLAN200"
sriov: true
bandwidth:
reservation: 1000 # Mbps
limit: 10000 # Mbps
shares: 100 # Priority
High Availability
# VM with HA/DRS settings
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: critical-application
spec:
providerRef:
name: vsphere-prod
# ... other config ...
# High availability configuration
availability:
# HA restart priority
restartPriority: "high" # disabled, low, medium, high
isolationResponse: "powerOff" # none, powerOff, shutdown
vmMonitoring: "vmMonitoringOnly" # vmMonitoringDisabled, vmMonitoringOnly, vmAndAppMonitoring
# DRS configuration
drsAutomationLevel: "fullyAutomated" # manual, partiallyAutomated, fullyAutomated
drsVmBehavior: "fullyAutomated" # manual, partiallyAutomated, fullyAutomated
# Anti-affinity rules
antiAffinityGroups: ["web-tier", "database-tier"]
# Host affinity (pin to specific hosts)
hostAffinityGroups: ["production-hosts"]
Snapshot Management
# Advanced snapshot configuration
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMSnapshot
metadata:
name: pre-upgrade-snapshot
spec:
vmRef:
name: web-application
# Snapshot settings
name: "Pre-upgrade snapshot"
description: "Snapshot before application upgrade"
memory: true # Include memory state
quiesce: true # Quiesce guest filesystem
# Retention policy
retention:
maxSnapshots: 3 # Keep max 3 snapshots
maxAge: "7d" # Delete after 7 days
# Schedule (optional)
schedule: "0 2 * * 0" # Weekly at 2 AM Sunday
Troubleshooting
Common Issues
β Connection Failed
Symptom: failed to connect to vSphere: connection refused
Causes & Solutions:
-
Network connectivity:
# Test connectivity to vCenter telnet vcenter.example.com 443 # Test from Kubernetes pod kubectl run debug --rm -i --tty --image=curlimages/curl -- \ curl -k https://vcenter.example.com -
DNS resolution:
# Test DNS resolution nslookup vcenter.example.com # Use IP address if DNS fails -
Firewall rules: Ensure port 443 is accessible from Kubernetes cluster
β Authentication Failed
Symptom: Login failed: incorrect user name or password
Solutions:
-
Verify credentials:
# Test credentials manually kubectl get secret vsphere-credentials -o yaml # Decode and verify echo "base64-password" | base64 -d -
Check user permissions:
- Verify user exists in vCenter
- Check assigned roles and privileges
- Ensure user is not locked out
-
Test login via vSphere Client: Verify credentials work in the GUI
β Insufficient Privileges
Symptom: operation requires privilege 'VirtualMachine.Interact.PowerOn'
Solution: Grant required privileges to the service account:
# Required privileges for VirtRigaud:
# - Datastore privileges:
# * Datastore.AllocateSpace
# * Datastore.Browse
# * Datastore.FileManagement
# - Network privileges:
# * Network.Assign
# - Resource privileges:
# * Resource.AssignVMToPool
# - Virtual machine privileges:
# * VirtualMachine.* (all) or specific subset
# - Global privileges:
# * Global.EnableMethods
# * Global.DisableMethods
β Template Not Found
Symptom: template 'ubuntu-template' not found
Solutions:
# List available templates
govc ls /datacenter/vm/templates/
# Check template path and permissions
govc object.collect -s vm/templates/ubuntu-template summary.config.name
# Verify template is properly marked as template
govc object.collect -s vm/templates/ubuntu-template config.template
β Datastore Issues
Symptom: insufficient disk space or datastore not accessible
Solutions:
# Check datastore capacity
govc datastore.info datastore-name
# List accessible datastores
govc datastore.ls
# Check datastore cluster configuration
govc cluster.ls
β Network Configuration
Symptom: network 'VM Network' not found
Solutions:
# List available networks
govc ls /datacenter/network/
# Check distributed port groups
govc dvs.portgroup.info
# Verify network accessibility from cluster
govc cluster.network.info
Validation Commands
Test your vSphere setup before deploying:
# 1. Install and configure govc CLI tool
export GOVC_URL='https://vcenter.example.com'
export GOVC_USERNAME='administrator@vsphere.local'
export GOVC_PASSWORD='password'
export GOVC_INSECURE=1 # for self-signed certificates
# 2. Test connectivity
govc about
# 3. List datacenters
govc ls
# 4. List clusters and hosts
govc ls /datacenter/host/
# 5. List datastores
govc ls /datacenter/datastore/
# 6. List networks
govc ls /datacenter/network/
# 7. List templates
govc ls /datacenter/vm/templates/
# 8. Test VM creation (dry run)
govc vm.create -c 1 -m 1024 -g ubuntu64Guest -net "VM Network" test-vm
govc vm.destroy test-vm
Debug Logging
Enable verbose logging for the vSphere provider:
providers:
vsphere:
env:
- name: LOG_LEVEL
value: "debug"
- name: GOVMOMI_DEBUG
value: "true"
endpoint: "https://vcenter.example.com"
Monitor vSphere tasks:
# Monitor recent tasks in vCenter
govc task.ls
# Get details of specific task
govc task.info task-123
Performance Optimization
Resource Allocation
# High-performance VMClass
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: performance-optimized
spec:
cpus: 16
memory: "64Gi"
spec:
cpu:
coresPerSocket: 8 # Match physical CPU topology
reservationMHz: 8000 # Guarantee CPU resources
shares: 2000 # High priority (normal=1000)
enableVirtualization: false # Disable if not needed for performance
memory:
reservationMB: 65536 # Guarantee memory
shares: 2000 # High priority
shareLevel: "high" # Alternative to shares value
hardware:
enableCpuHotAdd: false # Better performance when disabled
enableMemoryHotAdd: false # Better performance when disabled
# NUMA configuration for large VMs
numa:
enabled: true
coresPerSocket: 8 # Align with NUMA topology
Storage Optimization
# Storage-optimized configuration
spec:
storage:
diskFormat: "eagerZeroedThick" # Best performance, more space usage
controllerType: "pvscsi" # Paravirtual SCSI for better performance
multiwriter: false # Disable unless needed
# vSAN optimization
storagePolicy: "Performance-Tier"
cachingPolicy: "writethrough" # or "writeback" for better performance
# Multiple controllers for high IOPS
scsiControllers:
- type: "pvscsi"
busNumber: 0
maxDevices: 15
- type: "pvscsi"
busNumber: 1
maxDevices: 15
Network Optimization
# High-performance networking
networks:
- name: high-performance
portGroup: "DPG-HighPerf-SR-IOV"
adapter: "vmxnet3" # Best performance adapter
sriov: true # SR-IOV for near-native performance
bandwidth:
reservation: 1000 # Guaranteed bandwidth (Mbps)
limit: 10000 # Maximum bandwidth (Mbps)
shares: 100 # Priority level
API Reference
For complete API reference, see the Provider API Documentation.
Contributing
To contribute to the vSphere provider:
- See the Provider Development Guide
- Check the GitHub repository
- Review open issues
Support
- Documentation: VirtRigaud Docs
- Issues: GitHub Issues
- Community: Discord
- VMware: vSphere API Documentation
- govc: govc CLI Tool
LibVirt/KVM Provider
The LibVirt provider enables VirtRigaud to manage virtual machines on KVM/QEMU hypervisors using the LibVirt API. This provider runs as a dedicated pod that communicates with LibVirt daemons locally or remotely, making it ideal for development, on-premises deployments, and cloud environments.
Overview
This provider implements the VirtRigaud provider interface to manage VM lifecycle operations on LibVirt/KVM:
- Create: Create VMs from cloud images with comprehensive cloud-init support
- Delete: Remove VMs and associated storage volumes (with cleanup)
- Power: Start, stop, and reboot virtual machines
- Describe: Query VM state, resource usage, guest agent information, and network details
- Reconfigure: Modify VM resources (v0.2.3+ - requires VM restart)
- Clone: Create new VMs based on existing VM configurations
- Snapshot: Create, delete, and revert VM snapshots (storage-dependent)
- ConsoleURL: Generate VNC console URLs for remote access (v0.2.3+)
- ImagePrepare: Download and prepare cloud images from URLs
- Storage Management: Advanced storage pool and volume operations
- Cloud-Init: Full NoCloud datasource support with ISO generation
- QEMU Guest Agent: Integration for enhanced guest OS monitoring
- Network Configuration: Support for various network types and bridges
Prerequisites
The LibVirt provider connects to a LibVirt daemon (libvirtd) which can run locally or remotely. This makes it flexible for both development and production environments.
Connection Options:
- Local LibVirt: Connects to local libvirtd via
qemu:///system(ideal for development) - Remote LibVirt: Connects to remote libvirtd over SSH/TLS (production)
- Container LibVirt: Works with containerized libvirt or KubeVirt
Requirements:
- LibVirt daemon (libvirtd) running locally or accessible remotely
- KVM/QEMU hypervisor support (hardware virtualization recommended)
- Storage pools configured for VM disk storage
- Network bridges or interfaces for VM networking
- Appropriate permissions for VM management operations
Development Setup:
For local development, you can:
- Linux: Install
libvirt-daemon-systemandqemu-kvmpackages - macOS/Windows: Use remote LibVirt or nested virtualization
- Testing: The provider can connect to local libvirtd without complex infrastructure
Authentication & Connection
The LibVirt provider supports multiple connection methods:
Local LibVirt Connection
For connecting to a LibVirt daemon on the same host as the provider pod:
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: libvirt-local
namespace: default
spec:
type: libvirt
endpoint: "qemu:///system" # Local system connection
credentialSecretRef:
name: libvirt-local-credentials
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-libvirt:v0.2.3"
service:
port: 9090
Note: When using local connections, ensure the provider pod has appropriate permissions to access the LibVirt socket.
Remote Connection with SSH
For remote LibVirt over SSH:
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: libvirt-remote
namespace: default
spec:
type: libvirt
endpoint: "qemu+ssh://user@libvirt-host/system"
credentialSecretRef:
name: libvirt-ssh-credentials
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-libvirt:v0.2.3"
service:
port: 9090
Create SSH credentials secret:
apiVersion: v1
kind: Secret
metadata:
name: libvirt-ssh-credentials
namespace: default
type: Opaque
stringData:
username: "libvirt-user"
# For key-based auth (recommended):
tls.key: |
-----BEGIN PRIVATE KEY-----
# Your SSH private key here
-----END PRIVATE KEY-----
# For password auth (less secure):
password: "your-password"
Remote Connection with TLS
For remote LibVirt over TLS:
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: libvirt-tls
namespace: default
spec:
type: libvirt
endpoint: "qemu+tls://libvirt-host:16514/system"
credentialSecretRef:
name: libvirt-tls-credentials
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-libvirt:v0.2.3"
service:
port: 9090
Create TLS credentials secret:
apiVersion: v1
kind: Secret
metadata:
name: libvirt-tls-credentials
namespace: default
type: kubernetes.io/tls
data:
tls.crt: # Base64 encoded client certificate
tls.key: # Base64 encoded client private key
ca.crt: # Base64 encoded CA certificate
Configuration
Connection URIs
The LibVirt provider supports standard LibVirt connection URIs:
| URI Format | Description | Use Case |
|---|---|---|
qemu:///system | Local system connection | Development, single-host |
qemu+ssh://user@host/system | SSH connection | Remote access with SSH |
qemu+tls://host:16514/system | TLS connection | Secure remote access |
qemu+tcp://host:16509/system | TCP connection | Insecure remote (testing only) |
β οΈ Note: All LibVirt URI schemes are now supported in the CRD validation pattern.
Deployment Configuration
Using Helm Values
# values.yaml
providers:
libvirt:
enabled: true
endpoint: "qemu:///system" # Adjust for your environment
# For remote connections:
# endpoint: "qemu+ssh://user@libvirt-host/system"
credentialSecretRef:
name: libvirt-credentials # Optional for local connections
Development Configuration
# For local development with LibVirt
providers:
libvirt:
enabled: true
endpoint: "qemu:///system"
runtime:
# Mount host libvirt socket (for local access)
volumes:
- name: libvirt-sock
hostPath:
path: /var/run/libvirt/libvirt-sock
volumeMounts:
- name: libvirt-sock
mountPath: /var/run/libvirt/libvirt-sock
Production Configuration
# For production with remote LibVirt
apiVersion: v1
kind: Secret
metadata:
name: libvirt-credentials
namespace: virtrigaud-system
type: Opaque
stringData:
username: "virtrigaud-service"
tls.crt: |
-----BEGIN CERTIFICATE-----
# Client certificate for TLS authentication
-----END CERTIFICATE-----
tls.key: |
-----BEGIN PRIVATE KEY-----
# Client private key
-----END PRIVATE KEY-----
ca.crt: |
-----BEGIN CERTIFICATE-----
# CA certificate
-----END CERTIFICATE-----
---
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: libvirt-production
namespace: virtrigaud-system
spec:
type: libvirt
endpoint: "qemu+tls://libvirt.example.com:16514/system"
credentialSecretRef:
name: libvirt-credentials
Storage Configuration
Storage Pools
LibVirt requires storage pools for VM disks. Common configurations:
# Create directory-based storage pool
virsh pool-define-as default dir --target /var/lib/libvirt/images
virsh pool-build default
virsh pool-start default
virsh pool-autostart default
# Create LVM-based storage pool (performance)
virsh pool-define-as lvm-pool logical --source-name vg-libvirt --target /dev/vg-libvirt
virsh pool-start lvm-pool
virsh pool-autostart lvm-pool
VMClass Storage Specification
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: standard
spec:
cpus: 2
memory: "4Gi"
# LibVirt-specific storage settings
spec:
storage:
pool: "default" # Storage pool name
format: "qcow2" # Disk format (qcow2, raw)
cache: "writethrough" # Cache mode
io: "threads" # I/O mode
Network Configuration
Network Setup
Configure LibVirt networks for VM connectivity:
# Create NAT network (default)
virsh net-define /usr/share/libvirt/networks/default.xml
virsh net-start default
virsh net-autostart default
# Create bridge network (for external access)
cat > /tmp/bridge-network.xml << EOF
<network>
<name>br0</name>
<forward mode='bridge'/>
<bridge name='br0'/>
</network>
EOF
virsh net-define /tmp/bridge-network.xml
virsh net-start br0
Network Bridge Mapping
| Network Name | LibVirt Network | Use Case |
|---|---|---|
default, nat | default | NAT networking |
bridge, br0 | br0 | Bridged networking |
isolated | isolated | Host-only networking |
VM Network Configuration
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: web-server
spec:
providerRef:
name: libvirt-local
networks:
# Use default NAT network
- name: default
# Use bridged network for external access
- name: bridge
bridge: br0
mac: "52:54:00:12:34:56" # Optional MAC address
VM Configuration
VMClass Specification
Define hardware resources and LibVirt-specific settings:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: development
spec:
cpus: 2
memory: "4Gi"
# LibVirt-specific configuration
spec:
machine: "pc-i440fx-2.12" # Machine type
cpu:
mode: "host-model" # CPU mode (host-model, host-passthrough)
topology:
sockets: 1
cores: 2
threads: 1
features:
acpi: true
apic: true
pae: true
clock:
offset: "utc"
timers:
rtc: "catchup"
pit: "delay"
hpet: false
VMImage Specification
Reference existing disk images or templates:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMImage
metadata:
name: ubuntu-22-04
spec:
source:
# Path to existing image in storage pool
disk: "/var/lib/libvirt/images/ubuntu-22.04-base.qcow2"
# Or reference by pool and volume
# pool: "default"
# volume: "ubuntu-22.04-base"
format: "qcow2"
# Cloud-init preparation
cloudInit:
enabled: true
userDataTemplate: |
#cloud-config
hostname: {{"{{ .Name }}"}}
users:
- name: ubuntu
sudo: ALL=(ALL) NOPASSWD:ALL
ssh_authorized_keys:
- {{"{{ .SSHPublicKey }}"}}
Complete VM Example
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: dev-workstation
spec:
providerRef:
name: libvirt-local
classRef:
name: development
imageRef:
name: ubuntu-22-04
powerState: On
# Disk configuration
disks:
- name: root
size: "50Gi"
storageClass: "fast-ssd" # Maps to LibVirt storage pool
# Network configuration
networks:
- name: default # NAT network for internet
- name: bridge # Bridge for LAN access
staticIP:
address: "192.168.1.100/24"
gateway: "192.168.1.1"
dns: ["8.8.8.8", "1.1.1.1"]
# Cloud-init user data
userData:
cloudInit:
inline: |
#cloud-config
hostname: dev-workstation
users:
- name: developer
sudo: ALL=(ALL) NOPASSWD:ALL
shell: /bin/bash
ssh_authorized_keys:
- "ssh-ed25519 AAAA..."
packages:
- build-essential
- docker.io
- code
runcmd:
- systemctl enable docker
- usermod -aG docker developer
Cloud-Init Integration
Automatic Configuration
The LibVirt provider automatically handles cloud-init setup:
- ISO Generation: Creates cloud-init ISO with user-data and meta-data
- Attachment: Attaches ISO as CD-ROM device to VM
- Network Config: Generates network configuration from VM spec
- User Data: Renders templates with VM-specific values
Advanced Cloud-Init
userData:
cloudInit:
inline: |
#cloud-config
hostname: {{"{{ .Name }}"}}
# Network configuration (if not using DHCP)
network:
version: 2
ethernets:
ens3:
addresses: [192.168.1.100/24]
gateway4: 192.168.1.1
nameservers:
addresses: [8.8.8.8, 1.1.1.1]
# Storage configuration
disk_setup:
/dev/vdb:
table_type: gpt
layout: true
fs_setup:
- device: /dev/vdb1
filesystem: ext4
label: data
mounts:
- [/dev/vdb1, /data, ext4, defaults]
# Package installation
packages:
- qemu-guest-agent # Enable guest agent
- cloud-init
- curl
# Enable services
runcmd:
- systemctl enable qemu-guest-agent
- systemctl start qemu-guest-agent
Performance Optimization
KVM Optimization
# VMClass with performance optimizations
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: high-performance
spec:
cpus: 8
memory: "16Gi"
spec:
cpu:
mode: "host-passthrough" # Best performance
topology:
sockets: 1
cores: 8
threads: 1
# NUMA topology for large VMs
numa:
cells:
- id: 0
cpus: "0-7"
memory: "16"
# Virtio devices for performance
devices:
disk:
bus: "virtio"
cache: "none"
io: "native"
network:
model: "virtio"
video:
model: "virtio"
Storage Performance
# Create high-performance storage pool
virsh pool-define-as ssd-pool logical --source-name vg-ssd --target /dev/vg-ssd
virsh pool-start ssd-pool
# Use raw format for better performance (larger disk usage)
virsh vol-create-as ssd-pool vm-disk 100G --format raw
# Enable native AIO and disable cache for direct I/O
# (configured automatically by provider based on VMClass)
Troubleshooting
Common Issues
β Connection Failed
Symptom: failed to connect to Libvirt: <error>
Causes & Solutions:
-
Local connection issues:
# Check libvirtd status sudo systemctl status libvirtd # Start if not running sudo systemctl start libvirtd sudo systemctl enable libvirtd # Test connection virsh -c qemu:///system list -
Remote SSH connection:
# Test SSH connectivity ssh user@libvirt-host virsh list # Check SSH key permissions chmod 600 ~/.ssh/id_rsa -
Remote TLS connection:
# Verify certificates openssl x509 -in client-cert.pem -text -noout # Test TLS connection virsh -c qemu+tls://host:16514/system list
β Permission Denied
Symptom: authentication failed or permission denied
Solutions:
# Add user to libvirt group
sudo usermod -a -G libvirt $USER
# Check libvirt group membership
groups $USER
# Verify permissions on libvirt socket
ls -la /var/run/libvirt/libvirt-sock
# For containerized providers, ensure socket is mounted
β Storage Pool Not Found
Symptom: storage pool 'default' not found
Solution:
# List available pools
virsh pool-list --all
# Create default pool if missing
virsh pool-define-as default dir --target /var/lib/libvirt/images
virsh pool-build default
virsh pool-start default
virsh pool-autostart default
# Verify pool is active
virsh pool-info default
β Network Not Available
Symptom: network 'default' not found
Solution:
# List networks
virsh net-list --all
# Start default network
virsh net-start default
virsh net-autostart default
# Create bridge network if needed
virsh net-define /usr/share/libvirt/networks/default.xml
β KVM Not Available
Symptom: KVM is not available or hardware acceleration not available
Solutions:
-
Check virtualization support:
# Check CPU virtualization features egrep -c '(vmx|svm)' /proc/cpuinfo # Check KVM modules lsmod | grep kvm # Load KVM modules if missing sudo modprobe kvm sudo modprobe kvm_intel # or kvm_amd -
BIOS/UEFI settings: Enable Intel VT-x or AMD-V
-
Nested virtualization: If running in a VM, enable nested virtualization
Validation Commands
Test your LibVirt setup before deploying:
# 1. Test LibVirt connection
virsh -c qemu:///system list
# 2. Check storage pools
virsh pool-list --all
# 3. Check networks
virsh net-list --all
# 4. Test VM creation (simple test)
virt-install --name test-vm --memory 512 --vcpus 1 \
--disk size=1 --network network=default \
--boot cdrom --noautoconsole --dry-run
# 5. From within Kubernetes pod
kubectl run debug --rm -i --tty --image=ubuntu:22.04 -- bash
# Then test virsh commands if socket is mounted
Debug Logging
Enable verbose logging for the LibVirt provider:
providers:
libvirt:
env:
- name: LOG_LEVEL
value: "debug"
- name: LIBVIRT_DEBUG
value: "1"
endpoint: "qemu:///system"
Advanced Features
VM Reconfiguration (v0.2.3+)
The Libvirt provider supports VM reconfiguration for CPU, memory, and disk resources:
# Reconfigure VM resources
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: web-server
spec:
vmClassRef: medium # Change from small to medium
powerState: "On"
Capabilities:
- Online CPU Changes: Modify CPU count using
virsh setvcpus --livefor running VMs - Online Memory Changes: Modify memory using
virsh setmem --livefor running VMs - Disk Resizing: Expand disk volumes via storage provider integration
- Offline Configuration: Updates persistent config for stopped VMs via
--configflag
Important Notes:
- Most changes require VM restart for full effect
- Online changes apply to running VM but may need restart for persistence
- Disk shrinking not supported for safety
- Memory format parsing supports bytes, KiB, MiB, GiB
Implementation Details:
- Uses
virsh setvcpus --live --configfor CPU changes - Uses
virsh setmem --live --configfor memory changes - Parses current VM configuration with
virsh dominfo - Integrates with storage provider for volume resizing
VNC Console Access (v0.2.3+)
Generate VNC console URLs for direct VM access:
# Access provided in VM status
kubectl get vm web-server -o yaml
status:
consoleURL: "vnc://libvirt-host.example.com:5900"
phase: Running
Features:
- Automatic VNC port extraction from domain XML
- Direct connection URLs for VNC clients
- Support for standard VNC viewers (TigerVNC, RealVNC, etc.)
- Web-based VNC viewers compatible (noVNC)
VNC Client Usage:
# Using vncviewer
vncviewer libvirt-host.example.com:5900
# Using TigerVNC
tigervnc libvirt-host.example.com:5900
# Web browser (with noVNC)
# Access through web-based VNC proxy
Configuration: VNC is automatically configured during VM creation. The provider:
- Extracts VNC configuration from domain XML using
virsh dumpxml - Parses the graphics port number
- Constructs the VNC URL with host and port
- Returns URL in Describe operations
Advanced Configuration
High Availability Setup
# Multiple LibVirt hosts for HA
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: libvirt-cluster
spec:
type: libvirt
# Use load balancer or failover endpoint
endpoint: "qemu+tls://libvirt-cluster.example.com:16514/system"
runtime:
replicas: 2 # Multiple provider instances
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: libvirt-provider
topologyKey: kubernetes.io/hostname
GPU Passthrough
# VMClass with GPU passthrough
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: gpu-workstation
spec:
cpus: 8
memory: "32Gi"
spec:
devices:
hostdev:
- type: "pci"
source:
address:
domain: "0x0000"
bus: "0x01"
slot: "0x00"
function: "0x0"
managed: true
API Reference
For complete API reference, see the Provider API Documentation.
Contributing
To contribute to the LibVirt provider:
- See the Provider Development Guide
- Check the GitHub repository
- Review open issues
Support
- Documentation: VirtRigaud Docs
- Issues: GitHub Issues
- Community: Discord
- LibVirt: libvirt.org
Proxmox VE Provider
The Proxmox VE provider enables VirtRigaud to manage virtual machines on Proxmox Virtual Environment (PVE) clusters using the native Proxmox API.
Overview
This provider implements the VirtRigaud provider interface to manage VM lifecycle operations on Proxmox VE:
- Create: Create VMs from templates or ISO images with cloud-init support
- Delete: Remove VMs and associated resources
- Power: Start, stop, and reboot virtual machines
- Describe: Query VM state, IPs, and console access
- Guest Agent Integration: Enhanced IP detection via QEMU guest agent (v0.2.3+)
- Reconfigure: Hot-plug CPU/memory changes, disk expansion
- Clone: Create linked or full clones of existing VMs
- Snapshot: Create, delete, and revert VM snapshots with memory state
- ImagePrepare: Import and prepare VM templates from URLs or ensure existence
Prerequisites
β οΈ IMPORTANT: Active Proxmox VE Server Required
The Proxmox provider requires a running Proxmox VE server to function. Unlike some providers that can operate in simulation mode, this provider performs actual API calls to Proxmox VE during startup and operation.
Requirements:
- Proxmox VE 7.0 or later (running and accessible)
- API token or user account with appropriate privileges
- Network connectivity from VirtRigaud to Proxmox API (port 8006/HTTPS)
- Valid TLS configuration (production) or skip verification (development)
Testing/Development:
If you donβt have a Proxmox VE server available:
- Use Proxmox VE in a VM for testing
- Consider alternative providers (libvirt, vSphere) for local development
- The provider will fail startup validation without a reachable Proxmox endpoint
Authentication
The Proxmox provider supports two authentication methods:
API Token Authentication (Recommended)
API tokens provide secure, scope-limited access without exposing user passwords.
-
Create API Token in Proxmox:
# In Proxmox web UI: Datacenter -> Permissions -> API Tokens # Or via CLI: pveum user token add <USER@REALM> <TOKENID> --privsep 0 -
Configure Provider:
apiVersion: infra.virtrigaud.io/v1beta1 kind: Provider metadata: name: proxmox-prod namespace: default spec: type: proxmox endpoint: https://pve.example.com:8006 credentialSecretRef: name: pve-credentials runtime: mode: Remote image: "ghcr.io/projectbeskar/virtrigaud/provider-proxmox:v0.2.3" service: port: 9090 -
Create Credentials Secret:
apiVersion: v1 kind: Secret metadata: name: pve-credentials namespace: default type: Opaque stringData: token_id: "virtrigaud@pve!vrtg-token" token_secret: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Session Cookie Authentication (Optional)
For environments that cannot use API tokens:
apiVersion: v1
kind: Secret
metadata:
name: pve-credentials
namespace: default
type: Opaque
stringData:
username: "virtrigaud@pve"
password: "secure-password"
Deployment Configuration
Required Environment Variables
The Proxmox provider requires environment variables to connect to your Proxmox VE server. Configure these variables in your Helm values file:
| Variable | Required | Description | Example |
|---|---|---|---|
PVE_ENDPOINT | β Yes | Proxmox VE API endpoint URL | https://pve.example.com:8006 |
PVE_USERNAME | β Yes* | Username for password auth | root@pam or user@realm |
PVE_PASSWORD | β Yes* | Password for username | secure-password |
PVE_TOKEN_ID | β Yes** | API token ID (alternative) | user@realm!tokenid |
PVE_TOKEN_SECRET | β Yes** | API token secret (alternative) | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
PVE_INSECURE_SKIP_VERIFY | π΅ Optional | Skip TLS verification | true (dev only) |
* Either username/password OR token authentication is required
** API token authentication is recommended for production
Helm Configuration Examples
Username/Password Authentication
# values.yaml
providers:
proxmox:
enabled: true
env:
- name: PVE_ENDPOINT
value: "https://your-proxmox-server.example.com:8006"
- name: PVE_USERNAME
value: "root@pam"
- name: PVE_PASSWORD
value: "your-secure-password"
API Token Authentication (Recommended)
# values.yaml
providers:
proxmox:
enabled: true
env:
- name: PVE_ENDPOINT
value: "https://your-proxmox-server.example.com:8006"
- name: PVE_TOKEN_ID
value: "virtrigaud@pve!automation"
- name: PVE_TOKEN_SECRET
value: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
Using Kubernetes Secrets (Production)
For production environments, use Kubernetes secrets:
# Create secret first
apiVersion: v1
kind: Secret
metadata:
name: proxmox-credentials
type: Opaque
stringData:
PVE_ENDPOINT: "https://your-proxmox-server.example.com:8006"
PVE_TOKEN_ID: "virtrigaud@pve!automation"
PVE_TOKEN_SECRET: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
---
# values.yaml - Reference the secret
providers:
proxmox:
enabled: true
env:
- name: PVE_ENDPOINT
valueFrom:
secretKeyRef:
name: proxmox-credentials
key: PVE_ENDPOINT
- name: PVE_TOKEN_ID
valueFrom:
secretKeyRef:
name: proxmox-credentials
key: PVE_TOKEN_ID
- name: PVE_TOKEN_SECRET
valueFrom:
secretKeyRef:
name: proxmox-credentials
key: PVE_TOKEN_SECRET
Configuration Validation
The provider validates configuration at startup and will fail to start if:
- β
PVE_ENDPOINTis missing or invalid - β Neither username/password nor token credentials are provided
- β Proxmox server is unreachable
- β Authentication fails
Error Examples
# Missing endpoint
ERROR Failed to create PVE client error="endpoint is required"
# Invalid endpoint format
ERROR Failed to create PVE client error="invalid endpoint URL"
# Authentication failure
ERROR Failed to authenticate error="authentication failed: invalid credentials"
# Connection failure
ERROR Failed to connect error="dial tcp: no route to host"
Development vs Production
| Environment | Endpoint | Authentication | TLS | Notes |
|---|---|---|---|---|
| Development | https://pve-test.local:8006 | Username/Password | Skip verify | Use PVE_INSECURE_SKIP_VERIFY=true |
| Staging | https://pve-staging.company.com:8006 | API Token | Custom CA | Configure CA bundle |
| Production | https://pve.company.com:8006 | API Token | Valid cert | Use Kubernetes secrets |
TLS Configuration
Self-Signed Certificates (Development)
For test environments with self-signed certificates:
spec:
runtime:
env:
- name: PVE_INSECURE_SKIP_VERIFY
value: "true"
Custom CA Certificate (Production)
For production with custom CA:
apiVersion: v1
kind: Secret
metadata:
name: pve-credentials
type: Opaque
stringData:
ca.crt: |
-----BEGIN CERTIFICATE-----
MIIDXTCCAkWgAwIBAgIJAL...
-----END CERTIFICATE-----
Reconfiguration Support
Online Reconfiguration
The Proxmox provider supports online (hot-plug) reconfiguration for:
- CPU: Add/remove vCPUs while VM is running (guest OS support required)
- Memory: Increase memory using balloon driver (guest tools required)
- Disk Expansion: Expand disks online (disk shrinking not supported)
Reconfigure Matrix
| Operation | Online Support | Requirements | Notes |
|---|---|---|---|
| CPU increase | β Yes | Guest OS support | Most modern Linux/Windows |
| CPU decrease | β Yes | Guest OS support | May require guest cooperation |
| Memory increase | β Yes | Balloon driver | Install qemu-guest-agent |
| Memory decrease | β οΈ Limited | Balloon driver + guest | May require power cycle |
| Disk expand | β Yes | Online resize support | Filesystem resize separate |
| Disk shrink | β No | Not supported | Security/data protection |
Example Reconfiguration
# Scale up VM resources
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: web-server
spec:
# ... existing spec ...
classRef:
name: large # Changed from 'small'
---
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: large
spec:
cpus: 8 # Increased from 2
memory: "16Gi" # Increased from 4Gi
Snapshot Management
Snapshot Features
- Memory Snapshots: Include VM memory state for consistent restore
- Crash-Consistent: Without memory for faster snapshots
- Snapshot Trees: Nested snapshots with parent-child relationships
- Metadata: Description and timestamp tracking
Snapshot Operations
# Create snapshot with memory
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMSnapshot
metadata:
name: before-upgrade
spec:
vmRef:
name: web-server
description: "Pre-maintenance snapshot"
includeMemory: true # Include running memory state
# Create snapshot via kubectl
kubectl create vmsnapshot before-upgrade \
--vm=web-server \
--description="Before major upgrade" \
--include-memory=true
Multi-NIC Networking
Network Configuration
The provider supports multiple network interfaces with:
- Bridge Assignment: Map to Proxmox bridges (vmbr0, vmbr1, etc.)
- VLAN Tagging: 802.1Q VLAN support
- Static IPs: Cloud-init integration for network configuration
- MAC Addresses: Custom MAC assignment
Example Multi-NIC VM
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: multi-nic-vm
spec:
providerRef:
name: proxmox-prod
classRef:
name: medium
imageRef:
name: ubuntu-22
networks:
# Primary LAN interface
- name: lan
bridge: vmbr0
staticIP:
address: "192.168.1.100/24"
gateway: "192.168.1.1"
dns: ["8.8.8.8", "1.1.1.1"]
# DMZ interface with VLAN
- name: dmz
bridge: vmbr1
vlan: 100
staticIP:
address: "10.0.100.50/24"
# Management interface
- name: mgmt
bridge: vmbr2
mac: "02:00:00:aa:bb:cc"
Network Bridge Mapping
| Network Name | Default Bridge | Use Case |
|---|---|---|
lan, default | vmbr0 | General LAN connectivity |
dmz | vmbr1 | DMZ/public services |
mgmt, management | vmbr2 | Management network |
vmbr* | Same name | Direct bridge reference |
Configuration
Required Environment Variables
β οΈ The provider requires environment variables to connect to Proxmox VE:
| Variable | Description | Required | Default | Example |
|---|---|---|---|---|
PVE_ENDPOINT | Proxmox API endpoint URL | Yes | - | https://pve.example.com:8006/api2 |
PVE_TOKEN_ID | API token identifier | Yes* | - | virtrigaud@pve!vrtg-token |
PVE_TOKEN_SECRET | API token secret | Yes* | - | xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
PVE_USERNAME | Username for session auth | Yes* | - | virtrigaud@pve |
PVE_PASSWORD | Password for session auth | Yes* | - | secure-password |
PVE_NODE_SELECTOR | Preferred nodes (comma-separated) | No | Auto-detect | pve-node-1,pve-node-2 |
PVE_INSECURE_SKIP_VERIFY | Skip TLS verification | No | false | true |
PVE_CA_BUNDLE | Custom CA certificate | No | - | -----BEGIN CERTIFICATE-----... |
* Either token (PVE_TOKEN_ID + PVE_TOKEN_SECRET) or username/password (PVE_USERNAME + PVE_PASSWORD) is required
Deployment Configuration
The provider needs environment variables to connect to Proxmox. Here are complete deployment examples:
Using Helm Values
# values.yaml
providers:
proxmox:
enabled: true
env:
- name: PVE_ENDPOINT
value: "https://pve.example.com:8006/api2"
- name: PVE_TOKEN_ID
value: "virtrigaud@pve!vrtg-token"
- name: PVE_TOKEN_SECRET
value: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
- name: PVE_INSECURE_SKIP_VERIFY
value: "true" # Only for development!
- name: PVE_NODE_SELECTOR
value: "pve-node-1,pve-node-2" # Optional
Using Kubernetes Secrets (Recommended)
# Create secret with credentials
apiVersion: v1
kind: Secret
metadata:
name: proxmox-credentials
namespace: virtrigaud-system
type: Opaque
stringData:
PVE_ENDPOINT: "https://pve.example.com:8006/api2"
PVE_TOKEN_ID: "virtrigaud@pve!vrtg-token"
PVE_TOKEN_SECRET: "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"
PVE_INSECURE_SKIP_VERIFY: "false"
---
# Reference secret in deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: virtrigaud-provider-proxmox
spec:
template:
spec:
containers:
- name: provider-proxmox
image: ghcr.io/projectbeskar/virtrigaud/provider-proxmox:v0.2.3
envFrom:
- secretRef:
name: proxmox-credentials
Development/Testing Configuration
# For development with a local Proxmox VE instance
providers:
proxmox:
enabled: true
env:
- name: PVE_ENDPOINT
value: "https://192.168.1.100:8006/api2"
- name: PVE_USERNAME
value: "root@pam"
- name: PVE_PASSWORD
value: "your-password"
- name: PVE_INSECURE_SKIP_VERIFY
value: "true"
Node Selection
The provider can be configured to prefer specific nodes:
env:
- name: PVE_NODE_SELECTOR
value: "pve-node-1,pve-node-2"
If not specified, the provider will automatically select nodes based on availability.
VM Configuration
VMClass Specification
Define CPU and memory resources:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: small
spec:
cpus: 2
memory: "4Gi"
# Proxmox-specific settings
spec:
machine: "q35"
bios: "uefi"
VMImage Specification
Reference Proxmox templates:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMImage
metadata:
name: ubuntu-22
spec:
source: "ubuntu-22-template" # Template name in Proxmox
# Or clone from existing VM:
# source: "9000" # VMID to clone from
VirtualMachine Example
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: web-server
spec:
providerRef:
name: proxmox-prod
classRef:
name: small
imageRef:
name: ubuntu-22
powerState: On
networks:
- name: lan
# Maps to Proxmox bridge or VLAN configuration
disks:
- name: root
size: "40Gi"
userData:
cloudInit:
inline: |
#cloud-config
hostname: web-server
users:
- name: ubuntu
ssh_authorized_keys:
- "ssh-ed25519 AAAA..."
packages:
- nginx
Cloud-Init Integration
The provider automatically configures cloud-init for supported VMs:
Automatic Configuration
- IDE2 Device: Attached as cloudinit drive
- User Data: Rendered from VirtualMachine spec
- Network Config: Generated from network specifications
- SSH Keys: Extracted from userData or secrets
Static IP Configuration
Configure static IPs using cloud-init:
userData:
cloudInit:
inline: |
#cloud-config
write_files:
- path: /etc/netplan/01-static.yaml
content: |
network:
version: 2
ethernets:
ens18:
addresses: [192.168.1.100/24]
gateway4: 192.168.1.1
nameservers:
addresses: [8.8.8.8, 1.1.1.1]
Or use Proxmox IP configuration:
# This would be handled by the provider internally
# when processing network specifications
Guest Agent Integration (v0.2.3+)
The Proxmox provider now integrates with the QEMU Guest Agent for enhanced VM monitoring:
IP Address Detection
When a VM is running, the provider automatically queries the QEMU guest agent to retrieve accurate IP addresses:
# IP addresses are automatically populated in VM status
kubectl get vm my-vm -o yaml
status:
phase: Running
ipAddresses:
- 192.168.1.100
- fd00::1234:5678:9abc:def0
Features
- Automatic IP Detection: Retrieves all network interface IPs from running VMs
- IPv4 and IPv6 Support: Reports both address families
- Smart Filtering: Excludes loopback (127.0.0.1, ::1) and link-local (169.254.x.x, fe80::) addresses
- Real-time Updates: Information updated during Describe operations
- Graceful Degradation: Falls back gracefully when guest agent is not available
Requirements
For guest agent integration to work, the VM must have:
-
QEMU Guest Agent Installed:
# Ubuntu/Debian apt-get install qemu-guest-agent # CentOS/RHEL yum install qemu-guest-agent # Enable and start the service systemctl enable --now qemu-guest-agent -
VM Configuration: Guest agent is automatically enabled during VM creation
Implementation Details
The provider:
- Checks if VM is in running state
- Makes API call to
/api2/json/nodes/{node}/qemu/{vmid}/agent/network-get-interfaces - Parses network interface details from guest agent response
- Filters out irrelevant addresses (loopback, link-local)
- Populates
status.ipAddressesfield
Troubleshooting
If IP addresses are not appearing:
- Verify guest agent is installed:
systemctl status qemu-guest-agent - Check Proxmox VM options:
qm config <vmid> | grep agent - Ensure VM has network connectivity
- Check provider logs for guest agent errors
Cloning Behavior
Linked Clones (Default)
Efficient space usage, faster creation:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClone
metadata:
name: web-clone
spec:
sourceVMRef:
name: template-vm
linkedClone: true # Default
Full Clones
Independent copies, slower creation:
spec:
linkedClone: false
Snapshots
Create and manage VM snapshots:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMSnapshot
metadata:
name: before-upgrade
spec:
vmRef:
name: web-server
description: "Snapshot before system upgrade"
Troubleshooting
Common Issues
Authentication Failures
Error: failed to connect to Proxmox VE: authentication failed
Solutions:
- Verify API token permissions
- Check token expiration
- Ensure user has VM.* privileges
TLS Certificate Errors
Error: x509: certificate signed by unknown authority
Solutions:
- Add custom CA certificate to credentials secret
- Use
PVE_INSECURE_SKIP_VERIFY=truefor testing - Verify certificate chain
VM Creation Failures
Error: create VM failed with status 400: storage 'local-lvm' does not exist
Solutions:
- Verify storage configuration in Proxmox
- Check node availability
- Ensure sufficient resources
Debug Logging
Enable debug logging for troubleshooting:
env:
- name: LOG_LEVEL
value: "debug"
Health Checks
Monitor provider health:
# Check provider pod logs
kubectl logs -n virtrigaud-system deployment/provider-proxmox
# Test connectivity
kubectl exec -n virtrigaud-system deployment/provider-proxmox -- \
curl -k https://pve.example.com:8006/api2/json/version
Performance Considerations
Resource Allocation
For production environments:
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
Concurrent Operations
The provider handles concurrent VM operations efficiently but consider:
- Node capacity limits
- Storage I/O constraints
- Network bandwidth
Task Polling
Task completion is polled every 2 seconds with a 5-minute timeout. These can be tuned via environment variables if needed.
Minimal Proxmox VE Permissions
Required API Token Permissions
Create an API token with these minimal privileges:
# Create user for VirtRigaud
pveum user add virtrigaud@pve --comment "VirtRigaud Provider"
# Create API token
pveum user token add virtrigaud@pve vrtg-token --privsep 1
# Grant minimal required permissions
pveum acl modify / --users virtrigaud@pve --roles PVEVMAdmin,PVEDatastoreUser
# Custom role with minimal permissions (alternative)
pveum role add VirtRigaud --privs "VM.Allocate,VM.Audit,VM.Config.CPU,VM.Config.Memory,VM.Config.Disk,VM.Config.Network,VM.Config.Options,VM.Monitor,VM.PowerMgmt,VM.Snapshot,VM.Clone,Datastore.Allocate,Datastore.AllocateSpace,Pool.Allocate"
pveum acl modify / --users virtrigaud@pve --roles VirtRigaud
Permission Details
| Permission | Usage | Required |
|---|---|---|
VM.Allocate | Create new VMs | β Core |
VM.Audit | Read VM configuration | β Core |
VM.Config.* | Modify VM settings | β Reconfigure |
VM.Monitor | VM status monitoring | β Core |
VM.PowerMgmt | Power operations | β Core |
VM.Snapshot | Snapshot operations | β οΈ Optional |
VM.Clone | VM cloning | β οΈ Optional |
Datastore.Allocate | Create VM disks | β Core |
Pool.Allocate | Resource pool usage | β οΈ Optional |
Token Rotation Procedure
# 1. Create new token
NEW_TOKEN=$(pveum user token add virtrigaud@pve vrtg-token-2 --privsep 1 --output-format json | jq -r '.value')
# 2. Update Kubernetes secret
kubectl patch secret pve-credentials -n virtrigaud-system --type='merge' -p='{"stringData":{"token_id":"virtrigaud@pve!vrtg-token-2","token_secret":"'$NEW_TOKEN'"}}'
# 3. Restart provider to use new token
kubectl rollout restart deployment provider-proxmox -n virtrigaud-system
# 4. Verify new token works
kubectl logs deployment/provider-proxmox -n virtrigaud-system
# 5. Remove old token
pveum user token remove virtrigaud@pve vrtg-token
NetworkPolicy Examples
Production NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: provider-proxmox-netpol
namespace: virtrigaud-system
spec:
podSelector:
matchLabels:
app: provider-proxmox
policyTypes: [Ingress, Egress]
ingress:
- from:
- podSelector:
matchLabels:
app: virtrigaud-manager
ports: [9443, 8080]
egress:
# DNS resolution
- to: []
ports: [53]
# Proxmox VE API
- to:
- ipBlock:
cidr: 192.168.1.0/24 # Your PVE network
ports: [8006]
Development NetworkPolicy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: provider-proxmox-dev-netpol
namespace: virtrigaud-system
spec:
podSelector:
matchLabels:
app: provider-proxmox
environment: development
egress:
- to: [] # Allow all egress for development
Storage and Placement
Storage Class Mapping
Configure storage placement for different workloads:
# High-performance storage
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: high-performance
spec:
cpus: 8
memory: "32Gi"
storage:
class: "nvme-storage" # Maps to PVE storage
type: "thin" # Thin provisioning
# Standard storage
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: standard
spec:
cpus: 4
memory: "8Gi"
storage:
class: "ssd-storage"
type: "thick" # Thick provisioning
Placement Policies
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMPlacementPolicy
metadata:
name: production-placement
spec:
nodeSelector:
- "pve-node-1"
- "pve-node-2"
antiAffinity:
- key: "vm.type"
operator: "In"
values: ["database"]
constraints:
maxVMsPerNode: 10
minFreeMemory: "4Gi"
Performance Testing
Load Test Results
Performance benchmarks using virtrigaud-loadgen against fake PVE server:
| Operation | P50 Latency | P95 Latency | Throughput | Notes |
|---|---|---|---|---|
| Create VM | 2.3s | 4.1s | 12 ops/min | Including cloud-init |
| Power On | 800ms | 1.2s | 45 ops/min | Async operation |
| Power Off | 650ms | 1.1s | 50 ops/min | Graceful shutdown |
| Describe | 120ms | 200ms | 200 ops/min | Status query |
| Reconfigure CPU | 1.8s | 3.2s | 15 ops/min | Online hot-plug |
| Snapshot Create | 3.5s | 6.8s | 8 ops/min | With memory |
| Clone (Linked) | 1.9s | 3.4s | 12 ops/min | Fast COW clone |
Running Performance Tests
# Deploy fake PVE server for testing
kubectl apply -f test/performance/proxmox-loadtest.yaml
# Run performance test
kubectl create job proxmox-perf-test --from=cronjob/proxmox-performance-test
# View results
kubectl logs job/proxmox-perf-test -f
Security Best Practices
- Use API Tokens: Prefer API tokens over username/password
- Least Privilege: Grant minimal required permissions (see above)
- TLS Verification: Always verify certificates in production
- Secret Management: Use Kubernetes secrets with proper RBAC
- Network Policies: Restrict provider network access (see examples)
- Regular Rotation: Rotate API tokens quarterly
- Audit Logging: Enable PVE audit logs for provider actions
- Resource Quotas: Limit provider resource consumption
Examples
Multi-Node Setup
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: proxmox-cluster
spec:
type: proxmox
endpoint: https://pve-cluster.example.com:8006
runtime:
env:
- name: PVE_NODE_SELECTOR
value: "pve-1,pve-2,pve-3"
High-Availability Configuration
apiVersion: apps/v1
kind: Deployment
metadata:
name: provider-proxmox
spec:
replicas: 2
template:
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: provider-proxmox
topologyKey: kubernetes.io/hostname
Troubleshooting
Common Issues
β βendpoint is requiredβ Error
Symptom: Provider pod crashes with ERROR Failed to create PVE client error="endpoint is required"
Cause: Missing or empty PVE_ENDPOINT environment variable
Solution:
# Ensure PVE_ENDPOINT is set in deployment
env:
- name: PVE_ENDPOINT
value: "https://your-proxmox.example.com:8006/api2"
β Connection Timeout/Refused
Symptom: Provider fails with connection timeouts or βconnection refusedβ
Cause: Network connectivity issues or wrong endpoint URL
Solutions:
-
Verify endpoint: Test from a pod in the cluster:
kubectl run test-curl --rm -i --tty --image=curlimages/curl -- \ curl -k https://your-proxmox.example.com:8006/api2/json/version -
Check firewall: Ensure port 8006 is accessible from Kubernetes cluster
-
Verify URL format: Should be
https://hostname:8006/api2(note the/api2path)
β TLS Certificate Errors
Symptom: x509: certificate signed by unknown authority
Solutions:
- Development: Set
PVE_INSECURE_SKIP_VERIFY=true(not for production!) - Production: Provide valid TLS certificates or CA bundle
β Authentication Failures
Symptom: 401 Unauthorized or authentication failure
Solutions:
-
Verify token permissions:
# Test API token manually curl -k "https://pve.example.com:8006/api2/json/version" \ -H "Authorization: PVEAPIToken=USER@REALM!TOKENID=SECRET" -
Check user privileges: Ensure user has VM management permissions
-
Verify token format: Should be
user@realm!tokenid(note the!)
β Provider Not Starting
Symptom: Pod in CrashLoopBackOff or 0/1 Ready
Diagnostic Steps:
# Check pod logs
kubectl logs -n virtrigaud-system deployment/virtrigaud-provider-proxmox
# Check environment variables
kubectl describe pod -n virtrigaud-system -l app.kubernetes.io/component=provider-proxmox
# Verify configuration
kubectl get secret proxmox-credentials -o yaml
Validation Commands
Test your Proxmox connection before deploying:
# 1. Test network connectivity
telnet your-proxmox.example.com 8006
# 2. Test API endpoint
curl -k https://your-proxmox.example.com:8006/api2/json/version
# 3. Test authentication
curl -k "https://your-proxmox.example.com:8006/api2/json/nodes" \
-H "Authorization: PVEAPIToken=USER@REALM!TOKENID=SECRET"
# 4. Test from within cluster
kubectl run debug --rm -i --tty --image=curlimages/curl -- sh
# Then run curl commands from inside the pod
Debug Logging
Enable verbose logging for the provider:
providers:
proxmox:
env:
- name: LOG_LEVEL
value: "debug"
- name: PVE_ENDPOINT
value: "https://pve.example.com:8006/api2"
API Reference
For complete API reference, see the Provider API Documentation.
Contributing
To contribute to the Proxmox provider:
- See the Provider Development Guide
- Check the GitHub repository
- Review open issues
Support
- Documentation: VirtRigaud Docs
- Issues: GitHub Issues
- Community: Discord
Provider Developer Tutorial
This comprehensive tutorial walks you through creating a complete VirtRigaud provider from scratch. By the end, youβll have a fully functional provider that can create, manage, and delete virtual machines.
Prerequisites
Before starting this tutorial, ensure you have:
- Go 1.23 or later installed
- Docker installed for containerization
- kubectl and a Kubernetes cluster (Kind/minikube for local development)
- Helm 3.x installed
- Basic understanding of gRPC and protobuf
Tutorial Overview
Weβll build a File Provider that manages βvirtual machinesβ as JSON files on disk. While not practical for production, this provider demonstrates all the core concepts without requiring actual hypervisor access.
What weβll build:
- A complete provider implementation using the VirtRigaud SDK
- Conformance tests that pass VCTS core profile
- A Helm chart for deployment
- CI/CD integration
- Publication to the provider catalog
Step 1: Initialize Your Provider Project
1.1 Create Project Structure
# Create project directory
mkdir virtrigaud-provider-file
cd virtrigaud-provider-file
# Initialize the provider project
vrtg-provider init file
The vrtg-provider init command creates the following structure:
virtrigaud-provider-file/
βββ cmd/
β βββ provider-file/
β βββ main.go
β βββ Dockerfile
βββ internal/
β βββ provider/
β βββ provider.go
β βββ capabilities.go
β βββ provider_test.go
βββ charts/
β βββ provider-file/
β βββ Chart.yaml
β βββ values.yaml
β βββ templates/
βββ .github/
β βββ workflows/
β βββ ci.yml
βββ Makefile
βββ go.mod
βββ go.sum
βββ .gitignore
βββ README.md
1.2 Examine Generated Files
main.go - Entry point that sets up the gRPC server:
package main
import (
"log"
"github.com/projectbeskar/virtrigaud/sdk/provider/server"
"github.com/projectbeskar/virtrigaud/proto/rpc/provider/v1"
"virtrigaud-provider-file/internal/provider"
)
func main() {
// Create provider instance
p, err := provider.New()
if err != nil {
log.Fatalf("Failed to create provider: %v", err)
}
// Configure server
config := &server.Config{
Port: 9443,
HealthPort: 8080,
EnableTLS: false,
}
srv, err := server.New(config)
if err != nil {
log.Fatalf("Failed to create server: %v", err)
}
// Register provider service
providerv1.RegisterProviderServiceServer(srv.GRPCServer(), p)
// Start server
log.Println("Starting file provider on port 9443...")
if err := srv.Serve(); err != nil {
log.Fatalf("Server failed: %v", err)
}
}
go.mod - Module definition with SDK dependency:
module virtrigaud-provider-file
go 1.23
require (
github.com/projectbeskar/virtrigaud/sdk v0.1.0
github.com/projectbeskar/virtrigaud/proto v0.1.0
)
Step 2: Implement the Core Provider
2.1 Design the File Provider
Our file provider will:
- Store VM metadata as JSON files in
/var/lib/virtrigaud/vms/ - Use filename as VM ID
- Simulate power operations with state files
- Support basic CRUD operations
2.2 Define the VM Model
Create internal/provider/vm.go:
package provider
import (
"encoding/json"
"fmt"
"io/ioutil"
"os"
"path/filepath"
"time"
"github.com/projectbeskar/virtrigaud/proto/rpc/provider/v1"
)
type VirtualMachine struct {
ID string `json:"id"`
Name string `json:"name"`
Spec *providerv1.VMSpec `json:"spec"`
Status *providerv1.VMStatus `json:"status"`
CreatedAt time.Time `json:"created_at"`
UpdatedAt time.Time `json:"updated_at"`
}
type FileStore struct {
baseDir string
}
func NewFileStore(baseDir string) *FileStore {
return &FileStore{baseDir: baseDir}
}
func (fs *FileStore) Save(vm *VirtualMachine) error {
if err := os.MkdirAll(fs.baseDir, 0755); err != nil {
return fmt.Errorf("failed to create directory: %w", err)
}
vm.UpdatedAt = time.Now()
data, err := json.MarshalIndent(vm, "", " ")
if err != nil {
return fmt.Errorf("failed to marshal VM: %w", err)
}
filename := filepath.Join(fs.baseDir, vm.ID+".json")
return ioutil.WriteFile(filename, data, 0644)
}
func (fs *FileStore) Load(id string) (*VirtualMachine, error) {
filename := filepath.Join(fs.baseDir, id+".json")
data, err := ioutil.ReadFile(filename)
if err != nil {
if os.IsNotExist(err) {
return nil, fmt.Errorf("VM not found: %s", id)
}
return nil, fmt.Errorf("failed to read VM file: %w", err)
}
var vm VirtualMachine
if err := json.Unmarshal(data, &vm); err != nil {
return nil, fmt.Errorf("failed to unmarshal VM: %w", err)
}
return &vm, nil
}
func (fs *FileStore) Delete(id string) error {
filename := filepath.Join(fs.baseDir, id+".json")
if err := os.Remove(filename); err != nil && !os.IsNotExist(err) {
return fmt.Errorf("failed to delete VM file: %w", err)
}
return nil
}
func (fs *FileStore) List() ([]*VirtualMachine, error) {
files, err := ioutil.ReadDir(fs.baseDir)
if err != nil {
if os.IsNotExist(err) {
return []*VirtualMachine{}, nil
}
return nil, fmt.Errorf("failed to read directory: %w", err)
}
var vms []*VirtualMachine
for _, file := range files {
if !file.IsDir() && filepath.Ext(file.Name()) == ".json" {
id := file.Name()[:len(file.Name())-5] // Remove .json extension
vm, err := fs.Load(id)
if err != nil {
continue // Skip invalid files
}
vms = append(vms, vm)
}
}
return vms, nil
}
2.3 Implement the Provider Interface
Update internal/provider/provider.go:
package provider
import (
"context"
"fmt"
"os"
"path/filepath"
"time"
"github.com/google/uuid"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
"github.com/projectbeskar/virtrigaud/proto/rpc/provider/v1"
"github.com/projectbeskar/virtrigaud/sdk/provider/capabilities"
"github.com/projectbeskar/virtrigaud/sdk/provider/errors"
)
type Provider struct {
store *FileStore
caps *capabilities.ProviderCapabilities
}
func New() (*Provider, error) {
// Get storage directory from environment or use default
baseDir := os.Getenv("PROVIDER_STORAGE_DIR")
if baseDir == "" {
baseDir = "/var/lib/virtrigaud/vms"
}
// Create capabilities
caps := &capabilities.ProviderCapabilities{
ProviderInfo: &providerv1.ProviderInfo{
Name: "file",
Version: "0.1.0",
Description: "File-based virtual machine provider for development and testing",
},
SupportedCapabilities: []capabilities.Capability{
capabilities.CapabilityCore,
capabilities.CapabilitySnapshot,
capabilities.CapabilityClone,
},
}
return &Provider{
store: NewFileStore(baseDir),
caps: caps,
}, nil
}
// GetCapabilities returns provider capabilities
func (p *Provider) GetCapabilities(ctx context.Context, req *providerv1.GetCapabilitiesRequest) (*providerv1.GetCapabilitiesResponse, error) {
return &providerv1.GetCapabilitiesResponse{
ProviderId: "file-provider",
Capabilities: []*providerv1.Capability{
{
Name: "vm.create",
Supported: true,
Description: "Create virtual machines",
},
{
Name: "vm.read",
Supported: true,
Description: "Read virtual machine information",
},
{
Name: "vm.update",
Supported: true,
Description: "Update virtual machine configuration",
},
{
Name: "vm.delete",
Supported: true,
Description: "Delete virtual machines",
},
{
Name: "vm.power",
Supported: true,
Description: "Control virtual machine power state",
},
{
Name: "vm.snapshot",
Supported: true,
Description: "Create and manage VM snapshots",
},
{
Name: "vm.clone",
Supported: true,
Description: "Clone virtual machines",
},
},
}, nil
}
// CreateVM creates a new virtual machine
func (p *Provider) CreateVM(ctx context.Context, req *providerv1.CreateVMRequest) (*providerv1.CreateVMResponse, error) {
// Validate request
if req.Name == "" {
return nil, errors.NewInvalidSpec("VM name is required")
}
if req.Spec == nil {
return nil, errors.NewInvalidSpec("VM spec is required")
}
// Generate unique ID
vmID := uuid.New().String()
// Create VM object
vm := &VirtualMachine{
ID: vmID,
Name: req.Name,
Spec: req.Spec,
Status: &providerv1.VMStatus{
State: "Creating",
Message: "VM is being created",
},
CreatedAt: time.Now(),
UpdatedAt: time.Now(),
}
// Save to store
if err := p.store.Save(vm); err != nil {
return nil, status.Errorf(codes.Internal, "failed to save VM: %v", err)
}
// Simulate creation time
go func() {
time.Sleep(2 * time.Second)
vm.Status.State = "Running"
vm.Status.Message = "VM is running"
p.store.Save(vm)
}()
return &providerv1.CreateVMResponse{
VmId: vmID,
Status: vm.Status,
}, nil
}
// GetVM retrieves virtual machine information
func (p *Provider) GetVM(ctx context.Context, req *providerv1.GetVMRequest) (*providerv1.GetVMResponse, error) {
if req.VmId == "" {
return nil, errors.NewInvalidSpec("VM ID is required")
}
vm, err := p.store.Load(req.VmId)
if err != nil {
return nil, errors.NewNotFound("VM not found: %s", req.VmId)
}
return &providerv1.GetVMResponse{
VmId: vm.ID,
Name: vm.Name,
Spec: vm.Spec,
Status: vm.Status,
}, nil
}
// UpdateVM updates virtual machine configuration
func (p *Provider) UpdateVM(ctx context.Context, req *providerv1.UpdateVMRequest) (*providerv1.UpdateVMResponse, error) {
if req.VmId == "" {
return nil, errors.NewInvalidSpec("VM ID is required")
}
vm, err := p.store.Load(req.VmId)
if err != nil {
return nil, errors.NewNotFound("VM not found: %s", req.VmId)
}
// Update spec if provided
if req.Spec != nil {
vm.Spec = req.Spec
vm.Status.Message = "VM configuration updated"
if err := p.store.Save(vm); err != nil {
return nil, status.Errorf(codes.Internal, "failed to save VM: %v", err)
}
}
return &providerv1.UpdateVMResponse{
Status: vm.Status,
}, nil
}
// DeleteVM deletes a virtual machine
func (p *Provider) DeleteVM(ctx context.Context, req *providerv1.DeleteVMRequest) (*providerv1.DeleteVMResponse, error) {
if req.VmId == "" {
return nil, errors.NewInvalidSpec("VM ID is required")
}
// Check if VM exists
_, err := p.store.Load(req.VmId)
if err != nil {
return nil, errors.NewNotFound("VM not found: %s", req.VmId)
}
// Delete VM
if err := p.store.Delete(req.VmId); err != nil {
return nil, status.Errorf(codes.Internal, "failed to delete VM: %v", err)
}
return &providerv1.DeleteVMResponse{
Success: true,
Message: "VM deleted successfully",
}, nil
}
// PowerVM controls virtual machine power state
func (p *Provider) PowerVM(ctx context.Context, req *providerv1.PowerVMRequest) (*providerv1.PowerVMResponse, error) {
if req.VmId == "" {
return nil, errors.NewInvalidSpec("VM ID is required")
}
vm, err := p.store.Load(req.VmId)
if err != nil {
return nil, errors.NewNotFound("VM not found: %s", req.VmId)
}
// Update power state based on operation
switch req.PowerOp {
case providerv1.PowerOp_POWER_OP_ON:
vm.Status.State = "Running"
vm.Status.Message = "VM is running"
case providerv1.PowerOp_POWER_OP_OFF:
vm.Status.State = "Stopped"
vm.Status.Message = "VM is stopped"
case providerv1.PowerOp_POWER_OP_REBOOT:
vm.Status.State = "Rebooting"
vm.Status.Message = "VM is rebooting"
// Simulate reboot
go func() {
time.Sleep(3 * time.Second)
vm.Status.State = "Running"
vm.Status.Message = "VM is running"
p.store.Save(vm)
}()
default:
return nil, errors.NewInvalidSpec("unsupported power operation: %v", req.PowerOp)
}
if err := p.store.Save(vm); err != nil {
return nil, status.Errorf(codes.Internal, "failed to save VM: %v", err)
}
return &providerv1.PowerVMResponse{
Status: vm.Status,
}, nil
}
// ListVMs lists all virtual machines
func (p *Provider) ListVMs(ctx context.Context, req *providerv1.ListVMsRequest) (*providerv1.ListVMsResponse, error) {
vms, err := p.store.List()
if err != nil {
return nil, status.Errorf(codes.Internal, "failed to list VMs: %v", err)
}
var vmInfos []*providerv1.VMInfo
for _, vm := range vms {
vmInfos = append(vmInfos, &providerv1.VMInfo{
VmId: vm.ID,
Name: vm.Name,
Status: vm.Status,
})
}
return &providerv1.ListVMsResponse{
Vms: vmInfos,
}, nil
}
// CreateSnapshot creates a VM snapshot
func (p *Provider) CreateSnapshot(ctx context.Context, req *providerv1.CreateSnapshotRequest) (*providerv1.CreateSnapshotResponse, error) {
if req.VmId == "" {
return nil, errors.NewInvalidSpec("VM ID is required")
}
vm, err := p.store.Load(req.VmId)
if err != nil {
return nil, errors.NewNotFound("VM not found: %s", req.VmId)
}
// Create snapshot (simulate by copying VM file)
snapshotID := uuid.New().String()
snapshotPath := filepath.Join(filepath.Dir(p.store.baseDir), "snapshots")
if err := os.MkdirAll(snapshotPath, 0755); err != nil {
return nil, status.Errorf(codes.Internal, "failed to create snapshot directory: %v", err)
}
// Copy VM data to snapshot
snapshotVM := *vm
snapshotVM.ID = snapshotID
snapshotStore := NewFileStore(snapshotPath)
if err := snapshotStore.Save(&snapshotVM); err != nil {
return nil, status.Errorf(codes.Internal, "failed to save snapshot: %v", err)
}
return &providerv1.CreateSnapshotResponse{
SnapshotId: snapshotID,
Status: &providerv1.TaskStatus{
State: "Completed",
Message: "Snapshot created successfully",
},
}, nil
}
// CloneVM clones a virtual machine
func (p *Provider) CloneVM(ctx context.Context, req *providerv1.CloneVMRequest) (*providerv1.CloneVMResponse, error) {
if req.SourceVmId == "" {
return nil, errors.NewInvalidSpec("Source VM ID is required")
}
if req.CloneName == "" {
return nil, errors.NewInvalidSpec("Clone name is required")
}
// Load source VM
sourceVM, err := p.store.Load(req.SourceVmId)
if err != nil {
return nil, errors.NewNotFound("Source VM not found: %s", req.SourceVmId)
}
// Create clone
cloneID := uuid.New().String()
cloneVM := &VirtualMachine{
ID: cloneID,
Name: req.CloneName,
Spec: sourceVM.Spec, // Copy spec from source
Status: &providerv1.VMStatus{
State: "Stopped",
Message: "Clone created successfully",
},
CreatedAt: time.Now(),
UpdatedAt: time.Now(),
}
if err := p.store.Save(cloneVM); err != nil {
return nil, status.Errorf(codes.Internal, "failed to save clone: %v", err)
}
return &providerv1.CloneVMResponse{
CloneVmId: cloneID,
Status: &providerv1.TaskStatus{
State: "Completed",
Message: "VM cloned successfully",
},
}, nil
}
Step 3: Add Tests and Validation
3.1 Create Unit Tests
Create internal/provider/provider_test.go:
package provider
import (
"context"
"os"
"path/filepath"
"testing"
"time"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/projectbeskar/virtrigaud/proto/rpc/provider/v1"
)
func TestProvider_CreateVM(t *testing.T) {
// Create temporary directory for testing
tmpDir, err := os.MkdirTemp("", "file-provider-test")
require.NoError(t, err)
defer os.RemoveAll(tmpDir)
// Set storage directory
os.Setenv("PROVIDER_STORAGE_DIR", tmpDir)
defer os.Unsetenv("PROVIDER_STORAGE_DIR")
// Create provider
p, err := New()
require.NoError(t, err)
// Test VM creation
req := &providerv1.CreateVMRequest{
Name: "test-vm",
Spec: &providerv1.VMSpec{
Cpu: 2,
Memory: 4096,
Image: "ubuntu:20.04",
},
}
resp, err := p.CreateVM(context.Background(), req)
require.NoError(t, err)
assert.NotEmpty(t, resp.VmId)
assert.Equal(t, "Creating", resp.Status.State)
// Verify VM file was created
vmFile := filepath.Join(tmpDir, resp.VmId+".json")
assert.FileExists(t, vmFile)
}
func TestProvider_GetVM(t *testing.T) {
tmpDir, err := os.MkdirTemp("", "file-provider-test")
require.NoError(t, err)
defer os.RemoveAll(tmpDir)
os.Setenv("PROVIDER_STORAGE_DIR", tmpDir)
defer os.Unsetenv("PROVIDER_STORAGE_DIR")
p, err := New()
require.NoError(t, err)
// Create VM first
createReq := &providerv1.CreateVMRequest{
Name: "test-vm",
Spec: &providerv1.VMSpec{
Cpu: 2,
Memory: 4096,
},
}
createResp, err := p.CreateVM(context.Background(), createReq)
require.NoError(t, err)
// Get VM
getReq := &providerv1.GetVMRequest{
VmId: createResp.VmId,
}
getResp, err := p.GetVM(context.Background(), getReq)
require.NoError(t, err)
assert.Equal(t, createResp.VmId, getResp.VmId)
assert.Equal(t, "test-vm", getResp.Name)
assert.Equal(t, int32(2), getResp.Spec.Cpu)
}
func TestProvider_PowerVM(t *testing.T) {
tmpDir, err := os.MkdirTemp("", "file-provider-test")
require.NoError(t, err)
defer os.RemoveAll(tmpDir)
os.Setenv("PROVIDER_STORAGE_DIR", tmpDir)
defer os.Unsetenv("PROVIDER_STORAGE_DIR")
p, err := New()
require.NoError(t, err)
// Create VM
createReq := &providerv1.CreateVMRequest{
Name: "test-vm",
Spec: &providerv1.VMSpec{Cpu: 1, Memory: 1024},
}
createResp, err := p.CreateVM(context.Background(), createReq)
require.NoError(t, err)
// Power off VM
powerReq := &providerv1.PowerVMRequest{
VmId: createResp.VmId,
PowerOp: providerv1.PowerOp_POWER_OP_OFF,
}
powerResp, err := p.PowerVM(context.Background(), powerReq)
require.NoError(t, err)
assert.Equal(t, "Stopped", powerResp.Status.State)
// Power on VM
powerReq.PowerOp = providerv1.PowerOp_POWER_OP_ON
powerResp, err = p.PowerVM(context.Background(), powerReq)
require.NoError(t, err)
assert.Equal(t, "Running", powerResp.Status.State)
}
func TestProvider_GetCapabilities(t *testing.T) {
p, err := New()
require.NoError(t, err)
req := &providerv1.GetCapabilitiesRequest{}
resp, err := p.GetCapabilities(context.Background(), req)
require.NoError(t, err)
assert.Equal(t, "file-provider", resp.ProviderId)
assert.NotEmpty(t, resp.Capabilities)
// Check for core capabilities
capNames := make(map[string]bool)
for _, cap := range resp.Capabilities {
capNames[cap.Name] = cap.Supported
}
assert.True(t, capNames["vm.create"])
assert.True(t, capNames["vm.read"])
assert.True(t, capNames["vm.delete"])
assert.True(t, capNames["vm.power"])
}
func TestProvider_CloneVM(t *testing.T) {
tmpDir, err := os.MkdirTemp("", "file-provider-test")
require.NoError(t, err)
defer os.RemoveAll(tmpDir)
os.Setenv("PROVIDER_STORAGE_DIR", tmpDir)
defer os.Unsetenv("PROVIDER_STORAGE_DIR")
p, err := New()
require.NoError(t, err)
// Create source VM
createReq := &providerv1.CreateVMRequest{
Name: "source-vm",
Spec: &providerv1.VMSpec{
Cpu: 4,
Memory: 8192,
Image: "centos:8",
},
}
createResp, err := p.CreateVM(context.Background(), createReq)
require.NoError(t, err)
// Clone VM
cloneReq := &providerv1.CloneVMRequest{
SourceVmId: createResp.VmId,
CloneName: "cloned-vm",
}
cloneResp, err := p.CloneVM(context.Background(), cloneReq)
require.NoError(t, err)
assert.NotEmpty(t, cloneResp.CloneVmId)
assert.NotEqual(t, createResp.VmId, cloneResp.CloneVmId)
// Verify clone has same specs as source
getReq := &providerv1.GetVMRequest{
VmId: cloneResp.CloneVmId,
}
getResp, err := p.GetVM(context.Background(), getReq)
require.NoError(t, err)
assert.Equal(t, "cloned-vm", getResp.Name)
assert.Equal(t, int32(4), getResp.Spec.Cpu)
assert.Equal(t, int32(8192), getResp.Spec.Memory)
assert.Equal(t, "centos:8", getResp.Spec.Image)
}
3.2 Add Build and Test Targets
Update the Makefile:
# File Provider Makefile
.PHONY: help build test lint clean run docker-build docker-push
help: ## Show this help message
@echo 'Usage: make [target]'
@echo ''
@echo 'Targets:'
@awk 'BEGIN {FS = ":.*?## "} /^[a-zA-Z_-]+:.*?## / {printf " %-15s %s\n", $$1, $$2}' $(MAKEFILE_LIST)
build: ## Build the provider binary
go build -o bin/provider-file ./cmd/provider-file
test: ## Run tests
go test -v ./...
test-coverage: ## Run tests with coverage
go test -v -coverprofile=coverage.out ./...
go tool cover -html=coverage.out -o coverage.html
lint: ## Run linters
golangci-lint run ./...
clean: ## Clean build artifacts
rm -rf bin/
rm -f coverage.out coverage.html
run: build ## Run the provider locally
PROVIDER_STORAGE_DIR=/tmp/virtrigaud-file ./bin/provider-file
docker-build: ## Build Docker image
docker build -f cmd/provider-file/Dockerfile -t provider-file:latest .
docker-push: docker-build ## Build and push Docker image
docker tag provider-file:latest ghcr.io/yourorg/provider-file:latest
docker push ghcr.io/yourorg/provider-file:latest
# Development targets
dev-setup: ## Set up development environment
go mod download
go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
integration-test: build ## Run integration tests
./scripts/integration-test.sh
Step 4: Test with VCTS (VirtRigaud Conformance Test Suite)
4.1 Install VCTS
# Build VCTS from the main repository
go install github.com/projectbeskar/virtrigaud/cmd/vcts@latest
4.2 Create VCTS Configuration
Create vcts-config.yaml:
provider:
name: "file"
endpoint: "localhost:9443"
tls: false
profiles:
core:
enabled: true
vm_specs:
- name: "basic"
cpu: 1
memory: 1024
image: "test:latest"
- name: "medium"
cpu: 2
memory: 4096
image: "ubuntu:20.04"
snapshot:
enabled: true
clone:
enabled: true
tests:
timeout: "30s"
parallel: false
cleanup: true
4.3 Run Conformance Tests
# Start the provider
make run &
PROVIDER_PID=$!
# Wait for provider to start
sleep 3
# Run VCTS core profile
vcts run --config vcts-config.yaml --profile core
# Run all enabled profiles
vcts run --config vcts-config.yaml --profile all
# Stop the provider
kill $PROVIDER_PID
Expected output:
β
Core Profile Tests
β
Provider.GetCapabilities
β
Provider.CreateVM
β
Provider.GetVM
β
Provider.UpdateVM
β
Provider.DeleteVM
β
Provider.PowerVM
β
Provider.ListVMs
β
Snapshot Profile Tests
β
Provider.CreateSnapshot
β
Clone Profile Tests
β
Provider.CloneVM
π All tests passed! Provider is conformant.
Step 5: Create Helm Chart for Deployment
5.1 Chart Structure
The generated chart in charts/provider-file/ includes:
charts/provider-file/
βββ Chart.yaml
βββ values.yaml
βββ templates/
β βββ deployment.yaml
β βββ service.yaml
β βββ serviceaccount.yaml
β βββ rbac.yaml
β βββ _helpers.tpl
βββ examples/
βββ values-development.yaml
5.2 Customize Chart Values
Update charts/provider-file/values.yaml:
# Default values for provider-file
replicaCount: 1
image:
repository: ghcr.io/yourorg/provider-file
pullPolicy: IfNotPresent
tag: "0.1.0"
nameOverride: ""
fullnameOverride: ""
serviceAccount:
create: true
annotations: {}
name: ""
podAnnotations: {}
podSecurityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 1000
service:
type: ClusterIP
port: 9443
healthPort: 8080
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 100m
memory: 128Mi
nodeSelector: {}
tolerations: []
affinity: {}
# Provider-specific configuration
provider:
storageDir: "/var/lib/virtrigaud/vms"
logLevel: "info"
# Persistent storage for VM data
persistence:
enabled: true
accessMode: ReadWriteOnce
size: 10Gi
storageClass: ""
5.3 Test Helm Chart
# Lint the chart
helm lint charts/provider-file/
# Template the chart
helm template provider-file charts/provider-file/ \
--values charts/provider-file/values.yaml
# Install to local cluster
helm install provider-file charts/provider-file/ \
--namespace provider-file \
--create-namespace \
--values charts/provider-file/examples/values-development.yaml
Step 6: Set Up CI/CD
6.1 GitHub Actions Workflow
The generated .github/workflows/ci.yml includes:
name: CI
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main, develop ]
env:
GO_VERSION: '1.23'
jobs:
test:
name: Test
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: ${{ env.GO_VERSION }}
- name: Run tests
run: make test
- name: Run linting
run: make lint
build:
name: Build
runs-on: ubuntu-latest
needs: test
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: ${{ env.GO_VERSION }}
- name: Build binary
run: make build
- name: Build Docker image
run: make docker-build
conformance:
name: Conformance Tests
runs-on: ubuntu-latest
needs: build
steps:
- uses: actions/checkout@v4
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: ${{ env.GO_VERSION }}
- name: Build provider
run: make build
- name: Install VCTS
run: go install github.com/projectbeskar/virtrigaud/cmd/vcts@latest
- name: Run conformance tests
run: |
# Start provider in background
PROVIDER_STORAGE_DIR=/tmp/vcts-test ./bin/provider-file &
PROVIDER_PID=$!
# Wait for startup
sleep 5
# Run VCTS
vcts run --config vcts-config.yaml --profile core
# Clean up
kill $PROVIDER_PID
release:
name: Release
runs-on: ubuntu-latest
needs: [test, build, conformance]
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Build and push Docker image
run: |
echo ${{ secrets.GITHUB_TOKEN }} | docker login ghcr.io -u ${{ github.actor }} --password-stdin
make docker-push
- name: Package Helm chart
run: |
helm package charts/provider-file/ -d dist/
- name: Upload artifacts
uses: actions/upload-artifact@v4
with:
name: release-artifacts
path: |
bin/
dist/
Step 7: Publish to Provider Catalog
7.1 Run Provider Verification
# Verify the provider meets all requirements
vrtg-provider verify --profile all
7.2 Publish to Catalog
# Publish to the VirtRigaud provider catalog
vrtg-provider publish \
--name file \
--image ghcr.io/yourorg/provider-file \
--tag 0.1.0 \
--repo https://github.com/yourorg/virtrigaud-provider-file \
--maintainer your-email@example.com \
--license Apache-2.0
This command will:
- Run VCTS conformance tests
- Generate a provider badge
- Create a catalog entry
- Open a pull request to the main VirtRigaud repository
7.3 Example Catalog Entry
The generated catalog entry will look like:
- name: file
displayName: "File Provider"
description: "File-based virtual machine provider for development and testing"
repo: "https://github.com/yourorg/virtrigaud-provider-file"
image: "ghcr.io/yourorg/provider-file"
tag: "0.1.0"
capabilities:
- core
- snapshot
- clone
conformance:
profiles:
core: pass
snapshot: pass
clone: pass
image-prepare: skip
advanced: skip
report_url: "https://github.com/yourorg/virtrigaud-provider-file/actions"
badge_url: "https://img.shields.io/badge/conformance-pass-green"
last_tested: "2025-08-26T15:00:00Z"
maintainer: "your-email@example.com"
license: "Apache-2.0"
maturity: "beta"
tags:
- file
- development
- testing
documentation: "https://github.com/yourorg/virtrigaud-provider-file/blob/main/README.md"
Step 8: Production Considerations
8.1 Security Hardening
# Production values.yaml
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 65534
podSecurityContext:
fsGroup: 65534
runAsNonRoot: true
runAsUser: 65534
seccompProfile:
type: RuntimeDefault
networkPolicy:
enabled: true
ingress:
fromNamespaces:
- virtrigaud-system
egress:
- to: []
ports:
- protocol: UDP
port: 53
8.2 Observability
Add monitoring and logging:
// Add to provider.go
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
vmOperations = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "file_provider_vm_operations_total",
Help: "Total number of VM operations",
},
[]string{"operation", "status"},
)
vmOperationDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "file_provider_vm_operation_duration_seconds",
Help: "Duration of VM operations",
},
[]string{"operation"},
)
)
func (p *Provider) CreateVM(ctx context.Context, req *providerv1.CreateVMRequest) (*providerv1.CreateVMResponse, error) {
start := time.Now()
defer func() {
vmOperationDuration.WithLabelValues("create").Observe(time.Since(start).Seconds())
}()
// ... existing implementation ...
vmOperations.WithLabelValues("create", "success").Inc()
return resp, nil
}
8.3 Performance Optimization
- Add connection pooling for gRPC clients
- Implement caching for frequently accessed VMs
- Use background workers for long-running operations
- Add rate limiting and request validation
8.4 Error Handling and Resilience
- Implement circuit breakers for external dependencies
- Add retry logic with exponential backoff
- Use structured logging with correlation IDs
- Implement graceful shutdown handling
Conclusion
Youβve successfully created a complete VirtRigaud provider! This tutorial covered:
β
Provider Implementation - Full gRPC service with all core operations
β
SDK Integration - Using VirtRigaud SDK for server setup and utilities
β
Testing - Unit tests and VCTS conformance validation
β
Containerization - Docker images and Helm charts
β
CI/CD - Automated testing and publishing
β
Catalog Integration - Publishing to the provider ecosystem
Next Steps
-
Explore Advanced Features:
- Add image management capabilities
- Implement networking configuration
- Add storage volume management
-
Integration Examples:
- Connect to real hypervisors (libvirt, vSphere, etc.)
- Add authentication and authorization
- Implement backup and disaster recovery
-
Community Contribution:
- Submit your provider to the catalog
- Contribute improvements to the SDK
- Help other developers with provider development
-
Production Deployment:
- Set up monitoring and alerting
- Implement proper security measures
- Plan for scaling and high availability
For more information, visit the VirtRigaud documentation or join our community discussions.
Versioning & Breaking Changes
This document outlines VirtRigaudβs approach to versioning, compatibility, and managing breaking changes across the provider ecosystem.
Overview
VirtRigaud follows semantic versioning (SemVer) principles and maintains backward compatibility through careful API design and migration strategies. The system has multiple versioning dimensions:
- VirtRigaud Core - The main platform (API server, manager, CRDs)
- Provider SDK - Go SDK for building providers
- Proto Contracts - gRPC/protobuf API definitions
- Individual Providers - Each provider has independent versioning
Semantic Versioning
All VirtRigaud components follow Semantic Versioning 2.0.0:
Version Format: MAJOR.MINOR.PATCH
- MAJOR (X.0.0): Breaking changes that require user action
- MINOR (0.X.0): New features that are backward compatible
- PATCH (0.0.X): Bug fixes and security updates
Examples
1.0.0 β 1.0.1 # Patch: Bug fixes only
1.0.1 β 1.1.0 # Minor: New features, backward compatible
1.1.0 β 2.0.0 # Major: Breaking changes
Component Versioning Strategy
VirtRigaud Core APIs
Kubernetes-style API versioning with multiple supported versions:
# Supported API versions
apiVersion: infra.virtrigaud.io/v1beta1 # Development/preview
apiVersion: infra.virtrigaud.io/v1beta1 # Pre-release/testing
apiVersion: infra.virtrigaud.io/v1 # Stable/production
Stability Levels:
- Alpha (v1beta1): Experimental, may change or be removed
- Beta (v1beta1): Well-tested, minimal changes expected
- Stable (v1): Production-ready, strong backward compatibility
Support Windows:
- Alpha: Best effort, no guarantees
- Beta: Supported for 2 minor releases after stable equivalent
- Stable: Supported for 12 months after deprecation
Provider SDK Versioning
SDK versions are independent of core VirtRigaud versions:
// Go module versioning
module github.com/projectbeskar/virtrigaud/sdk
// Version tags
sdk/v0.1.0 # Initial release
sdk/v0.2.0 # New features
sdk/v1.0.0 # First stable release
sdk/v2.0.0 # Breaking changes (new module path: sdk/v2)
SDK Compatibility Matrix:
| SDK Version | VirtRigaud Core | Go Version | Status |
|---|---|---|---|
| v0.1.x | 0.1.0 - 0.2.x | 1.23+ | Beta |
| v1.0.x | 0.2.0 - 1.0.x | 1.23+ | Stable |
| v1.1.x | 0.3.0 - 1.1.x | 1.23+ | Stable |
| v2.0.x | 1.0.0+ | 1.24+ | Future |
Proto Contract Versioning
Protobuf APIs use both module versions and service versions:
// Service versioning in proto files
package provider.v1;
service ProviderService {
// API methods
}
// Module versioning
module github.com/projectbeskar/virtrigaud/proto
Proto Evolution Rules:
- β Add new fields (with proper defaults)
- β Add new RPC methods
- β Add new enum values
- β Remove fields or methods
- β Change field types or semantics
- β Remove enum values
Provider Versioning
Each provider maintains independent versioning:
# Provider catalog entry
name: vsphere
tag: "1.2.3" # Provider version
sdk_version: "v1.0.0" # SDK dependency
proto_version: "v0.1.0" # Proto dependency
Breaking Change Policy
What Constitutes a Breaking Change
API Breaking Changes:
- Removing or renaming API fields
- Changing field types or semantics
- Removing API endpoints or methods
- Changing required vs optional fields
- Modifying default behaviors
- Changing error codes or messages that clients depend on
SDK Breaking Changes:
- Removing public functions, types, or methods
- Changing function signatures
- Modifying struct fields (without proper backward compatibility)
- Changing package import paths
- Removing or renaming configuration options
Proto Breaking Changes:
- Removing fields or RPC methods
- Changing field numbers or types
- Removing enum values
- Modifying service or method names
Breaking Change Process
1. Proposal Phase
# Breaking Change Proposal: [Title]
## Summary
Brief description of the change and motivation.
## Motivation
Why is this change necessary? What problems does it solve?
## Proposed Changes
Detailed description of the changes.
## Migration Path
How will users migrate from old to new behavior?
## Timeline
- Deprecation announcement: v1.1.0
- Breaking change implementation: v2.0.0
- Legacy support removal: v3.0.0
## Alternatives Considered
What other approaches were considered?
2. Deprecation Phase
// Deprecated functions include clear migration guidance
// Deprecated: Use NewCreateVMRequest instead. Will be removed in v2.0.0.
func CreateVM(name string) *VMRequest {
return &VMRequest{Name: name}
}
// New recommended approach
func NewCreateVMRequest(spec *VMSpec) *CreateVMRequest {
return &CreateVMRequest{Spec: spec}
}
3. Migration Tools
# Migration command examples
vrtg-provider migrate --from v1 --to v2
vrtg-provider check-compatibility --target-version v2.0.0
4. Communication
- Release notes with migration guide
- Blog posts for major changes
- Community discussions and Q&A
- Updated documentation
Compatibility Testing
Automated Compatibility Checks
# .github/workflows/compatibility.yml
name: Compatibility Check
jobs:
compatibility-matrix:
strategy:
matrix:
sdk_version: [v1.0.0, v1.1.0, current]
provider_version: [v1.0.0, v1.1.0, current]
steps:
- name: Test SDK ${{ matrix.sdk_version }} with Provider ${{ matrix.provider_version }}
run: |
# Build provider with specific SDK version
# Run conformance tests
# Report compatibility results
Buf Proto Compatibility
# proto/buf.yaml
version: v1
breaking:
use:
# Prevent breaking changes
- FILE_NO_DELETE
- FIELD_NO_DELETE
- FIELD_SAME_TYPE
- ENUM_VALUE_NO_DELETE
- RPC_NO_DELETE
- SERVICE_NO_DELETE
ignore:
# Allowed changes during alpha/beta
- "provider/v1beta1"
# Check for breaking changes
buf breaking --against 'https://github.com/projectbeskar/virtrigaud.git#branch=main'
Provider Compatibility Testing
# Test provider against multiple VirtRigaud versions
vcts run --provider ./provider --virtrigaud-version 0.1.0
vcts run --provider ./provider --virtrigaud-version 0.2.0
vcts run --provider ./provider --virtrigaud-version 1.0.0
Migration Strategies
API Version Migration
Example: VirtualMachine v1beta1 β v1beta1
// Conversion webhook approach
func (src *v1beta1.VirtualMachine) ConvertTo(dst *v1beta1.VirtualMachine) error {
// Convert common fields
dst.ObjectMeta = src.ObjectMeta
// Handle field migrations
if src.Spec.PowerState == "On" {
dst.Spec.PowerState = v1beta1.PowerStateOn
}
// Set new fields with appropriate defaults
if dst.Spec.Phase == "" {
dst.Spec.Phase = v1beta1.PhaseUnknown
}
return nil
}
Gradual Migration Process
# Phase 1: Dual support (both versions work)
kubectl apply -f vm-v1beta1.yaml # Still works
kubectl apply -f vm-v1beta1.yaml # Also works
# Phase 2: Deprecation warning
kubectl apply -f vm-v1beta1.yaml
# Warning: v1beta1 is deprecated, use v1beta1
# Phase 3: Conversion only (internal storage uses v1beta1)
kubectl apply -f vm-v1beta1.yaml # Automatically converted
# Phase 4: Removal (after support window)
kubectl apply -f vm-v1beta1.yaml # Error: version not supported
Provider SDK Migration
Example: SDK v1 β v2
SDK v1 (deprecated):
// Old SDK pattern
func NewProvider(config Config) *Provider {
return &Provider{config: config}
}
func (p *Provider) CreateVM(name string, cpu int, memory int) error {
// Implementation
}
SDK v2 (new):
// New SDK pattern with better types
func NewProvider(config *Config) (*Provider, error) {
if err := config.Validate(); err != nil {
return nil, err
}
return &Provider{config: config}, nil
}
func (p *Provider) CreateVM(ctx context.Context, req *CreateVMRequest) (*CreateVMResponse, error) {
// Implementation with proper context and structured types
}
Migration Bridge:
// sdk/v2/compat/v1.go - Compatibility layer
package compat
import (
v1 "github.com/projectbeskar/virtrigaud/sdk/provider"
v2 "github.com/projectbeskar/virtrigaud/sdk/v2/provider"
)
// Bridge for gradual migration
func AdaptV1Provider(v1Provider v1.Provider) v2.Provider {
return &v1ProviderAdapter{old: v1Provider}
}
type v1ProviderAdapter struct {
old v1.Provider
}
func (a *v1ProviderAdapter) CreateVM(ctx context.Context, req *v2.CreateVMRequest) (*v2.CreateVMResponse, error) {
// Convert v2 request to v1 format
err := a.old.CreateVM(req.Name, int(req.Spec.CPU), int(req.Spec.Memory))
// Convert v1 response to v2 format
if err != nil {
return nil, err
}
return &v2.CreateVMResponse{
Status: "Created",
}, nil
}
Configuration Migration
Example: Configuration Schema Changes
v1 Configuration:
# provider-config-v1.yaml
provider:
type: "vsphere"
server: "vcenter.example.com"
username: "admin"
password: "secret"
v2 Configuration:
# provider-config-v2.yaml
apiVersion: config.virtrigaud.io/v2
kind: ProviderConfig
metadata:
name: vsphere-config
spec:
type: "vsphere"
connection:
endpoint: "vcenter.example.com"
authentication:
method: "basic"
secretRef:
name: "vsphere-credentials"
features:
snapshots: true
cloning: true
Migration Command:
# Automatic migration tool
vrtg-provider config migrate \
--from provider-config-v1.yaml \
--to provider-config-v2.yaml \
--create-secret vsphere-credentials
Release Planning
Release Cadence
- Patch releases: As needed for critical bugs/security
- Minor releases: Every 2-3 months
- Major releases: Every 12-18 months
Feature Lifecycle
Experimental β Alpha β Beta β Stable β Deprecated β Removed
| | | | | |
| | | | | +-- After support window
| | | | +-- 2 releases notice
| | | +-- Production ready
| | +-- Pre-release testing
| +-- Public preview
+-- Internal/development only
Release Branch Strategy
main # Current development
βββ release-0.1 # Patch releases for v0.1.x
βββ release-0.2 # Patch releases for v0.2.x
βββ release-1.0 # Patch releases for v1.0.x
Support Matrix
| Version | Status | Support Level | End of Life |
|---|---|---|---|
| 1.0.x | Stable | Full support | 2026-01-01 |
| 0.2.x | Stable | Security only | 2025-06-01 |
| 0.1.x | Deprecated | None | 2025-01-01 |
Best Practices
For Provider Developers
-
Version Dependencies Carefully
// Use specific versions, not floating require github.com/projectbeskar/virtrigaud/sdk v1.2.3 -
Test Compatibility Early
# Test against multiple SDK versions go mod edit -require=github.com/projectbeskar/virtrigaud/sdk@v1.1.0 go test ./... go mod edit -require=github.com/projectbeskar/virtrigaud/sdk@v1.2.0 go test ./... -
Handle Deprecations Gracefully
// Check for deprecated features if provider.IsDeprecated("vm.legacy-create") { log.Warn("Using deprecated API, migrate to vm.create") } -
Document Breaking Changes
# CHANGELOG.md ## [2.0.0] - 2025-01-15 ### BREAKING CHANGES - Removed deprecated `CreateVM` method, use `CreateVMRequest` instead - Changed configuration format, see migration guide ### Migration Guide Old: `provider.CreateVM("vm1", 2, 4096)` New: `provider.CreateVM(ctx, &CreateVMRequest{...})`
For Users
-
Pin Versions in Production
# Helm values image: tag: "1.2.3" # Not "latest" -
Test Upgrades in Staging
# Upgrade strategy helm upgrade provider-test virtrigaud/provider \ --version 1.3.0 \ --namespace staging -
Monitor Deprecation Warnings
# Check for deprecation warnings kubectl logs -l app=provider | grep -i deprecat -
Plan Migration Windows
# Schedule upgrades during maintenance windows # Have rollback plans ready # Test compatibility thoroughly
Future Considerations
Long-term Compatibility
- 10-year Support Goal: Core APIs should remain usable for 10 years
- Gradual Evolution: Prefer gradual evolution over revolutionary changes
- Ecosystem Stability: Consider impact on the entire provider ecosystem
Emerging Standards
- OCI Compliance: Align with OCI runtime and image standards
- CNCF Integration: Follow CNCF project graduation requirements
- Industry Standards: Adopt relevant industry standards as they emerge
Technology Evolution
- Go Version Support: Support 2-3 latest Go versions
- Kubernetes Compatibility: Support 3-4 latest Kubernetes versions
- gRPC Evolution: Adapt to gRPC and protobuf improvements
This versioning strategy ensures VirtRigaud can evolve while maintaining stability and compatibility for the provider ecosystem.
Advanced VM Lifecycle Management
This document describes the advanced VM lifecycle features in VirtRigaud, including reconfiguration, snapshots, cloning, multi-VM sets, and placement policies.
Overview
VirtRigaud Stage E introduces comprehensive VM lifecycle management capabilities that go beyond basic create/delete operations:
- VM Reconfiguration: Modify CPU, memory, and disk resources of running VMs
- Snapshot Management: Create, delete, and revert VM snapshots
- VM Cloning: Create new VMs from existing ones with linked clone support
- Multi-VM Sets: Manage groups of VMs with rolling updates
- Placement Policies: Advanced placement rules and anti-affinity constraints
- Image Preparation: Automated image import and preparation workflows
VM Reconfiguration
Online vs Offline Reconfiguration
VirtRigaud supports both online (hot) and offline reconfiguration depending on provider capabilities:
vSphere: Supports online CPU/memory changes and hot disk expansion Libvirt: Typically requires power cycle for resource changes
Example: CPU/Memory Upgrade
# Original VM with 2 CPU, 4GB RAM
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: web-server
spec:
resources:
cpu: 2
memoryMiB: 4096
# Patch to upgrade resources
# kubectl patch vm web-server --type merge -p '{"spec":{"resources":{"cpu":4,"memoryMiB":8192}}}'
The controller will:
- Detect resource changes in VM spec
- Attempt online reconfiguration if supported
- If offline required, orchestrate graceful power cycle:
- Set condition
ReconfigurePendingPowerCycle=True - Power off VM gracefully
- Apply reconfiguration
- Power on VM
- Update
status.lastReconfigureTime
- Set condition
Disk Expansion
spec:
disks:
- name: data
sizeGiB: 100 # Expanded from 50GB
expandPolicy: "Online" # Try online first
Snapshot Management
Creating Snapshots
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMSnapshot
metadata:
name: pre-maintenance-backup
spec:
vmRef:
name: web-server
nameHint: "maintenance-backup"
memory: true # Include memory state
description: "Backup before maintenance"
retentionPolicy:
maxAge: "7d"
deleteOnVMDelete: true
Snapshot Lifecycle
- Creating: Snapshot creation in progress
- Ready: Snapshot available for use
- Deleting: Snapshot being removed
- Failed: Snapshot operation failed
Reverting to Snapshots
# Patch VM to revert to snapshot
spec:
snapshot:
revertToRef:
name: pre-maintenance-backup
The controller will:
- Power off VM if running
- Call providerβs SnapshotRevert RPC
- Power on VM
- Clear
revertToRefwhen complete
VM Cloning
Basic Cloning
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClone
metadata:
name: web-server-clone
spec:
sourceRef:
name: web-server
target:
name: web-server-test
classRef:
name: test-class
linked: true # Faster, space-efficient
powerOn: true
Clone Customization
spec:
customization:
hostname: web-server-test
networks:
- name: primary
ipAddress: "192.168.1.100"
gateway: "192.168.1.1"
dns: ["8.8.8.8"]
userData:
cloudInit:
inline: |
#cloud-config
runcmd:
- echo "Test environment" > /etc/motd
Multi-VM Sets (VMSet)
VMSets provide declarative management of multiple VMs with rolling updates.
Basic VMSet
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMSet
metadata:
name: web-tier
spec:
replicas: 3
selector:
matchLabels:
app: web-server
template:
metadata:
labels:
app: web-server
spec:
providerRef:
name: vsphere-prod
classRef:
name: web-class
imageRef:
name: nginx-image
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
maxSurge: 1
Rolling Updates
When you update the template spec, VMSet will:
- Create new VMs with updated configuration
- Wait for new VMs to be ready
- Delete old VMs respecting
maxUnavailable - Continue until all replicas are updated
Placement Policies
Advanced Placement Rules
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMPlacementPolicy
metadata:
name: production-policy
spec:
hard:
clusters: ["prod-cluster-1", "prod-cluster-2"]
datastores: ["ssd-datastore-1", "ssd-datastore-2"]
hosts: ["esxi-01", "esxi-02", "esxi-03"]
soft:
folders: ["/Production/WebServers"]
zones: ["zone-a", "zone-b"]
antiAffinity:
hostAntiAffinity: true # Spread across hosts
clusterAntiAffinity: false
datastoreAntiAffinity: true # Spread across datastores
Using Placement Policies
spec:
placementRef:
name: production-policy
The provider will attempt to satisfy:
- Hard constraints: Must be satisfied
- Soft constraints: Best effort
- Anti-affinity rules: Avoid co-location
Image Preparation
Automated Image Import
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMImage
metadata:
name: ubuntu-22-04
spec:
vsphere:
ovaURL: "https://releases.ubuntu.com/22.04/ubuntu-22.04-server.ova"
checksum: "sha256:abcd1234..."
libvirt:
url: "https://cloud-images.ubuntu.com/22.04/ubuntu-22.04-server.img"
format: "qcow2"
prepare:
onMissing: "Import" # Auto-import if missing
validateChecksum: true
timeout: "30m"
retries: 3
storage:
vsphere:
datastore: "images-datastore"
folder: "/Templates"
thinProvisioned: true
Image Preparation Phases
- Pending: Waiting to start preparation
- Importing: Downloading/importing image
- Preparing: Processing image (conversion, etc.)
- Ready: Image ready for use
- Failed: Preparation failed
Provider Capabilities
Different providers support different features. Query capabilities:
# Example capabilities response
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
status:
capabilities:
supportsReconfigureOnline: true # vSphere: true, Libvirt: false
supportsDiskExpansionOnline: true # vSphere: true, Libvirt: false
supportsSnapshots: true # Both: true
supportsMemorySnapshots: true # vSphere: true, Libvirt: varies
supportsLinkedClones: true # Both: true
supportsImageImport: true # Both: true
supportedDiskTypes: ["thin", "thick"]
supportedNetworkTypes: ["VMXNET3", "E1000"]
Observability
Metrics
New metrics for advanced lifecycle operations:
virtrigaud_vm_reconfigure_total{provider_type,outcome}
virtrigaud_vm_snapshot_total{action,provider_type,outcome}
virtrigaud_vm_clone_total{linked,provider_type,outcome}
virtrigaud_vm_image_prepare_total{provider_type,outcome}
Events
Detailed events for lifecycle operations:
Normal SnapshotCreating Started snapshot creation
Normal SnapshotReady Snapshot created successfully
Normal ReconfigureStarted Started VM reconfiguration
Warning ReconfigurePowerCycle Reconfiguration requires power cycle
Normal CloneCompleted VM clone created successfully
Conditions
Comprehensive condition reporting:
VM Conditions:
Ready: VM is ready for useProvisioning: VM is being createdReconfiguring: VM is being reconfiguredReconfigurePendingPowerCycle: Needs power cycle for changes
Snapshot Conditions:
Ready: Snapshot is readyCreating: Snapshot being createdDeleting: Snapshot being deleted
Clone Conditions:
Ready: Clone completed successfullyCloning: Clone operation in progressCustomizing: Applying customizations
Best Practices
Snapshot Management
- Retention Policies: Always set appropriate retention policies
- Memory Snapshots: Use sparingly due to storage overhead
- Cleanup: Implement automated cleanup for old snapshots
- Testing: Test snapshot revert procedures regularly
VM Reconfiguration
- Gradual Changes: Make incremental resource changes
- Monitoring: Monitor VM performance after changes
- Rollback Plan: Have snapshots before major changes
- Capacity Planning: Ensure host resources before scaling up
Placement Policies
- Start Simple: Begin with basic constraints
- Test Anti-Affinity: Verify rules work as expected
- Monitor Placement: Check actual VM placement matches policy
- Balance Performance: Donβt over-constrain placement
Multi-VM Operations
- Rolling Updates: Use appropriate
maxUnavailablesettings - Health Checks: Implement proper readiness checks
- Monitoring: Monitor rollout progress
- Rollback Strategy: Plan for rollback scenarios
Troubleshooting
Common Issues
Reconfiguration Fails:
- Check provider capabilities
- Verify resource availability on host
- Check for VM tools/agent issues
Snapshot Operations Fail:
- Verify storage backend supports snapshots
- Check available storage space
- Ensure VM is not in transitional state
Clone Customization Issues:
- Verify network configuration
- Check cloud-init/guest tools
- Validate IP address availability
Placement Policy Violations:
- Check resource availability in target locations
- Verify anti-affinity rules arenβt too restrictive
- Review cluster resource distribution
Debugging
# Check VM reconfiguration status
kubectl describe vm web-server
# Monitor snapshot progress
kubectl get vmsnapshots -w
# Check clone status
kubectl describe vmclone web-server-clone
# Review placement policy usage
kubectl describe vmplacementpolicy production-policy
# Check VMSet rollout
kubectl describe vmset web-tier
Migration from Basic VMs
Existing VMs can be enhanced with advanced features:
- Add Placement Policy: Update VM spec with
placementRef - Enable Reconfiguration: Add resource overrides
- Create Snapshots: Deploy VMSnapshot resources
- Scale with VMSets: Migrate to VMSet for multi-instance workloads
The controller maintains backward compatibility with existing VM definitions.
Nested Virtualization Support
This document describes how to enable and configure nested virtualization in VirtRigaud virtual machines across different hypervisor providers.
Overview
Nested virtualization allows virtual machines to run hypervisors and create their own virtual machines. This is useful for:
- Development and testing of virtualization software
- Running container orchestration platforms like Kubernetes
- Creating nested lab environments
- Educational purposes for learning virtualization concepts
VirtRigaud supports nested virtualization through the PerformanceProfile configuration in VMClass resources.
Prerequisites
vSphere Provider
- ESXi 6.0 or later
- VM hardware version 9 or later (recommended: version 14+)
- ESXi host must have VT-x/AMD-V enabled in BIOS
- Sufficient CPU and memory resources on the ESXi host
LibVirt Provider
- QEMU/KVM hypervisor
- Host CPU with VT-x (Intel) or AMD-V (AMD) support
- Nested virtualization enabled in host kernel modules
- libvirt 1.2.13 or later
Proxmox Provider
- Proxmox VE 6.0 or later
- Host CPU with nested virtualization support
- Nested virtualization enabled in Proxmox configuration
Enabling Nested Virtualization
Nested virtualization is configured at the VMClass level using the PerformanceProfile section:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: nested-vm-class
namespace: virtrigaud-system
spec:
cpu: 4
memory: 8Gi
firmware: UEFI # Recommended for modern features
# Enable nested virtualization
performanceProfile:
nestedVirtualization: true
# Optional: Enable additional features
virtualizationBasedSecurity: true
cpuHotAddEnabled: true
memoryHotAddEnabled: true
# Optional: Security features that work well with nested virtualization
securityProfile:
secureBoot: false # May interfere with some nested hypervisors
tpmEnabled: false # Optional, depending on nested OS requirements
vtdEnabled: true # Enable VT-d/AMD-Vi for better performance
diskDefaults:
type: thin
size: 100Gi # Larger disk for nested VMs
Complete Example
Hereβs a complete example showing how to create a VM with nested virtualization support:
---
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: hypervisor-class
namespace: default
spec:
cpu: 8
memory: 16Gi
firmware: UEFI
performanceProfile:
nestedVirtualization: true
virtualizationBasedSecurity: false # May conflict with nested hypervisors
cpuHotAddEnabled: true
memoryHotAddEnabled: true
latencySensitivity: low # Better performance for nested VMs
hyperThreadingPolicy: prefer
securityProfile:
secureBoot: false # Disable for compatibility
tpmEnabled: false
vtdEnabled: true # Enable for better I/O performance
resourceLimits:
cpuReservation: 4000 # Reserve 4GHz for nested VMs
memoryReservation: 8Gi
diskDefaults:
type: thin
size: 200Gi
storageClass: fast-ssd
---
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMImage
metadata:
name: ubuntu-server-22-04
namespace: default
spec:
source:
libvirt:
url: "https://cloud-images.ubuntu.com/releases/22.04/release/ubuntu-22.04-server-cloudimg-amd64.img"
checksum: "sha256:de5e632e17b8965f2baf4ea6d2b824788e154d9a65df4fd419ec4019898e15cd"
---
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: nested-hypervisor
namespace: default
spec:
providerRef:
name: my-provider
classRef:
name: hypervisor-class
imageRef:
name: ubuntu-server-22-04
userData:
cloudInit:
inline: |
#cloud-config
hostname: nested-hypervisor
users:
- name: ubuntu
sudo: ALL=(ALL) NOPASSWD:ALL
ssh_authorized_keys:
- ssh-rsa AAAAB3NzaC1yc2E... # Your SSH key
packages:
- qemu-kvm
- libvirt-daemon-system
- libvirt-clients
- bridge-utils
- virt-manager
runcmd:
# Enable nested virtualization verification
- echo "Checking nested virtualization support..."
- cat /proc/cpuinfo | grep -E "(vmx|svm)"
- ls -la /dev/kvm
# Configure libvirt
- systemctl enable libvirtd
- systemctl start libvirtd
- usermod -aG libvirt ubuntu
# Verify nested KVM support
- modprobe kvm_intel nested=1 || modprobe kvm_amd nested=1
- echo "Nested virtualization setup complete"
powerState: On
Provider-Specific Configuration
vSphere Provider
For vSphere, nested virtualization is enabled using the following VM configuration:
vhv.enable = TRUE- Enables hardware-assisted virtualizationvhv.allowNestedPageTables = TRUE- Improves nested VM performance- Hardware version 14+ recommended for best compatibility
Additional considerations:
- Use UEFI firmware for modern guest operating systems
- Ensure sufficient CPU and memory allocation
- Consider enabling VT-d for better I/O performance
LibVirt Provider
For LibVirt/KVM, nested virtualization requires:
- Host kernel modules:
kvm_intel nested=1orkvm_amd nested=1 - CPU features:
vmx(Intel) orsvm(AMD) passed through to guest - QEMU machine type:
q35recommended for modern features
The LibVirt provider automatically configures:
<cpu mode='host-model' check='partial'>
<feature policy='require' name='vmx'/> <!-- Intel -->
<feature policy='require' name='svm'/> <!-- AMD -->
</cpu>
Proxmox Provider
For Proxmox VE, nested virtualization is configured through:
- CPU type:
hostorkvm64with nested features - Enable nested virtualization in VM CPU configuration
- Ensure host has nested virtualization enabled
Verification
After creating a VM with nested virtualization enabled, verify the setup:
On Linux Guests
# Check for virtualization extensions
grep -E "(vmx|svm)" /proc/cpuinfo
# Verify KVM device availability
ls -la /dev/kvm
# Check nested virtualization status
cat /sys/module/kvm_intel/parameters/nested # Intel
cat /sys/module/kvm_amd/parameters/nested # AMD
# Test with a simple nested VM
virt-host-validate
On Windows Guests
# Check Hyper-V compatibility
systeminfo | findstr /i hyper
# Verify virtualization extensions
Get-ComputerInfo | Select-Object HyperV*
Performance Considerations
CPU Allocation
- Allocate sufficient CPU cores (minimum 4, recommended 8+)
- Consider CPU reservation for consistent performance
- Enable CPU hot-add for flexibility
Memory Configuration
- Allocate generous memory (minimum 8GB, recommended 16GB+)
- Consider memory reservation for nested VMs
- Enable memory hot-add for dynamic scaling
Storage
- Use fast storage (SSD/NVMe) for better nested VM performance
- Allocate sufficient disk space for multiple nested VMs
- Consider thin provisioning for efficient space usage
Network
- Configure appropriate network topology
- Consider SR-IOV for high-performance networking
- Plan IP address allocation for nested environments
Troubleshooting
Common Issues
-
Nested virtualization not working
- Verify host CPU supports VT-x/AMD-V
- Check host BIOS settings
- Ensure hypervisor nested virtualization is enabled
-
Poor performance in nested VMs
- Increase CPU and memory allocation
- Enable CPU/memory reservations
- Use faster storage
- Verify nested page tables are enabled
-
Guest OS doesnβt detect virtualization extensions
- Check VM hardware version (vSphere)
- Verify CPU feature passthrough (LibVirt)
- Ensure proper CPU type configuration (Proxmox)
Debugging Commands
# Check virtualization support on host
lscpu | grep Virtualization
# Verify KVM nested support
cat /sys/module/kvm_*/parameters/nested
# Check VM CPU features (inside guest)
lscpu | grep -E "(vmx|svm|Virtualization)"
# Test nested VM creation
virt-install --name test-nested --memory 1024 --vcpus 1 --disk size=10 --cdrom /path/to/iso
Security Considerations
Isolation
- Nested VMs add additional attack surface
- Consider network isolation for nested environments
- Implement proper access controls
Resource Limits
- Set appropriate resource limits to prevent resource exhaustion
- Monitor nested VM resource usage
- Implement quotas for nested environments
Updates and Patches
- Keep host hypervisor updated
- Maintain guest hypervisor software
- Apply security patches to nested VMs
Best Practices
-
Planning
- Design nested architecture carefully
- Plan resource allocation in advance
- Consider network topology requirements
-
Configuration
- Use UEFI firmware for modern features
- Enable VT-d/AMD-Vi for better performance
- Configure appropriate CPU and memory reservations
-
Monitoring
- Monitor resource usage at all levels
- Set up alerting for resource exhaustion
- Track performance metrics
-
Maintenance
- Regular backup of nested environments
- Plan for hypervisor updates
- Test disaster recovery procedures
Limitations
vSphere Provider
- Requires ESXi 6.0+ and hardware version 9+
- Performance overhead of 10-20% typical
- Some advanced features may not be available in nested VMs
LibVirt Provider
- Requires host kernel support
- Performance depends on host CPU features
- Limited to x86_64 architecture
Proxmox Provider
- Requires Proxmox VE 6.0+
- Performance overhead varies by workload
- Some clustering features may not work in nested environments
Support Matrix
| Provider | Min Version | Nested Support | Performance | Security Features |
|---|---|---|---|---|
| vSphere | ESXi 6.0 | Full | Good | TPM, Secure Boot |
| LibVirt | 1.2.13 | Full | Good | TPM, Secure Boot |
| Proxmox | PVE 6.0 | Planned | Good | Limited |
For more information, see the provider-specific documentation in the docs/providers/ directory.
Graceful Shutdown Feature
The virtrigaud VM management platform now supports graceful shutdown of virtual machines to prevent data corruption and ensure proper cleanup of running processes.
Overview
Graceful shutdown uses VM guest tools (VMware Tools, QEMU Guest Agent, etc.) to properly shut down the operating system before powering off the virtual machine. This prevents data corruption and allows applications to save their state properly.
Power States
virtrigaud supports three power states:
On: Power on the VMOff: Hard power off (immediate shutdown without guest OS notification)OffGraceful: Graceful shutdown using guest tools with automatic fallback to hard power off
Configuration
Basic Usage
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: my-vm
spec:
powerState: OffGraceful # Use graceful shutdown
# ... other configuration
Advanced Configuration with Lifecycle Hooks
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: my-vm
spec:
powerState: OffGraceful
lifecycle:
# Timeout for graceful shutdown (default: 60s)
gracefulShutdownTimeout: "120s"
# Pre-stop hook runs before shutdown
preStop:
exec:
command:
- "/bin/bash"
- "-c"
- |
# Save application state
systemctl stop my-application
# Sync filesystem
sync
How It Works
vSphere Provider
- Guest Tools Check: Verifies VMware Tools is installed and running
- Graceful Shutdown: Calls
vm.ShutdownGuest()to initiate OS shutdown - Monitoring: Polls VM power state every 2 seconds
- Timeout Handling: Falls back to hard power off if timeout is reached
- Fallback: Uses
vm.PowerOff()if graceful shutdown fails
Libvirt Provider
- Graceful Attempt: Uses
virsh shutdowncommand - Fallback: Falls back to
virsh destroyif shutdown fails - Guest Agent: Requires QEMU Guest Agent for best results
Proxmox Provider
- API Call: Uses Proxmox
shutdownAPI endpoint - Built-in Timeout: Proxmox handles timeout and fallback internally
Default Timeouts
- vSphere: 60 seconds (configurable via gRPC request)
- Libvirt: Immediate fallback if
virsh shutdownfails - Proxmox: Managed by Proxmox server configuration
Requirements
VMware vSphere
- VMware Tools must be installed and running in the guest OS
- Guest OS must support ACPI shutdown signals
Libvirt/KVM
- QEMU Guest Agent recommended for reliable graceful shutdown
- Guest OS must support ACPI shutdown signals
Proxmox
- QEMU Guest Agent recommended
- Guest OS must support ACPI shutdown signals
Best Practices
- Always Install Guest Tools: Ensure VMware Tools or QEMU Guest Agent is installed
- Test Graceful Shutdown: Verify your VMs respond properly to shutdown signals
- Set Appropriate Timeouts: Allow enough time for applications to shut down gracefully
- Use Lifecycle Hooks: Implement pre-stop hooks for critical applications
- Monitor Logs: Check provider logs to verify graceful shutdown is working
Troubleshooting
Graceful Shutdown Not Working
-
Check Guest Tools Status:
# For VMware vmware-toolbox-cmd stat running # For QEMU/KVM systemctl status qemu-guest-agent -
Verify ACPI Support:
# Check if ACPI shutdown is supported cat /proc/acpi/button/power/*/info -
Test Manual Shutdown:
# Test graceful shutdown manually sudo shutdown -h now
Timeout Issues
If VMs consistently hit the graceful shutdown timeout:
- Increase Timeout: Set a longer
gracefulShutdownTimeout - Optimize Applications: Ensure applications shut down quickly
- Check System Resources: Verify the system isnβt under heavy load
Fallback to Hard Power Off
The provider will automatically fall back to hard power off if:
- Guest tools are not available
- Graceful shutdown times out
- Guest tools command fails
This ensures VMs are always powered off even if graceful shutdown isnβt possible.
Examples
See examples/graceful-shutdown-vm.yaml for complete examples of using graceful shutdown with various configurations.
Provider Architecture
This document describes the provider architecture in VirtRigaud.
Overview
VirtRigaud uses a Remote Provider architecture where providers run as independent pods, communicating with the manager controller via gRPC. This design provides scalability, security, and reliability benefits.
Architecture
βββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββ
β VirtualMachine β β Provider β β Provider Runtimeβ
β CRD β β CRD β β Deployment β
βββββββββββββββββββ βββββββββββββββββββββ βββββββββββββββββββ
β β β
β β β
v v β
βββββββββββββββββββ βββββββββββββββββββββ β
β Manager β β Provider β β
β Controller β β Controller β β
β β β β β
β βββββββββββββββ€ β - Creates Deploy β β
β β VM Reconcileβ β - Creates Service β β
β β β β - Updates Status β β
β βββββββββββββββ€ β β β
β β βββββββββββββββββββββ β
β βββββββββββββββ€ β
β β gRPC Client βββββββββββββββββββββββββββββββββββββββββ
β β β gRPC Connection
β βββββββββββββββ€ Port 9090
βββββββββββββββββββ
Provider Components
1. Provider Runtime Deployments
Each Provider resource automatically creates:
- Deployment: Runs provider-specific containers
- Service: ClusterIP service for gRPC communication
- ConfigMaps: Provider configuration
- Secret mounts: Credentials for hypervisor access
Configuration Flow: Provider Resource β Provider Pod
The VirtRigaud Provider Controller automatically translates your Provider resource configuration into the appropriate command-line arguments and environment variables for the provider pod.
Command-Line Arguments
The controller generates these arguments from your Provider spec:
| Provider Field | Generated Argument | Example |
|---|---|---|
spec.type | --provider-type | --provider-type=vsphere |
spec.endpoint | --provider-endpoint | --provider-endpoint=https://vcenter.example.com |
spec.runtime.service.port | --grpc-addr | --grpc-addr=:9090 |
| (hardcoded) | --metrics-addr | --metrics-addr=:8080 |
| (optional) | --tls-enabled | --tls-enabled=false |
Environment Variables
The controller also sets these environment variables:
| Provider Field | Environment Variable | Example |
|---|---|---|
spec.type | PROVIDER_TYPE | vsphere |
spec.endpoint | PROVIDER_ENDPOINT | https://vcenter.example.com |
metadata.namespace | PROVIDER_NAMESPACE | default |
metadata.name | PROVIDER_NAME | vsphere-datacenter |
| (optional) | TLS_ENABLED | false |
Secret Volume Mounts
Credentials from spec.credentialSecretRef are automatically mounted at:
- Mount Path:
/etc/virtrigaud/credentials/ - Files Created: Each secret key becomes a file
usernameβ/etc/virtrigaud/credentials/usernamepasswordβ/etc/virtrigaud/credentials/passwordtokenβ/etc/virtrigaud/credentials/token
Complete Example
When you create this Provider resource:
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: vsphere-datacenter
namespace: default
spec:
type: vsphere
endpoint: "https://vcenter.example.com:443"
credentialSecretRef:
name: vsphere-credentials
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.2.0"
service:
port: 9090
The controller automatically creates a deployment with:
Command-line arguments:
/provider-vsphere \
--grpc-addr=:9090 \
--metrics-addr=:8080 \
--provider-type=vsphere \
--provider-endpoint=https://vcenter.example.com:443 \
--tls-enabled=false
Environment variables:
PROVIDER_TYPE=vsphere
PROVIDER_ENDPOINT=https://vcenter.example.com:443
PROVIDER_NAMESPACE=default
PROVIDER_NAME=vsphere-datacenter
TLS_ENABLED=false
Volume mounts:
/etc/virtrigaud/credentials/username # Contains: admin@vsphere.local
/etc/virtrigaud/credentials/password # Contains: your-password
β Key Point: You Donβt Configure This Manually
The beauty of VirtRigaudβs Remote Provider architecture is that you never need to manually configure command-line arguments or environment variables. Simply create the Provider resource, and the controller handles all the deployment details automatically!
2. Provider Images
Specialized images for each provider type:
- ghcr.io/projectbeskar/virtrigaud/provider-vsphere: vSphere provider with govmomi
- ghcr.io/projectbeskar/virtrigaud/provider-libvirt: LibVirt provider via virsh commands
- ghcr.io/projectbeskar/virtrigaud/provider-proxmox: Proxmox VE provider
- ghcr.io/projectbeskar/virtrigaud/provider-mock: Mock provider for testing
3. gRPC Communication
- Protocol: gRPC with protocol buffers
- Security: Secure communication over TLS (optional)
- Health: Built-in health checks and graceful shutdown
- Metrics: Prometheus metrics on port 8080
Provider Configuration
Basic Provider Setup
apiVersion: v1
kind: Secret
metadata:
name: vsphere-credentials
namespace: default
type: Opaque
stringData:
username: "admin@vsphere.local"
password: "your-password"
---
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: vsphere-datacenter
namespace: default
spec:
type: vsphere
endpoint: "https://vcenter.example.com:443"
credentialSecretRef:
name: vsphere-credentials
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.2.0"
service:
port: 9090
Advanced Configuration
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: libvirt-cluster
namespace: production
spec:
type: libvirt
endpoint: "qemu+ssh://admin@kvm.example.com/system"
credentialSecretRef:
name: libvirt-credentials
defaults:
cluster: production
rateLimit:
qps: 20
burst: 50
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-libvirt:v0.2.0"
replicas: 3
service:
port: 9090
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "2"
memory: "2Gi"
# High availability setup
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: libvirt-cluster
topologyKey: kubernetes.io/hostname
# Node placement
nodeSelector:
workload-type: compute
tolerations:
- key: "compute-dedicated"
operator: "Equal"
value: "true"
effect: "NoSchedule"
# Environment variables
env:
- name: LIBVIRT_DEBUG
value: "1"
- name: PROVIDER_TIMEOUT
value: "300s"
Security Model
Pod Security
- Non-root execution: All containers run as non-root users
- Read-only filesystem: Immutable container filesystem
- Minimal capabilities: Reduced Linux capabilities
- Security contexts: Enforced via deployment templates
Credential Isolation
- Separated secrets: Each provider has dedicated credential secrets
- Scoped access: Providers only access their own hypervisor credentials
- RBAC isolation: Fine-grained RBAC per provider namespace
Network Security
- Service mesh ready: Compatible with Istio/Linkerd
- Network policies: Optional traffic restrictions
- TLS support: Secure gRPC communication (configurable)
Communication Protocol
gRPC Service Definition
service Provider {
rpc Validate(ValidateRequest) returns (ValidateResponse);
rpc Create(CreateRequest) returns (CreateResponse);
rpc Delete(DeleteRequest) returns (TaskResponse);
rpc Power(PowerRequest) returns (TaskResponse);
rpc Reconfigure(ReconfigureRequest) returns (TaskResponse);
rpc Describe(DescribeRequest) returns (DescribeResponse);
rpc TaskStatus(TaskStatusRequest) returns (TaskStatusResponse);
rpc ListCapabilities(CapabilitiesRequest) returns (CapabilitiesResponse);
}
Error Handling
- Retry logic: Exponential backoff for transient failures
- Circuit breakers: Prevent cascade failures
- Timeout controls: Configurable per-operation timeouts
- Status reporting: Conditions reflected in Kubernetes status
Observability
Metrics
Provider pods expose Prometheus metrics on port 8080:
# Request metrics
provider_grpc_requests_total{method="Create",status="success"} 42
provider_grpc_request_duration_seconds{method="Create",quantile="0.95"} 2.5
# VM metrics
provider_vms_total{state="running"} 15
provider_vms_total{state="stopped"} 3
# Health metrics
provider_health_status{provider="vsphere-datacenter"} 1
provider_hypervisor_connection_status{endpoint="vcenter.example.com"} 1
Logging
- Structured logs: JSON format with correlation IDs
- Log levels: Configurable verbosity (debug, info, warn, error)
- Request tracing: Context propagation across gRPC calls
Health Checks
- Kubernetes probes: Liveness and readiness probes
- gRPC health protocol: Standard health check implementation
- Hypervisor connectivity: Validates connection to external systems
Deployment Patterns
Single Provider Setup
# Simple development setup
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: dev-vsphere
spec:
type: vsphere
endpoint: "https://vcenter-dev.example.com:443"
credentialSecretRef:
name: dev-credentials
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.2.0"
High Availability Setup
# Production HA setup
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: prod-vsphere
spec:
type: vsphere
endpoint: "https://vcenter-prod.example.com:443"
credentialSecretRef:
name: prod-credentials
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.2.0"
replicas: 3
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/instance: prod-vsphere
topologyKey: kubernetes.io/hostname
Multi-Environment Setup
# Development environment
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: dev-libvirt
namespace: development
spec:
type: libvirt
endpoint: "qemu+ssh://dev@libvirt-dev.example.com/system"
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-libvirt:v0.2.0"
resources:
requests:
cpu: "100m"
memory: "128Mi"
---
# Production environment
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: prod-libvirt
namespace: production
spec:
type: libvirt
endpoint: "qemu+ssh://prod@libvirt-prod.example.com/system"
runtime:
mode: Remote
image: "ghcr.io/projectbeskar/virtrigaud/provider-libvirt:v0.2.0"
replicas: 2
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
Benefits
Scalability
- Horizontal scaling: Multiple provider replicas per hypervisor
- Resource isolation: Independent resource allocation per provider
- Load distribution: gRPC load balancing across provider instances
Security
- Credential isolation: Hypervisor credentials isolated to provider pods
- Network segmentation: Providers can run in separate namespaces
- Least privilege: Manager runs without direct hypervisor access
Reliability
- Fault isolation: Provider failures donβt affect the manager
- Independent updates: Provider images updated separately
- Circuit breaking: Automatic failure detection and recovery
Operational Excellence
- Rolling updates: Zero-downtime provider updates
- Health monitoring: Built-in health checks and metrics
- Debugging: Isolated provider logs and observability
Troubleshooting
Common Issues
-
Image Pull Failures
# Check image availability docker pull ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.2.0 # Verify imagePullSecrets if using private registry kubectl get secret regcred -o yaml -
Network Connectivity
# Test provider service kubectl get svc virtrigaud-provider-* # Check provider pod logs kubectl logs -l app.kubernetes.io/name=virtrigaud-provider -
Credential Issues
# Verify secret exists and is mounted kubectl get secret vsphere-credentials kubectl describe pod virtrigaud-provider-*
Debugging Commands
# Check provider status
kubectl describe provider vsphere-datacenter
# Check provider deployment
kubectl get deployment -l app.kubernetes.io/instance=vsphere-datacenter
# Check provider pods
kubectl get pods -l app.kubernetes.io/instance=vsphere-datacenter
# View provider logs
kubectl logs -l app.kubernetes.io/instance=vsphere-datacenter -f
# Check provider metrics
kubectl port-forward svc/virtrigaud-provider-vsphere-datacenter 8080:8080
curl http://localhost:8080/metrics
Performance Tuning
# Optimize for high-volume workloads
spec:
rateLimit:
qps: 100 # Increase API rate limit
burst: 200 # Allow burst capacity
runtime:
replicas: 5 # Scale out for throughput
resources:
requests:
cpu: "1" # Guarantee CPU resources
memory: "1Gi"
limits:
cpu: "4" # Allow burst CPU
memory: "4Gi"
Best Practices
Resource Management
- Right-sizing: Start with small requests, monitor and adjust
- Limits: Always set memory limits to prevent OOM kills
- QoS: Use Guaranteed QoS for production workloads
Security
- Secrets rotation: Implement regular credential rotation
- Network policies: Restrict provider-to-hypervisor traffic
- RBAC: Use dedicated service accounts per provider
Monitoring
- Alerting: Set up alerts on provider health metrics
- Dashboards: Create Grafana dashboards for provider metrics
- Log aggregation: Centralize logs for debugging and auditing
Migration and Upgrades
Provider Image Updates
# Update provider image
kubectl patch provider vsphere-datacenter -p '
{
"spec": {
"runtime": {
"image": "ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.2.0"
}
}
}'
# Monitor rollout
kubectl rollout status deployment virtrigaud-provider-vsphere-datacenter
Configuration Changes
# Update provider configuration
kubectl edit provider vsphere-datacenter
# Verify changes applied
kubectl describe provider vsphere-datacenter
VirtRigaud Observability Guide
This document describes the comprehensive observability features of VirtRigaud, including structured logging, metrics, tracing, and monitoring.
Overview
VirtRigaud provides production-grade observability through:
- Structured JSON Logging with correlation IDs and automatic secret redaction
- Comprehensive Prometheus Metrics for all components and operations
- OpenTelemetry Tracing with gRPC instrumentation
- Health Endpoints for liveness and readiness probes
- Grafana Dashboards for visualization
- Prometheus Alerts for proactive monitoring
Logging
Configuration
Configure logging via environment variables:
LOG_LEVEL=info # debug, info, warn, error
LOG_FORMAT=json # json or console
LOG_SAMPLING=true # Enable log sampling
LOG_DEVELOPMENT=false # Development mode
Correlation IDs
All log entries include correlation fields:
{
"level": "info",
"ts": "2025-01-27T10:30:45.123Z",
"msg": "VM operation started",
"correlationID": "req-12345",
"vm": "default/web-server-1",
"provider": "default/vsphere-prod",
"providerType": "vsphere",
"taskRef": "task-67890",
"reconcile": "uuid-abcdef"
}
Secret Redaction
Sensitive information is automatically redacted:
{
"msg": "Connecting to provider",
"endpoint": "vcenter://user:[REDACTED]@vc.example.com/Datacenter",
"userData": "[REDACTED]"
}
Metrics Catalog
Manager Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
virtrigaud_manager_reconcile_total | Counter | Total reconcile operations | kind, outcome |
virtrigaud_manager_reconcile_duration_seconds | Histogram | Reconcile duration | kind |
virtrigaud_queue_depth | Gauge | Work queue depth | kind |
Provider Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
virtrigaud_provider_rpc_requests_total | Counter | RPC requests | provider_type, method, code |
virtrigaud_provider_rpc_latency_seconds | Histogram | RPC latency | provider_type, method |
virtrigaud_provider_tasks_inflight | Gauge | Inflight tasks | provider_type, provider |
VM Operation Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
virtrigaud_vm_operations_total | Counter | VM operations | operation, provider_type, provider, outcome |
virtrigaud_ip_discovery_duration_seconds | Histogram | IP discovery time | provider_type |
Circuit Breaker Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
virtrigaud_circuit_breaker_state | Gauge | CB state (0=closed, 1=half-open, 2=open) | provider_type, provider |
virtrigaud_circuit_breaker_failures_total | Counter | CB failures | provider_type, provider |
Error Metrics
| Metric | Type | Description | Labels |
|---|---|---|---|
virtrigaud_errors_total | Counter | Errors by reason | reason, component |
Tracing
Configuration
Enable OpenTelemetry tracing:
VIRTRIGAUD_TRACING_ENABLED=true
VIRTRIGAUD_TRACING_ENDPOINT=http://jaeger:14268/api/traces
VIRTRIGAUD_TRACING_SAMPLING_RATIO=0.1
VIRTRIGAUD_TRACING_INSECURE=true
Span Structure
Key spans include:
vm.reconcile- Full VM reconciliationvm.create- VM creation operationprovider.validate- Provider validationrpc.Create- gRPC calls to providers
Trace Attributes
Standard attributes:
vm.namespace = "default"
vm.name = "web-server-1"
provider.type = "vsphere"
operation = "Create"
task.ref = "task-12345"
Health Endpoints
HTTP Endpoints
All components expose health endpoints on port 8080:
GET /healthz- Liveness probe (always returns 200)GET /readyz- Readiness probe (checks dependencies)GET /health- Detailed health status (JSON)
gRPC Health
Providers implement grpc.health.v1.Health service for health checks.
Grafana Dashboards
Manager Dashboard
- Reconcile rates and duration
- Queue depth monitoring
- Error rate tracking
- Resource usage (CPU/memory)
Provider Dashboard
- RPC latency and error rates
- Task monitoring
- Circuit breaker status
- Provider-specific metrics
VM Lifecycle Dashboard
- Creation success rates
- IP discovery times
- Failure analysis
- Provider comparison
Prometheus Alerts
Critical Alerts
VirtrigaudProviderDown- Provider unavailableVirtrigaudManagerDown- Manager unavailable
Warning Alerts
VirtrigaudProviderErrorRateHigh- High error rate (>50%)VirtrigaudReconcileStuck- Slow reconciles (>5min)VirtrigaudQueueBackedUp- Queue depth >100VirtrigaudCircuitBreakerOpen- CB protection active
Configuration Reference
Complete Environment Variables
# Logging
LOG_LEVEL=info
LOG_FORMAT=json
LOG_SAMPLING=true
LOG_DEVELOPMENT=false
# Tracing
VIRTRIGAUD_TRACING_ENABLED=false
VIRTRIGAUD_TRACING_ENDPOINT=""
VIRTRIGAUD_TRACING_SAMPLING_RATIO=0.1
VIRTRIGAUD_TRACING_INSECURE=true
# RPC Timeouts
RPC_TIMEOUT_DESCRIBE=30s
RPC_TIMEOUT_MUTATING=4m
RPC_TIMEOUT_VALIDATE=10s
RPC_TIMEOUT_TASK_STATUS=10s
# Retry Configuration
RETRY_MAX_ATTEMPTS=5
RETRY_BASE_DELAY=500ms
RETRY_MAX_DELAY=30s
RETRY_MULTIPLIER=2.0
RETRY_JITTER=true
# Circuit Breaker
CB_FAILURE_THRESHOLD=10
CB_RESET_SECONDS=60s
CB_HALF_OPEN_MAX_CALLS=3
# Rate Limiting
RATE_LIMIT_QPS=10
RATE_LIMIT_BURST=20
# Workers
WORKERS_PER_KIND=2
MAX_INFLIGHT_TASKS=100
# Feature Gates
FEATURE_GATES=""
# Performance
VIRTRIGAUD_PPROF_ENABLED=false
VIRTRIGAUD_PPROF_ADDR=:6060
Deployment
ServiceMonitor
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: virtrigaud-manager
spec:
selector:
matchLabels:
app.kubernetes.io/name: virtrigaud
endpoints:
- port: metrics
interval: 30s
PrometheusRule
Deploy alerts:
kubectl apply -f deploy/observability/prometheus/alerts.yaml
Grafana Dashboards
Import dashboards from deploy/observability/grafana/
Troubleshooting
High Error Rates
- Check provider health:
kubectl get providers - Review error metrics:
virtrigaud_errors_total - Check circuit breaker state
- Review provider logs
Slow Operations
- Check RPC latency metrics
- Review reconcile duration
- Check resource constraints
- Monitor task queue depth
Memory Issues
- Monitor
process_resident_memory_bytes - Check for goroutine leaks:
go_goroutines - Review heap usage:
go_memstats_heap_inuse_bytes
Security Policy
Supported Versions
We actively support the following versions of VirtRigaud with security updates:
| Version | Supported |
|---|---|
| 0.1.x | :white_check_mark: |
| < 0.1 | :x: |
Reporting a Vulnerability
The VirtRigaud team takes security vulnerabilities seriously. We appreciate your efforts to responsibly disclose your findings, and will make every effort to acknowledge your contributions.
How to Report
Please do not report security vulnerabilities through public GitHub issues.
Instead, please send an email to security@virtrigaud.io with the following information:
- A description of the vulnerability
- Steps to reproduce the issue
- Potential impact
- Any possible mitigations youβve identified
You should receive a response within 48 hours. If for some reason you do not, please follow up via email to ensure we received your original message.
What to Expect
- Acknowledgment: We will acknowledge receipt of your vulnerability report within 48 hours.
- Assessment: We will assess the vulnerability and determine its severity within 5 business days.
- Mitigation: For confirmed vulnerabilities, we will work on a fix and coordinate disclosure timeline with you.
- Recognition: We will credit you in our security advisory and release notes (unless you prefer to remain anonymous).
Disclosure Policy
- We ask that you do not publicly disclose the vulnerability until we have had a chance to address it.
- We will coordinate with you on an appropriate disclosure timeline.
- We typically aim to disclose within 90 days of initial report.
Security Considerations
General Security
- VirtRigaud runs with minimal privileges and follows security best practices
- All communications with providers use TLS encryption
- Sensitive data (credentials, user data) is properly handled and never logged
- RBAC is enforced to limit access to resources
Supply Chain Security
- All container images are signed with Cosign
- Software Bill of Materials (SBOM) is provided for all releases
- Container images are scanned for vulnerabilities
- Dependencies are regularly updated
Network Security
- Network policies are provided to restrict traffic
- mTLS is supported for provider communications
- No unnecessary ports are exposed
Access Control
- RBAC roles follow principle of least privilege
- Service accounts are properly scoped
- Admission webhooks enforce security policies
Vulnerability Management
Scanning
We regularly scan our codebase and dependencies for known vulnerabilities using:
- GitHub Security Advisories
- Trivy for container scanning
- Go vulnerability database
- OWASP dependency checking
Response Process
- Detection: Vulnerability discovered through scanning or reporting
- Assessment: Determine severity and impact
- Patching: Develop and test fix
- Release: Create security release with patch
- Notification: Inform users through security advisory
Severity Classification
We use the following severity levels:
- Critical: Immediate action required, patch within 24 hours
- High: Patch within 7 days
- Medium: Patch within 30 days
- Low: Patch in next regular release
Security Features
Authentication and Authorization
- Integration with Kubernetes RBAC
- Support for external identity providers
- Service account token projection
- Webhook authentication
Encryption
- TLS 1.2+ for all communications
- Certificate rotation and management
- Support for custom CA certificates
- Secrets encryption at rest (Kubernetes level)
Audit and Monitoring
- Comprehensive audit logging
- Security event monitoring
- Metrics for security-relevant events
- Integration with security monitoring tools
Best Practices for Users
Deployment Security
- Use namespace isolation: Deploy in dedicated namespace
- Apply network policies: Restrict network access
- Enable Pod Security Standards: Use strict or baseline profiles
- Regular updates: Keep VirtRigaud and dependencies updated
- Monitor security advisories: Subscribe to security notifications
Credential Management
- Use external secret management: HashiCorp Vault, External Secrets Operator
- Rotate credentials regularly: Implement credential rotation
- Principle of least privilege: Grant minimal required permissions
- Secure storage: Never store credentials in Git or plain text
Network Security
- Enable TLS: Use TLS for all communications
- Network segmentation: Isolate provider networks
- Firewall rules: Restrict hypervisor access
- VPN access: Use VPN for remote hypervisor access
Monitoring and Alerting
- Security monitoring: Monitor for security events
- Failed authentication alerts: Alert on authentication failures
- Unusual activity: Monitor for unexpected behavior
- Compliance scanning: Regular security scans
Compliance
VirtRigaud is designed to support compliance with various security frameworks:
- SOC 2: Control implementation guidance available
- ISO 27001: Security control mapping provided
- CIS Kubernetes Benchmark: Alignment with security benchmarks
- NIST Cybersecurity Framework: Control implementation guidance
Security Tools and Integrations
Supported Security Tools
- Falco: Runtime security monitoring
- OPA Gatekeeper: Policy enforcement
- Twistlock/Prisma: Container security scanning
- Aqua Security: Container and runtime security
- Cilium: Network security and observability
Security Configurations
Example security-hardened configurations are provided in:
examples/security/strict-rbac.yamlexamples/security/network-policies.yamlexamples/security/pod-security-policies.yamlexamples/security/external-secrets.yaml
Contact
For security-related questions that are not vulnerabilities, you can:
- Open a GitHub Discussion in the Security category
- Email security@virtrigaud.io
- Join the #virtrigaud-security channel on Kubernetes Slack
Recognition
We maintain a security hall of fame for researchers who have helped improve VirtRigaud security:
Thank you to all the security researchers who have contributed to making VirtRigaud more secure!
VirtRigaud Resilience Guide
This document describes the resilience patterns and error handling mechanisms in VirtRigaud.
Overview
VirtRigaud implements comprehensive resilience patterns:
- Error Taxonomy - Structured error classification
- Circuit Breakers - Protection against cascading failures
- Exponential Backoff - Intelligent retry strategies
- Timeout Policies - Prevent resource exhaustion
- Rate Limiting - Provider protection
Error Taxonomy
Error Types
VirtRigaud classifies all errors into specific categories:
| Type | Retryable | Description | Example |
|---|---|---|---|
NotFound | No | Resource doesnβt exist | VM not found |
InvalidSpec | No | Invalid configuration | Malformed VM spec |
Unauthorized | No | Authentication failed | Invalid credentials |
NotSupported | No | Unsupported operation | Feature not available |
Retryable | Yes | Transient error | Network timeout |
Unavailable | Yes | Service unavailable | Provider down |
RateLimit | Yes | Rate limited | API quota exceeded |
Timeout | Yes | Operation timeout | Long-running task |
QuotaExceeded | No | Resource quota hit | Storage full |
Conflict | No | Resource conflict | Duplicate name |
Error Creation
import "github.com/projectbeskar/virtrigaud/internal/providers/contracts"
// Create specific error types
err := contracts.NewNotFoundError("VM not found", originalErr)
err := contracts.NewRetryableError("Network timeout", originalErr)
err := contracts.NewUnavailableError("Provider unavailable", originalErr)
// Check if error is retryable
if providerErr, ok := err.(*contracts.ProviderError); ok {
if providerErr.IsRetryable() {
// Retry the operation
}
}
Circuit Breaker Pattern
Configuration
import "github.com/projectbeskar/virtrigaud/internal/resilience"
config := &resilience.Config{
FailureThreshold: 10, // Open after 10 failures
ResetTimeout: 60 * time.Second, // Try again after 60s
HalfOpenMaxCalls: 3, // Allow 3 test calls
}
cb := resilience.NewCircuitBreaker("provider-vsphere", "vsphere", "prod", config)
Usage
err := cb.Call(ctx, func(ctx context.Context) error {
// Call the potentially failing operation
return provider.Create(ctx, request)
})
if err != nil {
// Handle error (may be circuit breaker protection)
log.Error(err, "Operation failed")
}
States
- Closed - Normal operation, failures are counted
- Open - Fast-fail mode, requests are rejected immediately
- Half-Open - Testing mode, limited requests allowed
Metrics
Circuit breaker state is exposed via metrics:
virtrigaud_circuit_breaker_state{provider_type="vsphere",provider="prod"} 0
virtrigaud_circuit_breaker_failures_total{provider_type="vsphere",provider="prod"} 5
Retry Strategies
Exponential Backoff
import "github.com/projectbeskar/virtrigaud/internal/resilience"
config := &resilience.RetryConfig{
MaxAttempts: 5,
BaseDelay: 500 * time.Millisecond,
MaxDelay: 30 * time.Second,
Multiplier: 2.0,
Jitter: true,
}
err := resilience.Retry(ctx, config, func(ctx context.Context, attempt int) error {
return provider.Describe(ctx, vmID)
})
Backoff Calculation
For attempt n:
delay = BaseDelay Γ Multiplier^n
delay = min(delay, MaxDelay)
if Jitter:
delay += random(0, delay * 0.1)
Example delays with BaseDelay=500ms, Multiplier=2.0:
- Attempt 0: 500ms
- Attempt 1: 1s
- Attempt 2: 2s
- Attempt 3: 4s
- Attempt 4: 8s
Predefined Configurations
// For frequent, low-latency operations
aggressive := resilience.AggressiveRetryConfig()
// MaxAttempts: 10, BaseDelay: 100ms, Multiplier: 1.5
// For expensive operations
conservative := resilience.ConservativeRetryConfig()
// MaxAttempts: 3, BaseDelay: 1s, Multiplier: 3.0
// Disable retries
none := resilience.NoRetryConfig()
// MaxAttempts: 1
Combined Resilience Policies
Policy Builder
policy := resilience.NewPolicyBuilder("vm-operations").
WithRetry(resilience.DefaultRetryConfig()).
WithCircuitBreaker(circuitBreaker).
Build()
err := policy.Execute(ctx, func(ctx context.Context) error {
return provider.Create(ctx, request)
})
Integration Example
// In VirtualMachine controller
func (r *VirtualMachineReconciler) createVM(ctx context.Context, vm *v1beta1.VirtualMachine) error {
// Get circuit breaker for this provider
cb := r.CircuitBreakerRegistry.GetOrCreate(
"vm-operations",
provider.Spec.Type,
provider.Name,
)
// Create resilience policy
policy := resilience.NewPolicyBuilder("create-vm").
WithRetry(&resilience.RetryConfig{
MaxAttempts: 3,
BaseDelay: 1 * time.Second,
MaxDelay: 30 * time.Second,
Multiplier: 2.0,
Jitter: true,
}).
WithCircuitBreaker(cb).
Build()
// Execute with resilience
return policy.Execute(ctx, func(ctx context.Context) error {
resp, err := provider.Create(ctx, createReq)
if err != nil {
return err
}
vm.Status.ID = resp.ID
vm.Status.TaskRef = resp.TaskRef
return nil
})
}
Timeout Policies
RPC Timeouts
Different operations have different timeout requirements:
// Operation-specific timeouts
config := &config.RPCConfig{
TimeoutDescribe: 30 * time.Second, // Quick status check
TimeoutMutating: 4 * time.Minute, // Create/Delete/Power
TimeoutValidate: 10 * time.Second, // Provider validation
TimeoutTaskStatus: 10 * time.Second, // Task polling
}
// Usage in gRPC client
timeout := config.GetRPCTimeout("Create")
ctx, cancel := context.WithTimeout(ctx, timeout)
defer cancel()
resp, err := client.Create(ctx, request)
Context Propagation
Always respect context deadlines:
func (p *Provider) Create(ctx context.Context, req CreateRequest) error {
// Check if context is already cancelled
select {
case <-ctx.Done():
return ctx.Err()
default:
}
// Perform operation with context
return p.performCreate(ctx, req)
}
Rate Limiting
Provider Protection
import "golang.org/x/time/rate"
// Configure rate limiter
limiter := rate.NewLimiter(
rate.Limit(config.RateLimit.QPS), // 10 requests per second
config.RateLimit.Burst, // Allow bursts of 20
)
// Check rate limit before operation
if !limiter.Allow() {
return contracts.NewRateLimitError("Rate limit exceeded", nil)
}
// Proceed with operation
return provider.Create(ctx, request)
Per-Provider Limits
Each provider instance has its own rate limiter:
type ProviderManager struct {
limiters map[string]*rate.Limiter
}
func (pm *ProviderManager) getLimiter(providerType, provider string) *rate.Limiter {
key := fmt.Sprintf("%s:%s", providerType, provider)
if limiter, exists := pm.limiters[key]; exists {
return limiter
}
// Create new limiter
limiter := rate.NewLimiter(rate.Limit(10), 20)
pm.limiters[key] = limiter
return limiter
}
Condition Mapping
VM Conditions
VirtRigaud sets standard conditions based on operations:
| Condition | Status | Reason | Description |
|---|---|---|---|
Ready | True | VMReady | VM is ready for use |
Ready | False | ProviderError | Provider operation failed |
Ready | False | ValidationError | Spec validation failed |
Provisioning | True | Creating | VM creation in progress |
Provisioning | False | CreateFailed | VM creation failed |
Provider Conditions
| Condition | Status | Reason | Description |
|---|---|---|---|
ProviderRuntimeReady | True | DeploymentReady | Remote runtime ready |
ProviderRuntimeReady | False | DeploymentError | Deployment failed |
ProviderAvailable | True | HealthCheckPassed | Provider healthy |
ProviderAvailable | False | HealthCheckFailed | Provider unhealthy |
Error to Condition Mapping
func mapErrorToCondition(err error) metav1.Condition {
if providerErr, ok := err.(*contracts.ProviderError); ok {
switch providerErr.Type {
case contracts.ErrorTypeNotFound:
return metav1.Condition{
Type: "Ready",
Status: metav1.ConditionFalse,
Reason: "ResourceNotFound",
Message: providerErr.Message,
}
case contracts.ErrorTypeUnauthorized:
return metav1.Condition{
Type: "Ready",
Status: metav1.ConditionFalse,
Reason: "AuthenticationFailed",
Message: providerErr.Message,
}
case contracts.ErrorTypeUnavailable:
return metav1.Condition{
Type: "Ready",
Status: metav1.ConditionFalse,
Reason: "ProviderUnavailable",
Message: providerErr.Message,
}
}
}
// Default error condition
return metav1.Condition{
Type: "Ready",
Status: metav1.ConditionFalse,
Reason: "InternalError",
Message: err.Error(),
}
}
Best Practices
Error Handling
- Always classify errors - Use appropriate error types
- Preserve context - Wrap errors with additional context
- Avoid retrying non-retryable errors - Check error type first
- Set meaningful conditions - Help users understand state
Circuit Breakers
- Per-provider instances - Isolate failures
- Appropriate thresholds - Balance protection vs availability
- Monitor state changes - Alert on circuit breaker trips
- Manual override - Provide way to reset if needed
Timeouts
- Operation-appropriate - Different timeouts for different ops
- Propagate context - Always pass context through
- Handle cancellation - Check context.Done() regularly
- Resource cleanup - Ensure resources are freed on timeout
Rate Limiting
- Provider protection - Prevent overwhelming providers
- Burst handling - Allow reasonable bursts
- Back-pressure - Surface rate limits to users
- Fair sharing - Consider tenant isolation
Configuration Examples
Development Environment
apiVersion: v1
kind: ConfigMap
metadata:
name: virtrigaud-config
data:
# Relaxed timeouts for development
RPC_TIMEOUT_MUTATING: "10m"
# Aggressive retries for flaky dev environments
RETRY_MAX_ATTEMPTS: "10"
RETRY_BASE_DELAY: "100ms"
# Lower circuit breaker threshold
CB_FAILURE_THRESHOLD: "5"
CB_RESET_SECONDS: "30s"
Production Environment
apiVersion: v1
kind: ConfigMap
metadata:
name: virtrigaud-config
data:
# Strict timeouts
RPC_TIMEOUT_MUTATING: "4m"
RPC_TIMEOUT_DESCRIBE: "30s"
# Conservative retries
RETRY_MAX_ATTEMPTS: "3"
RETRY_BASE_DELAY: "1s"
RETRY_MAX_DELAY: "60s"
# Higher circuit breaker threshold
CB_FAILURE_THRESHOLD: "15"
CB_RESET_SECONDS: "120s"
# Rate limiting
RATE_LIMIT_QPS: "20"
RATE_LIMIT_BURST: "50"
VirtRigaud Upgrade Guide
This guide covers upgrading VirtRigaud installations, including CRD updates and breaking changes.
Quick Upgrade
Helm-based Upgrade (Recommended)
# 1. Update Helm repository
helm repo update
# 2. Check for breaking changes
helm diff upgrade virtrigaud virtrigaud/virtrigaud --version v0.2.1
# 3. Upgrade CRDs first (required for schema changes)
helm pull virtrigaud/virtrigaud --version v0.2.1 --untar
kubectl apply -f virtrigaud/crds/
# 4. Upgrade VirtRigaud
helm upgrade virtrigaud virtrigaud/virtrigaud \
--namespace virtrigaud-system \
--version v0.2.1
Alternative: Direct CRD Download
# Download and apply CRDs from release
curl -L "https://github.com/projectbeskar/virtrigaud/releases/download/v0.2.1/virtrigaud-crds.yaml" | kubectl apply -f -
# Upgrade application
helm upgrade virtrigaud virtrigaud/virtrigaud --version v0.2.1
Version-Specific Upgrade Notes
v0.2.0 β v0.2.1
Breaking Changes:
- β PowerState validation fixed (OffGraceful now supported)
- β Hardware version management added (vSphere only)
- β Disk size configuration respected
Required Actions:
- CRD Update Required: New powerState validation and schema changes
- Provider Image Update: Ensure providers use v0.2.1+ images for new features
- Field Testing: Verify OffGraceful, hardware version, and disk sizing work correctly
Upgrade Steps:
# 1. Backup existing resources
kubectl get virtualmachines,vmclasses,providers -A -o yaml > virtrigaud-backup-v021.yaml
# 2. Update CRDs (fixes OffGraceful validation)
kubectl apply -f https://github.com/projectbeskar/virtrigaud/releases/download/v0.2.1/virtrigaud-crds.yaml
# 3. Upgrade VirtRigaud
helm upgrade virtrigaud virtrigaud/virtrigaud --version v0.2.1
# 4. Verify OffGraceful works
kubectl patch virtualmachine <vm-name> --type='merge' -p='{"spec":{"powerState":"OffGraceful"}}'
Rollback Procedures
Rollback to Previous Version
# 1. Rollback application
helm rollback virtrigaud <revision>
# 2. Rollback CRDs (if schema breaking changes)
kubectl apply -f https://github.com/projectbeskar/virtrigaud/releases/download/v0.2.0/virtrigaud-crds.yaml
# 3. Verify resources still work
kubectl get virtualmachines -A
Emergency Recovery
# 1. Restore from backup
kubectl apply -f virtrigaud-backup-v021.yaml
# 2. Check controller logs
kubectl logs -n virtrigaud-system deployment/virtrigaud-manager
# 3. Force reconciliation
kubectl annotate virtualmachine <vm-name> virtrigaud.io/force-sync="$(date)"
Automated Upgrade with GitOps
ArgoCD
apiVersion: argoproj.io/v1beta1
kind: Application
metadata:
name: virtrigaud
spec:
source:
chart: virtrigaud
repoURL: https://projectbeskar.github.io/virtrigaud
targetRevision: "0.2.1"
helm:
parameters:
- name: manager.image.tag
value: "v0.2.1"
syncPolicy:
syncOptions:
- CreateNamespace=true
- Replace=true # Required for CRD updates
Flux
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
metadata:
name: virtrigaud
spec:
chart:
spec:
chart: virtrigaud
version: "0.2.1"
sourceRef:
kind: HelmRepository
name: virtrigaud
upgrade:
crds: CreateReplace # Ensure CRDs are updated
Troubleshooting Upgrades
CRD Validation Errors
# Check CRD status
kubectl get crd virtualmachines.infra.virtrigaud.io -o yaml
# Fix validation conflicts
kubectl patch crd virtualmachines.infra.virtrigaud.io --type='json' -p='[{"op": "remove", "path": "/spec/versions/0/schema/openAPIV3Schema/properties/spec/properties/powerState/allOf"}]'
Provider Image Mismatch
# Check provider images
kubectl get providers -o jsonpath='{.items[*].spec.runtime.image}'
# Update provider image
kubectl patch provider <provider-name> --type='merge' -p='{"spec":{"runtime":{"image":"ghcr.io/projectbeskar/virtrigaud/provider-vsphere:v0.2.1"}}}'
Resource Conflicts
# Check for resource conflicts
kubectl get events --sort-by=.metadata.creationTimestamp
# Force resource refresh
kubectl delete pod -l app.kubernetes.io/name=virtrigaud -n virtrigaud-system
Best Practices
Pre-Upgrade Checklist
- Backup all VirtRigaud resources
- Check for breaking changes in release notes
- Test upgrade in staging environment
- Verify provider connectivity
- Plan rollback strategy
Post-Upgrade Verification
- All CRDs updated successfully
- Controller manager running
- Providers healthy and responsive
- Existing VMs still manageable
- New features working (OffGraceful, hardware version, etc.)
Monitoring During Upgrade
# Watch controller logs
kubectl logs -n virtrigaud-system deployment/virtrigaud-manager -f
# Monitor VM status
kubectl get virtualmachines -A --watch
# Check provider health
kubectl get providers -o custom-columns=NAME:.metadata.name,STATUS:.status.conditions[0].type,MESSAGE:.status.conditions[0].message
Support and Recovery
If you encounter issues during upgrade:
- Check Release Notes: https://github.com/projectbeskar/virtrigaud/releases
- Review Logs: Controller and provider logs for error details
- Community Support: GitHub issues and discussions
- Emergency Rollback: Use documented rollback procedures
Remember: Always test upgrades in non-production environments first!
Development Workflow (v0.2.1+)
CRD Management
Starting with v0.2.1+, VirtRigaud uses a single-source-of-truth approach for CRDs:
- Code is the source of truth (API types in
api/infra.virtrigaud.io/v1beta1) config/crd/bases/contains generated CRDs for local development and is checked into gitcharts/virtrigaud/crds/CRDs are generated during Helm chart packaging and are NOT checked into git
For Developers
# Generate CRDs for local development
make gen-crds
# Generate CRDs for Helm chart packaging
make gen-helm-crds
# Package Helm chart with generated CRDs
make helm-package
Pre-commit Hooks
Install pre-commit hooks to automatically generate CRDs:
# Install pre-commit
pip install pre-commit
# Install hooks
pre-commit install
# CRDs will now be generated automatically on commits that modify:
# - api/**.go files
CI/CD Integration
The CI/CD pipeline automatically:
- Generates CRDs from code during builds
- Includes CRDs in release artifacts for users to download
- Generates Helm chart CRDs during packaging
This ensures CRDs are always up-to-date and not duplicated in the repository.
Repository Workflow
# 1. Make API changes
vim api/infra.virtrigaud.io/v1beta1/virtualmachine_types.go
# 2. Generate CRDs (automated by pre-commit)
make gen-crds
# 3. Commit changes
git add .
git commit -m "feat: add new VM power states"
# 4. CI validates and builds with generated CRDs
git push origin feature-branch
vSphere Hardware Version Management
This document describes how to configure and upgrade VM hardware compatibility versions in VMware vSphere environments using virtrigaud.
Overview
VMware vSphere virtual machines have a hardware compatibility version (also called virtual hardware version) that determines which features and capabilities are available to the VM. Higher hardware versions provide access to newer features but require compatible ESXi hosts.
Note: Hardware version management is specific to VMware vSphere and is not available for other providers (LibVirt, Proxmox, etc.).
Hardware Version Numbers
Common hardware versions and their corresponding VMware products:
| Hardware Version | vSphere/ESXi Version | Key Features |
|---|---|---|
| 10 | ESXi 5.5 | Legacy baseline |
| 11 | ESXi 6.0 | Enhanced graphics, larger VM memory |
| 13 | ESXi 6.5 | Enhanced security, more CPU/memory |
| 14 | ESXi 6.7 | Persistent memory, enhanced security |
| 15 | ESXi 6.7 U2 | Enhanced graphics, more vCPU |
| 17 | ESXi 7.0 | TPM 2.0, enhanced security |
| 18 | ESXi 7.0 U1 | Enhanced networking |
| 19 | ESXi 7.0 U2 | Precision time protocol |
| 20 | ESXi 7.0 U3 | Enhanced graphics, more memory |
| 21 | ESXi 8.0 | Latest features, DPU support |
Setting Hardware Version During VM Creation
Configure the hardware version in the VMClass using the extraConfig field:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: modern-vm-class
namespace: virtrigaud-system
spec:
cpu: 4
memory: 8Gi
firmware: UEFI
# vSphere-specific hardware version configuration
extraConfig:
vsphere.hardwareVersion: "21" # Use latest hardware version
diskDefaults:
type: thin
sizeGiB: 50
---
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: modern-vm
namespace: default
spec:
providerRef:
name: vsphere-provider
namespace: virtrigaud-system
classRef:
name: modern-vm-class # Uses hardware version 21
namespace: virtrigaud-system
imageRef:
name: ubuntu-22-04
namespace: virtrigaud-system
Upgrading Hardware Version for Existing VMs
You can upgrade the hardware version of existing VMs using the dedicated hardware upgrade API:
Using kubectl with Raw gRPC
# First, ensure the VM is powered off
kubectl patch vm my-vm --type='merge' -p='{"spec":{"powerState":"Off"}}'
# Wait for VM to be powered off, then upgrade hardware version
# Note: This requires direct access to the provider gRPC endpoint
# A kubectl plugin or controller extension would be needed for this operation
Programmatic Upgrade (Example Go Code)
package main
import (
"context"
"fmt"
"log"
providerv1 "github.com/projectbeskar/virtrigaud/proto/rpc/provider/v1"
"google.golang.org/grpc"
)
func upgradeVMHardwareVersion(vmID string, targetVersion int32) error {
// Connect to vSphere provider
conn, err := grpc.Dial("vsphere-provider:9090", grpc.WithInsecure())
if err != nil {
return fmt.Errorf("failed to connect: %w", err)
}
defer conn.Close()
client := providerv1.NewProviderClient(conn)
// Upgrade hardware version
req := &providerv1.HardwareUpgradeRequest{
Id: vmID,
TargetVersion: targetVersion,
}
resp, err := client.HardwareUpgrade(context.Background(), req)
if err != nil {
return fmt.Errorf("hardware upgrade failed: %w", err)
}
log.Printf("Hardware upgrade completed: %+v", resp)
return nil
}
Requirements and Limitations
Prerequisites
- VM Must Be Powered Off: Hardware version upgrades require the VM to be completely powered off
- ESXi Host Compatibility: Target hardware version must be supported by the ESXi host
- VMware Tools: For best results, ensure VMware Tools is installed and up-to-date
- Backup Recommended: Take a snapshot before upgrading hardware version
Limitations
- One-Way Operation: Hardware version upgrades cannot be downgraded
- vSphere Only: This feature is not available for LibVirt, Proxmox, or other providers
- Host Requirements: Upgrading to newer versions may prevent VM from running on older ESXi hosts
- Compatibility: Some older guest operating systems may not support newer hardware versions
Best Practices
Choosing Hardware Version
- Match ESXi Version: Use the hardware version that matches your ESXi environment
- Conservative Approach: Donβt always use the latest version unless you need specific features
- Test First: Test hardware version upgrades in development before production
Upgrade Process
- Plan Maintenance Window: VMs must be powered off during upgrade
- Backup First: Always take a snapshot before upgrading
- Batch Operations: Group VMs by hardware requirements for efficient upgrades
- Verify Compatibility: Ensure all ESXi hosts in your cluster support the target version
Example VMClass Configurations
Legacy Environment (ESXi 6.5)
extraConfig:
vsphere.hardwareVersion: "13"
Modern Environment (ESXi 7.0)
extraConfig:
vsphere.hardwareVersion: "17"
Latest Features (ESXi 8.0)
extraConfig:
vsphere.hardwareVersion: "21"
Troubleshooting
Common Issues
-
VM Not Powered Off
Error: VM must be powered off for hardware upgrade, current state: poweredOnSolution: Power off the VM first using
powerState: Off -
Unsupported Hardware Version
Error: target version vmx-21 is not supported by ESXi hostSolution: Check ESXi host compatibility and use a supported version
-
Version Not Newer
Error: target version vmx-15 is not newer than current version vmx-17Solution: Hardware versions can only be upgraded, not downgraded
Validation
After upgrading, verify the hardware version:
# Check VM configuration in vSphere
kubectl get vm my-vm -o jsonpath='{.status.provider}'
Integration Examples
Complete VM Lifecycle with Hardware Version
# 1. Create VMClass with specific hardware version
apiVersion: infra.virtrigaud.io/v1beta1
kind: VMClass
metadata:
name: production-vm-class
spec:
cpu: 8
memory: 16Gi
firmware: UEFI
extraConfig:
vsphere.hardwareVersion: "19" # ESXi 7.0 U2 compatible
---
# 2. Create VM using the class
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: production-vm
spec:
powerState: On
providerRef:
name: vsphere-provider
namespace: virtrigaud-system
classRef:
name: production-vm-class
namespace: virtrigaud-system
imageRef:
name: ubuntu-22-04
namespace: virtrigaud-system
---
# 3. Update to newer hardware version (requires separate upgrade operation)
# This would typically be done through a controller or manual gRPC call
# after powering off the VM
This vSphere-specific feature provides fine-grained control over VM hardware capabilities while maintaining compatibility with your ESXi infrastructure.
vSphere Datastore Cluster (StoragePod) Support
This document describes how to use vSphere Datastore Clusters (also known as StoragePods) for automatic datastore selection when provisioning virtual machines with virtrigaud.
Note: StoragePod support is specific to the vSphere provider and is not available for Libvirt or Proxmox.
Overview
A vSphere Datastore Cluster (internally called a StoragePod) is a logical grouping of datastores managed together as a single unit. When you specify a Datastore Cluster instead of an individual datastore, virtrigaud automatically selects the datastore within the cluster that has the most available free space at provisioning time.
This simplifies VM placement in environments with multiple datastores: instead of tracking which individual datastore has capacity, you point to the cluster and let virtrigaud choose.
Datastore Selection Strategy
virtrigaud uses a simple, predictable strategy: pick the datastore with the most free space. This distributes VMs across the cluster over time as datastores fill up.
vSphere Storage DRS is not required to be enabled on the cluster. virtrigaud queries datastore summaries directly via the vSphere API.
Configuration
Per-VM Placement (VirtualMachine spec)
Specify storagePod inside spec.placement on a VirtualMachine resource:
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: my-vm
namespace: virtrigaud-system
spec:
providerRef:
name: vsphere-prod
classRef:
name: standard-2cpu-4gb
imageRef:
name: ubuntu-24-04
placement:
cluster: prod-cluster
storagePod: "Production-DS-Cluster" # Datastore Cluster name
folder: /prod/vms
virtrigaud will inspect every datastore in Production-DS-Cluster and clone the VM onto the one with the most free space.
Provider-Level Default
Set spec.defaults.storagePod on the Provider resource to apply a Datastore Cluster as the default for all VMs that do not specify their own placement:
apiVersion: infra.virtrigaud.io/v1beta1
kind: Provider
metadata:
name: vsphere-prod
namespace: virtrigaud-system
spec:
type: vsphere
endpoint: https://vcenter.example.com
credentialSecretRef:
name: vsphere-credentials
defaults:
cluster: prod-cluster
storagePod: "Production-DS-Cluster" # cluster-wide default
folder: /prod/vms
runtime:
image: ghcr.io/projectbeskar/virtrigaud-provider-vsphere:latest
Alternatively, pass the default through the provider podβs environment by adding it to spec.runtime.env:
spec:
runtime:
env:
- name: PROVIDER_DEFAULT_STORAGE_POD
value: "Production-DS-Cluster"
Precedence Rules
When multiple sources specify storage placement, virtrigaud applies the following priority (highest to lowest):
| Priority | Source | Field |
|---|---|---|
| 1 | VM spec β explicit datastore | spec.placement.datastore |
| 2 | VM spec β StoragePod | spec.placement.storagePod |
| 3 | Provider default β StoragePod | spec.defaults.storagePod / PROVIDER_DEFAULT_STORAGE_POD |
| 4 | Provider default β datastore | spec.defaults.datastore / PROVIDER_DEFAULT_DATASTORE |
An explicit datastore always wins. storagePod is only consulted when no explicit datastore is set.
Examples
StoragePod only (recommended for large environments)
apiVersion: infra.virtrigaud.io/v1beta1
kind: VirtualMachine
metadata:
name: web-server-01
namespace: virtrigaud-system
spec:
providerRef:
name: vsphere-prod
classRef:
name: web-4cpu-8gb
imageRef:
name: ubuntu-24-04
placement:
cluster: prod-cluster
storagePod: "SSD-Datastore-Cluster"
Override with an explicit datastore (e.g. for compliance)
When you need a specific datastoreβfor example, a datastore dedicated to regulated workloadsβset datastore and storagePod is ignored:
placement:
cluster: prod-cluster
datastore: "regulated-ds-01" # StoragePod is ignored when this is set
storagePod: "SSD-Datastore-Cluster"
Use different clusters for different teams via a shared Provider
Combine a provider-level StoragePod default with per-VM overrides:
# Provider default: route most VMs to the general-purpose cluster
spec:
defaults:
cluster: general-cluster
storagePod: "General-DS-Cluster"
# High-performance VM overrides both cluster and StoragePod
spec:
placement:
cluster: nvme-cluster
storagePod: "NVMe-DS-Cluster"
How it Works Internally
When a VM is created and a StoragePod is resolved:
- virtrigaud creates a container view scoped to the vSphere root folder and searches for
StoragePodmanaged objects. - It matches the named StoragePod and reads its
childEntitylistβthe set of datastores it contains. - For each child datastore the
summary.freeSpaceproperty is retrieved via the property collector. - The datastore with the highest
freeSpaceis selected and used as the target in the clone specification (VirtualMachineRelocateSpec.Datastore).
The selection happens at provisioning time; it is not re-evaluated on subsequent reconciliations or reboots.
Troubleshooting
βStoragePod βXβ not foundβ
- Verify the name exactly matches the Datastore Cluster name in vCenter (case-sensitive).
- Confirm the vCenter user account has
Datastore.Browseprivilege on the Datastore Cluster. - Check provider pod logs for the container view query.
βStoragePod βXβ contains no datastoresβ
The Datastore Cluster exists but is empty (no datastores are members). Add datastores to the cluster in vCenter.
βfailed to retrieve datastores from StoragePodβ
The provider account lacks permission to read datastore summary properties. Grant Datastore.Browse on the individual datastores within the cluster.
VM is always placed on the same datastore
This is expected when one datastore consistently has significantly more free space. It is not a bug.
Checking which datastore was selected
The provider logs an INFO message at VM creation time:
INFO Selected datastore from StoragePod storagePod=Production-DS-Cluster datastore=vsanDatastore-02 freeSpaceGiB=812
Check the provider pod logs to see the selection for any specific VM.
Limitations
- Free-space only: virtrigaud does not use vSphere Storage DRS policies, IOPS limits, or storage tags when selecting a datastore. Only free space is considered.
- Point-in-time selection: The datastore is chosen once at clone time. Subsequent Storage vMotion by Storage DRS is not prevented.
- No rebalancing: virtrigaud does not rebalance existing VMs when free space changes.
- vSphere only: This feature has no equivalent for Libvirt or Proxmox providers.
Bearer Token Authentication
This guide covers how to configure bearer token authentication for VirtRigaud providers using JWT tokens and RBAC.
Overview
Bearer token authentication provides a stateless, scalable authentication mechanism using JSON Web Tokens (JWT). This approach is suitable for:
- Multi-tenant environments: Different tokens for different tenants
- API-based access: External systems accessing provider services
- Short-lived sessions: Tokens with configurable expiration
- Fine-grained permissions: Token-based RBAC
JWT Token Structure
Token Claims
{
"iss": "virtrigaud-manager",
"sub": "provider-client",
"aud": "virtrigaud-provider",
"exp": 1640995200,
"iat": 1640908800,
"nbf": 1640908800,
"scope": "vm:create vm:read vm:update vm:delete",
"tenant": "default",
"provider": "vsphere",
"jti": "unique-token-id"
}
Scopes Definition
| Scope | Description |
|---|---|
vm:create | Create virtual machines |
vm:read | Read virtual machine information |
vm:update | Update virtual machine configuration |
vm:delete | Delete virtual machines |
vm:power | Control virtual machine power state |
vm:snapshot | Create and manage snapshots |
vm:clone | Clone virtual machines |
admin | Full administrative access |
Token Generation
JWT Signing Key
# Generate RS256 private key
openssl genrsa -out jwt-private-key.pem 2048
# Extract public key
openssl rsa -in jwt-private-key.pem -pubout -out jwt-public-key.pem
# Store as Kubernetes secret
kubectl create secret generic jwt-keys \
--from-file=private-key=jwt-private-key.pem \
--from-file=public-key=jwt-public-key.pem \
--namespace=virtrigaud-system
Token Generation Service
package auth
import (
"crypto/rsa"
"time"
"github.com/golang-jwt/jwt/v4"
)
type TokenClaims struct {
Issuer string `json:"iss"`
Subject string `json:"sub"`
Audience string `json:"aud"`
ExpiresAt int64 `json:"exp"`
IssuedAt int64 `json:"iat"`
NotBefore int64 `json:"nbf"`
Scope string `json:"scope"`
Tenant string `json:"tenant"`
Provider string `json:"provider"`
ID string `json:"jti"`
jwt.RegisteredClaims
}
type TokenService struct {
privateKey *rsa.PrivateKey
publicKey *rsa.PublicKey
issuer string
}
func NewTokenService(privateKey *rsa.PrivateKey, publicKey *rsa.PublicKey, issuer string) *TokenService {
return &TokenService{
privateKey: privateKey,
publicKey: publicKey,
issuer: issuer,
}
}
func (ts *TokenService) GenerateToken(subject, tenant, provider string, scopes []string, duration time.Duration) (string, error) {
now := time.Now()
claims := &TokenClaims{
Issuer: ts.issuer,
Subject: subject,
Audience: "virtrigaud-provider",
ExpiresAt: now.Add(duration).Unix(),
IssuedAt: now.Unix(),
NotBefore: now.Unix(),
Scope: strings.Join(scopes, " "),
Tenant: tenant,
Provider: provider,
ID: generateJTI(),
}
token := jwt.NewWithClaims(jwt.SigningMethodRS256, claims)
return token.SignedString(ts.privateKey)
}
func (ts *TokenService) ValidateToken(tokenString string) (*TokenClaims, error) {
token, err := jwt.ParseWithClaims(tokenString, &TokenClaims{}, func(token *jwt.Token) (interface{}, error) {
if _, ok := token.Method.(*jwt.SigningMethodRSA); !ok {
return nil, fmt.Errorf("unexpected signing method: %v", token.Header["alg"])
}
return ts.publicKey, nil
})
if err != nil {
return nil, err
}
if claims, ok := token.Claims.(*TokenClaims); ok && token.Valid {
return claims, nil
}
return nil, fmt.Errorf("invalid token")
}
func generateJTI() string {
return uuid.New().String()
}
Provider Authentication Interceptor
gRPC Interceptor
package middleware
import (
"context"
"strings"
"google.golang.org/grpc"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/metadata"
"google.golang.org/grpc/status"
)
type AuthInterceptor struct {
tokenService *auth.TokenService
rbac *RBACManager
}
func NewAuthInterceptor(tokenService *auth.TokenService, rbac *RBACManager) *AuthInterceptor {
return &AuthInterceptor{
tokenService: tokenService,
rbac: rbac,
}
}
func (ai *AuthInterceptor) Unary() grpc.UnaryServerInterceptor {
return func(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
// Skip authentication for health checks
if strings.HasSuffix(info.FullMethod, "/Health/Check") {
return handler(ctx, req)
}
token, err := ai.extractToken(ctx)
if err != nil {
return nil, status.Errorf(codes.Unauthenticated, "missing or invalid token: %v", err)
}
claims, err := ai.tokenService.ValidateToken(token)
if err != nil {
return nil, status.Errorf(codes.Unauthenticated, "invalid token: %v", err)
}
// Check authorization
if !ai.rbac.IsAuthorized(claims, info.FullMethod) {
return nil, status.Errorf(codes.PermissionDenied, "insufficient permissions")
}
// Add claims to context
ctx = context.WithValue(ctx, "claims", claims)
return handler(ctx, req)
}
}
func (ai *AuthInterceptor) extractToken(ctx context.Context) (string, error) {
md, ok := metadata.FromIncomingContext(ctx)
if !ok {
return "", fmt.Errorf("missing metadata")
}
authHeaders := md.Get("authorization")
if len(authHeaders) == 0 {
return "", fmt.Errorf("missing authorization header")
}
authHeader := authHeaders[0]
if !strings.HasPrefix(authHeader, "Bearer ") {
return "", fmt.Errorf("invalid authorization header format")
}
return strings.TrimPrefix(authHeader, "Bearer "), nil
}
RBAC Manager
package middleware
import (
"strings"
)
type Permission struct {
Resource string
Action string
}
type RBACManager struct {
permissions map[string][]Permission
}
func NewRBACManager() *RBACManager {
return &RBACManager{
permissions: map[string][]Permission{
// RPC method to required permissions mapping
"/provider.v1.ProviderService/CreateVM": {
{Resource: "vm", Action: "create"},
},
"/provider.v1.ProviderService/GetVM": {
{Resource: "vm", Action: "read"},
},
"/provider.v1.ProviderService/UpdateVM": {
{Resource: "vm", Action: "update"},
},
"/provider.v1.ProviderService/DeleteVM": {
{Resource: "vm", Action: "delete"},
},
"/provider.v1.ProviderService/PowerVM": {
{Resource: "vm", Action: "power"},
},
"/provider.v1.ProviderService/CreateSnapshot": {
{Resource: "vm", Action: "snapshot"},
},
"/provider.v1.ProviderService/CloneVM": {
{Resource: "vm", Action: "clone"},
},
},
}
}
func (rbac *RBACManager) IsAuthorized(claims *auth.TokenClaims, method string) bool {
requiredPerms, exists := rbac.permissions[method]
if !exists {
// Allow if no specific permissions required
return true
}
userScopes := strings.Split(claims.Scope, " ")
// Check if user has admin scope
for _, scope := range userScopes {
if scope == "admin" {
return true
}
}
// Check specific permissions
for _, requiredPerm := range requiredPerms {
requiredScope := requiredPerm.Resource + ":" + requiredPerm.Action
hasPermission := false
for _, userScope := range userScopes {
if userScope == requiredScope {
hasPermission = true
break
}
}
if !hasPermission {
return false
}
}
return true
}
Kubernetes RBAC Integration
ServiceAccount and ClusterRole
apiVersion: v1
kind: ServiceAccount
metadata:
name: virtrigaud-token-manager
namespace: virtrigaud-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: virtrigaud-token-manager
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["configmaps"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: virtrigaud-token-manager
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: virtrigaud-token-manager
subjects:
- kind: ServiceAccount
name: virtrigaud-token-manager
namespace: virtrigaud-system
Token Management ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: token-config
namespace: virtrigaud-system
data:
config.yaml: |
tokenService:
issuer: "virtrigaud-manager"
defaultDuration: "1h"
maxDuration: "24h"
scopes:
- name: "vm:create"
description: "Create virtual machines"
- name: "vm:read"
description: "Read virtual machine information"
- name: "vm:update"
description: "Update virtual machine configuration"
- name: "vm:delete"
description: "Delete virtual machines"
- name: "vm:power"
description: "Control virtual machine power state"
- name: "vm:snapshot"
description: "Create and manage snapshots"
- name: "vm:clone"
description: "Clone virtual machines"
- name: "admin"
description: "Full administrative access"
tenants:
- name: "default"
description: "Default tenant"
allowedScopes: ["vm:create", "vm:read", "vm:update", "vm:delete", "vm:power"]
- name: "development"
description: "Development environment"
allowedScopes: ["vm:create", "vm:read", "vm:update", "vm:delete", "vm:power", "vm:snapshot", "vm:clone"]
- name: "production"
description: "Production environment"
allowedScopes: ["vm:read", "vm:power"]
Client Configuration
Manager Client Setup
package client
import (
"context"
"time"
"google.golang.org/grpc"
"google.golang.org/grpc/metadata"
)
type AuthenticatedClient struct {
client providerv1.ProviderServiceClient
token string
}
func NewAuthenticatedClient(endpoint, token string) (*AuthenticatedClient, error) {
conn, err := grpc.Dial(endpoint, grpc.WithInsecure())
if err != nil {
return nil, err
}
return &AuthenticatedClient{
client: providerv1.NewProviderServiceClient(conn),
token: token,
}, nil
}
func (ac *AuthenticatedClient) CreateVM(ctx context.Context, req *providerv1.CreateVMRequest) (*providerv1.CreateVMResponse, error) {
ctx = ac.addAuthHeader(ctx)
return ac.client.CreateVM(ctx, req)
}
func (ac *AuthenticatedClient) addAuthHeader(ctx context.Context) context.Context {
md := metadata.Pairs("authorization", "Bearer "+ac.token)
return metadata.NewOutgoingContext(ctx, md)
}
Token Refresh
package auth
import (
"sync"
"time"
)
type TokenManager struct {
tokenService *TokenService
currentToken string
expiresAt time.Time
mutex sync.RWMutex
subject string
tenant string
provider string
scopes []string
}
func NewTokenManager(tokenService *TokenService, subject, tenant, provider string, scopes []string) *TokenManager {
return &TokenManager{
tokenService: tokenService,
subject: subject,
tenant: tenant,
provider: provider,
scopes: scopes,
}
}
func (tm *TokenManager) GetToken() (string, error) {
tm.mutex.RLock()
if tm.currentToken != "" && time.Now().Before(tm.expiresAt.Add(-5*time.Minute)) {
token := tm.currentToken
tm.mutex.RUnlock()
return token, nil
}
tm.mutex.RUnlock()
return tm.refreshToken()
}
func (tm *TokenManager) refreshToken() (string, error) {
tm.mutex.Lock()
defer tm.mutex.Unlock()
// Double-check after acquiring write lock
if tm.currentToken != "" && time.Now().Before(tm.expiresAt.Add(-5*time.Minute)) {
return tm.currentToken, nil
}
token, err := tm.tokenService.GenerateToken(tm.subject, tm.tenant, tm.provider, tm.scopes, time.Hour)
if err != nil {
return "", err
}
tm.currentToken = token
tm.expiresAt = time.Now().Add(time.Hour)
return token, nil
}
Helm Chart Integration
Provider Runtime with Bearer Token Auth
# values-bearer-auth.yaml
auth:
type: "bearer"
jwt:
publicKeySecret: "jwt-keys"
publicKeyKey: "public-key"
issuer: "virtrigaud-manager"
audience: "virtrigaud-provider"
# Environment variables for authentication
env:
- name: AUTH_TYPE
value: "bearer"
- name: JWT_PUBLIC_KEY_PATH
value: "/etc/jwt/public-key"
- name: JWT_ISSUER
value: "virtrigaud-manager"
- name: JWT_AUDIENCE
value: "virtrigaud-provider"
# Mount JWT public key
volumes:
- name: jwt-public-key
secret:
secretName: jwt-keys
volumeMounts:
- name: jwt-public-key
mountPath: /etc/jwt
readOnly: true
Monitoring and Logging
Authentication Metrics
package metrics
import (
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
authenticationAttempts = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "virtrigaud_authentication_attempts_total",
Help: "Total number of authentication attempts",
},
[]string{"method", "result", "tenant"},
)
authenticationDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "virtrigaud_authentication_duration_seconds",
Help: "Duration of authentication operations",
},
[]string{"method", "result"},
)
activeTokens = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "virtrigaud_active_tokens",
Help: "Number of active tokens by tenant",
},
[]string{"tenant", "provider"},
)
)
func RecordAuthAttempt(method, result, tenant string) {
authenticationAttempts.WithLabelValues(method, result, tenant).Inc()
}
func RecordAuthDuration(method, result string, duration time.Duration) {
authenticationDuration.WithLabelValues(method, result).Observe(duration.Seconds())
}
Audit Logging
package audit
import (
"context"
"encoding/json"
"time"
"go.uber.org/zap"
)
type AuditEvent struct {
Timestamp time.Time `json:"timestamp"`
EventType string `json:"event_type"`
Subject string `json:"subject"`
Tenant string `json:"tenant"`
Provider string `json:"provider"`
Resource string `json:"resource"`
Action string `json:"action"`
Result string `json:"result"`
Error string `json:"error,omitempty"`
Metadata map[string]interface{} `json:"metadata,omitempty"`
}
type AuditLogger struct {
logger *zap.Logger
}
func NewAuditLogger(logger *zap.Logger) *AuditLogger {
return &AuditLogger{logger: logger}
}
func (al *AuditLogger) LogAuthEvent(ctx context.Context, eventType, subject, tenant, provider, result string, err error) {
event := AuditEvent{
Timestamp: time.Now(),
EventType: eventType,
Subject: subject,
Tenant: tenant,
Provider: provider,
Result: result,
}
if err != nil {
event.Error = err.Error()
}
eventJSON, _ := json.Marshal(event)
al.logger.Info("audit_event", zap.String("event", string(eventJSON)))
}
Security Best Practices
1. Token Validation
// Always validate all token claims
func validateTokenClaims(claims *TokenClaims) error {
now := time.Now()
// Check expiration
if claims.ExpiresAt < now.Unix() {
return fmt.Errorf("token expired")
}
// Check not before
if claims.NotBefore > now.Unix() {
return fmt.Errorf("token not yet valid")
}
// Check issuer
if claims.Issuer != expectedIssuer {
return fmt.Errorf("invalid issuer")
}
// Check audience
if claims.Audience != expectedAudience {
return fmt.Errorf("invalid audience")
}
return nil
}
2. Rate Limiting
// Implement rate limiting for token generation
type RateLimiter struct {
requests map[string][]time.Time
mutex sync.RWMutex
limit int
window time.Duration
}
func (rl *RateLimiter) Allow(key string) bool {
rl.mutex.Lock()
defer rl.mutex.Unlock()
now := time.Now()
requests := rl.requests[key]
// Remove old requests outside the window
var validRequests []time.Time
for _, req := range requests {
if now.Sub(req) < rl.window {
validRequests = append(validRequests, req)
}
}
// Check if we've exceeded the limit
if len(validRequests) >= rl.limit {
return false
}
// Add the current request
validRequests = append(validRequests, now)
rl.requests[key] = validRequests
return true
}
3. Token Blacklisting
// Implement token blacklisting for revoked tokens
type TokenBlacklist struct {
blacklistedTokens map[string]time.Time
mutex sync.RWMutex
}
func (tb *TokenBlacklist) IsBlacklisted(jti string) bool {
tb.mutex.RLock()
defer tb.mutex.RUnlock()
expiresAt, exists := tb.blacklistedTokens[jti]
if !exists {
return false
}
// Remove expired entries
if time.Now().After(expiresAt) {
delete(tb.blacklistedTokens, jti)
return false
}
return true
}
func (tb *TokenBlacklist) BlacklistToken(jti string, expiresAt time.Time) {
tb.mutex.Lock()
defer tb.mutex.Unlock()
tb.blacklistedTokens[jti] = expiresAt
}
mTLS Security Configuration
This guide covers how to configure mutual TLS (mTLS) authentication between VirtRigaud managers and providers.
Overview
mTLS provides strong authentication and encryption for gRPC communication between the VirtRigaud manager and provider services. It ensures:
- Authentication: Both client and server verify each otherβs certificates
- Encryption: All traffic is encrypted in transit
- Certificate Pinning: Specific certificate authorities are trusted
- Certificate Rotation: Automated certificate renewal
Certificate Management
1. Generate CA Certificate
# Create CA private key
openssl genrsa -out ca-key.pem 4096
# Create CA certificate
openssl req -new -x509 -key ca-key.pem -out ca-cert.pem -days 365 \
-subj "/C=US/ST=CA/L=San Francisco/O=VirtRigaud/CN=VirtRigaud CA"
2. Generate Server Certificate (Provider)
# Create server private key
openssl genrsa -out server-key.pem 4096
# Create server certificate signing request
openssl req -new -key server-key.pem -out server-csr.pem \
-subj "/C=US/ST=CA/L=San Francisco/O=VirtRigaud/CN=provider-service"
# Sign server certificate
openssl x509 -req -in server-csr.pem -CA ca-cert.pem -CAkey ca-key.pem \
-CAcreateserial -out server-cert.pem -days 365 \
-extensions v3_req -extfile <(cat <<EOF
[v3_req]
keyUsage = keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1 = provider-service
DNS.2 = provider-service.default.svc.cluster.local
DNS.3 = localhost
IP.1 = 127.0.0.1
EOF
)
3. Generate Client Certificate (Manager)
# Create client private key
openssl genrsa -out client-key.pem 4096
# Create client certificate signing request
openssl req -new -key client-key.pem -out client-csr.pem \
-subj "/C=US/ST=CA/L=San Francisco/O=VirtRigaud/CN=manager-client"
# Sign client certificate
openssl x509 -req -in client-csr.pem -CA ca-cert.pem -CAkey ca-key.pem \
-CAcreateserial -out client-cert.pem -days 365 \
-extensions v3_req -extfile <(cat <<EOF
[v3_req]
keyUsage = keyEncipherment, dataEncipherment
extendedKeyUsage = clientAuth
EOF
)
Kubernetes Secret Configuration
Provider TLS Secret
apiVersion: v1
kind: Secret
metadata:
name: provider-tls
namespace: default
type: kubernetes.io/tls
data:
tls.crt: # base64 encoded server-cert.pem
tls.key: # base64 encoded server-key.pem
ca.crt: # base64 encoded ca-cert.pem
Manager TLS Secret
apiVersion: v1
kind: Secret
metadata:
name: manager-tls
namespace: virtrigaud-system
type: kubernetes.io/tls
data:
tls.crt: # base64 encoded client-cert.pem
tls.key: # base64 encoded client-key.pem
ca.crt: # base64 encoded ca-cert.pem
Provider Configuration
SDK Server Configuration
package main
import (
"crypto/tls"
"crypto/x509"
"fmt"
"io/ioutil"
"github.com/projectbeskar/virtrigaud/sdk/provider/server"
)
func main() {
// Load certificates
cert, err := tls.LoadX509KeyPair("/etc/tls/tls.crt", "/etc/tls/tls.key")
if err != nil {
panic(fmt.Sprintf("Failed to load server certificates: %v", err))
}
// Load CA certificate for client verification
caCert, err := ioutil.ReadFile("/etc/tls/ca.crt")
if err != nil {
panic(fmt.Sprintf("Failed to load CA certificate: %v", err))
}
caCertPool := x509.NewCertPool()
if !caCertPool.AppendCertsFromPEM(caCert) {
panic("Failed to parse CA certificate")
}
// Configure TLS
tlsConfig := &tls.Config{
Certificates: []tls.Certificate{cert},
ClientAuth: tls.RequireAndVerifyClientCert,
ClientCAs: caCertPool,
MinVersion: tls.VersionTLS12,
CipherSuites: []uint16{
tls.TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,
tls.TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,
tls.TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,
tls.TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305,
},
}
// Create server with mTLS
srv, err := server.New(&server.Config{
Port: 9443,
TLS: tlsConfig,
EnableTLS: true,
})
if err != nil {
panic(fmt.Sprintf("Failed to create server: %v", err))
}
// Register your provider implementation here
// providerv1.RegisterProviderServiceServer(srv.GRPCServer(), &YourProvider{})
if err := srv.Serve(); err != nil {
panic(fmt.Sprintf("Server failed: %v", err))
}
}
Helm Chart Values (Provider Runtime)
# values-mtls.yaml
tls:
enabled: true
secretName: provider-tls
# Mount TLS certificates
volumes:
- name: tls-certs
secret:
secretName: provider-tls
volumeMounts:
- name: tls-certs
mountPath: /etc/tls
readOnly: true
# Environment variables for TLS
env:
- name: TLS_ENABLED
value: "true"
- name: TLS_CERT_PATH
value: "/etc/tls/tls.crt"
- name: TLS_KEY_PATH
value: "/etc/tls/tls.key"
- name: TLS_CA_PATH
value: "/etc/tls/ca.crt"
Manager Configuration
Client TLS Configuration
// In manager code
func createProviderClient(endpoint string) (providerv1.ProviderServiceClient, error) {
// Load client certificates
cert, err := tls.LoadX509KeyPair("/etc/manager-tls/tls.crt", "/etc/manager-tls/tls.key")
if err != nil {
return nil, fmt.Errorf("failed to load client certificates: %w", err)
}
// Load CA certificate for server verification
caCert, err := ioutil.ReadFile("/etc/manager-tls/ca.crt")
if err != nil {
return nil, fmt.Errorf("failed to load CA certificate: %w", err)
}
caCertPool := x509.NewCertPool()
if !caCertPool.AppendCertsFromPEM(caCert) {
return nil, fmt.Errorf("failed to parse CA certificate")
}
// Configure TLS
tlsConfig := &tls.Config{
Certificates: []tls.Certificate{cert},
RootCAs: caCertPool,
ServerName: "provider-service", // Must match server certificate CN/SAN
MinVersion: tls.VersionTLS12,
}
// Create gRPC connection with mTLS
conn, err := grpc.Dial(endpoint,
grpc.WithTransportCredentials(credentials.NewTLS(tlsConfig)),
)
if err != nil {
return nil, fmt.Errorf("failed to connect: %w", err)
}
return providerv1.NewProviderServiceClient(conn), nil
}
Certificate Rotation
Using cert-manager
# Install cert-manager first
# kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.12.0/cert-manager.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: virtrigaud-ca-issuer
spec:
ca:
secretName: virtrigaud-ca-secret
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: provider-tls
namespace: default
spec:
secretName: provider-tls
issuerRef:
name: virtrigaud-ca-issuer
kind: ClusterIssuer
commonName: provider-service
dnsNames:
- provider-service
- provider-service.default.svc.cluster.local
duration: 8760h # 1 year
renewBefore: 720h # 30 days before expiry
Manual Rotation Script
#!/bin/bash
# rotate-certs.sh
NAMESPACE=${1:-default}
SECRET_NAME=${2:-provider-tls}
echo "Rotating certificates for $SECRET_NAME in namespace $NAMESPACE"
# Generate new certificates (using the same process as above)
# ...
# Update Kubernetes secret
kubectl create secret tls $SECRET_NAME \
--cert=server-cert.pem \
--key=server-key.pem \
--namespace=$NAMESPACE \
--dry-run=client -o yaml | kubectl apply -f -
# Add CA certificate to the secret
kubectl patch secret $SECRET_NAME -n $NAMESPACE \
--patch="$(cat <<EOF
data:
ca.crt: $(base64 -w 0 ca-cert.pem)
EOF
)"
# Restart provider deployment to pick up new certificates
kubectl rollout restart deployment/provider-deployment -n $NAMESPACE
echo "Certificate rotation completed"
Security Best Practices
1. Certificate Validation
// Always validate certificate chains
func validateCertificate(cert *x509.Certificate, caCert *x509.Certificate) error {
roots := x509.NewCertPool()
roots.AddCert(caCert)
opts := x509.VerifyOptions{
Roots: roots,
KeyUsages: []x509.ExtKeyUsage{x509.ExtKeyUsageServerAuth},
}
_, err := cert.Verify(opts)
return err
}
2. Certificate Pinning
// Pin specific certificate or CA
func createTLSConfigWithPinning(expectedCertFingerprint string) *tls.Config {
return &tls.Config{
VerifyPeerCertificate: func(rawCerts [][]byte, verifiedChains [][]*x509.Certificate) error {
if len(rawCerts) == 0 {
return fmt.Errorf("no certificates provided")
}
cert, err := x509.ParseCertificate(rawCerts[0])
if err != nil {
return err
}
fingerprint := sha256.Sum256(cert.Raw)
if hex.EncodeToString(fingerprint[:]) != expectedCertFingerprint {
return fmt.Errorf("certificate fingerprint mismatch")
}
return nil
},
}
}
3. Monitoring and Alerting
# Prometheus AlertManager rules
groups:
- name: virtrigaud.certificates
rules:
- alert: CertificateExpiringSoon
expr: (cert_manager_certificate_expiration_timestamp_seconds - time()) / 86400 < 30
for: 1h
labels:
severity: warning
annotations:
summary: "Certificate expiring soon"
description: "Certificate {{ $labels.name }} expires in less than 30 days"
- alert: CertificateExpired
expr: cert_manager_certificate_expiration_timestamp_seconds < time()
for: 0m
labels:
severity: critical
annotations:
summary: "Certificate expired"
description: "Certificate {{ $labels.name }} has expired"
Troubleshooting
Common Issues
-
Certificate chain issues
# Verify certificate chain openssl verify -CAfile ca-cert.pem server-cert.pem -
SAN mismatch
# Check certificate SAN entries openssl x509 -in server-cert.pem -text -noout | grep -A1 "Subject Alternative Name" -
TLS handshake failures
# Test TLS connection openssl s_client -connect provider-service:9443 -cert client-cert.pem -key client-key.pem -CAfile ca-cert.pem -
Clock skew issues
# Ensure time synchronization ntpdate -s time.nist.gov
Debug Commands
# Check certificate validity
kubectl get secret provider-tls -o yaml | grep tls.crt | base64 -d | openssl x509 -text -noout
# Monitor certificate expiration
kubectl get certificates
# Check provider logs for TLS errors
kubectl logs deployment/provider-deployment | grep -i tls
External Secrets Management
This guide covers integrating VirtRigaud providers with external secret management systems using ExternalSecrets operators and best practices for credential security.
Overview
External secret management provides secure, centralized credential storage and automatic secret rotation. Supported systems include:
- HashiCorp Vault: Enterprise secret management with dynamic secrets
- AWS Secrets Manager: Cloud-native secret storage with automatic rotation
- Azure Key Vault: Azure-integrated secret management
- Google Secret Manager: GCP secret storage service
- Kubernetes External Secrets: Generic external secret integration
External Secrets Operator Setup
Installation
# Install External Secrets Operator
helm repo add external-secrets https://charts.external-secrets.io
helm repo update
helm install external-secrets external-secrets/external-secrets \
--namespace external-secrets-system \
--create-namespace \
--set installCRDs=true
Basic Configuration
# ServiceAccount for External Secrets Operator
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-secrets
namespace: virtrigaud-system
annotations:
# For AWS IRSA (IAM Roles for Service Accounts)
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT:role/external-secrets-role
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: external-secrets
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["create", "update", "patch", "delete", "get", "list", "watch"]
- apiGroups: ["external-secrets.io"]
resources: ["*"]
verbs: ["*"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: external-secrets
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: external-secrets
subjects:
- kind: ServiceAccount
name: external-secrets
namespace: virtrigaud-system
HashiCorp Vault Integration
Vault SecretStore
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-secret-store
namespace: virtrigaud-system
spec:
provider:
vault:
server: "https://vault.example.com:8200"
path: "secret"
version: "v2"
auth:
# Use Kubernetes service account for authentication
kubernetes:
mountPath: "kubernetes"
role: "virtrigaud-role"
serviceAccountRef:
name: "external-secrets"
---
# For multi-namespace access
apiVersion: external-secrets.io/v1beta1
kind: ClusterSecretStore
metadata:
name: vault-cluster-store
spec:
provider:
vault:
server: "https://vault.example.com:8200"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes"
role: "virtrigaud-cluster-role"
serviceAccountRef:
name: "external-secrets"
namespace: "virtrigaud-system"
Vault Policy Configuration
# Vault policy for VirtRigaud secrets
path "secret/data/virtrigaud/*" {
capabilities = ["read"]
}
path "secret/data/providers/*" {
capabilities = ["read"]
}
# Dynamic database credentials
path "database/creds/readonly" {
capabilities = ["read"]
}
# PKI for TLS certificates
path "pki/issue/virtrigaud" {
capabilities = ["create", "update"]
}
vSphere Credentials from Vault
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: vsphere-credentials
namespace: vsphere-providers
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-secret-store
kind: SecretStore
target:
name: vsphere-credentials
creationPolicy: Owner
template:
type: Opaque
data:
username: "{{ .username }}"
password: "{{ .password }}"
server: "{{ .server }}"
# Optional: TLS certificate
ca.crt: "{{ .ca_cert | b64dec }}"
data:
- secretKey: username
remoteRef:
key: secret/data/providers/vsphere
property: username
- secretKey: password
remoteRef:
key: secret/data/providers/vsphere
property: password
- secretKey: server
remoteRef:
key: secret/data/providers/vsphere
property: server
- secretKey: ca_cert
remoteRef:
key: secret/data/providers/vsphere
property: ca_cert
AWS Secrets Manager Integration
AWS SecretStore with IRSA
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: aws-secrets-manager
namespace: virtrigaud-system
spec:
provider:
aws:
service: SecretsManager
region: us-west-2
auth:
# Use IAM Roles for Service Accounts (IRSA)
serviceAccount:
name: external-secrets
namespace: virtrigaud-system
---
# IAM Policy for the IRSA role
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"secretsmanager:GetSecretValue",
"secretsmanager:DescribeSecret"
],
"Resource": [
"arn:aws:secretsmanager:us-west-2:ACCOUNT:secret:virtrigaud/*"
]
}
]
}
AWS Secret Configuration
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: aws-provider-credentials
namespace: provider-namespace
spec:
refreshInterval: 15m
secretStoreRef:
name: aws-secrets-manager
kind: SecretStore
target:
name: provider-credentials
creationPolicy: Owner
data:
- secretKey: credentials.json
remoteRef:
key: "virtrigaud/provider-credentials"
property: "credentials"
- secretKey: api-key
remoteRef:
key: "virtrigaud/api-keys"
property: "provider-api-key"
Azure Key Vault Integration
Azure SecretStore
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: azure-key-vault
namespace: virtrigaud-system
spec:
provider:
azurekv:
vaultUrl: "https://virtrigaud-vault.vault.azure.net/"
authType: "ManagedIdentity"
# Or use Service Principal:
# authType: "ServicePrincipal"
# authSecretRef:
# clientId:
# name: azure-secret
# key: client-id
# clientSecret:
# name: azure-secret
# key: client-secret
tenantId: "tenant-id-here"
---
# Managed Identity setup (ARM template or Terraform)
apiVersion: v1
kind: Secret
metadata:
name: azure-config
namespace: virtrigaud-system
type: Opaque
data:
# Base64 encoded values
tenant-id: dGVuYW50LWlkLWhlcmU=
client-id: Y2xpZW50LWlkLWhlcmU=
client-secret: Y2xpZW50LXNlY3JldC1oZXJl
Azure Key Vault Secret
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: azure-provider-secrets
namespace: provider-namespace
spec:
refreshInterval: 30m
secretStoreRef:
name: azure-key-vault
kind: SecretStore
target:
name: provider-secrets
creationPolicy: Owner
data:
- secretKey: subscription-id
remoteRef:
key: "azure-subscription-id"
- secretKey: resource-group
remoteRef:
key: "azure-resource-group"
- secretKey: client-certificate
remoteRef:
key: "azure-client-cert"
Google Secret Manager Integration
GCP SecretStore
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: gcp-secret-manager
namespace: virtrigaud-system
spec:
provider:
gcpsm:
projectId: "your-gcp-project"
auth:
# Use Workload Identity
workloadIdentity:
clusterLocation: us-central1
clusterName: virtrigaud-cluster
serviceAccountRef:
name: external-secrets
namespace: virtrigaud-system
---
# Workload Identity binding
apiVersion: v1
kind: ServiceAccount
metadata:
name: external-secrets
namespace: virtrigaud-system
annotations:
iam.gke.io/gcp-service-account: virtrigaud-secrets@PROJECT.iam.gserviceaccount.com
GCP Secret Configuration
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: gcp-provider-secrets
namespace: provider-namespace
spec:
refreshInterval: 20m
secretStoreRef:
name: gcp-secret-manager
kind: SecretStore
target:
name: gcp-provider-credentials
creationPolicy: Owner
data:
- secretKey: service-account.json
remoteRef:
key: "virtrigaud-service-account"
version: "latest"
- secretKey: project-id
remoteRef:
key: "gcp-project-id"
version: "latest"
Provider-Specific Configurations
vSphere Provider with Dynamic Credentials
# Vault configuration for vSphere dynamic credentials
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: vsphere-dynamic-credentials
namespace: vsphere-providers
spec:
refreshInterval: 15m # Short refresh for dynamic credentials
secretStoreRef:
name: vault-secret-store
kind: SecretStore
target:
name: vsphere-dynamic-creds
creationPolicy: Owner
template:
type: Opaque
data:
username: "{{ .username }}"
password: "{{ .password }}"
server: "{{ .server }}"
session_ttl: "{{ .lease_duration }}"
data:
- secretKey: username
remoteRef:
key: "vsphere/creds/dynamic-role"
property: "username"
- secretKey: password
remoteRef:
key: "vsphere/creds/dynamic-role"
property: "password"
- secretKey: server
remoteRef:
key: "secret/data/vsphere/static"
property: "server"
- secretKey: lease_duration
remoteRef:
key: "vsphere/creds/dynamic-role"
property: "lease_duration"
---
# Provider deployment using dynamic credentials
apiVersion: apps/v1
kind: Deployment
metadata:
name: vsphere-provider
namespace: vsphere-providers
spec:
template:
spec:
containers:
- name: provider
env:
- name: VSPHERE_USERNAME
valueFrom:
secretKeyRef:
name: vsphere-dynamic-creds
key: username
- name: VSPHERE_PASSWORD
valueFrom:
secretKeyRef:
name: vsphere-dynamic-creds
key: password
- name: VSPHERE_SERVER
valueFrom:
secretKeyRef:
name: vsphere-dynamic-creds
key: server
Libvirt Provider with SSH Keys
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: libvirt-ssh-keys
namespace: libvirt-providers
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-secret-store
kind: SecretStore
target:
name: libvirt-ssh-credentials
creationPolicy: Owner
template:
type: kubernetes.io/ssh-auth
data:
ssh-privatekey: "{{ .private_key }}"
ssh-publickey: "{{ .public_key }}"
known_hosts: "{{ .known_hosts }}"
data:
- secretKey: private_key
remoteRef:
key: "secret/data/libvirt/ssh"
property: "private_key"
- secretKey: public_key
remoteRef:
key: "secret/data/libvirt/ssh"
property: "public_key"
- secretKey: known_hosts
remoteRef:
key: "secret/data/libvirt/ssh"
property: "known_hosts"
---
# Mount SSH keys in provider
apiVersion: apps/v1
kind: Deployment
metadata:
name: libvirt-provider
spec:
template:
spec:
containers:
- name: provider
volumeMounts:
- name: ssh-keys
mountPath: /home/provider/.ssh
readOnly: true
env:
- name: SSH_AUTH_SOCK
value: "/tmp/ssh-agent.sock"
volumes:
- name: ssh-keys
secret:
secretName: libvirt-ssh-credentials
defaultMode: 0600
TLS Certificate Management
Automatic TLS with External Secrets
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: provider-tls-certs
namespace: provider-namespace
spec:
refreshInterval: 24h
secretStoreRef:
name: vault-secret-store
kind: SecretStore
target:
name: provider-tls
creationPolicy: Owner
template:
type: kubernetes.io/tls
data:
tls.crt: "{{ .certificate }}"
tls.key: "{{ .private_key }}"
ca.crt: "{{ .ca_certificate }}"
data:
- secretKey: certificate
remoteRef:
key: "pki/issue/virtrigaud"
property: "certificate"
- secretKey: private_key
remoteRef:
key: "pki/issue/virtrigaud"
property: "private_key"
- secretKey: ca_certificate
remoteRef:
key: "pki/issue/virtrigaud"
property: "issuing_ca"
---
# Vault PKI configuration (run in Vault)
# vault write pki/roles/virtrigaud \
# allowed_domains="virtrigaud.local,provider-service" \
# allow_subdomains=true \
# max_ttl="8760h" \
# generate_lease=true
Monitoring and Alerting
ExternalSecret Monitoring
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: external-secrets-monitor
namespace: monitoring
spec:
selector:
matchLabels:
app.kubernetes.io/name: external-secrets
endpoints:
- port: metrics
interval: 30s
---
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: external-secrets-alerts
namespace: monitoring
spec:
groups:
- name: external-secrets.rules
rules:
- alert: ExternalSecretSyncFailure
expr: increase(external_secrets_sync_calls_error[5m]) > 0
for: 2m
labels:
severity: warning
annotations:
summary: "External secret sync failure"
description: "ExternalSecret {{ $labels.name }} in namespace {{ $labels.namespace }} failed to sync"
- alert: ExternalSecretStale
expr: (time() - external_secrets_sync_calls_total) > 3600
for: 5m
labels:
severity: critical
annotations:
summary: "External secret not refreshed"
description: "ExternalSecret {{ $labels.name }} has not been refreshed for over 1 hour"
Custom Monitoring
package monitoring
import (
"context"
"time"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promauto"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/client-go/kubernetes"
)
var (
secretAge = promauto.NewGaugeVec(
prometheus.GaugeOpts{
Name: "virtrigaud_secret_age_seconds",
Help: "Age of provider secrets in seconds",
},
[]string{"secret_name", "namespace", "provider"},
)
secretRotationCount = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "virtrigaud_secret_rotations_total",
Help: "Total number of secret rotations",
},
[]string{"secret_name", "namespace", "provider"},
)
)
type SecretMonitor struct {
client kubernetes.Interface
}
func (sm *SecretMonitor) MonitorSecrets(ctx context.Context) {
ticker := time.NewTicker(60 * time.Second)
defer ticker.Stop()
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
sm.updateSecretMetrics()
}
}
}
func (sm *SecretMonitor) updateSecretMetrics() {
secrets, err := sm.client.CoreV1().Secrets("").List(context.TODO(), metav1.ListOptions{
LabelSelector: "app.kubernetes.io/managed-by=external-secrets",
})
if err != nil {
return
}
for _, secret := range secrets.Items {
provider := secret.Labels["provider"]
if provider == "" {
continue
}
age := time.Since(secret.CreationTimestamp.Time).Seconds()
secretAge.WithLabelValues(secret.Name, secret.Namespace, provider).Set(age)
}
}
Security Best Practices
1. Least Privilege Access
# Minimal Vault policy for specific provider
path "secret/data/providers/vsphere/{{ identity.entity.aliases.auth_kubernetes_*.metadata.service_account_namespace }}" {
capabilities = ["read"]
}
# Time-bound secrets
path "vsphere/creds/readonly" {
capabilities = ["read"]
allowed_parameters = {
"ttl" = ["15m", "30m", "1h"]
}
}
2. Secret Rotation Automation
apiVersion: batch/v1
kind: CronJob
metadata:
name: rotate-provider-secrets
namespace: virtrigaud-system
spec:
schedule: "0 2 * * 0" # Weekly on Sunday at 2 AM
jobTemplate:
spec:
template:
spec:
containers:
- name: secret-rotator
image: virtrigaud/secret-rotator:latest
command:
- /bin/sh
- -c
- |
# Force refresh of all external secrets
kubectl annotate externalsecret --all \
force-sync="$(date +%s)" \
--namespace=vsphere-providers
# Restart provider deployments to pick up new secrets
kubectl rollout restart deployment \
--selector=app.kubernetes.io/name=virtrigaud-provider-runtime \
--namespace=vsphere-providers
restartPolicy: OnFailure
serviceAccountName: secret-rotator
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: secret-rotator
rules:
- apiGroups: ["external-secrets.io"]
resources: ["externalsecrets"]
verbs: ["get", "list", "patch"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "patch"]
3. Audit Logging
# Vault audit configuration
vault audit enable file file_path=/vault/logs/audit.log
# Example audit log entry structure
{
"time": "2023-12-01T10:30:00Z",
"type": "request",
"auth": {
"client_token": "hvs.xxx",
"accessor": "hmac-sha256:xxx",
"display_name": "kubernetes-virtrigaud-system-external-secrets",
"policies": ["virtrigaud-policy"],
"metadata": {
"service_account_name": "external-secrets",
"service_account_namespace": "virtrigaud-system"
}
},
"request": {
"id": "request-id",
"operation": "read",
"path": "secret/data/providers/vsphere",
"data": null,
"remote_address": "10.0.0.100"
}
}
4. Emergency Procedures
#!/bin/bash
# emergency-secret-rotation.sh
echo "=== Emergency Secret Rotation ==="
# 1. Revoke all active leases for a provider
vault lease revoke -prefix vsphere/creds/
# 2. Force refresh all external secrets
kubectl get externalsecret --all-namespaces -o name | \
xargs -I {} kubectl annotate {} force-sync="$(date +%s)"
# 3. Restart all provider deployments
kubectl get deployments --all-namespaces \
-l app.kubernetes.io/name=virtrigaud-provider-runtime \
-o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}' | \
while read deployment; do
kubectl rollout restart deployment $deployment
done
# 4. Monitor rollout status
kubectl get deployments --all-namespaces \
-l app.kubernetes.io/name=virtrigaud-provider-runtime \
-o jsonpath='{range .items[*]}{.metadata.namespace}/{.metadata.name}{"\n"}{end}' | \
while read deployment; do
kubectl rollout status deployment $deployment --timeout=300s
done
echo "Emergency rotation completed"
5. Secret Validation
package validation
import (
"crypto/x509"
"encoding/pem"
"fmt"
"time"
)
func ValidateSecret(secretData map[string][]byte, secretType string) error {
switch secretType {
case "tls":
return validateTLSSecret(secretData)
case "ssh":
return validateSSHSecret(secretData)
case "credential":
return validateCredentialSecret(secretData)
}
return nil
}
func validateTLSSecret(data map[string][]byte) error {
cert, ok := data["tls.crt"]
if !ok {
return fmt.Errorf("missing tls.crt")
}
key, ok := data["tls.key"]
if !ok {
return fmt.Errorf("missing tls.key")
}
// Parse certificate
block, _ := pem.Decode(cert)
if block == nil {
return fmt.Errorf("failed to parse certificate PEM")
}
parsedCert, err := x509.ParseCertificate(block.Bytes)
if err != nil {
return fmt.Errorf("failed to parse certificate: %w", err)
}
// Check expiration
if time.Now().After(parsedCert.NotAfter) {
return fmt.Errorf("certificate expired on %v", parsedCert.NotAfter)
}
if time.Now().Add(24*time.Hour).After(parsedCert.NotAfter) {
return fmt.Errorf("certificate expires soon on %v", parsedCert.NotAfter)
}
// Validate key
block, _ = pem.Decode(key)
if block == nil {
return fmt.Errorf("failed to parse private key PEM")
}
return nil
}
Network Policies for Provider Security
This guide covers Kubernetes NetworkPolicy configurations to secure communication between VirtRigaud components and provider services.
Overview
NetworkPolicies provide network-level security by controlling traffic flow between pods, namespaces, and external endpoints. For VirtRigaud providers, this includes:
- Ingress Control: Restricting which services can communicate with providers
- Egress Control: Limiting provider access to external hypervisor endpoints
- Namespace Isolation: Preventing cross-tenant communication
- External Access: Controlling access to hypervisor management interfaces
Basic NetworkPolicy Template
Provider Ingress Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: provider-ingress
namespace: provider-namespace
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: virtrigaud-provider
policyTypes:
- Ingress
ingress:
# Allow from VirtRigaud manager
- from:
- namespaceSelector:
matchLabels:
name: virtrigaud-system
- podSelector:
matchLabels:
app.kubernetes.io/name: virtrigaud-manager
ports:
- protocol: TCP
port: 9443 # gRPC provider port
# Allow health checks from monitoring
- from:
- namespaceSelector:
matchLabels:
name: monitoring
- podSelector:
matchLabels:
app: prometheus
ports:
- protocol: TCP
port: 8080 # Health/metrics port
# Allow from same namespace (for debugging)
- from:
- podSelector: {}
ports:
- protocol: TCP
port: 8080
Provider Egress Policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: provider-egress
namespace: provider-namespace
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: virtrigaud-provider
policyTypes:
- Egress
egress:
# Allow DNS resolution
- to: []
ports:
- protocol: UDP
port: 53
- protocol: TCP
port: 53
# Allow HTTPS to Kubernetes API
- to:
- namespaceSelector:
matchLabels:
name: kube-system
ports:
- protocol: TCP
port: 443
# Allow access to hypervisor management interfaces
- to: []
ports:
- protocol: TCP
port: 443 # vCenter HTTPS
- protocol: TCP
port: 80 # vCenter HTTP (if needed)
# For libvirt providers - allow access to hypervisor nodes
- to:
- podSelector:
matchLabels:
node-role.kubernetes.io/worker: "true"
ports:
- protocol: TCP
port: 16509 # libvirt daemon
Environment-Specific Policies
vSphere Provider
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: vsphere-provider-policy
namespace: vsphere-providers
labels:
provider: vsphere
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: virtrigaud-provider-runtime
provider: vsphere
policyTypes:
- Ingress
- Egress
ingress:
# Manager access
- from:
- namespaceSelector:
matchLabels:
name: virtrigaud-system
ports:
- protocol: TCP
port: 9443
# Monitoring access
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 8080
egress:
# DNS
- to: []
ports:
- protocol: UDP
port: 53
# vCenter access (specific IP ranges)
- to:
- ipBlock:
cidr: 10.0.0.0/8
except:
- 10.244.0.0/16 # Exclude pod network
ports:
- protocol: TCP
port: 443
- to:
- ipBlock:
cidr: 192.168.0.0/16
ports:
- protocol: TCP
port: 443
# ESXi host access for direct operations
- to:
- ipBlock:
cidr: 10.1.0.0/24 # ESXi management network
ports:
- protocol: TCP
port: 443
- protocol: TCP
port: 902 # vCenter agent
Libvirt Provider
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: libvirt-provider-policy
namespace: libvirt-providers
labels:
provider: libvirt
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: virtrigaud-provider-runtime
provider: libvirt
policyTypes:
- Ingress
- Egress
ingress:
# Manager access
- from:
- namespaceSelector:
matchLabels:
name: virtrigaud-system
ports:
- protocol: TCP
port: 9443
# Monitoring access
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 8080
egress:
# DNS
- to: []
ports:
- protocol: UDP
port: 53
# Access to hypervisor nodes
- to: []
ports:
- protocol: TCP
port: 16509 # libvirt daemon
- protocol: TCP
port: 22 # SSH for remote libvirt
# Access to shared storage (NFS, iSCSI, etc.)
- to:
- ipBlock:
cidr: 10.2.0.0/24 # Storage network
ports:
- protocol: TCP
port: 2049 # NFS
- protocol: TCP
port: 3260 # iSCSI
- protocol: UDP
port: 111 # RPC portmapper
Mock Provider (Development)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: mock-provider-policy
namespace: development
labels:
provider: mock
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: virtrigaud-provider-runtime
provider: mock
policyTypes:
- Ingress
- Egress
ingress:
# Allow from manager and other development pods
- from:
- namespaceSelector:
matchLabels:
environment: development
ports:
- protocol: TCP
port: 9443
- protocol: TCP
port: 8080
egress:
# Allow all egress for development environment
- to: []
Multi-Tenant Isolation
Tenant Namespace Policies
# Template for tenant-specific policies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: tenant-isolation
namespace: tenant-{{TENANT_NAME}}
labels:
tenant: "{{TENANT_NAME}}"
spec:
podSelector: {} # Apply to all pods in namespace
policyTypes:
- Ingress
- Egress
ingress:
# Allow from same tenant namespace
- from:
- namespaceSelector:
matchLabels:
tenant: "{{TENANT_NAME}}"
# Allow from VirtRigaud system namespace
- from:
- namespaceSelector:
matchLabels:
name: virtrigaud-system
# Allow from monitoring namespace
- from:
- namespaceSelector:
matchLabels:
name: monitoring
egress:
# Allow to same tenant namespace
- to:
- namespaceSelector:
matchLabels:
tenant: "{{TENANT_NAME}}"
# Allow to VirtRigaud system namespace
- to:
- namespaceSelector:
matchLabels:
name: virtrigaud-system
# DNS resolution
- to: []
ports:
- protocol: UDP
port: 53
# External hypervisor access (tenant-specific IP ranges)
- to:
- ipBlock:
cidr: "{{TENANT_HYPERVISOR_CIDR}}"
ports:
- protocol: TCP
port: 443
Cross-Tenant Communication Prevention
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: deny-cross-tenant
namespace: tenant-production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
# Explicitly deny from other tenant namespaces
- from: []
# Empty from selector with explicit namespace exclusions
egress:
# Explicitly deny to other tenant namespaces
- to:
- namespaceSelector:
matchLabels:
name: virtrigaud-system
- to:
- namespaceSelector:
matchLabels:
name: monitoring
- to:
- namespaceSelector:
matchLabels:
tenant: production
# Deny all other namespace access
Advanced Policies
Time-Based Access Control
# Use external controllers like OPA Gatekeeper for time-based policies
apiVersion: templates.gatekeeper.sh/v1beta1
kind: ConstraintTemplate
metadata:
name: timerestriction
spec:
crd:
spec:
names:
kind: TimeRestriction
validation:
type: object
properties:
allowedHours:
type: array
items:
type: integer
description: "Allowed hours (0-23) for network access"
targets:
- target: admission.k8s.gatekeeper.sh
rego: |
package timerestriction
violation[{"msg": msg}] {
current_hour := time.now_ns() / 1000000000 / 3600 % 24
not current_hour in input.parameters.allowedHours
msg := sprintf("Network access not allowed at hour %v", [current_hour])
}
---
apiVersion: constraints.gatekeeper.sh/v1beta1
kind: TimeRestriction
metadata:
name: business-hours-only
spec:
match:
kinds:
- apiGroups: ["networking.k8s.io"]
kinds: ["NetworkPolicy"]
namespaces: ["production"]
parameters:
allowedHours: [8, 9, 10, 11, 12, 13, 14, 15, 16, 17] # 8 AM - 5 PM
Dynamic IP Allow-listing
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: dynamic-hypervisor-access
namespace: provider-namespace
annotations:
# Use external controllers to update IP blocks dynamically
network-policy-controller/update-interval: "300s"
network-policy-controller/ip-source: "configmap:hypervisor-ips"
spec:
podSelector:
matchLabels:
app.kubernetes.io/name: virtrigaud-provider
policyTypes:
- Egress
egress:
# Will be dynamically updated by controller
- to:
- ipBlock:
cidr: 10.0.0.0/8
# Static rules remain
- to: []
ports:
- protocol: UDP
port: 53
Monitoring and Troubleshooting
NetworkPolicy Monitoring
# ServiceMonitor for network policy violations
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: networkpolicy-monitoring
namespace: monitoring
spec:
selector:
matchLabels:
app: networkpolicy-exporter
endpoints:
- port: metrics
interval: 30s
path: /metrics
---
# Example alerts for network policy violations
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: networkpolicy-alerts
namespace: monitoring
spec:
groups:
- name: networkpolicy.rules
rules:
- alert: NetworkPolicyDeniedConnections
expr: increase(networkpolicy_denied_connections_total[5m]) > 10
for: 2m
labels:
severity: warning
annotations:
summary: "High number of denied network connections"
description: "{{ $labels.source_namespace }}/{{ $labels.source_pod }} had {{ $value }} denied connections to {{ $labels.dest_namespace }}/{{ $labels.dest_pod }}"
Debug NetworkPolicies
#!/bin/bash
# debug-networkpolicy.sh
NAMESPACE=${1:-default}
POD_NAME=${2}
echo "=== NetworkPolicy Debug for $NAMESPACE/$POD_NAME ==="
# List all NetworkPolicies in namespace
echo "NetworkPolicies in namespace $NAMESPACE:"
kubectl get networkpolicy -n $NAMESPACE
# Show specific NetworkPolicy details
echo -e "\nNetworkPolicy details:"
kubectl get networkpolicy -n $NAMESPACE -o yaml
# Test connectivity
if [ ! -z "$POD_NAME" ]; then
echo -e "\nTesting connectivity from $POD_NAME:"
# Test DNS resolution
kubectl exec -n $NAMESPACE $POD_NAME -- nslookup kubernetes.default.svc.cluster.local
# Test internal connectivity
kubectl exec -n $NAMESPACE $POD_NAME -- wget -qO- --timeout=5 http://kubernetes.default.svc.cluster.local/api
# Test external connectivity (adjust as needed)
kubectl exec -n $NAMESPACE $POD_NAME -- wget -qO- --timeout=5 https://google.com
fi
# Check iptables rules (if accessible)
echo -e "\nIPTables rules (if accessible):"
kubectl get nodes -o wide
echo "Run the following on a node to see iptables:"
echo "sudo iptables -L -n | grep -E '(KUBE|Chain)'"
CNI-Specific Troubleshooting
Calico
# Check Calico network policies
kubectl get networkpolicy --all-namespaces
kubectl get globalnetworkpolicy
# Check Calico endpoints
kubectl get endpoints --all-namespaces
# Debug Calico connectivity
kubectl exec -it -n kube-system <calico-node-pod> -- /bin/sh
calicoctl get wep --all-namespaces
calicoctl get netpol --all-namespaces
Cilium
# Check Cilium network policies
kubectl get cnp --all-namespaces # Cilium Network Policies
kubectl get ccnp --all-namespaces # Cilium Cluster Network Policies
# Debug Cilium connectivity
kubectl exec -it -n kube-system <cilium-pod> -- cilium endpoint list
kubectl exec -it -n kube-system <cilium-pod> -- cilium policy get
Security Best Practices
1. Principle of Least Privilege
# Example: Minimal egress for a provider
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: minimal-egress-example
spec:
podSelector:
matchLabels:
app: provider
policyTypes:
- Egress
egress:
# Only allow what's absolutely necessary
- to: []
ports:
- protocol: UDP
port: 53 # DNS only
- to:
- ipBlock:
cidr: 10.1.1.100/32 # Specific vCenter IP only
ports:
- protocol: TCP
port: 443 # HTTPS only
2. Default Deny Policies
# Apply default deny to all namespaces
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: production
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
# Empty ingress/egress rules = deny all
3. Regular Policy Auditing
#!/bin/bash
# audit-networkpolicies.sh
echo "=== NetworkPolicy Audit Report ==="
echo "Generated: $(date)"
echo
# Check for namespaces without NetworkPolicies
echo "Namespaces without NetworkPolicies:"
for ns in $(kubectl get namespaces -o jsonpath='{.items[*].metadata.name}'); do
if [ $(kubectl get networkpolicy -n $ns --no-headers 2>/dev/null | wc -l) -eq 0 ]; then
echo " - $ns (WARNING: No network policies)"
fi
done
echo
# Check for overly permissive policies
echo "Potentially overly permissive policies:"
kubectl get networkpolicy --all-namespaces -o json | jq -r '
.items[] |
select(
(.spec.egress[]?.to // []) | length == 0 or
(.spec.ingress[]?.from // []) | length == 0
) |
"\(.metadata.namespace)/\(.metadata.name) - Check for overly broad rules"
'
echo
# Check for unused NetworkPolicies
echo "NetworkPolicies with no matching pods:"
kubectl get networkpolicy --all-namespaces -o json | jq -r '
.items[] as $np |
$np.metadata.namespace as $ns |
$np.spec.podSelector as $selector |
if ($selector | keys | length) == 0 then
"\($ns)/\($np.metadata.name) - Applies to all pods in namespace"
else
"\($ns)/\($np.metadata.name) - Check if pods match selector"
end
'
4. Integration with Service Mesh
# Example: Istio integration with NetworkPolicies
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: istio-compatible-policy
spec:
podSelector:
matchLabels:
app: provider
policyTypes:
- Ingress
- Egress
ingress:
# Allow Istio sidecar communication
- from:
- podSelector:
matchLabels:
app: istio-proxy
ports:
- protocol: TCP
port: 15090 # Istio pilot
# Your application ports
- from:
- namespaceSelector:
matchLabels:
name: virtrigaud-system
ports:
- protocol: TCP
port: 9443
egress:
# Allow Istio control plane
- to:
- namespaceSelector:
matchLabels:
name: istio-system
ports:
- protocol: TCP
port: 15010 # Pilot
- protocol: TCP
port: 15011 # Pilot secure
CLI Tools Reference
VirtRigaud provides a comprehensive set of command-line tools for managing virtual machines, developing providers, running conformance tests, and performing load testing. This guide covers all available CLI tools and their usage.
Overview
| Tool | Purpose | Target Users |
|---|---|---|
vrtg | Main CLI for VM management and operations | End users, DevOps teams, System administrators |
vcts | Conformance testing suite | Provider developers, QA teams, CI/CD pipelines |
vrtg-provider | Provider development toolkit | Provider developers, Contributors |
virtrigaud-loadgen | Load testing and benchmarking | Performance engineers, SREs |
Installation
From GitHub Releases
# Download the latest release
export VIRTRIGAUD_VERSION="v0.2.1"
export PLATFORM="linux-amd64" # or darwin-amd64, windows-amd64
# Install main CLI tool
curl -L "https://github.com/projectbeskar/virtrigaud/releases/download/${VIRTRIGAUD_VERSION}/vrtg-${PLATFORM}" -o vrtg
chmod +x vrtg
sudo mv vrtg /usr/local/bin/
# Install all CLI tools
curl -L "https://github.com/projectbeskar/virtrigaud/releases/download/${VIRTRIGAUD_VERSION}/virtrigaud-cli-${PLATFORM}.tar.gz" | tar xz
sudo mv vrtg vcts vrtg-provider virtrigaud-loadgen /usr/local/bin/
From Source
git clone https://github.com/projectbeskar/virtrigaud.git
cd virtrigaud
# Build all CLI tools
make build-cli
# Install to /usr/local/bin
sudo make install-cli
# Or install to custom location
make install-cli PREFIX=/usr/local
Using Go
# Install specific version
go install github.com/projectbeskar/virtrigaud/cmd/vrtg@v0.2.1
go install github.com/projectbeskar/virtrigaud/cmd/vcts@v0.2.1
go install github.com/projectbeskar/virtrigaud/cmd/vrtg-provider@v0.2.1
go install github.com/projectbeskar/virtrigaud/cmd/virtrigaud-loadgen@v0.2.1
# Install latest
go install github.com/projectbeskar/virtrigaud/cmd/vrtg@latest
Completion
Enable shell completion for enhanced productivity:
# Bash
vrtg completion bash > /etc/bash_completion.d/vrtg
source /etc/bash_completion.d/vrtg
# Zsh
vrtg completion zsh > "${fpath[1]}/_vrtg"
# Fish
vrtg completion fish > ~/.config/fish/completions/vrtg.fish
# PowerShell
vrtg completion powershell | Out-String | Invoke-Expression
vrtg
The main CLI tool for managing VirtRigaud resources and virtual machines.
Global Flags
--kubeconfig string Path to kubeconfig file (default: $KUBECONFIG or ~/.kube/config)
--namespace string Kubernetes namespace (default: "default")
--output string Output format: table, json, yaml (default: "table")
--timeout duration Operation timeout (default: 30s)
--verbose Enable verbose output
-h, --help Help for vrtg
Commands
vm - Virtual Machine Management
Manage virtual machines with comprehensive lifecycle operations.
# List virtual machines
vrtg vm list [flags]
# Describe a virtual machine
vrtg vm describe <name> [flags]
# Show VM events
vrtg vm events <name> [flags]
# Get VM console URL
vrtg vm console-url <name> [flags]
Flags:
--all-namespaces: List VMs across all namespaces--label-selector: Filter by labels (e.g.,app=web,env=prod)--field-selector: Filter by fields (e.g.,spec.powerState=On)--sort-by: Sort output by column (name, namespace, powerState, provider)--watch: Watch for changes
Examples:
# List all VMs in table format
vrtg vm list
# List VMs with custom output format
vrtg vm list --output json --namespace production
# List VMs across all namespaces
vrtg vm list --all-namespaces
# Filter VMs by labels
vrtg vm list --label-selector environment=production,tier=web
# Watch VM status changes
vrtg vm list --watch
# Get detailed VM information
vrtg vm describe my-vm --output yaml
# Get VM console URL
vrtg vm console-url my-vm
# Show recent VM events
vrtg vm events my-vm
provider - Provider Management
Manage provider configurations and monitor their health.
# List providers
vrtg provider list [flags]
# Show provider status
vrtg provider status <name> [flags]
# Show provider logs
vrtg provider logs <name> [flags]
Flags:
--follow: Follow log output (for logs command)--tail: Number of lines to show from end of logs (default: 100)--since: Show logs since timestamp (e.g., 1h, 30m)
Examples:
# List all providers
vrtg provider list
# Check provider status
vrtg provider status vsphere-provider
# View provider logs
vrtg provider logs vsphere-provider --tail 50
# Follow provider logs in real-time
vrtg provider logs vsphere-provider --follow
# Show logs from last hour
vrtg provider logs vsphere-provider --since 1h
snapshot - Snapshot Management
Manage VM snapshots for backup and recovery.
# Create a VM snapshot
vrtg snapshot create <vm-name> <snapshot-name> [flags]
# List snapshots
vrtg snapshot list [vm-name] [flags]
# Revert VM to snapshot
vrtg snapshot revert <vm-name> <snapshot-name> [flags]
Flags for create:
--description: Snapshot description--include-memory: Include memory state in snapshot
Examples:
# Create a simple snapshot
vrtg snapshot create my-vm pre-upgrade
# Create snapshot with description and memory
vrtg snapshot create my-vm pre-maintenance \
--description "Before maintenance window" \
--include-memory
# List all snapshots
vrtg snapshot list
# List snapshots for specific VM
vrtg snapshot list my-vm
# Revert to a snapshot
vrtg snapshot revert my-vm pre-upgrade
clone - VM Cloning
Clone virtual machines for rapid provisioning.
# Clone a virtual machine
vrtg clone run <source-vm> <target-vm> [flags]
# List clone operations
vrtg clone list [flags]
Flags for run:
--linked: Create linked clone (faster, space-efficient)--target-namespace: Namespace for target VM--customize: Apply customization during clone
Examples:
# Simple VM clone
vrtg clone run template-vm new-vm
# Linked clone for development
vrtg clone run production-vm dev-vm --linked
# Clone to different namespace
vrtg clone run template-vm test-vm --target-namespace testing
# List clone operations
vrtg clone list
conformance - Provider Testing
Run conformance tests against providers.
# Run conformance tests
vrtg conformance run <provider> [flags]
Flags:
--output-dir: Directory for test results--skip-tests: Comma-separated list of tests to skip--timeout: Test timeout (default: 30m)
Examples:
# Run conformance tests
vrtg conformance run vsphere-provider
# Run tests with custom timeout
vrtg conformance run vsphere-provider --timeout 1h
# Skip specific tests
vrtg conformance run vsphere-provider --skip-tests "test-large-vms,test-network"
diag - Diagnostics
Diagnostic tools for troubleshooting.
# Create diagnostic bundle
vrtg diag bundle [flags]
Flags:
--output: Output file path (default: virtrigaud-diag-<timestamp>.tar.gz)--include-logs: Include provider logs in bundle--since: Collect logs since timestamp
Examples:
# Create diagnostic bundle
vrtg diag bundle
# Create bundle with logs from last 2 hours
vrtg diag bundle --include-logs --since 2h
# Custom output location
vrtg diag bundle --output /tmp/debug-bundle.tar.gz
init - Installation
Initialize VirtRigaud in a Kubernetes cluster.
# Initialize virtrigaud
vrtg init [flags]
Flags:
--chart-version: Helm chart version to install--namespace: Installation namespace (default: virtrigaud-system)--values: Values file for Helm chart--dry-run: Show what would be installed
Examples:
# Basic installation
vrtg init
# Install specific version
vrtg init --chart-version v0.2.1
# Install with custom values
vrtg init --values custom-values.yaml
# Dry run to see what would be installed
vrtg init --dry-run
vcts
VirtRigaud Conformance Test Suite - runs standardized tests against providers.
Usage
vcts [command] [flags]
Global Flags
--kubeconfig string Path to kubeconfig file
--namespace string Kubernetes namespace (default: "virtrigaud-system")
--provider string Provider name to test
--output-dir string Output directory for test results (default: "./conformance-results")
--skip-tests strings Comma-separated list of tests to skip
--timeout duration Test timeout (default: 30m)
--parallel int Number of parallel test executions (default: 1)
--verbose Enable verbose output
Commands
run - Execute Tests
# Run all conformance tests
vcts run --provider vsphere-provider
# Run with custom settings
vcts run --provider vsphere-provider \
--timeout 1h \
--parallel 3 \
--output-dir /tmp/test-results
# Skip specific tests
vcts run --provider libvirt-provider \
--skip-tests "test-snapshots,test-linked-clones"
# Verbose output for debugging
vcts run --provider proxmox-provider --verbose
list - List Available Tests
# List all available tests
vcts list
# List tests for specific capability
vcts list --capability snapshots
validate - Validate Provider
# Validate provider configuration
vcts validate --provider vsphere-provider
# Check provider connectivity
vcts validate --provider vsphere-provider --check-connectivity
Test Categories
- Basic Operations: VM creation, deletion, power operations
- Lifecycle Management: Start, stop, restart, suspend operations
- Resource Management: CPU, memory, disk operations
- Networking: Network configuration and connectivity
- Storage: Disk operations, resizing, multiple disks
- Snapshots: Create, list, revert, delete snapshots
- Cloning: VM cloning and linked clones
- Error Handling: Provider error scenarios
- Performance: Basic performance benchmarks
Output Formats
Test results are available in multiple formats:
- JUnit XML: For CI/CD integration
- JSON: Machine-readable format
- HTML: Human-readable report
- TAP: Test Anything Protocol
vrtg-provider
Provider development toolkit for creating and managing VirtRigaud providers.
Usage
vrtg-provider [command] [flags]
Global Flags
--verbose Enable verbose output
--help Help for vrtg-provider
Commands
init - Initialize Provider
Bootstrap a new provider project with scaffolding.
vrtg-provider init <provider-name> [flags]
Flags:
--template: Template to use (grpc, rest, hybrid)--output-dir: Output directory (default: current directory)--module: Go module name--author: Author name for generated files
Examples:
# Create basic gRPC provider
vrtg-provider init my-provider --template grpc
# Create with custom module
vrtg-provider init my-provider \
--template grpc \
--module github.com/myorg/my-provider \
--author "John Doe <john@example.com>"
# Create in specific directory
vrtg-provider init my-provider \
--output-dir /path/to/providers \
--template grpc
generate - Code Generation
Generate boilerplate code for provider implementation.
vrtg-provider generate [type] [flags]
Types:
client: Generate client codeserver: Generate server implementationtests: Generate test scaffoldingdocs: Generate documentation templates
Examples:
# Generate client code
vrtg-provider generate client --provider my-provider
# Generate test scaffolding
vrtg-provider generate tests --provider my-provider
# Generate documentation
vrtg-provider generate docs --provider my-provider
verify - Verification
Verify provider implementation and compliance.
vrtg-provider verify [flags]
Flags:
--provider-dir: Provider directory to verify--check-interface: Verify interface compliance--check-docs: Verify documentation completeness--check-tests: Verify test coverage
Examples:
# Basic verification
vrtg-provider verify --provider-dir ./my-provider
# Comprehensive check
vrtg-provider verify \
--provider-dir ./my-provider \
--check-interface \
--check-docs \
--check-tests
publish - Publishing
Prepare provider for publishing and distribution.
vrtg-provider publish [flags]
Flags:
--provider-dir: Provider directory--version: Version to publish--registry: Container registry--chart-repo: Helm chart repository
Examples:
# Publish provider
vrtg-provider publish \
--provider-dir ./my-provider \
--version v1.0.0 \
--registry ghcr.io/myorg
# Publish with Helm chart
vrtg-provider publish \
--provider-dir ./my-provider \
--version v1.0.0 \
--registry ghcr.io/myorg \
--chart-repo https://charts.myorg.com
Provider Template Structure
my-provider/
βββ cmd/
β βββ provider/
β βββ main.go # Provider entry point
βββ internal/
β βββ provider/
β β βββ server.go # gRPC server implementation
β β βββ client.go # Provider client
β β βββ types.go # Provider-specific types
β βββ config/
β βββ config.go # Configuration management
βββ pkg/
β βββ api/ # Public API interfaces
βββ test/
β βββ conformance/ # Conformance tests
β βββ integration/ # Integration tests
βββ deploy/
β βββ helm/ # Helm charts
β βββ k8s/ # Kubernetes manifests
βββ docs/ # Documentation
βββ Dockerfile # Container image
βββ Makefile # Build automation
βββ README.md # Provider documentation
virtrigaud-loadgen
Load testing and benchmarking tool for VirtRigaud deployments.
Usage
virtrigaud-loadgen [command] [flags]
Global Flags
--kubeconfig string Path to kubeconfig file
--namespace string Kubernetes namespace (default: "default")
--output-dir string Output directory for results (default: "./loadgen-results")
--config-file string Load generation configuration file
--dry-run Show what would be executed without running
--verbose Enable verbose output
Commands
run - Execute Load Test
virtrigaud-loadgen run [flags]
Flags:
--vms: Number of VMs to create (default: 10)--duration: Test duration (default: 10m)--ramp-up: Ramp-up time (default: 2m)--workers: Number of concurrent workers (default: 5)--provider: Provider to test against--vm-class: VMClass to use for test VMs--vm-image: VMImage to use for test VMs
Examples:
# Basic load test
virtrigaud-loadgen run --vms 50 --duration 15m
# Comprehensive load test
virtrigaud-loadgen run \
--vms 100 \
--duration 30m \
--ramp-up 5m \
--workers 10 \
--provider vsphere-provider
# Test with specific configuration
virtrigaud-loadgen run --config-file loadtest-config.yaml
config - Configuration Management
# Generate sample configuration
virtrigaud-loadgen config generate --output sample-config.yaml
# Validate configuration
virtrigaud-loadgen config validate --config-file my-config.yaml
Configuration File
# loadtest-config.yaml
metadata:
name: "production-load-test"
description: "Load test for production environment"
spec:
# Test parameters
vms: 100
duration: "30m"
rampUp: "5m"
workers: 10
# Target configuration
provider: "vsphere-provider"
namespace: "loadtest"
# VM configuration
vmClass: "standard-vm"
vmImage: "ubuntu-22-04"
# Test scenarios
scenarios:
- name: "vm-lifecycle"
weight: 70
operations:
- create
- start
- stop
- delete
- name: "vm-operations"
weight: 20
operations:
- snapshot
- clone
- reconfigure
- name: "provider-stress"
weight: 10
operations:
- rapid-create-delete
- concurrent-operations
# Reporting
reporting:
formats: ["json", "html", "csv"]
metrics:
- response-time
- throughput
- error-rate
- resource-usage
Metrics and Reporting
Load test results include:
- Performance Metrics: Response times, throughput, latency percentiles
- Error Analysis: Error rates, failure patterns, error categorization
- Resource Usage: CPU, memory, network utilization
- Provider Metrics: Provider-specific performance indicators
- Trend Analysis: Performance over time, bottleneck identification
Output Formats
- JSON: Machine-readable results for automation
- HTML: Interactive dashboard with charts and graphs
- CSV: Raw data for further analysis
- Prometheus: Metrics export for monitoring systems
Advanced Usage
Automation and Scripting
Bash Integration
#!/bin/bash
# VM management script
# Function to check VM status
check_vm_status() {
local vm_name=$1
vrtg vm describe "$vm_name" --output json | jq -r '.status.powerState'
}
# Wait for VM to be ready
wait_for_vm() {
local vm_name=$1
local timeout=300
local count=0
while [ $count -lt $timeout ]; do
status=$(check_vm_status "$vm_name")
if [ "$status" = "On" ]; then
echo "VM $vm_name is ready"
return 0
fi
sleep 5
count=$((count + 5))
done
echo "Timeout waiting for VM $vm_name"
return 1
}
# Create and wait for VM
vrtg vm create --file vm-config.yaml
wait_for_vm "my-vm"
CI/CD Integration
# .github/workflows/vm-test.yml
name: VM Integration Test
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Install vrtg CLI
run: |
curl -L "https://github.com/projectbeskar/virtrigaud/releases/latest/download/vrtg-linux-amd64" -o vrtg
chmod +x vrtg
sudo mv vrtg /usr/local/bin/
- name: Setup kubeconfig
run: echo "${{ secrets.KUBECONFIG }}" | base64 -d > ~/.kube/config
- name: Run conformance tests
run: vcts run --provider test-provider --output-dir test-results
- name: Upload test results
uses: actions/upload-artifact@v3
with:
name: conformance-results
path: test-results/
Configuration Management
Environment-specific Configurations
# Development environment
export VRTG_KUBECONFIG=~/.kube/dev-config
export VRTG_NAMESPACE=development
export VRTG_OUTPUT=yaml
# Production environment
export VRTG_KUBECONFIG=~/.kube/prod-config
export VRTG_NAMESPACE=production
export VRTG_OUTPUT=json
# Use environment-specific settings
vrtg vm list # Uses environment variables
Configuration Files
Create ~/.vrtg/config.yaml:
contexts:
development:
kubeconfig: ~/.kube/dev-config
namespace: development
output: yaml
timeout: 30s
production:
kubeconfig: ~/.kube/prod-config
namespace: production
output: json
timeout: 60s
current-context: development
aliases:
ls: vm list
get: vm describe
logs: provider logs
Troubleshooting
Common Issues
- Connection Issues
# Check cluster connectivity
vrtg provider list
# Validate kubeconfig
kubectl cluster-info
# Check provider logs
vrtg provider logs <provider-name> --tail 100
- Permission Issues
# Check RBAC permissions
kubectl auth can-i create virtualmachines
# Get current user context
kubectl auth whoami
- Provider Issues
# Check provider status
vrtg provider status <provider-name>
# Run diagnostics
vrtg diag bundle --include-logs
Debug Mode
Enable debug output:
# Global debug flag
vrtg --verbose vm list
# Provider-specific debugging
vrtg provider logs <provider-name> --follow --verbose
# Conformance test debugging
vcts run --provider <provider-name> --verbose
See Also
CLI Reference
VirtRigaud provides several command-line tools for managing virtual machines, testing providers, and developing new providers. All tools are available as part of VirtRigaud v0.2.0.
Overview
| Tool | Purpose | Target Users |
|---|---|---|
vrtg | Main CLI for VM management | End users, DevOps teams |
vcts | Conformance testing suite | Provider developers, QA teams |
vrtg-provider | Provider development toolkit | Provider developers |
virtrigaud-loadgen | Load testing and benchmarking | Performance engineers |
Installation
From GitHub Releases
# Download the latest release
curl -L "https://github.com/projectbeskar/virtrigaud/releases/download/v0.2.0/vrtg-linux-amd64" -o vrtg
chmod +x vrtg
sudo mv vrtg /usr/local/bin/
# Install all CLI tools
curl -L "https://github.com/projectbeskar/virtrigaud/releases/download/v0.2.0/virtrigaud-cli-linux-amd64.tar.gz" | tar xz
sudo mv vrtg vcts vrtg-provider virtrigaud-loadgen /usr/local/bin/
From Source
git clone https://github.com/projectbeskar/virtrigaud.git
cd virtrigaud
# Build all CLI tools
make build-cli
# Install to /usr/local/bin
sudo make install-cli
Using Go
go install github.com/projectbeskar/virtrigaud/cmd/vrtg@v0.2.0
go install github.com/projectbeskar/virtrigaud/cmd/vcts@v0.2.0
go install github.com/projectbeskar/virtrigaud/cmd/vrtg-provider@v0.2.0
go install github.com/projectbeskar/virtrigaud/cmd/virtrigaud-loadgen@v0.2.0
vrtg
The main CLI tool for managing VirtRigaud resources and virtual machines.
Global Flags
--kubeconfig string Path to kubeconfig file (default: $KUBECONFIG or ~/.kube/config)
--namespace string Kubernetes namespace (default: "default")
--output string Output format: table, json, yaml (default: "table")
--timeout duration Operation timeout (default: 5m0s)
-h, --help Help for vrtg
Commands
vm
Manage virtual machines.
# List all VMs
vrtg vm list
# Get detailed VM information
vrtg vm get <vm-name>
# Create a VM from configuration
vrtg vm create --file vm.yaml
# Delete a VM
vrtg vm delete <vm-name>
# Power operations
vrtg vm start <vm-name>
vrtg vm stop <vm-name>
vrtg vm restart <vm-name>
# Scale VMSet
vrtg vm scale <vmset-name> --replicas 5
# Get VM console URL
vrtg vm console <vm-name>
# Watch VM status changes
vrtg vm watch <vm-name>
Examples:
# List VMs with custom output
vrtg vm list --output json --namespace production
# Create VM with timeout
vrtg vm create --file my-vm.yaml --timeout 10m
# Power on all VMs in namespace
vrtg vm list --output json | jq -r '.items[].metadata.name' | xargs -I {} vrtg vm start {}
provider
Manage provider configurations.
# List providers
vrtg provider list
# Get provider details
vrtg provider get <provider-name>
# Check provider connectivity
vrtg provider validate <provider-name>
# Get provider capabilities
vrtg provider capabilities <provider-name>
# View provider logs
vrtg provider logs <provider-name>
# Test provider functionality
vrtg provider test <provider-name>
Examples:
# Validate all providers
vrtg provider list --output json | jq -r '.items[].metadata.name' | xargs -I {} vrtg provider validate {}
# Get detailed provider status
vrtg provider get vsphere-prod --output yaml
image
Manage VM images and templates.
# List available images
vrtg image list
# Get image details
vrtg image get <image-name>
# Prepare an image
vrtg image prepare <image-name>
# Delete an image
vrtg image delete <image-name>
snapshot
Manage VM snapshots.
# List snapshots for a VM
vrtg snapshot list --vm <vm-name>
# Create a snapshot
vrtg snapshot create <vm-name> --name "pre-upgrade"
# Restore from snapshot
vrtg snapshot restore <vm-name> <snapshot-name>
# Delete a snapshot
vrtg snapshot delete <vm-name> <snapshot-name>
completion
Generate shell completion scripts.
# Bash
vrtg completion bash > /etc/bash_completion.d/vrtg
# Zsh
vrtg completion zsh > "${fpath[1]}/_vrtg"
# Fish
vrtg completion fish > ~/.config/fish/completions/vrtg.fish
# PowerShell
vrtg completion powershell > vrtg.ps1
Configuration
vrtg uses the same kubeconfig as kubectl. Configuration precedence:
--kubeconfigflagKUBECONFIGenvironment variable~/.kube/config
Config File
Create ~/.vrtg/config.yaml for default settings:
defaults:
namespace: "virtrigaud-system"
timeout: "10m"
output: "table"
providers:
preferred: "vsphere-prod"
output:
colors: true
timestamps: true
vcts
VirtRigaud Conformance Test Suite for validating provider implementations.
Global Flags
--kubeconfig string Path to kubeconfig file
--namespace string Test namespace (default: "vcts")
--provider string Provider to test
--output-dir string Directory for test results
--timeout duration Test timeout (default: 30m)
--parallel int Number of parallel tests (default: 1)
--skip strings Tests to skip (comma-separated)
--verbose Verbose output
-h, --help Help for vcts
Commands
run
Run conformance tests against a provider.
# Run all tests
vcts run --provider vsphere-prod
# Run specific test suites
vcts run --provider vsphere-prod --suites core,storage
# Run with custom configuration
vcts run --provider libvirt-test --config test-config.yaml
# Skip specific tests
vcts run --provider vsphere-prod --skip "test-large-vm,test-snapshot-memory"
# Generate detailed report
vcts run --provider vsphere-prod --output-dir ./test-results --verbose
list
List available test suites and tests.
# List all test suites
vcts list suites
# List tests in a suite
vcts list tests --suite core
# List supported providers
vcts list providers
validate
Validate test configuration.
# Validate configuration file
vcts validate --config test-config.yaml
# Validate provider setup
vcts validate --provider vsphere-prod
Test Suites
Core Suite
- Basic VM lifecycle (create, start, stop, delete)
- Provider connectivity and authentication
- Resource allocation and management
Storage Suite
- Disk creation and attachment
- Volume expansion operations
- Storage pool management
Network Suite
- Network interface management
- IP address allocation
- Network connectivity tests
Snapshot Suite
- Snapshot creation and deletion
- Snapshot restoration
- Memory state preservation
Performance Suite
- VM creation performance
- Resource utilization benchmarks
- Concurrent operation handling
Test Configuration
Create test-config.yaml:
provider:
name: "vsphere-prod"
type: "vsphere"
tests:
core:
enabled: true
timeout: "15m"
storage:
enabled: true
testDiskSize: "10Gi"
network:
enabled: false # Skip network tests
resources:
vmClass: "test-small"
vmImage: "ubuntu-22-04"
cleanup:
enabled: true
timeout: "10m"
vrtg-provider
Development toolkit for creating and maintaining VirtRigaud providers.
Global Flags
--verbose Enable verbose output
-h, --help Help for vrtg-provider
Commands
init
Initialize a new provider project.
# Create a new provider
vrtg-provider init --name hyperv --type hyperv --output ./hyperv-provider
# Create with custom options
vrtg-provider init --name aws-ec2 --type aws \
--capabilities snapshots,linked-clones \
--output ./aws-provider
Options:
--name: Provider name--type: Provider type--capabilities: Comma-separated capabilities list--output: Output directory--remote: Generate remote provider (default: true)
generate
Generate code for provider components.
# Generate API types
vrtg-provider generate api --provider-type vsphere
# Generate client code
vrtg-provider generate client --provider-type vsphere --api-version v1
# Generate test suite
vrtg-provider generate tests --provider-type vsphere
# Generate documentation
vrtg-provider generate docs --provider-type vsphere
verify
Verify provider implementation.
# Verify provider structure
vrtg-provider verify structure --path ./my-provider
# Verify capabilities
vrtg-provider verify capabilities --path ./my-provider
# Verify API compatibility
vrtg-provider verify api --path ./my-provider --api-version v1beta1
publish
Publish provider artifacts.
# Build and publish provider image
vrtg-provider publish --path ./my-provider --registry ghcr.io/myorg
# Publish with specific tag
vrtg-provider publish --path ./my-provider --tag v1.0.0
# Dry run publication
vrtg-provider publish --path ./my-provider --dry-run
Provider Structure
my-provider/
βββ cmd/
β βββ provider-mytype/
β βββ Dockerfile
β βββ main.go
βββ internal/
β βββ provider/
β βββ provider.go
β βββ capabilities.go
β βββ provider_test.go
βββ deploy/
β βββ provider.yaml
β βββ service.yaml
β βββ deployment.yaml
βββ docs/
β βββ README.md
βββ go.mod
βββ go.sum
βββ Makefile
virtrigaud-loadgen
Load testing and performance benchmarking tool for VirtRigaud providers.
Global Flags
--kubeconfig string Path to kubeconfig file
--namespace string Test namespace (default: "loadgen")
--output-dir string Output directory for results
--config-file string Load generation configuration file
--dry-run Show what would be created without executing
--verbose Verbose output
-h, --help Help for virtrigaud-loadgen
Commands
run
Execute load generation scenarios.
# Run default load test
virtrigaud-loadgen run --config loadtest.yaml
# Run with custom settings
virtrigaud-loadgen run --config loadtest.yaml --workers 50 --duration 10m
# Run specific scenario
virtrigaud-loadgen run --scenario vm-creation --vms 100
# Generate performance report
virtrigaud-loadgen run --config loadtest.yaml --output-dir ./perf-results
scenarios
Manage load testing scenarios.
# List available scenarios
virtrigaud-loadgen scenarios list
# Show scenario details
virtrigaud-loadgen scenarios get vm-lifecycle
# Validate scenario configuration
virtrigaud-loadgen scenarios validate --config custom-scenario.yaml
analyze
Analyze load test results.
# Generate performance report
virtrigaud-loadgen analyze --input ./perf-results
# Compare test runs
virtrigaud-loadgen analyze --compare run1.csv,run2.csv
# Generate charts
virtrigaud-loadgen analyze --input ./perf-results --charts
Load Test Configuration
Create loadtest.yaml:
metadata:
name: "vm-creation-load-test"
description: "Test VM creation performance"
scenarios:
- name: "vm-creation"
type: "vm-lifecycle"
workers: 20
duration: "5m"
resources:
vmClass: "small"
vmImage: "ubuntu-22-04"
provider: "vsphere-prod"
- name: "vm-scaling"
type: "vmset-scaling"
workers: 5
iterations: 10
scaling:
min: 1
max: 50
step: 5
providers:
- name: "vsphere-prod"
type: "vsphere"
- name: "libvirt-test"
type: "libvirt"
output:
format: ["csv", "json"]
metrics: ["latency", "throughput", "errors"]
cleanup:
enabled: true
timeout: "15m"
Performance Scenarios
VM Lifecycle
- Create, start, stop, delete operations
- Measures end-to-end VM management performance
Burst Creation
- Rapid VM creation under load
- Tests provider scaling capabilities
VMSet Scaling
- Scale VMSets up and down
- Measures horizontal scaling performance
Provider Stress
- High concurrent operations
- Tests provider reliability under stress
Results Analysis
Load test results include:
- Latency metrics: P50, P95, P99 response times
- Throughput: Operations per second
- Error rates: Failed operations percentage
- Resource usage: CPU, memory, network utilization
- Provider metrics: API call statistics
Example output:
timestamp,scenario,operation,latency_ms,status,provider
2025-01-15T10:00:01Z,vm-creation,create,2500,success,vsphere-prod
2025-01-15T10:00:03Z,vm-creation,create,2800,success,vsphere-prod
2025-01-15T10:00:05Z,vm-creation,create,failed,timeout,vsphere-prod
Best Practices
Using vrtg
- Use namespaces to organize resources
- Set timeouts appropriately for your environment
- Use dry-run options for validation before execution
- Monitor operations with watch commands
Testing with vcts
- Run core tests first to validate basic functionality
- Use separate namespaces for different test runs
- Clean up resources after testing
- Document test results for compliance tracking
Developing with vrtg-provider
- Start with init to create proper structure
- Implement core capabilities before advanced features
- Test thoroughly with vcts before publishing
- Follow naming conventions for consistency
Load Testing with virtrigaud-loadgen
- Start small and gradually increase load
- Monitor system resources during tests
- Use realistic scenarios that match production workloads
- Analyze results to identify bottlenecks
Support
- Documentation: VirtRigaud Docs
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Community: Discord
Version Information
This documentation covers VirtRigaud CLI tools v0.2.0.
For older versions, see the releases page.
Metrics Catalog
VirtRigaud exposes comprehensive metrics for monitoring and observability. All metrics are available at the /metrics endpoint on port 8080.
Manager Metrics
Reconciliation Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
virtrigaud_manager_reconcile_total | Counter | kind, outcome | Total number of reconcile loops |
virtrigaud_manager_reconcile_duration_seconds | Histogram | kind | Time spent in reconcile loops |
virtrigaud_queue_depth | Gauge | kind | Current queue depth for each resource kind |
VM Operation Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
virtrigaud_vm_operations_total | Counter | operation, provider_type, provider, outcome | Total VM operations |
virtrigaud_vm_reconfigure_total | Counter | provider_type, outcome | Total VM reconfiguration operations |
virtrigaud_vm_snapshot_total | Counter | action, provider_type, outcome | Total VM snapshot operations |
virtrigaud_vm_clone_total | Counter | linked, provider_type, outcome | Total VM clone operations |
virtrigaud_vm_image_prepare_total | Counter | provider_type, outcome | Total VM image preparation operations |
Build Information
| Metric Name | Type | Labels | Description |
|---|---|---|---|
virtrigaud_build_info | Gauge | version, git_sha, go_version | Build information |
Provider Metrics
gRPC Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
virtrigaud_provider_rpc_requests_total | Counter | provider_type, method, code | Total gRPC requests |
virtrigaud_provider_rpc_latency_seconds | Histogram | provider_type, method | gRPC request latency |
virtrigaud_provider_tasks_inflight | Gauge | provider_type, provider | Number of inflight tasks |
Provider-Specific Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
virtrigaud_ip_discovery_duration_seconds | Histogram | provider_type | Time to discover VM IP addresses |
Error Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
virtrigaud_errors_total | Counter | reason, component | Total errors by reason and component |
Label Definitions
Common Labels
provider_type: The type of provider (vsphere,libvirt)provider: The name of the provider instanceoutcome: The result of an operation (success,failure,error)kind: The Kubernetes resource kind (VirtualMachine,VMClass, etc.)component: The component generating the metric (manager,provider)
Operation-Specific Labels
operation: Type of VM operation (Create,Delete,Power,Describe,Reconfigure)method: gRPC method name (CreateVM,DeleteVM,PowerVM, etc.)code: gRPC status code (OK,INVALID_ARGUMENT,DEADLINE_EXCEEDED, etc.)action: Snapshot action (create,delete,revert)linked: Whether a clone is linked (true,false)reason: Error reason (ConnectionFailed,AuthenticationError, etc.)
Histogram Buckets
Duration histograms use the following buckets (in seconds):
0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10, 30, 60, 120, 300
Example Queries
Prometheus Queries
Error Rate
# Overall error rate
rate(virtrigaud_vm_operations_total{outcome="failure"}[5m]) /
rate(virtrigaud_vm_operations_total[5m])
# Provider-specific error rate
rate(virtrigaud_provider_rpc_requests_total{code!="OK"}[5m]) /
rate(virtrigaud_provider_rpc_requests_total[5m])
Latency
# 95th percentile VM creation time
histogram_quantile(0.95,
rate(virtrigaud_vm_operations_duration_seconds_bucket{operation="Create"}[5m])
)
# gRPC request latency by method
histogram_quantile(0.95,
rate(virtrigaud_provider_rpc_latency_seconds_bucket[5m])
) by (method)
Throughput
# VM operations per second
rate(virtrigaud_vm_operations_total[5m])
# Operations by provider
rate(virtrigaud_vm_operations_total[5m]) by (provider_type, provider)
Queue Depth
# Current queue depth
virtrigaud_queue_depth
# Average queue depth over time
avg_over_time(virtrigaud_queue_depth[5m])
Inflight Tasks
# Current inflight tasks
virtrigaud_provider_tasks_inflight
# Inflight tasks by provider
virtrigaud_provider_tasks_inflight by (provider_type, provider)
Grafana Dashboard Queries
VM Creation Success Rate Panel
sum(rate(virtrigaud_vm_operations_total{operation="Create",outcome="success"}[5m])) /
sum(rate(virtrigaud_vm_operations_total{operation="Create"}[5m])) * 100
Provider Health Panel
up{job="virtrigaud-provider"}
Error Rate by Provider Panel
sum(rate(virtrigaud_errors_total[5m])) by (component, provider_type)
ServiceMonitor Configuration
Example ServiceMonitor for Prometheus Operator:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: virtrigaud-manager
namespace: virtrigaud-system
spec:
selector:
matchLabels:
app.kubernetes.io/name: virtrigaud
app.kubernetes.io/component: manager
endpoints:
- port: metrics
interval: 30s
path: /metrics
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: virtrigaud-providers
namespace: virtrigaud-system
spec:
selector:
matchLabels:
app.kubernetes.io/name: virtrigaud
app.kubernetes.io/component: provider
endpoints:
- port: metrics
interval: 30s
path: /metrics
Alert Rules
Example PrometheusRule for common alerts:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: virtrigaud-alerts
namespace: virtrigaud-system
spec:
groups:
- name: virtrigaud.rules
rules:
- alert: VirtrigaudProviderDown
expr: up{job="virtrigaud-provider"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Virtrigaud provider is down"
description: "Provider {{ $labels.instance }} has been down for more than 5 minutes"
- alert: VirtrigaudHighErrorRate
expr: |
rate(virtrigaud_vm_operations_total{outcome="failure"}[5m]) /
rate(virtrigaud_vm_operations_total[5m]) > 0.1
for: 10m
labels:
severity: warning
annotations:
summary: "High error rate in VM operations"
description: "Error rate is {{ $value | humanizePercentage }} for {{ $labels.provider }}"
- alert: VirtrigaudSlowVMCreation
expr: |
histogram_quantile(0.95,
rate(virtrigaud_vm_operations_duration_seconds_bucket{operation="Create"}[5m])
) > 600
for: 15m
labels:
severity: warning
annotations:
summary: "Slow VM creation times"
description: "95th percentile VM creation time is {{ $value }}s"
- alert: VirtrigaudQueueBacklog
expr: virtrigaud_queue_depth > 100
for: 10m
labels:
severity: warning
annotations:
summary: "Queue backlog detected"
description: "Queue depth for {{ $labels.kind }} is {{ $value }}"
Custom Metrics
Providers can expose additional custom metrics specific to their implementation:
vSphere Provider Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
virtrigaud_vsphere_sessions_total | Counter | datacenter | Total vSphere sessions created |
virtrigaud_vsphere_api_calls_total | Counter | method, datacenter | Total vSphere API calls |
Libvirt Provider Metrics
| Metric Name | Type | Labels | Description |
|---|---|---|---|
virtrigaud_libvirt_connections_total | Counter | host | Total Libvirt connections |
virtrigaud_libvirt_domains_total | Gauge | host, state | Current number of domains by state |
Metric Collection Best Practices
- Scrape Interval: Use 30s interval for most metrics
- Retention: Keep metrics for at least 30 days for trending
- High Cardinality: Be careful with VM names and IDs in labels
- Aggregation: Use recording rules for frequently queried metrics
- Alerting: Set up alerts for SLI/SLO violations
Related Documentation
Provider Catalog
Last updated: 2025-08-26T14:30:00Z
The VirtRigaud Provider Catalog lists all verified and community providers available for the VirtRigaud virtualization management platform. All providers in this catalog have been tested for conformance and compatibility.
Provider Overview
| Provider | Description | Capabilities | Conformance | Maintainer | License |
|---|---|---|---|---|---|
| Mock Provider | A mock provider for testing and demonstrations | core, snapshot, clone, image-prepare, advanced | virtrigaud@projectbeskar.com | Apache-2.0 | |
| vSphere Provider | VMware vSphere provider for VirtRigaud | core, snapshot, clone, advanced | virtrigaud@projectbeskar.com | Apache-2.0 | |
| Libvirt Provider | Libvirt/KVM provider for VirtRigaud | core, snapshot, clone | virtrigaud@projectbeskar.com | Apache-2.0 |
Quick Start
Installing a Provider
To install a provider in your Kubernetes cluster, use the VirtRigaud provider runtime Helm chart:
# Add the VirtRigaud Helm repository
helm repo add virtrigaud https://projectbeskar.github.io/virtrigaud
helm repo update
# Install a provider using the runtime chart
helm install my-vsphere-provider virtrigaud/virtrigaud-provider-runtime \
--namespace vsphere-providers \
--create-namespace \
--set image.repository=ghcr.io/projectbeskar/virtrigaud/provider-vsphere \
--set image.tag=0.1.1 \
--set env[0].name=VSPHERE_SERVER \
--set env[0].value=vcenter.example.com
Provider Discovery
Once installed, providers automatically register with the VirtRigaud manager. You can list available providers:
kubectl get providers -n virtrigaud-system
Provider Details
Mock Provider
- Image:
ghcr.io/projectbeskar/virtrigaud/provider-mock:0.1.1 - Repository: https://github.com/projectbeskar/virtrigaud
- Maturity: stable
- Tags: testing, development, demo
- Documentation: https://projectbeskar.github.io/virtrigaud/providers/mock/
The mock provider is perfect for:
- Testing VirtRigaud functionality
- Development and CI/CD pipelines
- Learning provider concepts
- Demonstrating VirtRigaud capabilities
Installation:
helm install mock-provider virtrigaud/virtrigaud-provider-runtime \
--namespace development \
--create-namespace \
--set image.repository=ghcr.io/projectbeskar/virtrigaud/provider-mock \
--set image.tag=0.1.1 \
--set env[0].name=LOG_LEVEL \
--set env[0].value=debug
vSphere Provider
- Image:
ghcr.io/projectbeskar/virtrigaud/provider-vsphere:0.1.1 - Repository: https://github.com/projectbeskar/virtrigaud
- Maturity: beta
- Tags: vmware, vsphere, enterprise
- Documentation: https://projectbeskar.github.io/virtrigaud/providers/vsphere/
The vSphere provider enables VirtRigaud to manage VMware vSphere environments, including:
- VM lifecycle management (create, update, delete)
- Power operations (on, off, restart, suspend)
- Snapshot management
- VM cloning and templates
- Resource allocation and configuration
Prerequisites:
- VMware vSphere 6.7 or later
- vCenter Server credentials
- Network connectivity to vCenter API
Installation:
# Create secret for vSphere credentials
kubectl create secret generic vsphere-credentials \
--namespace vsphere-providers \
--from-literal=username=your-username \
--from-literal=password=your-password
# Install provider
helm install vsphere-provider virtrigaud/virtrigaud-provider-runtime \
--namespace vsphere-providers \
--create-namespace \
--set image.repository=ghcr.io/projectbeskar/virtrigaud/provider-vsphere \
--set image.tag=0.1.1 \
--set env[0].name=VSPHERE_SERVER \
--set env[0].value=vcenter.example.com \
--set env[1].name=VSPHERE_USERNAME \
--set env[1].valueFrom.secretKeyRef.name=vsphere-credentials \
--set env[1].valueFrom.secretKeyRef.key=username \
--set env[2].name=VSPHERE_PASSWORD \
--set env[2].valueFrom.secretKeyRef.name=vsphere-credentials \
--set env[2].valueFrom.secretKeyRef.key=password
Libvirt Provider
- Image:
ghcr.io/projectbeskar/virtrigaud/provider-libvirt:0.1.1 - Repository: https://github.com/projectbeskar/virtrigaud
- Maturity: beta
- Tags: libvirt, kvm, qemu, open-source
- Documentation: https://projectbeskar.github.io/virtrigaud/providers/libvirt/
The libvirt provider manages KVM/QEMU virtual machines through libvirt, supporting:
- VM lifecycle management
- Power state control
- Snapshot operations
- Basic cloning capabilities
- Local and remote libvirt connections
Prerequisites:
- Libvirt daemon running on target hosts
- SSH access for remote connections
- Shared storage for multi-host deployments
Installation:
helm install libvirt-provider virtrigaud/virtrigaud-provider-runtime \
--namespace libvirt-providers \
--create-namespace \
--set image.repository=ghcr.io/projectbeskar/virtrigaud/provider-libvirt \
--set image.tag=0.1.1 \
--set env[0].name=LIBVIRT_URI \
--set env[0].value=qemu:///system \
--set securityContext.runAsUser=0 \
--set podSecurityContext.runAsUser=0
Capability Profiles
VirtRigaud defines several capability profiles that providers can implement:
Core Profile
Required for all providers
vm.create- Create virtual machinesvm.read- Get virtual machine informationvm.update- Update virtual machine configurationvm.delete- Delete virtual machinesvm.power- Control power state (on/off/restart)vm.list- List virtual machines
Snapshot Profile
Optional - for providers supporting VM snapshots
vm.snapshot.create- Create VM snapshotsvm.snapshot.list- List VM snapshotsvm.snapshot.delete- Delete VM snapshotsvm.snapshot.restore- Restore VM from snapshot
Clone Profile
Optional - for providers supporting VM cloning
vm.clone- Clone virtual machinesvm.template- Create and manage VM templates
Image Prepare Profile
Optional - for providers with image management
image.prepare- Prepare VM imagesimage.list- List available imagesimage.upload- Upload custom images
Advanced Profile
Optional - for advanced provider features
vm.migrate- Live migrate VMs between hostsvm.resize- Dynamic resource allocationvm.backup- Backup and restore operationsvm.monitoring- Advanced monitoring and metrics
Contributing a Provider
Want to add your provider to the catalog? Follow these steps:
1. Develop Your Provider
Use the Provider Developer Tutorial to create your provider using the VirtRigaud SDK.
2. Ensure Conformance
Run the VirtRigaud Conformance Test Suite (VCTS) to verify your provider meets the requirements:
# Install the VCTS tool
go install github.com/projectbeskar/virtrigaud/cmd/vcts@latest
# Run conformance tests
vcts run --provider-endpoint=localhost:9443 --profile=core
3. Publish to Catalog
Use the vrtg-provider publish command to submit your provider:
vrtg-provider publish \
--name your-provider \
--image ghcr.io/yourorg/your-provider \
--tag v1.0.0 \
--repo https://github.com/yourorg/your-provider \
--maintainer your-email@example.com \
--license Apache-2.0
This will:
- Run conformance tests
- Generate provider badges
- Create a catalog entry
- Open a pull request to add your provider
4. Catalog Requirements
To be included in the catalog, providers must:
- β Pass VCTS core profile tests
- β Include comprehensive documentation
- β Provide Helm chart for deployment
- β Follow security best practices
- β Include proper error handling
- β Support health checks and metrics
- β Have active maintenance and support
Provider Support Matrix
| Provider | Kubernetes | VirtRigaud | Go Version | Platforms |
|---|---|---|---|---|
| Mock | 1.25+ | 0.1.0+ | 1.23+ | linux/amd64, linux/arm64 |
| vSphere | 1.25+ | 0.1.0+ | 1.23+ | linux/amd64, linux/arm64 |
| Libvirt | 1.25+ | 0.1.0+ | 1.23+ | linux/amd64 |
Community and Support
- Documentation: VirtRigaud Provider Docs
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Slack: VirtRigaud Community
Versioning and Compatibility
Providers follow semantic versioning (SemVer) and maintain compatibility with VirtRigaud versions:
- Major versions (1.0.0 β 2.0.0): Breaking changes to APIs or behavior
- Minor versions (1.0.0 β 1.1.0): New features, backward compatible
- Patch versions (1.0.0 β 1.0.1): Bug fixes, security updates
Compatibility Policy:
- Current VirtRigaud version supports providers from current major version
- Providers should support at least 2 minor versions of VirtRigaud
- Breaking changes require migration documentation
License and Legal
All providers in this catalog are open source and follow the licensing terms specified in their individual repositories. The catalog itself is maintained under the Apache 2.0 license.
Trademark Notice: VMware and vSphere are trademarks of VMware, Inc. KVM and QEMU are trademarks of their respective owners. All trademarks are the property of their respective owners.
Testing GitHub Actions Workflows Locally
This guide explains how to test VirtRigaudβs GitHub Actions workflows locally before pushing to save on GitHub Actions costs and catch issues early.
Overview
We provide several scripts to test workflows locally:
| Script | Purpose | Dependencies | Use Case |
|---|---|---|---|
hack/test-workflows-locally.sh | Main orchestrator using act | act, docker | Full GitHub Actions simulation |
hack/test-lint-locally.sh | Lint workflow replica | go, golangci-lint | Quick lint testing |
hack/test-ci-locally.sh | CI workflow replica | go, helm, system deps | Comprehensive CI testing |
hack/test-release-locally.sh | Release workflow simulation | docker, helm, go | Release preparation testing |
hack/test-helm-locally.sh | Helm charts testing | helm, kind, kubectl | Chart validation and deployment |
Quick Start
1. Setup (First Time)
# Install dependencies and configure act
./hack/test-workflows-locally.sh setup
This will:
- Install
act(GitHub Actions runner) - Create
.actrcconfiguration - Create
.env.localwith environment variables - Create
.secretsfile (update with real values if needed)
2. Quick Validation
# Fast syntax check of all workflows
./hack/test-workflows-locally.sh smoke
3. Test Individual Workflows
# Test lint workflow (fastest)
./hack/test-lint-locally.sh
# Test CI workflow (comprehensive)
./hack/test-ci-locally.sh
# Test Helm charts
./hack/test-helm-locally.sh
# Test release workflow (requires Docker)
./hack/test-release-locally.sh v0.2.0-test
Detailed Usage
Lint Testing (test-lint-locally.sh)
Replicates the lint.yml workflow:
# Quick lint test
./hack/test-lint-locally.sh
What it tests:
- Go version compatibility
- golangci-lint installation and execution
- Comprehensive code linting (matching CI exactly)
Requirements:
- Go 1.21+
- Internet access (to download golangci-lint if needed)
CI Testing (test-ci-locally.sh)
Replicates the ci.yml workflow jobs:
# Interactive mode (asks about optional jobs)
./hack/test-ci-locally.sh
# Quick essential tests only
./hack/test-ci-locally.sh quick
# Full CI replication including security scans
./hack/test-ci-locally.sh full
Jobs tested:
- test: Go tests and coverage
- lint: Code linting with golangci-lint
- generate: Code and manifest generation
- build: Binary compilation
- build-tools: CLI tools compilation
- helm: Helm chart validation
- security: Security scanning (optional)
Requirements:
- Go 1.23+
- Helm 3.12+
- System dependencies (libvirt-dev on Linux)
- Python 3 (for YAML validation)
Release Testing (test-release-locally.sh)
Simulates the release.yml workflow:
# Test with default tag
./hack/test-release-locally.sh
# Test with specific tag
./hack/test-release-locally.sh v0.3.0-rc.1
# Skip image building (faster)
./hack/test-release-locally.sh --no-images
What it tests:
- Container image building and pushing (to local registry)
- Helm chart packaging with version updates
- CLI tools building for multiple platforms
- Changelog generation
- Checksum creation
- Container image smoke testing
Requirements:
- Docker
- Go 1.23+
- Helm 3.12+
- Local Docker registry (started automatically)
Helm Testing (test-helm-locally.sh)
Tests Helm charts with real Kubernetes:
# Full helm test suite
./hack/test-helm-locally.sh
# Individual test types
./hack/test-helm-locally.sh lint # Chart linting only
./hack/test-helm-locally.sh template # Template rendering only
./hack/test-helm-locally.sh crd # CRD installation only
./hack/test-helm-locally.sh main # Main chart installation
./hack/test-helm-locally.sh runtime # Runtime chart installation
# Cleanup after testing
./hack/test-helm-locally.sh cleanup
What it tests:
- Helm chart linting (
helm lint) - Template rendering with various value files
- CRD installation and functionality
- Chart installation in Kind cluster
- Pod readiness and basic functionality
Requirements:
- Helm 3.12+
- Kind (Kubernetes in Docker)
- kubectl
- Docker
Act-Based Testing (test-workflows-locally.sh)
Uses act to run actual GitHub Actions workflows:
# Setup first time
./hack/test-workflows-locally.sh setup
# Test individual workflows
./hack/test-workflows-locally.sh lint
./hack/test-workflows-locally.sh ci
./hack/test-workflows-locally.sh runtime
# Test all workflows (interactive)
./hack/test-workflows-locally.sh all
# Cleanup
./hack/test-workflows-locally.sh cleanup
Advanced usage:
- Supports secrets from
.secretsfile - Uses reusable containers for speed
- Artifact handling with local storage
- Environment variable injection
Configuration Files
.actrc
# Act configuration for GitHub Actions simulation
-P ubuntu-latest=catthehacker/ubuntu:act-22.04
-P ubuntu-22.04=catthehacker/ubuntu:act-22.04
-P ubuntu-24.04=catthehacker/ubuntu:act-22.04
--container-daemon-socket /var/run/docker.sock
--reuse
--rm
.env.local
# Local environment variables
GO_VERSION=1.23
GOLANGCI_LINT_VERSION=v1.64.8
REGISTRY=localhost:5000
IMAGE_NAME_PREFIX=virtrigaud
GITHUB_ACTOR=local-user
GITHUB_REPOSITORY=projectbeskar/virtrigaud
# ... more environment variables
.secrets (optional)
# GitHub token for release workflows
GITHUB_TOKEN=your_github_token_here
REGISTRY=localhost:5000
Workflow-Specific Notes
Lint Workflow (lint.yml)
- Fast: Usually completes in 1-2 minutes
- Requirements: Minimal (Go + golangci-lint)
- Run before: Every commit
- Catches: Code style, syntax, and simple errors
CI Workflow (ci.yml)
- Comprehensive: Tests building, testing, security
- Duration: 10-20 minutes for full run
- Platform deps: LibVirt requires Linux for full testing
- Run before: Pull requests and major changes
Release Workflow (release.yml)
- Complex: Multi-platform builds, signing, publishing
- Duration: 20-30 minutes
- Local only: Uses local registry, no real publishing
- Run before: Creating releases
Runtime Chart Workflow (runtime-chart.yml)
- Kubernetes focused: Tests provider runtime charts
- Requirements: Kind cluster
- Duration: 5-10 minutes
- Run before: Chart changes
Best Practices
Daily Development Workflow
# Before committing
./hack/test-lint-locally.sh
# Before pushing feature branch
./hack/test-ci-locally.sh quick
# Before creating PR
./hack/test-ci-locally.sh full
Pre-Release Workflow
# Test release preparation
./hack/test-release-locally.sh v0.2.0-rc.1
# Test chart deployment
./hack/test-helm-locally.sh full
# Test with act for full simulation
./hack/test-workflows-locally.sh all
Troubleshooting
Common Issues
-
Docker permission denied
sudo usermod -aG docker $USER # Then logout/login -
LibVirt dependencies missing
# Ubuntu/Debian sudo apt-get install libvirt-dev pkg-config # Skip libvirt tests on non-Linux ./hack/test-ci-locally.sh quick -
Kind cluster creation fails
# Clean up and retry kind delete cluster --name virtrigaud-test ./hack/test-helm-locally.sh -
Act fails with container errors
# Clean up act containers docker ps -a | grep "act-" | awk '{print $1}' | xargs docker rm -f # Rebuild without cache ./hack/test-workflows-locally.sh cleanup ./hack/test-workflows-locally.sh setup
Debugging Tips
- Check logs: All scripts provide detailed logging
- Use dry-run: Most scripts support
--helpfor options - Incremental testing: Start with
lint, thenci quick, then full tests - Docker cleanup: Regular
docker system prunehelps with space
Performance Tips
- Use quick modes for daily development
- Skip expensive jobs like security scans during iteration
- Reuse Kind clusters with
./hack/test-helm-locally.sh - Use local registry for container testing
- Run parallel tests when possible
Integration with Development
Git Hooks
Add to .git/hooks/pre-push:
#!/bin/bash
echo "Running local lint check before push..."
./hack/test-lint-locally.sh
IDE Integration
Many IDEs can run these scripts as build tasks:
VS Code (.vscode/tasks.json):
{
"version": "2.0.0",
"tasks": [
{
"label": "Test Lint Locally",
"type": "shell",
"command": "./hack/test-lint-locally.sh",
"group": "test",
"presentation": {
"echo": true,
"reveal": "always",
"focus": false,
"panel": "shared"
}
}
]
}
CI Cost Optimization
By testing locally first:
- Reduce failed CI runs by ~80%
- Save GitHub Actions minutes
- Faster feedback (local runs are often faster)
- Better debugging (local environment is easier to inspect)
Conclusion
These local testing scripts allow you to:
β
Catch issues early before they reach GitHub Actions
β
Save costs by reducing failed CI runs
β
Debug faster with local environment access
β
Test thoroughly with multiple approaches
β
Iterate quickly during development
Start with the lint script for daily use, and gradually incorporate the full test suite for comprehensive validation before releases.
Contributing to VirtRigaud
Thank you for your interest in contributing to VirtRigaud! This document provides guidelines and information for contributors.
Development Setup
Prerequisites
- Go 1.23+
- Docker
- Kubernetes cluster (kind, k3s, or remote)
- kubectl
- Helm 3.x
- make
Clone and Setup
git clone https://github.com/projectbeskar/virtrigaud.git
cd virtrigaud
# Install development dependencies
make dev-setup
# Install pre-commit hooks (optional but recommended)
pip install pre-commit
pre-commit install
Development Workflow
1. Making Changes
API Changes
When modifying API types:
# Edit API types
vim api/infra.virtrigaud.io/v1beta1/virtualmachine_types.go
# Generate code and CRDs
make generate
make gen-crds
Code Changes
For other code changes:
# Run tests
make test
# Lint code
make lint
# Format code
make fmt
2. CRD Management
Important: CRDs are generated from code (the source of truth) and are not duplicated in git.
config/crd/bases/- CRDs for local development and releases (checked into git)charts/virtrigaud/crds/- CRDs for Helm charts (generated during packaging, not checked into git)
# After API changes, generate CRDs
make gen-crds
# For Helm chart development/packaging
make gen-helm-crds
3. Testing
# Unit tests
make test
# Integration tests (requires cluster)
make test-integration
# End-to-end tests
make test-e2e
# Test specific provider
make test-provider-vsphere
4. Local Development
# Deploy to local cluster
make dev-deploy
# Watch for changes and auto-reload
make dev-watch
# Clean up
make dev-clean
Contribution Guidelines
Pull Request Process
- Fork and branch: Create a feature branch from
main - Make changes: Follow the development workflow above
- Test thoroughly: Run all relevant tests
- Update docs: Update documentation if needed
- CRD sync: Ensure CRDs are synchronized (CI will verify)
- Submit PR: Create a pull request with clear description
PR Requirements
- All tests pass
- CRDs are in sync (verified by CI)
-
Code is formatted (
make fmt) -
Code is linted (
make lint) - Documentation updated if needed
- Changelog entry added (for user-facing changes)
Commit Message Format
Use conventional commit format:
<type>(<scope>): <description>
[optional body]
[optional footer(s)]
Types:
feat: New featurefix: Bug fixdocs: Documentation changesstyle: Code style changesrefactor: Code refactoringtest: Test changeschore: Maintenance tasks
Examples:
feat(vsphere): add graceful shutdown support
fix(crd): resolve powerState validation conflict
docs(upgrade): add CRD synchronization guide
Code Style
Go Code
- Follow standard Go conventions
- Use
gofmtandgolangci-lint - Add meaningful comments for exported functions
- Include unit tests for new functionality
YAML/Kubernetes
- Use 2-space indentation
- Follow Kubernetes API conventions
- Add descriptions to CRD fields
- Include examples in documentation
Documentation
- Use clear, concise language
- Include code examples
- Update both API docs and user guides
- Test documentation examples
Testing
Unit Tests
# Run all unit tests
make test
# Run tests for specific package
go test ./internal/controller/...
# Run with coverage
make test-coverage
Integration Tests
# Requires running Kubernetes cluster
export KUBECONFIG=~/.kube/config
make test-integration
Provider Tests
# Test specific provider (requires infrastructure)
make test-provider-vsphere
make test-provider-libvirt
make test-provider-proxmox
Release Process
For Maintainers
-
Prepare release:
# Generate CRDs for config directory (will be in release artifacts) make gen-crds # Update version in charts vim charts/virtrigaud/Chart.yaml # Update changelog vim CHANGELOG.md -
Create release:
git tag v0.2.1 git push origin v0.2.1 -
CI handles:
- Building and pushing images
- Creating GitHub release
- Publishing Helm charts
- Generating CLI binaries
Common Issues
CRD Generation Issues
If you need to regenerate CRDs:
# For local development and config directory
make gen-crds
# For Helm chart packaging
make gen-helm-crds
Note: CRDs in charts/virtrigaud/crds/ are generated during packaging and should not be committed to git.
Test Failures
# Clean and retry
make clean
make test
# For libvirt-related failures
export SKIP_LIBVIRT_TESTS=true
make test
Development Environment
# Reset development environment
make dev-clean
make dev-deploy
# Check logs
kubectl logs -n virtrigaud-system deployment/virtrigaud-manager
Getting Help
- GitHub Issues: Bug reports and feature requests
- GitHub Discussions: Questions and community support
- Documentation: Check docs/ directory
- Code Review: Maintainers will provide feedback on PRs
Recognition
Contributors are recognized in:
- CHANGELOG.md for significant contributions
- README.md contributors section
- GitHub contributor graphs
Thank you for contributing to VirtRigaud! π
VirtRigaud Examples
This directory contains comprehensive examples for VirtRigaud v0.2.3+, showcasing all features and capabilities.
Quick Start Examples
Basic Examples
- complete-example.yaml - Complete end-to-end example with v0.2.1 features
- vm-ubuntu-small.yaml - Simple Ubuntu VM with graceful shutdown
- vmclass-small.yaml - Basic VMClass with hardware version support
Provider Examples
- provider-vsphere.yaml - Basic vSphere provider configuration
- provider-libvirt.yaml - Basic LibVirt provider configuration
Resource Examples
- vmimage-ubuntu.yaml - VM image configuration
- vmnetwork-app.yaml - Network attachment configuration
- vm-adoption-example.yaml - VM adoption with filters
v0.2.1 Feature Examples
New in v0.2.1
- v021-feature-showcase.yaml - π COMPREHENSIVE DEMO - All v0.2.1 features in one example
- graceful-shutdown-examples.yaml - OffGraceful shutdown configurations
- vsphere-hardware-versions.yaml - Hardware version management
- disk-sizing-examples.yaml - Disk size configuration tests
Advanced Provider Examples
- vsphere-advanced-example.yaml - Advanced vSphere with v0.2.1 features
- libvirt-advanced-example.yaml - Advanced LibVirt configuration
- proxmox-complete-example.yaml - Complete Proxmox setup
Multi-Provider Examples
- multi-provider-example.yaml - Multiple providers in one cluster
- libvirt-complete-example.yaml - Complete LibVirt deployment
v0.2.3 Feature Summary
π§ VM Reconfiguration (vSphere, Libvirt, Proxmox)
# Online resource changes (vSphere, Proxmox)
# Offline changes (Libvirt - requires restart)
spec:
vmClassRef: medium # Change from small to medium
powerState: "On"
π Async Task Tracking (vSphere, Proxmox)
# Automatic tracking of long-running operations
# Real-time progress and error reporting
π₯οΈ Console Access (vSphere, Libvirt)
# Web console URLs automatically generated
status:
consoleURL: "https://vcenter.example.com/ui/app/vm..." # vSphere
# or
consoleURL: "vnc://libvirt-host:5900" # Libvirt VNC
π Guest Agent Integration (Proxmox)
# Accurate IP detection via QEMU guest agent
status:
ipAddresses:
- 192.168.1.100
- fd00::1234:5678:9abc:def0
π¦ VM Cloning (vSphere)
# Full and linked clones with automatic snapshot handling
spec:
vmImageRef: source-vm
cloneType: linked # or "full"
π Previous Features (v0.2.1)
- Graceful Shutdown: OffGraceful power state with VMware Tools
- Hardware Version Management: vSphere hardware version control
- Proper Disk Sizing: Correct disk allocation across providers
- Enhanced Lifecycle Management: postStart/preStop hooks
Usage Patterns
Testing v0.2.3 Features
-
Test VM reconfiguration:
# Change VM class to trigger reconfiguration kubectl patch virtualmachine my-vm --type='merge' \ -p='{"spec":{"vmClassRef":"medium"}}' # Watch the reconfiguration process kubectl get vm my-vm -w -
Access VM console:
# Get console URL from VM status kubectl get vm my-vm -o jsonpath='{.status.consoleURL}' # For VNC (Libvirt): Use any VNC client vncviewer $(kubectl get vm my-vm -o jsonpath='{.status.consoleURL}' | sed 's/vnc:\/\///') -
Monitor async tasks (vSphere, Proxmox):
# Watch task progress in provider logs kubectl logs -f deployment/virtrigaud-provider-vsphere -
Verify guest agent (Proxmox):
# Check IP addresses from guest agent kubectl get vm my-vm -o jsonpath='{.status.ipAddresses}' -
Test VM cloning (vSphere):
# Create a clone of existing VM kubectl apply -f - <<EOF apiVersion: infra.virtrigaud.io/v1beta1 kind: VirtualMachine metadata: name: web-server-clone spec: vmClassRef: small vmImageRef: web-server-01 cloneType: linked EOF
Development Workflow
- Choose base example based on your use case
- Customize provider, class, and VM specifications
- Test locally with your infrastructure
- Iterate based on your requirements
Production Deployment
- Start with complete-example.yaml
- Add security configurations from security/ subdirectory
- Configure secrets from secrets/ subdirectory
- Apply advanced patterns from advanced/ subdirectory
File Organization
docs/examples/
βββ README.md # This file
βββ complete-example.yaml # Complete setup guide
βββ v021-feature-showcase.yaml # π v0.2.1 comprehensive demo
βββ vm-ubuntu-small.yaml # Simple VM example
βββ vmclass-small.yaml # Basic VMClass
βββ provider-*.yaml # Provider configurations
βββ graceful-shutdown-examples.yaml # OffGraceful demos
βββ vsphere-hardware-versions.yaml # Hardware version examples
βββ disk-sizing-examples.yaml # Disk sizing tests
βββ advanced/ # Complex scenarios
βββ secrets/ # Secret management
βββ security/ # Security configurations
Version Compatibility
- v0.2.3+: All examples with v0.2.3 features (Reconfigure, Clone, TaskStatus, ConsoleURL, Guest Agent)
- v0.2.2: Nested virtualization, TPM support, snapshot management
- v0.2.1: Graceful shutdown, hardware version, disk sizing fixes
- v0.2.0: Initial production-ready providers
- v0.1.x: Legacy examples in git history
Need Help?
- π Documentation: ../README.md
- π Quick Start: ../getting-started/quickstart.md
- π§ CLI Tools: ../CLI.md
- π Upgrade Guide: ../UPGRADE.md
- ποΈ Contributing: ../../CONTRIBUTING.md
Pro Tip: Start with v021-feature-showcase.yaml to see all v0.2.1 capabilities in action! π