Sesja 17: Potoki CI/CD dla rozwiązań AI

MLOps i automatyzacja deployment

🎯 Cele sesji

  • Implementacja CI/CD pipelines dla projektów ML
  • Azure DevOps dla automatyzacji MLOps
  • Model deployment strategies (blue-green, canary)
  • Automated testing dla ML systems

🔄 MLOps Pipeline Architecture

CI/CD dla Machine Learning

CODE COMMIT → AUTOMATED TESTING → MODEL TRAINING → MODEL VALIDATION → DEPLOYMENT
     ↓              ↓                    ↓               ↓              ↓
VERSION CTRL → DATA VALIDATION → EXPERIMENT TRACKING → QUALITY GATES → MONITORING

Kluczowe komponenty:

  1. Source Control - kod, data versioning, model registry
  2. Automated Testing - unit tests, data validation, model tests
  3. Training Pipeline - automated retraining, hyperparameter optimization
  4. Model Validation - performance benchmarking, bias testing
  5. Deployment - containerization, blue-green deployments

Azure DevOps dla ML

# azure-pipelines.yml dla ML project

trigger:
  branches:
    include:
    - main
    - develop
  paths:
    include:
    - src/
    - data/
    - models/

variables:
  azureServiceConnection: 'azure-ml-service-connection'
  workspaceName: 'ai-workshop-workspace'
  experimentName: 'production-model-training'

stages:
- stage: DataValidation
  displayName: 'Data Quality Validation'
  jobs:
  - job: ValidateData
    displayName: 'Validate Training Data'
    pool:
      vmImage: 'ubuntu-latest'
    steps:
    - task: UsePythonVersion@0
      inputs:
        versionSpec: '3.9'
    
    - script: |
        pip install -r requirements.txt
        python scripts/validate_data.py --data-path $(dataPath)
      displayName: 'Run Data Validation'
    
    - task: PublishTestResults@2
      inputs:
        testResultsFiles: 'data-validation-results.xml'

- stage: ModelTraining
  displayName: 'Model Training & Evaluation'
  dependsOn: DataValidation
  condition: succeeded()
  jobs:
  - job: TrainModel
    displayName: 'Train and Evaluate Model'
    pool:
      vmImage: 'ubuntu-latest'
    steps:
    - task: AzureCLI@2
      displayName: 'Train Model in Azure ML'
      inputs:
        azureSubscription: $(azureServiceConnection)
        scriptType: 'bash'
        scriptLocation: 'inlineScript'
        inlineScript: |
          az ml job create --file training-job.yml \
                          --workspace-name $(workspaceName)

- stage: ModelDeployment
  displayName: 'Model Deployment'
  dependsOn: ModelTraining
  condition: and(succeeded(), eq(variables['Build.SourceBranch'], 'refs/heads/main'))
  jobs:
  - deployment: DeployToStaging
    displayName: 'Deploy to Staging'
    environment: 'staging'
    strategy:
      runOnce:
        deploy:
          steps:
          - task: AzureCLI@2
            displayName: 'Deploy Model Endpoint'
            inputs:
              azureSubscription: $(azureServiceConnection)
              scriptType: 'bash'
              scriptLocation: 'inlineScript'
              inlineScript: |
                az ml online-endpoint create --file staging-endpoint.yml
                az ml online-deployment create --file staging-deployment.yml

🚀 Model Deployment Strategies

Blue-Green Deployment

from azure.ai.ml import MLClient
from azure.ai.ml.entities import ManagedOnlineEndpoint, ManagedOnlineDeployment

class ModelDeploymentManager:
    def __init__(self, ml_client):
        self.ml_client = ml_client
    
    async def blue_green_deployment(self, model_name, new_model_version, endpoint_name):
        """Blue-green deployment strategy"""
        
        print(f"🔄 Starting blue-green deployment for {model_name}")
        
        try:
            # Get current deployment status
            current_deployments = list(
                self.ml_client.online_deployments.list(endpoint_name)
            )
            
            # Determine current and new deployment names
            if current_deployments:
                active_deployment = current_deployments[0].name
                new_deployment = "green" if active_deployment == "blue" else "blue"
            else:
                active_deployment = None
                new_deployment = "blue"
            
            print(f"📊 Current: {active_deployment}, New: {new_deployment}")
            
            # Create new deployment
            deployment_config = ManagedOnlineDeployment(
                name=new_deployment,
                endpoint_name=endpoint_name,
                model=f"{model_name}:{new_model_version}",
                instance_type="Standard_DS3_v2",
                instance_count=1,
                request_settings={
                    "request_timeout_ms": 60000,
                    "max_concurrent_requests_per_instance": 1
                },
                liveness_probe={
                    "initial_delay": 10,
                    "period": 10,
                    "timeout": 2,
                    "failure_threshold": 30
                }
            )
            
            # Deploy new version
            print(f"🚀 Deploying {new_deployment}...")
            deployment_poller = self.ml_client.online_deployments.begin_create_or_update(
                deployment_config
            )
            deployment_result = deployment_poller.result()
            
            # Test new deployment
            print("🧪 Testing new deployment...")
            test_results = await self._test_deployment(endpoint_name, new_deployment)
            
            if test_results["success"]:
                # Switch traffic to new deployment
                print("✅ Tests passed, switching traffic...")
                await self._update_traffic_allocation(
                    endpoint_name, 
                    {new_deployment: 100, active_deployment: 0} if active_deployment else {new_deployment: 100}
                )
                
                # Clean up old deployment after verification
                if active_deployment:
                    print(f"🗑️ Cleaning up old deployment: {active_deployment}")
                    await asyncio.sleep(60)  # Wait for traffic switch
                    self.ml_client.online_deployments.begin_delete(
                        endpoint_name, active_deployment
                    )
                
                return {
                    "status": "success",
                    "active_deployment": new_deployment,
                    "previous_deployment": active_deployment,
                    "test_results": test_results
                }
            else:
                # Rollback - delete failed deployment
                print("❌ Tests failed, rolling back...")
                self.ml_client.online_deployments.begin_delete(
                    endpoint_name, new_deployment
                )
                
                return {
                    "status": "rollback",
                    "active_deployment": active_deployment,
                    "error": test_results["error"],
                    "failed_deployment": new_deployment
                }
                
        except Exception as e:
            print(f"❌ Deployment failed: {str(e)}")
            return {
                "status": "failed",
                "error": str(e)
            }
    
    async def _test_deployment(self, endpoint_name: str, deployment_name: str) -> Dict:
        """Comprehensive testing nowego deployment"""
        
        test_scenarios = [
            {
                "name": "basic_prediction",
                "input": {"data": [1, 2, 3, 4, 5]},
                "expected_type": "array"
            },
            {
                "name": "edge_case_input", 
                "input": {"data": []},
                "expected_error": True
            },
            {
                "name": "performance_test",
                "input": {"data": list(range(1000))},
                "max_response_time": 5000  # ms
            }
        ]
        
        test_results = {"success": True, "tests": []}
        
        for scenario in test_scenarios:
            try:
                # Get endpoint details
                endpoint = self.ml_client.online_endpoints.get(endpoint_name)
                scoring_uri = endpoint.scoring_uri
                
                # Make test request
                import requests
                import time
                
                start_time = time.time()
                
                response = requests.post(
                    scoring_uri,
                    json=scenario["input"],
                    headers={
                        "Authorization": f"Bearer {self._get_auth_token()}",
                        "Content-Type": "application/json"
                    },
                    timeout=30
                )
                
                response_time = (time.time() - start_time) * 1000  # ms
                
                # Validate response
                test_result = {
                    "scenario": scenario["name"],
                    "status": "passed",
                    "response_time_ms": response_time
                }
                
                if scenario.get("expected_error"):
                    if response.status_code == 200:
                        test_result["status"] = "failed"
                        test_result["reason"] = "Expected error but got success"
                elif response.status_code != 200:
                    test_result["status"] = "failed" 
                    test_result["reason"] = f"Request failed with status {response.status_code}"
                
                # Performance check
                if "max_response_time" in scenario and response_time > scenario["max_response_time"]:
                    test_result["status"] = "failed"
                    test_result["reason"] = f"Response time {response_time:.0f}ms exceeds limit {scenario['max_response_time']}ms"
                
                test_results["tests"].append(test_result)
                
                if test_result["status"] == "failed":
                    test_results["success"] = False
                
                print(f"✅ Test {scenario['name']}: {test_result['status']}")
                
            except Exception as e:
                test_results["tests"].append({
                    "scenario": scenario["name"],
                    "status": "failed",
                    "error": str(e)
                })
                test_results["success"] = False
                print(f"❌ Test {scenario['name']}: {str(e)}")
        
        return test_results

✅ Zadania praktyczne

Zadanie 1: Azure DevOps Setup (45 min)

  1. Skonfiguruj Azure DevOps project
  2. Stwórz service connections dla Azure ML
  3. Zaimplementuj basic CI/CD pipeline
  4. Przetestuj automated training

Zadanie 2: Blue-Green Deployment (30 min)

  1. Zaimplementuj blue-green deployment strategy
  2. Dodaj comprehensive testing
  3. Skonfiguruj automatic rollback
  4. Przetestuj z sample model

Zadanie 3: Monitoring Pipeline (30 min)

  1. Dodaj monitoring do deployment pipeline
  2. Skonfiguruj alerty dla failures
  3. Implement automated notifications
  4. Create deployment dashboard

Zadanie 4: Advanced Automation (15 min)

  1. Triggered retraining na data changes
  2. Automated hyperparameter optimization
  3. Multi-environment deployment
  4. Performance benchmarking automation

🎯 Metryki sukcesu

  • Deployment frequency - daily releases possible
  • Lead time < 4 hours from commit to production
  • Mean time to recovery < 30 minutes
  • Change failure rate < 15%

📚 Materiały dodatkowe

💡 Wskazówka

Każda sesja to 2 godziny intensywnej nauki z praktycznymi ćwiczeniami. Materiały można przeglądać w dowolnym tempie.

📈 Postęp

Śledź swój postęp w nauce AI i przygotowaniu do certyfikacji Azure AI-102. Każdy moduł buduje na poprzednim.