argo-rollouts-demo

Canary Releases with Argo Rollouts and Pixie

Use Pixie analytics to drive canary releases to make your application deployment process safer and faster. Canary Releases with Argo Rollouts and Pixie is the accompanying blog post for this demo.

What is this demo?

This demo uses Pixie to perform canary analysis as part of a Argo Rollout canary analysis example project. A Pixie metrics server is deployed with an endpoint that returns HTTP error rate per specified pod. This error rate is used by the Argo Rollouts controller to determine whether to promote or rollback an application upgrade.

Requirements

A Kubernetes cluster. If you don't already have one, follow these directions to create a minikube cluster.
Install Pixie into the cluster
Install kubectl
Install kustomize

Setup the Pixie Metrics Server

The Pixie metrics server has an /error-rate/<namespace>/<pod(s)> endpoint that returns HTTP error rate per specified pod(s).

Clone this repo and navigate to the argo-rollouts-demo folder:

git clone https://github.com/pixie-io/pixie-demos.git
cd pixie-demos/argo-rollouts-demo

Create a secret containing the Pixie API credentials for your Kubernetes cluster:

# Get your current cluster name from your Kubernetes context.
kubectl config current-context

# Get the Pixie Cluster ID for the above cluster name.
# Record the value of the `ID` column for this cluster.
px get viziers

# Create an API key. Record the value of the `Key` parameter.
px api-key create

# Create `px-metrics` namespace.
kubectl create namespace px-metrics

# Create the secret.
kubectl -n px-metrics create secret generic px-credentials --from-literal=px-api-key=<YOUR API KEY VALUE HERE> --from-literal=px-cluster-id=<YOUR CLUSTER ID VALUE HERE>

[Optional] If using self-hosted Pixie Cloud, update the PX_CLOUD_ADDR in px-metrics.yaml.
Create the Pixie metrics provider in your Kubernetes cluster in the px-metrics namespace:

kubectl apply -f px-metrics.yaml

Setup the Demo Application

Install Argo Rollouts onto your cluster with:

kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml

Install the Argo Rollouts kubectl plugin. To use Homebrew, run:

brew install argoproj/tap/kubectl-argo-rollouts

Apply the manifests (including the application Rollout and AnalysisTemplate):

kustomize build . | kubectl apply -f -

Get the EXTERNAL-IP address for the canary-demo-preview service:

kubectl get svc canary-demo-preview

Navigate to the External IP in your browser to access the demo application front-end. Leave this open.

Each of these colorful squares represents a request made by the browser to the demo application backend.

The backend responds with a color that indicates which version of the app it is. In this case we're getting blue boxes for the application image with the blue tag set in the Rollout yaml.

The bar chart at the bottom represents the percentage of requests that were handled by the different backends (stable, canary). Currently you should see all requests are handled by the stable backend. We will see the stable and canary backends split the traffic in the next step.

Successful Canary Rollout

Watch the rollout live with the following command. Leave this tab open.

kubectl argo rollouts get rollout canary-demo --watch

In another tab, modify the Rollout application image to trigger an upgrade:

kubectl argo rollouts set image canary-demo "*=argoproj/rollouts-demo:yellow"

You should now see blue (stable) running alongside yellow (canary):

On our first step, we direct 50% of traffic to the blue release and 50% to yellow release.

Argo Rollouts splits traffic between versions by creating a new replica set that uses the same service object and the service will still split the traffic evenly across pods (new and old). In other words, controlling the number of pods controls the traffic percentage.

The Rollout controller queries the Pixie metric server to get HTTP error rate for the canary pods every 30 seconds.

Let's look at our front-end:

The bar chart on the bottom shows us that the requests are being roughly equally split between the blue (stable) and yellow (canary) versions.

Let's check the kubectl plugin:

After 2 minutes, the error rate has met the success criteria and the yellow canary image is fully promoted.

Unsuccessful Canary Rollout (HTTP error rate)

Let's again modify the image tag of the application Rollout to trigger an upgrade. This time we'll update it to a buggy application image which returns 500 errors for most requests.

kubectl argo rollouts set image canary-demo "*=argoproj/rollouts-demo:bad-red"

Let's look at our front-end:

We see requests are being split between the yellow (stable) and red (canary) version.

Let's check the kubectl plugin:

The Rollout controller queries the Pixie metric server to get HTTP error rate for the canary pods every 30 seconds.

After 30 seconds or so (the length of time it takes to get the analysis results back), the analysis should return an HTTP error rate that does not meet the successCondition defined in the pixie-analysis.yaml file.

The rollout fails and the Rollout controller automatically rolls the deployment back to the yellow (stable) version.

To see the result of the analysis run:

kubectl get analysisrun
kubectl get analysisrun <ANALYSISRUN_NAME> -o yaml

Let's inspect the results of the analysis run:

The analysis run output shows us the value Pixie measured for HTTP error rate of the canary release.

The HTTP error rate value of 82% is well above the criteria we defined for a successful release.

Note that if you don't have the front-end open in your browser, no requests will be made to the backend (meaning no errors will be returned) so Pixie will report an error rate of 0.

Development

This tutorial used Pixie to analyze the performance of the canary deployment. Pixie can generate many different types of metrics, not just HTTP error rate and latency by pod.

Pixie can generate metrics by pod, service, node, or container.

Other example metrics Pixie can generate:

Latency, error rate, and throughput for our supported protocols.
Latency, error rate, and throughput by request path (including wildcards, such as /orders//item/)
System metrics such as CPU, network utilization, memory utilization
Application CPU profiles
See our example PxL scripts for additional examples

If you'd like to modify the Pixie metrics server to return different metrics, you'll need to build and deploy a new version:

Make your changes to pixie_metrics_server/pixie-metric-provider.go and build a new version of the server image:

docker build . -t <YOUR DOCKER IMAGE PATH HERE>

(Optional) Push your version of the image. This is not necessary if your cluster has access to your local Docker images.

docker push <YOUR DOCKER IMAGE PATH HERE>:latest

Depending on your ImagePullPolicy, delete and recreate the px-metrics deployment:

kubectl -n px-metrics delete deployment px-metrics
kubectl apply -f px-metrics.yaml

Bugs & Features

Feel free to file a bug or an issue for a feature request. You can also join our Slack community.

Name		Name	Last commit message	Last commit date
parent directory ..
.readme_assets		.readme_assets
canary		canary
px_metrics_server		px_metrics_server
README.md		README.md
kustomization.yaml		kustomization.yaml
px-metrics.yaml		px-metrics.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Canary Releases with Argo Rollouts and Pixie

What is this demo?

Requirements

Setup the Pixie Metrics Server

Setup the Demo Application

Successful Canary Rollout

Unsuccessful Canary Rollout (HTTP error rate)

Development

Bugs & Features

FilesExpand file tree

argo-rollouts-demo

Directory actions

More options

Directory actions

More options

Latest commit

History

argo-rollouts-demo

Folders and files

parent directory

README.md

Canary Releases with Argo Rollouts and Pixie

What is this demo?

Requirements

Setup the Pixie Metrics Server

Setup the Demo Application

Successful Canary Rollout

Unsuccessful Canary Rollout (HTTP error rate)

Development

Bugs & Features