Deploy AKS with Nginx, External DNS, Helm Operator and Flux

A while ago, I blogged about an Azure YAML pipeline to deploy AKS together with Traefik. As a variation on that theme, this post talks about deploying AKS together with Nginx, External DNS, a Helm Operator and Flux CD. I blogged about Flux before if you want to know what it does.

Video version (1.5x speed recommended)

I added the Azure DevOps pipeline to the existing GitHub repo, in the nginx-dns-helm-flux folder.

Let’s break the pipeline down a little. In what follows, replace AzureMPN with a reference to your own subscription. The first two tasks, AKS deployment and IP address deployment are ARM templates that deploy these resources in Azure. Nothing too special there. Note that the AKS cluster is one with default networking, no Azure AD integration and without VMSS (so no multiple node pools either).

Note: I modified the pipeline to deploy a VMSS-based cluster with a standard load balancer, which is recommended instead of a cluster based on an availability set with a basic load balancer.

The third task takes the output of the IP address deployment and parses out the IP address using jq (last echo statement on one line):

task: Bash@3
      name: GetIP
      inputs:
        targetType: 'inline'
        script: |
          echo "##vso[task.setvariable variable=test-ip;]$(echo '$(armoutputs)' | jq .ipaddress.value -r)"

The IP address is saved in a variable test-ip for easy reuse later.

Next, we install kubectl and Helm v3. Indeed, Azure DevOps now supports installation of Helm v3 with:

- task: HelmInstaller@1
      inputs:
        helmVersionToInstall: 'latest'

Next, we need to run a script to achieve a couple of things:

  • Get AKS credentials with Azure CLI
  • Add Helm repositories
  • Install a custom resource definition (CRD) for the Helm operator

This is achieved with the following inline Bash script:

- task: AzureCLI@1
      inputs:
        azureSubscription: 'AzureMPN'
        scriptLocation: 'inlineScript'
        inlineScript: |
          az aks get-credentials -g $(aksTestRG) -n $(aksTest) --admin
          helm repo add stable https://kubernetes-charts.storage.googleapis.com/
          helm repo add fluxcd https://charts.fluxcd.io
          helm repo update
          kubectl apply -f https://raw.githubusercontent.com/fluxcd/helm-operator/master/deploy/flux-helm-release-crd.yaml

Next, we create a Kubernetes namespace called fluxcd. I create the namespace with some inline YAML in the Kubernetes@1 task:

- task: Kubernetes@1
      inputs:
        connectionType: 'None'
        command: 'apply'
        useConfigurationFile: true
        configurationType: 'inline'
        inline: |
          apiVersion: v1
          kind: Namespace
          metadata:
            name: fluxcd

It’s best to use the approach above instead of kubectl create ns. If the namespace already exists, you will not get an error.

Now we are ready to deploy Nginx, External DNS, Helm operator and Flux CD

Nginx

This is a pretty basic installation with the Azure DevOps Helm task:

- task: HelmDeploy@0
      inputs:
        connectionType: 'None'
        namespace: 'kube-system'
        command: 'upgrade'
        chartType: 'Name'
        chartName: 'stable/nginx-ingress'
        releaseName: 'nginx'
        overrideValues: 'controller.service.loadBalancerIP=$(test-ip),controller.publishService.enabled=true,controller.metrics.enabled=true'

For External DNS to work, I found I had to set controller.publishService.enabled=true. As you can see, the Nginx service is configured to use the IP we created earlier. Azure will create a load balancer with a front end IP configuration that uses this address. This all happens automatically.

Note: controller.metrics.enabled enables a Prometheus scraping endpoint; that is not discussed further in this blog

External DNS

External DNS can automatically add DNS records for ingresses and services you add to Kubernetes. For instance, if I create an ingress for test.baeke.info, External DNS can create this record in the baeke.info zone and use the IP address of the Ingress Controller (nginx here). Installation is pretty straightforward but you need to provide credentials to your DNS provider. In my case, I use CloudFlare. Many others are available. Here is the task:

- task: HelmDeploy@0
      inputs:
        connectionType: 'None'
        namespace: 'kube-system'
        command: 'upgrade'
        chartType: 'Name'
        chartName: 'stable/external-dns'
        releaseName: 'externaldns'
        overrideValues: 'cloudflare.apiToken=$(CFAPIToken)'
        valueFile: 'externaldns/values.yaml'

On CloudFlare, I created a token that has the required access rights to my zone (read, edit). I provide that token to the chart via the CFAPIToken variable defined as a secret on the pipeline. The valueFile looks like this:

rbac:
  create: true

provider: cloudflare

logLevel: debug

cloudflare:
  apiToken: CFAPIToken
  email: email address
  proxied: false

interval: "1m"

policy: sync # or upsert-only

domainFilters: [ 'baeke.info' ]

In the beginning, it’s best to set the logLevel to debug in case things go wrong. With interval 1m, External DNS checks for ingresses and services every minute and syncs with your DNS zone. Note that External DNS only touches the records it created. It does so by creating TXT records that provide a record that External DNS is indeed the owner.

With External DNS in place, you just need to create an ingress like below to have the A record real.baeke.info created:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: realtime-ingress
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
  - host: real.baeke.info
    http:
      paths:
      - path: /
        backend:
          serviceName: realtime
          servicePort: 80

Helm Operator

The Helm Operator allows us to install Helm chart by simply using a yaml file. First, we install the operator:

- task: HelmDeploy@0
      name: HelmOp
      displayName: Install Flux CD Helm Operator
      inputs:
        connectionType: 'None'
        namespace: 'kube-system'
        command: 'upgrade'
        chartType: 'Name'
        chartName: 'fluxcd/helm-operator'
        releaseName: 'helm-operator'
        overrideValues: 'extraEnvs[0].name=HELM_VERSION,extraEnvs[0].value=v3,image.repository=docker.io/fluxcd/helm-operator-prerelease,image.tag=helm-v3-dev-53b6a21d'
        arguments: '--namespace fluxcd'

This installs the latest version of the operator at the time of this writing (image.repository and image.tag) and also sets Helm to v3. With this installed, you can install a Helm chart by submitting files like below:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: influxdb
  namespace: default
spec:
  releaseName: influxdb
  chart:
    repository: https://charts.bitnami.com/bitnami
    name: influxdb
    version: 0.2.4

You can create files that use kind HelmRelease (HR) because we installed the Helm Operator CRD before. To check installed Helm releases in a namespace, you can run kubectl get hr.

The Helm operator is useful if you want to install Helm charts from a git repository with the help of Flux CD.

Flux CD

Deploy Flux CD with the following task:

- task: HelmDeploy@0
      name: FluxCD
      displayName: Install Flux CD
      inputs:
        connectionType: 'None'
        namespace: 'fluxcd'
        command: 'upgrade'
        chartType: 'Name'
        chartName: 'fluxcd/flux'
        releaseName: 'flux'
        overrideValues: 'git.url=git@github.com:$(gitURL),git.pollInterval=1m'

The gitURL variable should be set to a git repo that contains your cluster configuration. For instance: gbaeke/demo-clu-flux. Flux will check the repo for changes every minute. Note that we are using a public repo here. Private repos and systems other than GitHub are supported.

Take a look at GitOps with Weaveworks Flux for further instructions. Some things you need to do:

  • Install fluxctl
  • Use fluxctl identity to obtain the public key from the key pair created by Flux (when you do not use your own)
  • Set the public key as a deploy key on the git repo
GitHub deploy key

By connecting the https://github.com/gbaeke/demo-clu-flux repo to Flux CD (as done here), the following is done based on the content of the repo (the complete repo is scanned:

  • Install InfluxDB Helm chart
  • Add a simple app that uses a Go socket.io implementation to provide realtime updates based on Redis channel content; this app is published via nginx and real.baeke.info is created in DNS (by External DNS)
  • Adds a ConfigMap that is used to configure Azure Monitor to enable Prometheus endpoint scraping (to show this can be used for any object you need to add to Kubernetes)

Note that the ingress of the Go app has an annotation (in realtime.yaml, in the git repo) to issue a certificate via cert-manager. If you want to make that work, add an extra task to the pipeline that installs cert-manager:

- task: HelmDeploy@0
      inputs:
        connectionType: 'None'
        namespace: 'cert-manager'
        command: 'upgrade'
        chartType: 'Name'
        chartName: 'jetstack/cert-manager'
        releaseName: 'cert-manager'
        arguments: '--version v0.12.0'

You will also need to create another namespace, cert-manager, just like we created the fluxcd namespace.

In order to make the above work, you will need Issuers or ClusterIssuers. The repo used by Flux CD contains two ClusterIssuers, one for Let’s Encrypt staging and one for production. The ingress resource uses the production issuer due to the following annotation:

cert-manager.io/cluster-issuer: "letsencrypt-prod" 

The Go app that is deployed by Flux now has TLS enabled by default:

https on the Go app

I often use this deployment in demo’s of all sorts. I hope it is helpful for you too in that way!

GitOps with Weaveworks Flux – Installing and Updating Applications

In a previous post, we installed Weaveworks Flux. Flux synchronizes the contents of a git repository with your Kubernetes cluster. Flux can easily be installed via a Helm chart. As an example, we installed Traefik by adding the following yaml to the synced repository:

apiVersion: helm.fluxcd.io/v1
kind: HelmRelease
metadata:
  name: traefik
  namespace: default
  annotations:
    fluxcd.io/ignore: "false"
spec:
  releaseName: traefik
  chart:
    repository: https://kubernetes-charts.storage.googleapis.com/
    name: traefik
    version: 1.78.0
  values:
    serviceType: LoadBalancer
    rbac:
      enabled: true
    dashboard:
      enabled: true   

It does not matter where you put this file because Flux scans the complete repository. I added the file to a folder called traefik.

If you look more closely at the YAML file, you’ll notice its kind is HelmRelease. You need an operator that can handle this type of file, which is this one. In the previous post, we installed the custom resource definition and the operator manually.

Adding a custom application

Now it’s time to add our own application. You do not need to use Helm packages or the Helm operator to install applications. Regular yaml will do just fine.

The application we will deploy needs a Redis backend. Let’s deploy that first. Add the following yaml file to your repository:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  labels:
    app: redis       
spec:
  selector:
    matchLabels:     
      app: redis
  replicas: 1        
  template:          
    metadata:
      labels:        
        app: redis
    spec:            
      containers:
      - name: redis
        image: redis
        resources:
          requests:
            cpu: 200m
            memory: 100Mi
        ports:
        - containerPort: 6379
---        
apiVersion: v1
kind: Service        
metadata:
  name: redis
  labels:            
    app: redis
spec:
  ports:
  - port: 6379       
    targetPort: 6379
  selector:          
    app: redis

After committing this file, wait a moment or run fluxctl sync. When you run kubectl get pods for the default namespace, you should see the Redis pod:

Redis is running — yay!!!

Now it’s time to add the application. I will use an image, based on the following code: https://github.com/gbaeke/realtime-go (httponly branch because master contains code to automatically request a certificate with Let’s Encrypt). I pushed the image to Docker Hub as gbaeke/fluxapp:1.0.0. Now let’s deploy the app with the following yaml:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: realtime
  labels:
    app: realtime       
spec:
  selector:
    matchLabels:     
      app: realtime
  replicas: 1        
  template:          
    metadata:
      labels:        
        app: realtime
    spec:            
      containers:
      - name: realtime
        image: gbaeke/fluxapp:1.0.0
        env:
        - name: REDISHOST
          value: "redis:6379"
        resources:
          requests:
            cpu: 50m
            memory: 50Mi
          limits:
            cpu: 150m
            memory: 150Mi
        ports:
        - containerPort: 8080
---        
apiVersion: v1
kind: Service        
metadata:
  name: realtime
  labels:            
    app: realtime
spec:
  ports:
  - port: 80       
    targetPort: 8080
  selector:          
    app: realtime
---
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: realtime-ingress
spec:
  rules:
  - host: realtime.IP.xip.io
    http:
      paths:
      - path: /
        backend:
          serviceName: realtime
          servicePort: 80

In the above yaml, replace IP in the Ingress specification to the IP of the external load balancer used by your Ingress Controller. Once you add the yaml to the git repository and you run fluxctl sync the application should be deployed. You see the following page when you browse to http://realtime.IP.xip.io:

Web app deployed via Flux and standard yaml

Great, v1.0.0 of the app is deployed using the gbaeke/fluxapp:1.0.0 image. But what if I have a new version of the image and the yaml specification does not change? Read on…

Upgrading the application

If you have been following along, you can now run the following command:

fluxctl list-workloads -a

This will list all workloads on the cluster, including the ones that were not installed by Flux. If you check the list, none of the workloads are automated. When a workload is automated, it can automatically upgrade the application when a new image appears. Let’s try to automate the fluxapp. To do so, you can either add annotations to your yaml or use fluxctl. Let’s use the yaml approach by adding the following to our deployment:

annotations:
    flux.weave.works/automated: "true"
    flux.weave.works/tag.realtime: semver:~1.0

Note: Flux only works with immutable tags; do not use latest

After committing the file and running fluxctl sync, you can run fluxctl list-workloads -a again. The deployment should now be automated:

fluxapp is now automated

Now let’s see what happens when we add a new version of the image with tag 1.0.1. That image uses a different header color to show the difference. Flux monitors the repository for changes. When it detects a new version of the image that matches the semver filter, it will modify the deployment. Let’s check with fluxctl list-workloads -a:

new image deployed

And here’s the new color:

New color in version 1.0.1. Exciting! 😊

But wait… what about the git repo?

With the configuration of a deploy key, Flux has access to the git repository. When a deployment is automated and the image is changed, that change is also reflected in the git repo:

Weave Flux updated the realtime yaml file

In the yaml, version 1.0.1 is now used:

Flux updated the yaml file

What if I don’t like this release? With fluxctl, you can rollback to a previous version like so:

Rolling back a release – will also update the git repo

Although this works, the deployment will be updated to 1.0.1 again since it is automated. To avoid that, first lock the deployment (or workload) and then force the release of the old image:

fluxctl lock -w=deployment/realtime

fluxctl release -n default --workload=deployment/realtime --update-image=gbaeke/fluxapp:1.0.0 --force

In your yaml, there will be an additional annotation: fluxcd.io/locked: ‘true’ and the image will be set to 1.0.0.

Conclusion

In this post, we looked at deploying and updating an application via Flux automation. You only need a couple of annotations to make this work. This was just a simple example. For an example with dev, staging and production branches and promotion from staging to production, be sure to look at https://github.com/fluxcd/helm-operator-get-started as well.

The basics of meshing Traefik 2.0 with Linkerd

A while ago, I blogged about Linkerd 2.x. In that post, I used a simple calculator API, reachable via an Azure Load Balancer. When you look at that traffic in Linkerd, you see the following:

Incoming load balancer traffic to a meshed deployment (in this case Traefik 2.0)

Above, you do not see this is Azure Load Balancer traffic. The traffic reaches the meshed service via the Azure CNI pods.

In this post, we will install Traefik 2.0, mesh the Traefik deployment and make the calculator service reachable via Traefik and the new IngressRoute. Let’s get started!

Install Traefik 2.0

We will install Traefik 2.0 with http support only. There’s an excellent blog that covers the installation over here. In short, you do the following:

  • deploy prerequisites such as custom resource definitions (CRDs), ClusterRole, ClusterRoleBinding, ServiceAccount
  • deploy Traefik 2.0: it’s just a Kubernetes deployment
  • deploy a service to expose the Traefik HTTP endpoint via a Load Balancer; I used an Azure Load Balancer automatically deployed via Azure Kubernetes Service (AKS)
  • deploy a service to expose the Traefik admin endpoint via an IngressRoute

Here are the prerequisites for easy copy and pasting:

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ingressroutes.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: IngressRoute
    plural: ingressroutes
    singular: ingressroute
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: ingressroutetcps.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: IngressRouteTCP
    plural: ingressroutetcps
    singular: ingressroutetcp
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: middlewares.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: Middleware
    plural: middlewares
    singular: middleware
  scope: Namespaced

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: tlsoptions.traefik.containo.us

spec:
  group: traefik.containo.us
  version: v1alpha1
  names:
    kind: TLSOption
    plural: tlsoptions
    singular: tlsoption
  scope: Namespaced

---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: traefik-ingress-controller

rules:
  - apiGroups:
      - ""
    resources:
      - services
      - endpoints
      - secrets
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - extensions
    resources:
      - ingresses/status
    verbs:
      - update
  - apiGroups:
      - traefik.containo.us
    resources:
      - middlewares
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - traefik.containo.us
    resources:
      - ingressroutes
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - traefik.containo.us
    resources:
      - ingressroutetcps
    verbs:
      - get
      - list
      - watch
  - apiGroups:
      - traefik.containo.us
    resources:
      - tlsoptions
    verbs:
      - get
      - list
      - watch

---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: traefik-ingress-controller

roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: traefik-ingress-controller
subjects:
  - kind: ServiceAccount
    name: traefik-ingress-controller
    namespace: default

---
apiVersion: v1
kind: ServiceAccount
metadata:
  namespace: default
  name: traefik-ingress-controller

Save this to a file and then use kubectl apply -f filename.yaml. Here’s the deployment:

kind: Deployment
apiVersion: extensions/v1beta1
metadata:
  namespace: default
  name: traefik
  labels:
    app: traefik

spec:
  replicas: 2
  selector:
    matchLabels:
      app: traefik
  template:
    metadata:
      labels:
        app: traefik
    spec:
      serviceAccountName: traefik-ingress-controller
      containers:
        - name: traefik
          image: traefik:v2.0
          args:
            - --api
            - --accesslog
            - --entrypoints.web.Address=:8000
            - --entrypoints.web.forwardedheaders.insecure=true
            - --providers.kubernetescrd
            - --ping
            - --accesslog=true
            - --log=true
          ports:
            - name: web
              containerPort: 8000
            - name: admin
              containerPort: 8080

Here’s the service to expose Traefik’s web endpoint. This is different from the post I referred to because that post used DigitalOcean. I am using Azure here.

apiVersion: v1
kind: Service
metadata:
  name: traefik
spec:
  type: LoadBalancer
  ports:
    - protocol: TCP
      name: web
      port: 80
      targetPort: 8000
  selector:
    app: traefik

The above service definition will give you a public IP. Traffic destined to port 80 on that IP goes to the Traefik pods on port 8000.

Now we can expose the Traefik admin interface via Traefik itself. Note that I am not using any security here. Check the original post for basic auth config via middleware.

apiVersion: v1
kind: Service
metadata:
  name: traefik-admin
spec:
  type: ClusterIP
  ports:
    - protocol: TCP
      name: admin
      port: 8080
  selector:
    app: traefik
---
apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: traefik-admin
spec:
  entryPoints:
    - web
  routes:
  - match: Host(`somehost.somedomain.com`) && PathPrefix(`/`)
    kind: Rule
    priority: 1
    services:
    - name: traefik-admin
      port: 8080

Traefik’s admin site is first exposed as a ClusterIP service on port 8080. Next, an object of kind IngressRoute is defined, which is new for Traefik 2.0. You don’t need to create standard Ingress objects and configure Traefik with custom annotations. This new approach is cleaner. Of course, substitute the host with a host that points to the public IP of the load balancer. Or use the IP address with the xip.io domain. If your IP would be 1.1.1.1 then you could use something like admin.1.1.1.1.xip.io. That name automatically resolves to the IP in the name.

Let’s see if we can reach the admin interface:

The new Traefik 2 admin UI

Traefik 2.0 is now installed in a basic way and working properly. We exposed the admin interface but now it is time to expose the calculator API.

Exposing the calculator API

The API is deployed as 5 pods in the add namespace:

Calculator API exposed

The API is exposed as a service of type ClusterIP with only an internal Kubernetes IP. To expose it via Traefik, we create the following object in the add namespace:

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRoute
metadata:
  name: calc-svc
  namespace: add  
spec:
  entryPoints:
    - web
  routes:
  - match: Host(`calc.1.1.1.1.xip.io`) && PathPrefix(`/`)
    kind: Rule
    priority: 1
    middlewares:
      - name: calcheader
    services:
    - name: add-svc
      port: 80

I am using xip.io above. Change 1.1.1.1 to the public IP of Traefik’s Azure Load Balancer. The add-svc that exposes the calculator API on port 80 is exposed via Traefik. We can easily call the service via:

curl http://calc.1.1.1.1.xip.io/add/10/10

20

Great! But what is that calcheader middleware? Middlewares modify the requests and responses to and from Traefik 2.0. There are all sorts of middelwares as explained here. You can set headers, configure authentication, perform rate limiting and much much more. In this case we create the following middleware object in the add namespace:

apiVersion: traefik.containo.us/v1alpha1
kind: Middleware
metadata:
  name: calcheader
  namespace: add
spec:
  headers:
    customRequestHeaders:
      l5d-dst-override: "add-svc.add.svc.cluster.local:80"

This middleware adds a header to the request before it comes in to Traefik. The header overrides the destination and sets it to the internal DNS name of the add-svc service that exposes the calculator API. This requirement is documented by Linkerd here.

Meshing the Traefik deployment

Because we want to mesh Traefik to get Linkerd metrics and more, we need to inject the Linkerd proxy in the Traefik pods. In my case, Traefik is deployed in the default namespace so the command below can be used:

kubectl get deploy -o yaml | linkerd inject - | kubectl apply -f - 

Make sure you run the command on a system with the linkerd executable in your path and kubectl homed to the cluster that has Linkerd installed.

Checking the traffic in the Linkerd dashboard

With some traffic generated, this is what you should see when you check the meshed deployment that runs the calculator API (deploy/add):

Both the traffic generator (add-cli) and Traefik are meshed which results in a more detailed view of the traffic

If you are wondering what these services are and do, check this post. In the above diagram, we can clearly see we are receiving traffic to the calculator API from Traefik. When I click on Traefik, I see the following:

A view on the meshed Traefik deployment

From the above, we see Traefik receives traffic via the Azure Load Balancer and that it forwards traffic to the calculator service. The live calls are coming from the admin UI which refreshes regularly.

In Grafana, we can get more information about the Traefik deployment:

Linkerd metrics for Traefik in the Grafana dashboard that comes with Linkerd
More metrics

Conclusion

This was just a brief look at both Traefik 2 and “meshing” Traefik with Linkerd. There is much more to say and I have much more to explore. Hopefully, this can get you started!

Giving linkerd a spin

A while ago, I gave linkerd a spin. Due to vacations and a busy schedule, I was not able to write about my experience. I will briefly discuss how to setup linkerd and then deploy a sample service to illustrate what it can do out of the box. Let’s go!

Wait! What is linkerd?

linkerd basically is a network proxy for your Kubernetes pods that’s designed to be deployed as a service mesh. When the pods you care about have been infused with linkerd, you will automatically get metrics like latency and requests per second, a web portal to check these metrics, live inspection of traffic and much more. Below is an example of a Kubernetes namespace that has been meshed:

A meshed namespace; all deployments in this particular namespace are meshed which means all pods get the linkerd network proxy that provides the metrics and features such as encryption

Installation

I can be very brief about this: installation is about as simple as it gets. Simply navigate to https://linkerd.io/2/getting-started to get started. Here are the simplified steps:

  • Download the linkerd executable as described in the Getting Started guide; I used WSL for this
  • Create a Kubernetes cluster with AKS (or another provider); for AKS, use the Azure CLI to get your credentials (az aks get-credentials); make sure the Azure CLI is installed in WSL and that you connected to your Azure subscription with az login
  • Make sure you can connect to your cluster with kubectl
  • Run linkerd check –pre to check if prerequisites are fulfilled
  • Install linkerd with linkerd install | kubectl apply -f –
  • Check the installation with linkerd check

The last step will nicely show its progress and end when the installation is complete:

linkerd check output

Exploring linkerd with the dashboard

linkerd automatically installs a dashboard. The dashboard is exposed as a Kubernetes service called linkerd-web. The service is of type ClusterIP. Although you could expose the service using an ingress, you can easily tunnel to the service with the following linkerd command (first line is the command; other lines are the output):

linkerd dashboard

Linkerd dashboard available at:
http://127.0.0.1:50750
Grafana dashboard available at:
http://127.0.0.1:50750/grafana
Opening Linkerd dashboard in the default browser
Failed to open Linkerd dashboard automatically
Visit http://127.0.0.1:50750 in your browser to view the dashboard

From WSL, the dashboard can not open automatically but you can manually browse to it. Note that linkerd also installs Prometheus and Grafana.

Out of the box, the linkerd deployment is meshed:

Adding linkerd to your own service

In this section, we will deploy a simple service that can add numbers and add linkerd to it. Although there are many ways to do this, I chose to create a separate namespace and enable auto-injection via an annotation. Here’s the yaml to create the namespace (add-ns.yaml):

apiVersion: v1
kind: Namespace
metadata:
  name: add
  annotations:
    linkerd.io/inject: enabled

Just run kubectl create -f add-ns.yaml to create the namespace. The annotation ensures that all pods added to the namespace get the linkerd proxy in the pod. All traffic to and from the pod will then pass through the proxy.

Now, let’s install the add service and deployment:

apiVersion: v1
kind: Service
metadata:
  name: add-svc
spec:
  ports:
  - port: 80
    name: http
    protocol: TCP
    targetPort: 8000
  - port: 8080
    name: grpc
    protocol: TCP
    targetPort: 8080
  selector:
    app: add
    version: v1
  type: LoadBalancer
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: add
spec:
  replicas: 2
  selector:
    matchLabels:
      app: add
  template:
    metadata:
      labels:
        app: add
        version: v1
    spec:
      containers:
      - name: add
        image: gbaeke/adder

The deployment deploys to two pods with the gbaeke/adder image. To deploy the above, save it to a file (add.yaml) and use the following command to deploy:

kubectl create -f add-yaml -n add

Because the deployment uses the add namespace, the linkerd proxy will be added to each pod automatically. When you list the pods in the deployment, you see:

Each add pod has two containers: the actual add container based on gbaeke/adder and the proxy

To see more details about one of these pods, I can use the following command:

k get po add-5b48fcc894-2dc97 -o yaml -n add

You will clearly see the two containers in the output:

Two containers in the pod: actual service (gbaeke/adder) and the linkerd proxy

Generating some traffic

Let’s deploy a client that continuously uses the calculator service:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: add-cli
spec:
  replicas: 1
  selector:
    matchLabels:
      app: add-cli
  template:
    metadata:
      labels:
        app: add-cli
    spec:
      containers:
      - name: add-cli
        image: gbaeke/adder-cli
        env:
        - name: SERVER
          value: "add-svc"

Save the above to add-cli.yaml and deploy with the below command:

kubectl create -f add-cli.yaml -n add

The deployment uses another image called gbaeke/adder-cli that continuously makes requests to the server specified in the SERVER environment variable.

Checking the deployment in the linkerd portal

When you now open the add namespace in the linked portal, you should see something similar to the below screenshot (note: I deployed 5 servers and 5 clients):

A view on the add namespace; linkerd has learned how the deployments talk to eachother

The linkerd proxy in all pods sees all traffic. From the traffic, it can infer that the add-cli deployment talks to the add deployment. The add deployment receives about 150 requests per second. The 99th percentile latency is relatively high because the cluster nodes are very small, I deployed more instances and the client is relatively inefficient.

When I click the deployment called add, the following screen is shown:

A view on the deployment

The deployment clearly shows where traffic is coming from plus relevant metrics such as RPS and P99 latency. You also get a view on the live calls now. Note that the client is using GRPC which uses a HTTP POST. When you scroll down on this page, you get more information about the caller and a view on the individual pods:

A view on the inbound calls to the deployment plus a view on the pods

To see live calls in more detail, you can click the Tap icon:

A live view on the calls with Tap

For each call, details can be requested:

Request details

Conclusion

This was just a brief look at linkerd. It is trivially easy to install and with auto-injection, very simple to add it to your own services. Highly recommended to give it a spin to see where it can add value to your projects!

Securing your API with Kong and CloudFlare

In the previous post, we looked at API Management with Kong and the Kong Ingress Controller. We did not care about security and exposed a sample toy API over a public HTTP endpoint that also required an API key. All in the clear, no firewall, no WAF, nothing… πŸ‘ŽπŸ‘ŽπŸ‘Ž

In this post, we will expose the API over TLS and configure Kong to use a CloudFlare origin certificate. An origin certificate is issued and trusted by CloudFlare to connect to the origin, which in our case is an API hosted on Kubernetes.

The API consumer will not connect directly to the Kubernetes-hosted API exposed via Kong. Instead, the consumer connects to CloudFlare over TLS and uses a certificate issued by CloudFlare that is fully trusted by browsers and other clients.

The traffic flow is as follows:

Consumer --> CloudFlare (TLS with fully trusted cert, WAF, ...) --> Kong Ingress (TLS with origin cert) --> API (HTTP)

Configuring Kong

Refer to the previous post for installation instructions. The YAML files to configure the Ingress, KongIngress, Consumer, etc… are almost the same. The Ingress resource has the following changes:

  • We use a new hostname api.baeke.info
  • We configure TLS for api.baeke.info by referring to a secret called baeke.info.tls which contains the CloudFlare origin certificate.
  • We use an additional Kong plugin which provides whitelisting of CloudFlare addresses; only CloudFlare is allowed to connect to the Ingress

Here is the full definition:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: func
  namespace: default
  annotations:
    kubernetes.io/ingress.class: kong
    plugins.konghq.com: http-auth, whitelist
spec:
  tls:
  - hosts:
    - api.baeke.info
    secretName: baeke.info.tls # cloudflare origin cert
  rules:
    - host: api.baeke.info
      http:
        paths:
        - path: /users
          backend:
            serviceName: func
            servicePort: 80

Here is the plugin definition for whitelisting with the current (June 15th, 2019) list of IP ranges used by CloudFlare. Note that you have to supply the addresses and ranges as an array. The documentation shows a comma-separated list! πŸ€·β€β™‚οΈ

apiVersion: configuration.konghq.com/v1
kind: KongPlugin
metadata:
  name: whitelist
  namespace: default
config:
  whitelist: 
  - 173.245.48.0/20
  - 103.21.244.0/22
  - 103.22.200.0/22
  - 103.31.4.0/22
  - 141.101.64.0/18
  - 108.162.192.0/18
  - 190.93.240.0/20
  - 188.114.96.0/20
  - 197.234.240.0/22
  - 198.41.128.0/17
  - 162.158.0.0/15
  - 104.16.0.0/12
  - 172.64.0.0/13
  - 131.0.72.0/22
plugin: ip-restriction 

I also made a change to the KongIngress resource, to only allow https to the back-end service. Only the route section is shown below:

route:
 methods:
 - GET
 regex_priority: 0
 strip_path: true
 preserve_host: true
 protocols:
 - https 

In the previous post, the protocols array contained the http value.

Note: for whitelisting to work, the Kong proxy service needs externalTrafficPolicy set to Local. Use kubectl edit svc kong-kong-proxy to modify that setting. You can set this value at deployment time as well. This might or might not work for you. I used AKS where this produces the desired outcome.

CloudFlare

Get the external IP of the kong-kong-proxy service and create a DNS entry for it. I created a A record for api.baeke.info:

Make sure the orange cloud is active. In this case, this means that requests for api.baeke.info are proxied by CloudFlare. That allows us to cache, enable WAF (web application firewall), rate limiting and more!

In the Firewall section, WAF is turned on. Note that this is a paying feature!

WAF to protect your API

In Crypto, Universal SSL is turned on and set to Full (strict).

Full (strict) means that CloudFlare connects to your origin over HTTPS and that it expects a valid certificate, which is checked. An origin certificate, issued by CloudFlare but not trusted by your operating system is also valid. As stated above, I use such an origin certificate at the Ingress level.

The origin certificate can be issued and/or downloaded from the Crypto section:

Origin certs

I created an origin certificate for *.baeke.info and baeke.info and downloaded the certificate and private key in PEM format. I then encoded the contents of the certificate and key in base64 format and used them in a secret:

apiVersion: v1
kind: Secret
metadata:
  name: baeke.info.tls
  namespace: default
type: kubernetes.io/tls
data:
  tls.crt: base64-encoded-cert
  tls.key: base64-endoced-key

As you have seen in the Ingress definition, it referred to this secret via its name, baeke.info.tls.

When a consumer connects to the API, the fully trusted certificate issued by CloudFlare is used:

Universal SSL cert from CloudFlare

We also make sure consumers of the API need to use TLS:

Force HTTPS at the CloudFlare level

With the above configuration, consumers need to securely connect to https://api.baeke.info at CloudFlare. CloudFlare connects securely to the origin, which is the external IP of the ingress. Only CloudFlare is allowed to connect to that external IP because of the whitelisting configuration.

Testing the API

Let’s try the API with the http tool:

Connecting to the API

All sorts of headers are added by CloudFlare which makes it clear that CloudFlare is proxying the requests. When we don’t add a key or specify a wrong one:

Kong is still doing its work

The key is now securely sent from consumer to CloudFlare to origin. Phew! 😎

Conclusion

In this post, we hosted an API on Kubernetes, exposed it with Kong and secured it with CloudFlare. This example can easily be extended with multiple Kong proxies for high availability and multiple APIs (/users, /orders, /products, …) that are all protected by CloudFlare with end-to-end encryption and WAF. CloudFlare lends an extra helping hand by automatically generating both the “front-end” and origin certificates.

In a follow-up post, we will look at an alternative approach via Azure Front Door Service. Stay tuned!

Quick overview of Traefik Ingress Controller Installation

This post is mainly a note to self πŸ“πŸ“πŸ“ that describes a quick way to deploy a Kubernetes Ingress Controller with Traefik.

There is also a video version:

We will install Traefik with Helm and I assume the cluster has rbac enabled. If you deploy clusters with AKS, that is the default although you can turn it off. With rbac enabled, you need to install the server-side component of Helm, tiller, using the following commands:

kubectl apply -f tiller-rbac.yaml
helm init --service-account tiller

The file tiller-rbac.yaml should contain the following:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system 

Note that you create an account that has cluster-wide admin privileges. That’s guaranteed to work but might not be what you want.

Next, install the Traefik Ingress Controller with the following Helm one-liner:

helm install stable/traefik --name traefik --set serviceType=LoadBalancer,rbac.enabled=true,ssl.enabled=true,ssl.enforced=true,acme.enabled=true,acme.email=email@domain.com,onHostRule=true,acme.challengeType=tls-alpn-01,acme.staging=false,dashboard.enabled=true --namespace kube-system 

The above command uses Helm to install the stable/traefik chart. Note that the chart is maintained by the community and not by the folks at Traefik. Traefik itself is exposed via a service of type LoadBalancer, which results in a public IP address. Use kubectl get svc traefik -n kube-system to check. There are ways to make sure the service uses a static IP but that is not discussed in this post. Check out this doc for AKS. The other settings do the following:

  • ssl.enabled: yes, SSL πŸ˜‰
  • ssl.enforced: redirect to https when user uses http
  • acme.enabled: enable Let’s Encrypt
  • acme.email: set the e-mail address to use with Let’s Encrypt; you will get certificate expiry mails on that address
  • onHostRule: issue certificates based on the host setting in the ingress definition
  • acme.challengeType: method used by Let’s Encrypt to issue the certificate; use this one for regular certs; use DNS verification for wildcard certs
  • acme.staging: set to false to issue fully trusted certs; beware of rate limiting
  • dashboard.enabled: enable the Traefik dashboard; you can expose the service via an ingress object as well

Note: to specify a specific version of Traefik, use the imageTag parameter as part of –set; for instance imageTag=1.7.12

When the installation is finished, run the following commands:

# check installation
helm ls

# check traefik service
kubectl get svc traefik --namespace kube-system -w

The first command should show that Traefik is installed. The second command returns the traefik service, which we configured with serviceType LoadBalancer. The external IP of the service will be pending for a while. When you have an address and you browse it, you should get a 404. Result from curl -v below:

 Rebuilt URL to: http://IP/
 Trying 137.117.140.116…
 Connected to 137.117.140.116 (IP) port 80 (#0) 
 GET / HTTP/1.1
 Host: IP
 User-Agent: curl/7.47.0
 Accept: /
 < HTTP/1.1 404 Not Found
 < Content-Type: text/plain; charset=utf-8
 < Vary: Accept-Encoding
 < X-Content-Type-Options: nosniff
 < Date: Fri, 24 May 2019 17:00:29 GMT
 < Content-Length: 19
 <
 404 page not found 

Next, install nginx just to have a simple website to securely publish. Yes I know, kubectl run… 🀷

kubectl run nginx --image nginx --expose --port 80

The above command installs nginx but also creates an nginx service of type ClusterIP. We can expose that service via an ingress definition:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: nginx
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
    - host: your.domain.com
      http:
        paths:
        - path: /
          backend:
            serviceName: nginx
            servicePort: 80

Replace your.domain.com with a host that resolves to the external IP address of the Traefik service. The annotation is not technically required if Traefik is the only Ingress Controller in your cluster. I prefer being explicit though. Save the above contents to a file and then run:

kubectl apply -f yourfile.yaml

Now browse to whatever you used as domain. The result should be:

Yes… nginx exposed via Traefik and a Let’s Encrypt certificate

To expose the Traefik dashboard, use the yaml below. Note that we explicitly installed the dashboard by setting dashboard.enabled to true.

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  name: traefikdb
  annotations:
    kubernetes.io/ingress.class: traefik
spec:
  rules:
    - host: yourother.domain.com
      http:
        paths:
        - path: /
          backend:
            serviceName: traefik-dashboard
            servicePort: 80

Put the above contents in a file and create the ingress object in the same namespace as the traefik-dashboard service. Use kubectl apply -f yourfile.yaml -n kube-system. You should then be able to access the dashboard with the host name you provided:

Traefik dashboard

Note: if you do not want to mess with DNS records that map to the IP address of the Ingress Controller, just use a xip.io address. In the ingress object’s host setting, use something like web.w.x.y.z.xip.io where web is just something you choose and w.x.y.z is the IP address of the Ingress Controller. Traefik will also request a certificate for such a name. For more information, check xip.io. Simple for testing purposes!

Hope it helps!

Cloud Run on Google Kubernetes Engine

In this short post, we will take a look at Cloud Run on Google Kubernetes Engine (GKE). To get this to work, you will need to deploy a Kubernetes cluster. Make sure you use nodes with at least 2 vCPUs and 7.5 GB of memory. Take a look here for more details. You will notice that you need to include Istio which will make the option to enable Cloud Run on GKE available.

To create a Cloud Run service on GKE, navigate to Cloud Run in the console and click Create Service. For location, you can select your Kubernetes cluster. In the screenshot below, the default namespace of my cluster gebacr in zone us-central1-a was chosen:

Cloud Run service on GKE

In Connectivity, select external:

External connectivity to the service

In the optional settings, you can specify the allocated memory and maximum requests per container.

When finished, you will see a deployment on your cluster:

Cloud Run Kubernetes deployment (note that the Cloud Run service is nasnet-gke)

Notice that, like with Cloud Run without GKE, the deployment is scaled to zero when it is not in use!

To connect to the service, check the URL given to you by Cloud Run. It will be in the form of: http://SERVICE.NAMESPACE.example.com. For example: http://nasnet-gke.default.example.com. Clearly, we will not be able to connect to that from the browser.

To fix that, you can patch the domain name to something that can be resolved, for instance a xip.io address. First get the external IP of the istio-ingressgateway:

kubectl get service istio-ingressgateway --namespace istio-system

Next, patch the config-domain configmap to replace example.com with <EXTERNALIP>.xip.io

kubectl patch configmap config-domain --namespace knative-serving --patch \
'{"data": {"example.com": null, "[EXTERNAL-IP].xip.io": ""}}'

In my example Cloud Run service, I now get the following URL (not the actual IP):

http://nasnet-gke.default.107.198.183.182.xip.io/

Note: instead of patching the domain, you could also use curl to connect to the external IP of the ingress and pass the host header nasnet-gke.default.example.com.

With that URL, I can connect to the service. In case of a cold start (when the ReplicaSet has been scaled to 0), it takes a bit longer that “native” Cloud Run which takes a second or so.

It is clear that connecting to the Cloud Run service on GKE takes a bit more work than with “native” Cloud Run. Enabling HTTPS is also more of a pain on GKE where in “native” Cloud Run, you merely need to validate your domain and Google will configure a Let’s Encrypt certificate for the domain name you have configured. Cloud Run cold starts also seem faster.

That’s it for this quick look. In general, try to use Cloud Run versus Cloud Run on GKE as much as possible. Less fuss, more productivity! πŸ˜‰