I recently gave a talk at TechTrain, a monthly event in Mechelen (Belgium), hosted by Cronos. The talk is called “GitOps with Kubernetes: a better way to deploy” and is an introduction to GitOps with Weaveworks Flux as an example.
You can find a re-recording of the presentation on Youtube:
In today’s post, we will write a simple operator with Kopf, which is a Python framework created by Zalando. A Kubernetes operator is a piece of software, running in Kubernetes, that does something application specific. To see some examples of what operators are used for, check out operatorhub.io.
Our operator will do something simple in order to easily grasp how it works:
the operator will create a deployment that runs nginx
nginx will serve a static website based on a git repository that you specify; we will use an init container to grab the website from git and store it in a volume
you can control the number of instances via a replicas parameter
That’s great but how will the operator know when it has to do something, like creating or updating resources? We will use custom resources for that. Read on to learn more…
Note that we specified our own API and version in the CRD (baeke.info/v1) and that we set the kind to DemoWeb. In the additionalPrinterColumns, we defined some properties that can be set in the spec that will also be printed on screen. When you list resources of kind DemoWeb, you will the see replicas and gitrepo columns:
Custom resources based on the DemoWeb CRD
Of course, creating the CRD and the custom resources is not enough. To actually create the nginx deployment when the custom resource is created, we need to write and run the operator.
Writing the operator
I wrote the operator on a Mac with Python 3.7.6 (64-bit). On Windows, for best results, make sure you use Miniconda instead of Python from the Windows Store. First install Kopf and the Kubernetes package:
pip3 install kopf kubernetes
Verify you can run kopf:
Running kopf
Let’s write the operator. You can find it in full here. Here’s the first part:
Naturally, we import kopf and other necessary packages. As noted before, kopf and kubernetes will have to be installed with pip. Next, we define a handler that runs whenever a resource of our custom type is spotted by the operator (with the @kopf.on.create decorator). The handler has two parameters:
spec object: allows us to retrieve our custom properties with spec.get (e.g. spec.get(‘replicas’, 1) – the second parameter is the default value)
**kwargs: a dictionary with lots of extra values we can use; we use it to retrieve the name of our custom resource (e.g. demoweb1); we can use that name to derive the name of our deployment and to set labels for our pods
Note: instead of using **kwargs to retrieve the name, you can also define an extra name parameter in the handler like so: def create_fn(spec, name, **kwargs); see the docs for more information
Our deployment is just yaml stored in the doc variable with some help from the Python yaml package. We use spec.get and the name variable to customise it.
After the doc variable, the following code completes the event handler:
The rest of the operator
With kopf.adopt, we make sure the deployment we create is a child of our custom resource. When we delete the custom resource, its children are also deleted.
Next, we simply use the kubernetes client to create a deployment via the apps/v1 api. The method create_namespaced_deployment takes two required parameters: the namespace and the deployment specification. Note there is only minimal error checking here. There is much more you can do with regards to error checking, retries, etc…
Now we can run the operator with:
kopf run operator-filename.py
You can perfectly run this on your local workstation if you have a working kube config pointing at a running cluster with the CRD installed. Kopf will automatically use that for authentication:
Running the operator on your workstation
Running the operator in your cluster
To run the operator in your cluster, create a Dockerfile that produces an image with Python, kopf, kubernetes and your operator in Python. In my case:
FROM python:3.7
RUN mkdir /src
ADD with_create.py /src
RUN pip install kopf
RUN pip install kubernetes
CMD kopf run /src/with_create.py --verbose
We added the verbose parameter for extra logging. Next, run the following commands to build and push the image (example with my image name):
The above is just a regular deployment but the serviceAccountName is extremely important. It gives kopf and your operator the required access rights to create the deployment is the target namespace. Check out the documentation to find out more about the creation of the service account and the required roles. Note that you should only run one instance of the operator!
Once the operator is deployed, you will see it running as a normal pod:
The operator is running
To see what is going on, check the logs. Let’s show them with octant:
Your operator logs
At the bottom, you see what happens when a creation event is detected for a resource of type DemoWeb. The spec is shown with the git repository and the number on replicas.
Now you can create resources of kind DemoWeb and see what happens. If you have your own git repository with some HTML in it, try to use that. Otherwise, just use mine at https://github.com/gbaeke/static-web.
Conclusion
Writing an operator is easy to do with the Kopf framework. Do note that we only touched on the basics to get started. We only have an on.create handler, and no on.update handler. So if you want to increase the number of replicas, you will have to delete the custom resource and create a new one. Based on the example though, it should be pretty easy to fix that. The git repo contains an example of an operator that also implements the on.update handler (with_update.py).
If you have followed my blog a little, you have seen a few posts about GitOps with Flux CD. This time, I am taking a look at Argo CD which, like Flux CD, is a GitOps tool to deploy applications from manifests in a git repository.
Don’t want to read this whole thing?
Here’s the video version of this post
There are several differences between the two tools:
At first glance, Flux appears to use a single git repo for your cluster where Argo immediately introduces the concept of apps. Each app can be connected to a different git repo. However Flux can also use multiple git repositories in the same cluster. See https://github.com/fluxcd/multi-tenancy for more information
Flux has the concept of workloads which can be automated. This means that image repositories are scanned for updates. When an update is available (say from tag v1.0.0 to v1.0.1), Flux will update your application based on filters you specify. As far as I can see, Argo requires you to drive the update from your CI process, which might be preferred.
By default, Argo deploys an administrative UI (next to a CLI) with a full view on your deployment and its dependencies
Argo supports RBAC and integrates with external identity providers (e.g. Azure Active Directory)
The Argo CD admin interface is shown below:
Argo CD admin interface… not too shabby
Let’s take a look at how to deploy Argo and deploy the app you see above. The app is deployed using a single yaml file. Nothing fancy yet such as kustomize or jsonnet.
Deployment
The getting started guide is pretty clear, so do have a look over there as well. To install, just run (with a deployed Kubernetes cluster and kubectl pointing at the cluster):
Next, install the CLI. On a Mac, that is simple (with Homebrew):
brew tap argoproj/tap
brew install argoproj/tap/argocd
You will need access to the API server, which is not exposed over the Internet by default. For testing, port forwarding is easiest. In a separate shell, run the following command:
You can now connect to https://localhost:8080 to get to the UI. You will need the admin password by running:
kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server -o name | cut -d'/' -f 2
You can now login to the UI with the user admin and the displayed password. You should also login from the CLI and change the password with the following commands:
Great! You are all set now to deploy an application.
Deploying an application
We will deploy an application that has a couple of dependencies. Normally, you would install those dependencies with Argo CD as well but since I am using a cluster that has these dependencies installed via Azure DevOps, I will just list what you need (Helm commands):
To know more about these dependencies and use an Azure DevOps YAML pipeline to deploy them, see this post. If you want, you can skip the externaldns installation and create a DNS record yourself that resolves to the public IP address of Nginx Ingress. If you do not want to use an Azure static IP address, you can remove the loadBalancerIP parameter from the first command.
The manifests we will deploy with Argo CD can be found in the following public git repository: https://github.com/gbaeke/argo-demo. The application is in three YAML files:
Two YAML files that create a certificate cluster issuer based on custom resource definitions (CRDs) from cert-manager
realtime.yaml: Redis deployment, Redis service (ClusterIP), realtime web app deployment (based on this), realtime web app service (ClusterIP), ingress resource for https://real.baeke.info (record automatically created by externaldns)
It’s best that you fork my repo and modify realtime.yaml’s ingress resource with your own DNS name.
Create the Argo app
Now you can create the Argo app based on my forked repo. I used the following command with my original repo:
The command above creates an app called realtime based on the specified repo. The app should use the manifests folder and apply (kubectl apply) all the manifests in that folder. The manifests are deployed to the cluster that Argo CD runs in. Note that you can run Argo CD in one cluster and deploy to totally different clusters.
The above command does not configure the repository to be synced automatically, although that is an option. To sync manually, use the following command:
argocd app sync realtime
The application should now be synced and viewable in the UI:
Not Secure because we use Let’s Encrypt staging for this app
Set up auto-sync
Let’s set up this app to automatically sync with the repo (default = every 3 minutes). This can be done from both the CLI and the UI. Let’s do it from the UI. Click on the app and then click App Details. You will find a Sync Policy in the app details where you can enable auto-sync
Setting up auto-sync from the UI
You can now make changes to the git repo like changing the image tag for gbaeke/fluxapp (yes, I used this image with the Flux posts as well 😊 ) to 1.0.6 and wait for the sync to happen. Or sync manually from the CLI or the UI.
Conclusion
This was a quick tour of Argo CD. There is much more you can do but the above should get you started quickly. I must say I quite like the solution and am eager to see what the collaboration of Flux CD, Argo CD and Amazon comes up with in the future.
A while ago, I blogged about an Azure YAML pipeline to deploy AKS together with Traefik. As a variation on that theme, this post talks about deploying AKS together with Nginx, External DNS, a Helm Operator and Flux CD. I blogged about Flux before if you want to know what it does.
Let’s break the pipeline down a little. In what follows, replace AzureMPN with a reference to your own subscription. The first two tasks, AKS deployment and IP address deployment are ARM templates that deploy these resources in Azure. Nothing too special there. Note that the AKS cluster is one with default networking, no Azure AD integration and without VMSS (so no multiple node pools either).
Note: I modified the pipeline to deploy a VMSS-based cluster with a standard load balancer, which is recommended instead of a cluster based on an availability set with a basic load balancer.
The third task takes the output of the IP address deployment and parses out the IP address using jq (last echo statement on one line):
For External DNS to work, I found I had to set controller.publishService.enabled=true. As you can see, the Nginx service is configured to use the IP we created earlier. Azure will create a load balancer with a front end IP configuration that uses this address. This all happens automatically.
Note: controller.metrics.enabled enables a Prometheus scraping endpoint; that is not discussed further in this blog
External DNS
External DNS can automatically add DNS records for ingresses and services you add to Kubernetes. For instance, if I create an ingress for test.baeke.info, External DNS can create this record in the baeke.info zone and use the IP address of the Ingress Controller (nginx here). Installation is pretty straightforward but you need to provide credentials to your DNS provider. In my case, I use CloudFlare. Many others are available. Here is the task:
On CloudFlare, I created a token that has the required access rights to my zone (read, edit). I provide that token to the chart via the CFAPIToken variable defined as a secret on the pipeline. The valueFile looks like this:
In the beginning, it’s best to set the logLevel to debug in case things go wrong. With interval 1m, External DNS checks for ingresses and services every minute and syncs with your DNS zone. Note that External DNS only touches the records it created. It does so by creating TXT records that provide a record that External DNS is indeed the owner.
With External DNS in place, you just need to create an ingress like below to have the A record real.baeke.info created:
This installs the latest version of the operator at the time of this writing (image.repository and image.tag) and also sets Helm to v3. With this installed, you can install a Helm chart by submitting files like below:
You can create files that use kind HelmRelease (HR) because we installed the Helm Operator CRD before. To check installed Helm releases in a namespace, you can run kubectl get hr.
The Helm operator is useful if you want to install Helm charts from a git repository with the help of Flux CD.
The gitURL variable should be set to a git repo that contains your cluster configuration. For instance: gbaeke/demo-clu-flux. Flux will check the repo for changes every minute. Note that we are using a public repo here. Private repos and systems other than GitHub are supported.
Use fluxctl identity to obtain the public key from the key pair created by Flux (when you do not use your own)
Set the public key as a deploy key on the git repo
GitHub deploy key
By connecting the https://github.com/gbaeke/demo-clu-flux repo to Flux CD (as done here), the following is done based on the content of the repo (the complete repo is scanned:
Install InfluxDB Helm chart
Add a simple app that uses a Go socket.io implementation to provide realtime updates based on Redis channel content; this app is published via nginx and real.baeke.info is created in DNS (by External DNS)
Adds a ConfigMap that is used to configure Azure Monitor to enable Prometheus endpoint scraping (to show this can be used for any object you need to add to Kubernetes)
Note that the ingress of the Go app has an annotation (in realtime.yaml, in the git repo) to issue a certificate via cert-manager. If you want to make that work, add an extra task to the pipeline that installs cert-manager:
You will also need to create another namespace, cert-manager, just like we created the fluxcd namespace.
In order to make the above work, you will need Issuers or ClusterIssuers. The repo used by Flux CD contains two ClusterIssuers, one for Let’s Encrypt staging and one for production. The ingress resource uses the production issuer due to the following annotation:
In a previous post, we installed Weaveworks Flux. Flux synchronizes the contents of a git repository with your Kubernetes cluster. Flux can easily be installed via a Helm chart. As an example, we installed Traefik by adding the following yaml to the synced repository:
It does not matter where you put this file because Flux scans the complete repository. I added the file to a folder called traefik.
If you look more closely at the YAML file, you’ll notice its kind is HelmRelease. You need an operator that can handle this type of file, which is this one. In the previous post, we installed the custom resource definition and the operator manually.
Adding a custom application
Now it’s time to add our own application. You do not need to use Helm packages or the Helm operator to install applications. Regular yaml will do just fine.
The application we will deploy needs a Redis backend. Let’s deploy that first. Add the following yaml file to your repository:
After committing this file, wait a moment or run fluxctl sync. When you run kubectl get pods for the default namespace, you should see the Redis pod:
Redis is running — yay!!!
Now it’s time to add the application. I will use an image, based on the following code: https://github.com/gbaeke/realtime-go (httponly branch because master contains code to automatically request a certificate with Let’s Encrypt). I pushed the image to Docker Hub as gbaeke/fluxapp:1.0.0. Now let’s deploy the app with the following yaml:
In the above yaml, replace IP in the Ingress specification to the IP of the external load balancer used by your Ingress Controller. Once you add the yaml to the git repository and you run fluxctl sync the application should be deployed. You see the following page when you browse to http://realtime.IP.xip.io:
Web app deployed via Flux and standard yaml
Great, v1.0.0 of the app is deployed using the gbaeke/fluxapp:1.0.0 image. But what if I have a new version of the image and the yaml specification does not change? Read on…
Upgrading the application
If you have been following along, you can now run the following command:
fluxctl list-workloads -a
This will list all workloads on the cluster, including the ones that were not installed by Flux. If you check the list, none of the workloads are automated. When a workload is automated, it can automatically upgrade the application when a new image appears. Let’s try to automate the fluxapp. To do so, you can either add annotations to your yaml or use fluxctl. Let’s use the yaml approach by adding the following to our deployment:
Note: Flux only works with immutable tags; do not use latest
After committing the file and running fluxctl sync, you can run fluxctl list-workloads -a again. The deployment should now be automated:
fluxapp is now automated
Now let’s see what happens when we add a new version of the image with tag 1.0.1. That image uses a different header color to show the difference. Flux monitors the repository for changes. When it detects a new version of the image that matches the semver filter, it will modify the deployment. Let’s check with fluxctl list-workloads -a:
new image deployed
And here’s the new color:
New color in version 1.0.1. Exciting! 😊
But wait… what about the git repo?
With the configuration of a deploy key, Flux has access to the git repository. When a deployment is automated and the image is changed, that change is also reflected in the git repo:
Weave Flux updated the realtime yaml file
In the yaml, version 1.0.1 is now used:
Flux updated the yaml file
What if I don’t like this release? With fluxctl, you can rollback to a previous version like so:
Rolling back a release – will also update the git repo
Although this works, the deployment will be updated to 1.0.1 again since it is automated. To avoid that, first lock the deployment (or workload) and then force the release of the old image:
In your yaml, there will be an additional annotation: fluxcd.io/locked: ‘true’ and the image will be set to 1.0.0.
Conclusion
In this post, we looked at deploying and updating an application via Flux automation. You only need a couple of annotations to make this work. This was just a simple example. For an example with dev, staging and production branches and promotion from staging to production, be sure to look at https://github.com/fluxcd/helm-operator-get-started as well.
A while ago, I blogged about Linkerd 2.x. In that post, I used a simple calculator API, reachable via an Azure Load Balancer. When you look at that traffic in Linkerd, you see the following:
Incoming load balancer traffic to a meshed deployment (in this case Traefik 2.0)
Above, you do not see this is Azure Load Balancer traffic. The traffic reaches the meshed service via the Azure CNI pods.
In this post, we will install Traefik 2.0, mesh the Traefik deployment and make the calculator service reachable via Traefik and the new IngressRoute. Let’s get started!
Install Traefik 2.0
We will install Traefik 2.0 with http support only. There’s an excellent blog that covers the installation over here. In short, you do the following:
deploy prerequisites such as custom resource definitions (CRDs), ClusterRole, ClusterRoleBinding, ServiceAccount
deploy Traefik 2.0: it’s just a Kubernetes deployment
deploy a service to expose the Traefik HTTP endpoint via a Load Balancer; I used an Azure Load Balancer automatically deployed via Azure Kubernetes Service (AKS)
deploy a service to expose the Traefik admin endpoint via an IngressRoute
Here are the prerequisites for easy copy and pasting:
Here’s the service to expose Traefik’s web endpoint. This is different from the post I referred to because that post used DigitalOcean. I am using Azure here.
The above service definition will give you a public IP. Traffic destined to port 80 on that IP goes to the Traefik pods on port 8000.
Now we can expose the Traefik admin interface via Traefik itself. Note that I am not using any security here. Check the original post for basic auth config via middleware.
Traefik’s admin site is first exposed as a ClusterIP service on port 8080. Next, an object of kind IngressRoute is defined, which is new for Traefik 2.0. You don’t need to create standard Ingress objects and configure Traefik with custom annotations. This new approach is cleaner. Of course, substitute the host with a host that points to the public IP of the load balancer. Or use the IP address with the xip.io domain. If your IP would be 1.1.1.1 then you could use something like admin.1.1.1.1.xip.io. That name automatically resolves to the IP in the name.
Let’s see if we can reach the admin interface:
The new Traefik 2 admin UI
Traefik 2.0 is now installed in a basic way and working properly. We exposed the admin interface but now it is time to expose the calculator API.
Exposing the calculator API
The API is deployed as 5 pods in the add namespace:
Calculator API exposed
The API is exposed as a service of type ClusterIP with only an internal Kubernetes IP. To expose it via Traefik, we create the following object in the add namespace:
I am using xip.io above. Change 1.1.1.1 to the public IP of Traefik’s Azure Load Balancer. The add-svc that exposes the calculator API on port 80 is exposed via Traefik. We can easily call the service via:
curl http://calc.1.1.1.1.xip.io/add/10/10
20
Great! But what is that calcheader middleware? Middlewares modify the requests and responses to and from Traefik 2.0. There are all sorts of middelwares as explained here. You can set headers, configure authentication, perform rate limiting and much much more. In this case we create the following middleware object in the add namespace:
This middleware adds a header to the request before it comes in to Traefik. The header overrides the destination and sets it to the internal DNS name of the add-svc service that exposes the calculator API. This requirement is documented by Linkerd here.
Meshing the Traefik deployment
Because we want to mesh Traefik to get Linkerd metrics and more, we need to inject the Linkerd proxy in the Traefik pods. In my case, Traefik is deployed in the default namespace so the command below can be used:
Make sure you run the command on a system with the linkerd executable in your path and kubectl homed to the cluster that has Linkerd installed.
Checking the traffic in the Linkerd dashboard
With some traffic generated, this is what you should see when you check the meshed deployment that runs the calculator API (deploy/add):
Both the traffic generator (add-cli) and Traefik are meshed which results in a more detailed view of the traffic
If you are wondering what these services are and do, check this post. In the above diagram, we can clearly see we are receiving traffic to the calculator API from Traefik. When I click on Traefik, I see the following:
A view on the meshed Traefik deployment
From the above, we see Traefik receives traffic via the Azure Load Balancer and that it forwards traffic to the calculator service. The live calls are coming from the admin UI which refreshes regularly.
In Grafana, we can get more information about the Traefik deployment:
Linkerd metrics for Traefik in the Grafana dashboard that comes with LinkerdMore metrics
Conclusion
This was just a brief look at both Traefik 2 and “meshing” Traefik with Linkerd. There is much more to say and I have much more to explore. Hopefully, this can get you started!
A while ago, I gave linkerd a spin. Due to vacations and a busy schedule, I was not able to write about my experience. I will briefly discuss how to setup linkerd and then deploy a sample service to illustrate what it can do out of the box. Let’s go!
Wait! What is linkerd?
linkerd basically is a network proxy for your Kubernetes pods that’s designed to be deployed as a service mesh. When the pods you care about have been infused with linkerd, you will automatically get metrics like latency and requests per second, a web portal to check these metrics, live inspection of traffic and much more. Below is an example of a Kubernetes namespace that has been meshed:
A meshed namespace; all deployments in this particular namespace are meshed which means all pods get the linkerd network proxy that provides the metrics and features such as encryption
Installation
I can be very brief about this: installation is about as simple as it gets. Simply navigate to https://linkerd.io/2/getting-started to get started. Here are the simplified steps:
Download the linkerd executable as described in the Getting Started guide; I used WSL for this
Create a Kubernetes cluster with AKS (or another provider); for AKS, use the Azure CLI to get your credentials (az aks get-credentials); make sure the Azure CLI is installed in WSL and that you connected to your Azure subscription with az login
Make sure you can connect to your cluster with kubectl
Run linkerd check –pre to check if prerequisites are fulfilled
Install linkerd with linkerd install | kubectl apply -f –
Check the installation with linkerd check
The last step will nicely show its progress and end when the installation is complete:
linkerd check output
Exploring linkerd with the dashboard
linkerd automatically installs a dashboard. The dashboard is exposed as a Kubernetes service called linkerd-web. The service is of type ClusterIP. Although you could expose the service using an ingress, you can easily tunnel to the service with the following linkerd command (first line is the command; other lines are the output):
linkerd dashboard
Linkerd dashboard available at:
http://127.0.0.1:50750
Grafana dashboard available at:
http://127.0.0.1:50750/grafana
Opening Linkerd dashboard in the default browser
Failed to open Linkerd dashboard automatically
Visit http://127.0.0.1:50750 in your browser to view the dashboard
From WSL, the dashboard can not open automatically but you can manually browse to it. Note that linkerd also installs Prometheus and Grafana.
Out of the box, the linkerd deployment is meshed:
Adding linkerd to your own service
In this section, we will deploy a simple service that can add numbers and add linkerd to it. Although there are many ways to do this, I chose to create a separate namespace and enable auto-injection via an annotation. Here’s the yaml to create the namespace (add-ns.yaml):
Just run kubectl create -f add-ns.yaml to create the namespace. The annotation ensures that all pods added to the namespace get the linkerd proxy in the pod. All traffic to and from the pod will then pass through the proxy.
Now, let’s install the add service and deployment:
The deployment deploys to two pods with the gbaeke/adder image. To deploy the above, save it to a file (add.yaml) and use the following command to deploy:
kubectl create -f add-yaml -n add
Because the deployment uses the add namespace, the linkerd proxy will be added to each pod automatically. When you list the pods in the deployment, you see:
Each add pod has two containers: the actual add container based on gbaeke/adder and the proxy
To see more details about one of these pods, I can use the following command:
k get po add-5b48fcc894-2dc97 -o yaml -n add
You will clearly see the two containers in the output:
Two containers in the pod: actual service (gbaeke/adder) and the linkerd proxy
Generating some traffic
Let’s deploy a client that continuously uses the calculator service:
Save the above to add-cli.yaml and deploy with the below command:
kubectl create -f add-cli.yaml -n add
The deployment uses another image called gbaeke/adder-cli that continuously makes requests to the server specified in the SERVER environment variable.
Checking the deployment in the linkerd portal
When you now open the add namespace in the linked portal, you should see something similar to the below screenshot (note: I deployed 5 servers and 5 clients):
A view on the add namespace; linkerd has learned how the deployments talk to eachother
The linkerd proxy in all pods sees all traffic. From the traffic, it can infer that the add-cli deployment talks to the add deployment. The add deployment receives about 150 requests per second. The 99th percentile latency is relatively high because the cluster nodes are very small, I deployed more instances and the client is relatively inefficient.
When I click the deployment called add, the following screen is shown:
A view on the deployment
The deployment clearly shows where traffic is coming from plus relevant metrics such as RPS and P99 latency. You also get a view on the live calls now. Note that the client is using GRPC which uses a HTTP POST. When you scroll down on this page, you get more information about the caller and a view on the individual pods:
A view on the inbound calls to the deployment plus a view on the pods
To see live calls in more detail, you can click the Tap icon:
A live view on the calls with Tap
For each call, details can be requested:
Request details
Conclusion
This was just a brief look at linkerd. It is trivially easy to install and with auto-injection, very simple to add it to your own services. Highly recommended to give it a spin to see where it can add value to your projects!
In the previous post, we looked at API Management with Kong and the Kong Ingress Controller. We did not care about security and exposed a sample toy API over a public HTTP endpoint that also required an API key. All in the clear, no firewall, no WAF, nothing… 👎👎👎
In this post, we will expose the API over TLS and configure Kong to use a CloudFlare origin certificate. An origin certificate is issued and trusted by CloudFlare to connect to the origin, which in our case is an API hosted on Kubernetes.
The API consumer will not connect directly to the Kubernetes-hosted API exposed via Kong. Instead, the consumer connects to CloudFlare over TLS and uses a certificate issued by CloudFlare that is fully trusted by browsers and other clients.
The traffic flow is as follows:
Consumer --> CloudFlare (TLS with fully trusted cert, WAF, ...) --> Kong Ingress (TLS with origin cert) --> API (HTTP)
Configuring Kong
Refer to the previous post for installation instructions. The YAML files to configure the Ingress, KongIngress, Consumer, etc… are almost the same. The Ingress resource has the following changes:
We use a new hostname api.baeke.info
We configure TLS for api.baeke.info by referring to a secret called baeke.info.tls which contains the CloudFlare origin certificate.
We use an additional Kong plugin which provides whitelisting of CloudFlare addresses; only CloudFlare is allowed to connect to the Ingress
Here is the plugin definition for whitelisting with the current (June 15th, 2019) list of IP ranges used by CloudFlare. Note that you have to supply the addresses and ranges as an array. The documentation shows a comma-separated list! 🤷♂️
In the previous post, the protocols array contained the http value.
Note: for whitelisting to work, the Kong proxy service needs externalTrafficPolicy set to Local. Use kubectl edit svc kong-kong-proxy to modify that setting. You can set this value at deployment time as well. This might or might not work for you. I used AKS where this produces the desired outcome.
CloudFlare
Get the external IP of the kong-kong-proxy service and create a DNS entry for it. I created a A record for api.baeke.info:
Make sure the orange cloud is active. In this case, this means that requests for api.baeke.info are proxied by CloudFlare. That allows us to cache, enable WAF (web application firewall), rate limiting and more!
In the Firewall section, WAF is turned on. Note that this is a paying feature!
WAF to protect your API
In Crypto, Universal SSL is turned on and set to Full (strict).
Full (strict) means that CloudFlare connects to your origin over HTTPS and that it expects a valid certificate, which is checked. An origin certificate, issued by CloudFlare but not trusted by your operating system is also valid. As stated above, I use such an origin certificate at the Ingress level.
The origin certificate can be issued and/or downloaded from the Crypto section:
Origin certs
I created an origin certificate for *.baeke.info and baeke.info and downloaded the certificate and private key in PEM format. I then encoded the contents of the certificate and key in base64 format and used them in a secret:
As you have seen in the Ingress definition, it referred to this secret via its name, baeke.info.tls.
When a consumer connects to the API, the fully trusted certificate issued by CloudFlare is used:
Universal SSL cert from CloudFlare
We also make sure consumers of the API need to use TLS:
Force HTTPS at the CloudFlare level
With the above configuration, consumers need to securely connect to https://api.baeke.info at CloudFlare. CloudFlare connects securely to the origin, which is the external IP of the ingress. Only CloudFlare is allowed to connect to that external IP because of the whitelisting configuration.
Testing the API
Let’s try the API with the http tool:
Connecting to the API
All sorts of headers are added by CloudFlare which makes it clear that CloudFlare is proxying the requests. When we don’t add a key or specify a wrong one:
Kong is still doing its work
The key is now securely sent from consumer to CloudFlare to origin. Phew! 😎
Conclusion
In this post, we hosted an API on Kubernetes, exposed it with Kong and secured it with CloudFlare. This example can easily be extended with multiple Kong proxies for high availability and multiple APIs (/users, /orders, /products, …) that are all protected by CloudFlare with end-to-end encryption and WAF. CloudFlare lends an extra helping hand by automatically generating both the “front-end” and origin certificates.
In a follow-up post, we will look at an alternative approach via Azure Front Door Service. Stay tuned!
We will install Traefik with Helm and I assume the cluster has rbac enabled. If you deploy clusters with AKS, that is the default although you can turn it off. With rbac enabled, you need to install the server-side component of Helm, tiller, using the following commands:
The above command uses Helm to install the stable/traefik chart. Note that the chart is maintained by the community and not by the folks at Traefik. Traefik itself is exposed via a service of type LoadBalancer, which results in a public IP address. Use kubectl get svc traefik -n kube-system to check. There are ways to make sure the service uses a static IP but that is not discussed in this post. Check out this doc for AKS. The other settings do the following:
ssl.enabled: yes, SSL 😉
ssl.enforced: redirect to https when user uses http
acme.enabled: enable Let’s Encrypt
acme.email: set the e-mail address to use with Let’s Encrypt; you will get certificate expiry mails on that address
onHostRule: issue certificates based on the host setting in the ingress definition
acme.challengeType: method used by Let’s Encrypt to issue the certificate; use this one for regular certs; use DNS verification for wildcard certs
acme.staging: set to false to issue fully trusted certs; beware of rate limiting
dashboard.enabled: enable the Traefik dashboard; you can expose the service via an ingress object as well
Note: to specify a specific version of Traefik, use the imageTag parameter as part of –set; for instance imageTag=1.7.12
When the installation is finished, run the following commands:
# check installation
helm ls
# check traefik service
kubectl get svc traefik --namespace kube-system -w
The first command should show that Traefik is installed. The second command returns the traefik service, which we configured with serviceType LoadBalancer. The external IP of the service will be pending for a while. When you have an address and you browse it, you should get a 404. Result from curl -v below:
Rebuilt URL to: http://IP/
Trying 137.117.140.116…
Connected to 137.117.140.116 (IP) port 80 (#0)
GET / HTTP/1.1
Host: IP
User-Agent: curl/7.47.0
Accept: /
< HTTP/1.1 404 Not Found
< Content-Type: text/plain; charset=utf-8
< Vary: Accept-Encoding
< X-Content-Type-Options: nosniff
< Date: Fri, 24 May 2019 17:00:29 GMT
< Content-Length: 19
<
404 page not found
Next, install nginx just to have a simple website to securely publish. Yes I know, kubectl run… 🤷
kubectl run nginx --image nginx --expose --port 80
The above command installs nginx but also creates an nginx service of type ClusterIP. We can expose that service via an ingress definition:
Replace your.domain.com with a host that resolves to the external IP address of the Traefik service. The annotation is not technically required if Traefik is the only Ingress Controller in your cluster. I prefer being explicit though. Save the above contents to a file and then run:
kubectl apply -f yourfile.yaml
Now browse to whatever you used as domain. The result should be:
Yes… nginx exposed via Traefik and a Let’s Encrypt certificate
To expose the Traefik dashboard, use the yaml below. Note that we explicitly installed the dashboard by setting dashboard.enabled to true.
Put the above contents in a file and create the ingress object in the same namespace as the traefik-dashboard service. Use kubectl apply -f yourfile.yaml -n kube-system. You should then be able to access the dashboard with the host name you provided:
Traefik dashboard
Note: if you do not want to mess with DNS records that map to the IP address of the Ingress Controller, just use a xip.io address. In the ingress object’s host setting, use something like web.w.x.y.z.xip.io where web is just something you choose and w.x.y.z is the IP address of the Ingress Controller. Traefik will also request a certificate for such a name. For more information, check xip.io. Simple for testing purposes!
In this short post, we will take a look at Cloud Run on Google Kubernetes Engine (GKE). To get this to work, you will need to deploy a Kubernetes cluster. Make sure you use nodes with at least 2 vCPUs and 7.5 GB of memory. Take a look here for more details. You will notice that you need to include Istio which will make the option to enable Cloud Run on GKE available.
To create a Cloud Run service on GKE, navigate to Cloud Run in the console and click Create Service. For location, you can select your Kubernetes cluster. In the screenshot below, the default namespace of my cluster gebacr in zone us-central1-a was chosen:
Cloud Run service on GKE
In Connectivity, select external:
External connectivity to the service
In the optional settings, you can specify the allocated memory and maximum requests per container.
When finished, you will see a deployment on your cluster:
Cloud Run Kubernetes deployment (note that the Cloud Run service is nasnet-gke)
Notice that, like with Cloud Run without GKE, the deployment is scaled to zero when it is not in use!
To fix that, you can patch the domain name to something that can be resolved, for instance a xip.io address. First get the external IP of the istio-ingressgateway:
kubectl get service istio-ingressgateway --namespace istio-system
Next, patch the config-domain configmap to replace example.com with <EXTERNALIP>.xip.io
In my example Cloud Run service, I now get the following URL (not the actual IP):
http://nasnet-gke.default.107.198.183.182.xip.io/
Note: instead of patching the domain, you could also use curl to connect to the external IP of the ingress and pass the host header nasnet-gke.default.example.com.
With that URL, I can connect to the service. In case of a cold start (when the ReplicaSet has been scaled to 0), it takes a bit longer that “native” Cloud Run which takes a second or so.
It is clear that connecting to the Cloud Run service on GKE takes a bit more work than with “native” Cloud Run. Enabling HTTPS is also more of a pain on GKE where in “native” Cloud Run, you merely need to validate your domain and Google will configure a Let’s Encrypt certificate for the domain name you have configured. Cloud Run cold starts also seem faster.
That’s it for this quick look. In general, try to use Cloud Run versus Cloud Run on GKE as much as possible. Less fuss, more productivity! 😉