As explained on https://github.com/rancher/rio, Rancher Rio is a MicroPaaS that can be layered on top of any standard Kubernetes cluster. It makes it easier to deploy, scale, version and expose services. In this post, we will take a quick look at some of its basic capabilities.
To follow along, make sure you have a Kubernetes cluster running. I deployed a standard AKS cluster with three nodes. In your shell (I used Ubuntu Bash on Windows), install Rio:
curl -sfL https://get.rio.io | sh -
After installation, check the version of Rio with:
rio version v0.1.1-rc1 (cdb75cf1)
With v0.1.1 there was an issue with deploying the registry component. v0.1.1-rc1 fixes that.
Make sure you have kubectl installed and that its context points to the cluster in which you want to deploy Rio. If that is the case, just run the following command:
The above command will install a bunch of components in the rio-system namespace. After a while, running kubectl get po -n rio-system should show the list below:
Rio will install Istio and expose a service mesh gateway via a service of type load balancer. With AKS, this will result in an Azure load balancer that sends traffic to the service mesh gateway. When you deploy Rio services, you can automatically get a DNS name that will resolve to the external IP of the Azure load balancer.
Let’s install such a Rio service. We will use the following application: https://github.com/gbaeke/realtime-go. Instead of the master branch, we will deploy the httponly branch. The repo contains a Dockerfile with a two-stage build that results in a web application that displays messages published to redis in real time. Before we deploy the application, deploy redis with the following command:
kubectl run redis --image redis --port 6379 --expose
Now deploy the realtime-go app with Rio:
rio run -p 8080/http -n realtime --build-branch httponly --env REDISHOST=redis:6379 https://github.com/gbaeke/realtime-go.git
Rio makes it easy to deploy the application because it will pull the specified branch of the git repo and build the container image based on the Dockerfile. The above command also sets an environment variable that is used by the realtime-go code to find the redis host.
When the build is finished, the image is stored in the internal registry. You can check builds with rio builds. Get the build logs with rio build logsimagename. For example:
rio build logs default/realtime:7acdc6dfed59c1b93f2def1a84376a880aac9f5d
The result would be something like:
The rio run command results in a deployed service. Run rio ps to check this:
Notice that you also get a URL which is publicly accessible over SSL via a Let’s Encrypt certificate:
Just for fun, you can publish a message to the redis channel that this app checks for:
The above commands should display the message in the web app:
To check the logs of the deployed service, run rio logs servicename. The result should be:
When you run rio –system ps you will see the rio system services. One of the services is Grafana, which contains Istio dashboards. Grab the URL of that service to access the dashboards:
Even in this early version, Rio works quite well. It is very simple to install and it takes the grunt work out of deploying services on Kubernetes. Going from source code repository to a published service is just a single command, which is a bit similar to OpenShift. Highly recommended to give it a go when you have some time!
In earlier posts (like here and here) I mentioned GoCV. GoCV allows you to use the popular OpenCV library from your Go programs. To avoid installing OpenCV and having to compile it from source, a container that runs your GoCV app can be beneficial. This post provides information about doing just that.
They are over on Docker Hub, ready for use. To actually use the above images in a typical two-step build, I used the following Dockerfile:
FROM gbaeke/gocv-4.0.0-build as build RUN go get -u -d gocv.io/x/gocv RUN go get -u -d github.com/disintegration/imaging RUN go get -u -d github.com/gbaeke/emotion RUN cd $GOPATH/src/github.com/gbaeke/emotion && go build -o $GOPATH/bin/emo ./main.go
FROM gbaeke/gocv-4.0.0-run COPY --from=build /go/bin/emo /emo ADD haarcascade_frontalface_default.xml /
docker run -it --rm --device=/dev/video0 --env SCOREURI="YOUR-SCORE-URI" --env VIDEO=0 gbaeke/emo
The SCOREURI environment variable needs to refer to the score URI offered by the ONNX FER+ container as discussed in Detecting Emotions with FER+. With VIDEO=0 the GUI window that shows the webcam video stream is turned off (required). Detected emotions will be logged to the console.
To be able to use the actual webcam of the host, the –device flag is used to map /dev/video0 from the host to the container. That works well on a Linux host and was tested on a laptop running Ubuntu 16.04.
In a previous post, I discussed the creation of a container image that uses the ResNet50v2 model for image classification. If you want to perform tasks such as localization or segmentation, there are other models that serve that purpose. The image was built with GPU support. Adding GPU support was pretty easy:
Use the enable_gpu flag in the Azure Machine Learning SDK or check the GPU box in the Azure Portal; the service will build an image that supports NVIDIA cuda
Add GPU support in your score.py file and/or conda dependencies file (scoring script uses the ONNX runtime, so we added the onnxruntime-gpu package)
In this post, we will deploy the image to a Kubernetes cluster with GPU nodes. We will use Azure Kubernetes Service (AKS) for this purpose. Check my previous post if you want to use NVIDIA V100 GPUs. In this post, I use hosts with one V100 GPU.
To get started, make sure you have the Kubernetes cluster deployed and that you followed the steps in my previous post to create the GPU container image. Make sure you attached the cluster to the workspace’s compute.
Deploy image to Kubernetes
Click the container image you created from the previous post and deploy it to the Kubernetes cluster you attached to the workspace by clicking + Create Deployment:
The Create Deployment screen is shown. Select AKS as deployment target and select the Kubernetes cluster you attached. Then press Create.
Azure Machine Learning now deploys the containers to Kubernetes. Note that I said containers in plural. In addition to the scoring container, another front–end container is added as well. You send your requests to the front-end container using HTTP POST. The front-end container talks to the scoring container over TCP port 5001 and passes the result back. The front-end container can be configured with certificates to support SSL.
Check the deployment and wait until it is healthy. We did not specify advanced settings during deployment so the default settings were chosen. Click the deployment to see the settings:
As you can see, the deployment has authentication enabled. When you send your HTTP POST request to the scoring URI, make sure you pass an authentication header like so: bearer primary-or-secondary-key. The primary and secondary key are in the settings above. You can regenerate those keys at any time.
Checking the deployment
From the Azure Cloud Shell, issue the following commands in order to list the pods deployed to your Kubernetes cluster:
az aks list -o table
az aks get-credentials -g RESOURCEGROUP -n CLUSTERNAME
kubectl get pods
Azure Machine Learning has deployed three front-ends (default; can be changed via Advanced Settings during deployment) and one scoring container. Let’s check the container with: kubectl get pod onnxgpu-5d6c65789b-rnc56 -o yaml. Replace the container name with yours. In the output, you should find the following:
The data field is a multi-dimensional array, serialized to JSON. The shape of the array is (1,3,224,224). The dimensions correspond to the batch size, channels (RGB), height and width.
You only have to read an image and put the pixel values in the array! Easy right? Well, as usual the answer is: “it depends”! The easiest way to do it, according to me, is with Python and a collection of helper packages. The code is in the following GitHub gist: https://gist.github.com/gbaeke/b25849f3813e9eb984ee691659d1d05a. You need to run the code on a machine with Python 3 installed. Make sure you also install Keras and NumPy (pip3 install keras / pip3 install numpy). The code uses two images, cat.jpg and car.jpg but you can use your own. When I run the code, I get the following result:
Using TensorFlow backend. channels_last Loading and preprocessing image… cat.jpg Array shape (224, 224, 3) Array shape afer moveaxis: (3, 224, 224) Array shape after expand_dims (1, 3, 224, 224) prediction time (as measured by the scoring container) 0.025304794311523438 Probably a: Egyptian_cat 0.9460222125053406 Loading and preprocessing image… car.jpg Array shape (224, 224, 3) Array shape afer moveaxis: (3, 224, 224) Array shape after expand_dims (1, 3, 224, 224) prediction time (as measured by the scoring container) 0.02526378631591797 Probably a: sports_car 0.948998749256134
It takes about 25 milliseconds to classify an image, or 40 images/second. By increasing the number of GPUs and scoring containers (we only deployed one), we can easily scale out the solution.
With a bit of help from Keras and NumPy, the code does the following:
check the image format reported by the keras back-end: it reports channels_last which means that, by default, the RGB channels are the last dimensions of the image array
load the image; the resulting array has a (224,224,3) shape
our container expects the channels_first format; we use moveaxis to move the last axis to the front; the array now has a (3,224,224) shape
our container expects a first dimension with a batch size; we use expand_dims to end up with a (1,3,224,224) shape
we convert the 4D array to a list and construct the JSON payload
we send the payload to the scoring URI and pass an authorization header
we get a JSON response with two fields: result and time; we print the inference time as reported by the container
from keras.applications.resnet50, we use the decode_predictions class to process the result field; result contains the 1000 values computed by the softmax function in the container; decode_predictions knows the categories and returns the first five
we print the name and probability of the category with the highest probability (item 0)
What happens when you use a scoring container that uses the CPU? In that case, you could run the container in Azure Container Instances (ACI). Using ACI is much less costly! In ACI with the default setting of 0.1 CPU, it will take around 2 seconds to score an image. Ouch! With a full CPU (in ACI), the scoring time goes down to around 180-220ms per image. To achieve better results, simply increase the number of CPUs. On the Standard_NC6s_v3 Kubernetes node with 6 cores, scoring time with CPU hovers around 60ms.
In this post, you have seen how Azure Machine Learning makes it straightforward to deploy GPU scoring images to a Kubernetes cluster with GPU nodes. The service automatically configures the resource requests for the GPU and maps the NVIDIA drivers to the scoring container. The only thing left to do is to start scoring images with the service. We have seen how easy that is with a bit of help from Keras and NumPy. In practice, always start with CPU scoring and scale out that solution to match your requirements. But if you do need GPUs for scoring, Azure Machine Learning makes it pretty easy to do so!