A first look at Rancher Rio

As explained on https://github.com/rancher/rio, Rancher Rio is a MicroPaaS that can be layered on top of any standard Kubernetes cluster. It makes it easier to deploy, scale, version and expose services. In this post, we will take a quick look at some of its basic capabilities.

To follow along, make sure you have a Kubernetes cluster running. I deployed a standard AKS cluster with three nodes. In your shell (I used Ubuntu Bash on Windows), install Rio:

curl -sfL https://get.rio.io | sh - 

After installation, check the version of Rio with:

rio --version
rio version v0.1.1-rc1 (cdb75cf1)

With v0.1.1 there was an issue with deploying the registry component. v0.1.1-rc1 fixes that.

Make sure you have kubectl installed and that its context points to the cluster in which you want to deploy Rio. If that is the case, just run the following command:

rio install

The above command will install a bunch of components in the rio-system namespace. After a while, running kubectl get po -n rio-system should show the list below:

Rio installed

Rio will install Istio and expose a service mesh gateway via a service of type load balancer. With AKS, this will result in an Azure load balancer that sends traffic to the service mesh gateway. When you deploy Rio services, you can automatically get a DNS name that will resolve to the external IP of the Azure load balancer.

Let’s install such a Rio service. We will use the following application: https://github.com/gbaeke/realtime-go. Instead of the master branch, we will deploy the httponly branch. The repo contains a Dockerfile with a two-stage build that results in a web application that displays messages published to redis in real time. Before we deploy the application, deploy redis with the following command:

kubectl run redis --image redis --port 6379 --expose

Now deploy the realtime-go app with Rio:

rio run -p 8080/http -n realtime --build-branch httponly --env REDISHOST=redis:6379 https://github.com/gbaeke/realtime-go.git

Rio makes it easy to deploy the application because it will pull the specified branch of the git repo and build the container image based on the Dockerfile. The above command also sets an environment variable that is used by the realtime-go code to find the redis host.

When the build is finished, the image is stored in the internal registry. You can check builds with rio builds. Get the build logs with rio build logs imagename. For example:

rio build logs default/realtime:7acdc6dfed59c1b93f2def1a84376a880aac9f5d

The result would be something like:

build logs

The rio run command results in a deployed service. Run rio ps to check this:

rio ps displays the deployed service

Notice that you also get a URL which is publicly accessible over SSL via a Let’s Encrypt certificate:

Application on public endpoint using a staging Let’s Encrypt cert

Just for fun, you can publish a message to the redis channel that this app checks for:

kubectl exec -it redis-pod /bin/sh
redis-cli
127.0.0.1:6379> publish device01 Hello

The above commands should display the message in the web app:

Great success!!!

To check the logs of the deployed service, run rio logs servicename. The result should be:

Logs from the realtime-go service

When you run rio –system ps you will see the rio system services. One of the services is Grafana, which contains Istio dashboards. Grab the URL of that service to access the dashboards:

One of the Istio dashboards

Even in this early version, Rio works quite well. It is very simple to install and it takes the grunt work out of deploying services on Kubernetes. Going from source code repository to a published service is just a single command, which is a bit similar to OpenShift. Highly recommended to give it a go when you have some time!

Running a GoCV application in a container

In earlier posts (like here and here) I mentioned GoCV. GoCV allows you to use the popular OpenCV library from your Go programs. To avoid installing OpenCV and having to compile it from source, a container that runs your GoCV app can be beneficial. This post provides information about doing just that.

The following GitHub repository, https://github.com/denismakogon/gocv-alpine, contains all you need to get started. It’s for OpenCV 3.4.2 so you will run into issues when you want to use OpenCV 4.0. The pull request, https://github.com/denismakogon/gocv-alpine/pull/7, contains the update to 4.0 but it has not been merged yet. I used the proposed changes in the pull request to build two containers:

  • the build container: gbaeke/gocv-4.0.0-build
  • the run container: gbaeke/gocv-4.0.0-run

They are over on Docker Hub, ready for use. To actually use the above images in a typical two-step build, I used the following Dockerfile:

FROM gbaeke/gocv-4.0.0-build as build       
RUN go get -u -d gocv.io/x/gocv
RUN go get -u -d github.com/disintegration/imaging
RUN go get -u -d github.com/gbaeke/emotion
RUN cd $GOPATH/src/github.com/gbaeke/emotion && go build -o $GOPATH/bin/emo ./main.go

FROM gbaeke/gocv-4.0.0-run
COPY --from=build /go/bin/emo /emo
ADD haarcascade_frontalface_default.xml /

ENTRYPOINT ["/emo"]

The above Dockerfile uses the webcam emotion detection program from https://github.com/gbaeke/emotion. To run it on a Linux system, use the following command:

docker run -it --rm --device=/dev/video0 --env SCOREURI="YOUR-SCORE-URI" --env VIDEO=0 gbaeke/emo

The SCOREURI environment variable needs to refer to the score URI offered by the ONNX FER+ container as discussed in Detecting Emotions with FER+. With VIDEO=0 the GUI window that shows the webcam video stream is turned off (required). Detected emotions will be logged to the console.

To be able to use the actual webcam of the host, the –device flag is used to map /dev/video0 from the host to the container. That works well on a Linux host and was tested on a laptop running Ubuntu 16.04.

Recognizing images with Azure Machine Learning and the ONNX ResNet50v2 model

Featured image from: https://medium.com/comet-app/review-of-deep-learning-algorithms-for-object-detection-c1f3d437b852

In a previous post, I discussed the creation of a container image that uses the ResNet50v2 model for image classification. If you want to perform tasks such as localization or segmentation, there are other models that serve that purpose. The image was built with GPU support. Adding GPU support was pretty easy:

  • Use the enable_gpu flag in the Azure Machine Learning SDK or check the GPU box in the Azure Portal; the service will build an image that supports NVIDIA cuda
  • Add GPU support in your score.py file and/or conda dependencies file (scoring script uses the ONNX runtime, so we added the onnxruntime-gpu package)

In this post, we will deploy the image to a Kubernetes cluster with GPU nodes. We will use Azure Kubernetes Service (AKS) for this purpose. Check my previous post if you want to use NVIDIA V100 GPUs. In this post, I use hosts with one V100 GPU.

To get started, make sure you have the Kubernetes cluster deployed and that you followed the steps in my previous post to create the GPU container image. Make sure you attached the cluster to the workspace’s compute.

Deploy image to Kubernetes

Click the container image you created from the previous post and deploy it to the Kubernetes cluster you attached to the workspace by clicking + Create Deployment:

Starting the deployment from the image in the workspace

The Create Deployment screen is shown. Select AKS as deployment target and select the Kubernetes cluster you attached. Then press Create.

Azure Machine Learning now deploys the containers to Kubernetes. Note that I said containers in plural. In addition to the scoring container, another frontend container is added as well. You send your requests to the front-end container using HTTP POST. The front-end container talks to the scoring container over TCP port 5001 and passes the result back. The front-end container can be configured with certificates to support SSL.

Check the deployment and wait until it is healthy. We did not specify advanced settings during deployment so the default settings were chosen. Click the deployment to see the settings:

Deployment settings including authentication keys and scoring URI

As you can see, the deployment has authentication enabled. When you send your HTTP POST request to the scoring URI, make sure you pass an authentication header like so: bearer primary-or-secondary-key. The primary and secondary key are in the settings above. You can regenerate those keys at any time.

Checking the deployment

From the Azure Cloud Shell, issue the following commands in order to list the pods deployed to your Kubernetes cluster:

  • az aks list -o table
  • az aks get-credentials -g RESOURCEGROUP -n CLUSTERNAME
  • kubectl get pods
Listing the deployed pods

Azure Machine Learning has deployed three front-ends (default; can be changed via Advanced Settings during deployment) and one scoring container. Let’s check the container with: kubectl get pod onnxgpu-5d6c65789b-rnc56 -o yaml. Replace the container name with yours. In the output, you should find the following:

resources:
limits:
nvidia.com/gpu: "1"
requests:
cpu: 100m
memory: 500m
nvidia.com/gpu: "1"

The above allows the pod to use the GPU on the host. The nvidia drivers on the host are mapped to the pod with a volume:

volumeMounts:
- mountPath: /usr/local/nvidia
name: nvidia

Great! We did not have to bother with doing this ourselves. Let’s now try to recognize an image by sending requests to the front-end pods.

Recognizing images

To recognize an image, we need to POST a JSON payload to the scoring URI. The scoring URI can be found in the deployment properties in the workspace. In my case, the URI is:

http://23.97.218.34/api/v1/service/onnxgpu/score

The JSON payload needs to be in the below format:

{"data": [[[[143.06100463867188, 130.22100830078125, 122.31999969482422, ... ]]]]} 

The data field is a multi-dimensional array, serialized to JSON. The shape of the array is (1,3,224,224). The dimensions correspond to the batch size, channels (RGB), height and width.

You only have to read an image and put the pixel values in the array! Easy right? Well, as usual the answer is: “it depends”! The easiest way to do it, according to me, is with Python and a collection of helper packages. The code is in the following GitHub gist: https://gist.github.com/gbaeke/b25849f3813e9eb984ee691659d1d05a. You need to run the code on a machine with Python 3 installed. Make sure you also install Keras and NumPy (pip3 install keras / pip3 install numpy). The code uses two images, cat.jpg and car.jpg but you can use your own. When I run the code, I get the following result:

Using TensorFlow backend.
channels_last
Loading and preprocessing image… cat.jpg
Array shape (224, 224, 3)
Array shape afer moveaxis: (3, 224, 224)
Array shape after expand_dims (1, 3, 224, 224)
prediction time (as measured by the scoring container) 0.025304794311523438
Probably a: Egyptian_cat 0.9460222125053406
Loading and preprocessing image… car.jpg
Array shape (224, 224, 3)
Array shape afer moveaxis: (3, 224, 224)
Array shape after expand_dims (1, 3, 224, 224)
prediction time (as measured by the scoring container) 0.02526378631591797
Probably a: sports_car 0.948998749256134

It takes about 25 milliseconds to classify an image, or 40 images/second. By increasing the number of GPUs and scoring containers (we only deployed one), we can easily scale out the solution.

With a bit of help from Keras and NumPy, the code does the following:

  • check the image format reported by the keras back-end: it reports channels_last which means that, by default, the RGB channels are the last dimensions of the image array
  • load the image; the resulting array has a (224,224,3) shape
  • our container expects the channels_first format; we use moveaxis to move the last axis to the front; the array now has a (3,224,224) shape
  • our container expects a first dimension with a batch size; we use expand_dims to end up with a (1,3,224,224) shape
  • we convert the 4D array to a list and construct the JSON payload
  • we send the payload to the scoring URI and pass an authorization header
  • we get a JSON response with two fields: result and time; we print the inference time as reported by the container
  • from keras.applications.resnet50, we use the decode_predictions class to process the result field; result contains the 1000 values computed by the softmax function in the container; decode_predictions knows the categories and returns the first five
  • we print the name and probability of the category with the highest probability (item 0)

What happens when you use a scoring container that uses the CPU? In that case, you could run the container in Azure Container Instances (ACI). Using ACI is much less costly! In ACI with the default setting of 0.1 CPU, it will take around 2 seconds to score an image. Ouch! With a full CPU (in ACI), the scoring time goes down to around 180-220ms per image. To achieve better results, simply increase the number of CPUs. On the Standard_NC6s_v3 Kubernetes node with 6 cores, scoring time with CPU hovers around 60ms.

Conclusion

In this post, you have seen how Azure Machine Learning makes it straightforward to deploy GPU scoring images to a Kubernetes cluster with GPU nodes. The service automatically configures the resource requests for the GPU and maps the NVIDIA drivers to the scoring container. The only thing left to do is to start scoring images with the service. We have seen how easy that is with a bit of help from Keras and NumPy. In practice, always start with CPU scoring and scale out that solution to match your requirements. But if you do need GPUs for scoring, Azure Machine Learning makes it pretty easy to do so!