Microsoft Face API with a local container

A few days ago, I obtained access to the Face container. It provides access to the Face API via a container you can run where you want: on your pc, at the network edge or in your datacenter. You should allocate 6 GB or RAM and 2 cores for the container to run well. Note that you still need to create a Face API resource in the Azure Portal. The container needs to be associated with the Azure Face API via the endpoint and access key:

Face API with a West Europe (Amsterdam) endpoint

I used the Standard tier, which charges 0.84 euros per 1000 calls. As noted, the container will not function without associating it with an Azure Face API resource.

When you gain access to the container registry, you can pull the container:

docker pull containerpreview.azurecr.io/microsoft/cognitive-services-face:latest

After that, you can run the container as follows (for API billing endpoint in West Europe):

docker run --rm -it -p 5000:5000 --memory 6g --cpus 2 containerpreview.azurecr.io/microsoft/cognitive-services-face Eula=accept Billing=https://westeurope.api.cognitive.microsoft.com/face/v1.0 ApiKey=YOUR_API_KEY

The container will start. You will see the output (–it):

Running Face API container

And here’s the spec:

API spec Face API v1

Before showing how to use the detection feature, note that the container needs Internet access for billing purposes. You will not be able to run the container in fully offline scenarios.

Over at https://github.com/gbaeke/msface-go, you can find a simple example in Go that uses the container. The Face API can take a byte stream of an image or a URL to an image. The example takes the first approach and loads an image from disk as specified by the -image parameter. The resulting io.Reader is passed to the getFace function which does the actual call to the API (uri = http://localhost:5000/face/v1.0/detect):

request, err := http.NewRequest("POST", uri+"?returnFaceAttributes="+params, m)
request.Header.Add("Content-Type", "application/octet-stream")

// Send the request to the local web service
resp, err := client.Do(request)
if err != nil {
    return "", err
}

The response contains a Body attribute and that attribute is unmarshalled to a variable of type interface. That one is marshalled with indentation to a byte slice (b) which is returned by the function as a string:

var response interface{}
err = json.Unmarshal(respBody, &response)
if err != nil {
    return "", err
}
b, err := json.MarshalIndent(response, "", "\t")

Now you can use a picture like the one below:

Is he smiling?

Here are some parts of the input, following the command
detectface -image smiling.jpg

Emotion is clearly happiness with additional features such as age, gender, hair color, etc…

[
{
"faceAttributes": {
"accessories": [],
"age": 33,
"blur": {
"blurLevel": "high",
"value": 1
},
"emotion": {
"anger": 0,
"contempt": 0,
"disgust": 0,
"fear": 0,
"happiness": 1,
"neutral": 0,
"sadness": 0,
"surprise": 0
},
"exposure": {
"exposureLevel": "goodExposure",
"value": 0.71
},
"facialHair": {
"beard": 0.6,
"moustache": 0.6,
"sideburns": 0.6
},
"gender": "male",
"glasses": "NoGlasses",
"hair": {
"bald": 0.26,
"hairColor": [
{
"color": "black",
"confidence": 1
}],
"faceId": "b6d924c1-13ef-4d19-8bc9-34b0bb21f0ce",
"faceRectangle": {
"height": 1183,
"left": 944,
"top": 167,
"width": 1183
}
}
]

That’s it! Give the Face API container a go with the tool. You can get it here: https://github.com/gbaeke/msface-go/releases/tag/v0.0.1 (Windows)

Using the Microsoft Face API to detect emotions in photos and video

In a previous post, I blogged about detecting emotions with the ONNX FER+ model. As an alternative, you can use cloud models hosted by major cloud providers such as Microsoft, Amazon and Google. Besides those, there are many other services to choose from.

To detect facial emotions with Azure, there is a Face API in two flavours:

  • Cloud: API calls are sent to a cloud-hosted endpoint in the selected deployment region
  • Container: API calls are sent to a container that you deploy anywhere, including the edge (e.g. IoT Edge device)

To use the container version, you need to request access via this link. In another blog post, I already used the Text Analytics container to detect sentiment in a piece of text.

Note that the container version is not free and needs to be configured with an API key. The API key is obtained by deploying the Face API in the cloud. Doing so generates a primary and secondary key. Be aware that the Face API container, like the Text Analytics container, needs connectivity to the cloud to ensure proper billing. It cannot be used in completely offline scenarios. In short, no matter the flavour you use, you need to deploy the Face API. It will appear in the portal as shown below:

Deployed Face API (part of Cognitive Services)

Using the API is a simple matter. An image can be delivered to the API in two ways:

  • Link: just provide a URL to an image
  • Octet-stream: POST binary data (the image’s bytes) to the API

In the Go example you can find on GitHub, the second approach is used. You simply open the image file (e.g. a jpg or png) and pass the byte array to the endpoint. The endpoint is in the following form for emotion detection:

https://westeurope.api.cognitive.microsoft.com/face/v1.0/detect?returnFaceAttributes=emotion

Instead of emotion, you can ask for other attributes or a combination of attributes: age, gender, headPose, smile, facialHair, glasses, emotion, hair, makeup, occlusion, accessories, blur, exposure and noise. You simply add them together with +’s (e.g. emotion+age+gender). When you add attributes, the cost per call will increase slightly as will the response time. With the additional attributes, the Face API is much more useful than the simple FER+ model. The Face API has several additional features such as storing and comparing faces. Check out the documentation for full details.

To detect emotion in a video, the sample at https://github.com/gbaeke/emotion/blob/master/main.go contains some commented out code in the import section and around line 100 so you can use the Face API via the github.com/gbaeke/emotion/faceapi/msface package’s GetEmotion() function instead of the GetEmotion() function in the code. Because we have the full webcam image and face in an OpenCV mat, some extra code is needed to serialize it to a byte stream in a format the Face API understands:

encodedImage, _ := gocv.IMEncode(gocv.JPEGFileExt, face)       
emotion, err = msface.GetEmotion(bytes.NewReader(encodedImage))

In the above example, the face region detected by OpenCV is encoded to a JPG format as a byte slice. The byte slice is simply converted to an io.Reader and handed to the GetEmotion() function in the msface package.

When you use the Face API to detect emotions in a video stream from a webcam (or a video file), you will be hitting the API quite hard. You will surely need the standard tier of the API which allows you to do 10 transactions per second. To add face and emotion detection to video, the solution discussed in Detecting Emotions in FER+ is a better option.