Overview

This activity uses the Mediapipe library to demonstrate how we can take existing deep learning models trained on computer vision tasks, and build on their results. We won’t be re-training these models on new data, although that is one common use case.

The Github repository for this assignment will contain five starter code files, one for each Mediapipe model.
You will also need to download the models themselves from our Moodle site (also available on the Mediapipe website), and move them into your Pycharm project.

Do not try to add the models to the Github repo! If asked about adding them, say NO. They are too large!

Face detection

Open up the Chapter 8 readings about the Mediapipe face detection model
Open the mediapipeFaceDetect.py program
Try running the program, to see how it works

Try this:

Study the structure of the results returned by the model when it is run on an image. This is described in the Chapter 8 readings
Uncomment the call to findFacing in runFaceDetect
Modify findFacing by doing the following:
- Remove the pass statement from the definition of findFacing, and replace it by code that prints the detect_results (temporarily, until you figure out the format of the results, then comment out the print statement)
- Experiment until you can figure out how to get the left ear and left eye coordinates out of the detection results
- Do the same with the right ear and right eye coordinates (assign variables to hold these four values)
- Compute the distance in the x dimension between the left ear and eye, and do the same for the right ear and eye
- Write an if statement that checks the distances, and if the distance between left eye and ear is small enough, and the distance between right eye and ear is large enough, then print that the person is facing toward the left
- Do something similar for facing toward the right
- If neither of these hold, then print facing forward

Facial landmark detection

Open up the Chapter 8 readings about the Mediapipe facial landmark model
Open the mediapipeFaceLandmark.py program
Try running the program, to see how it works

Try this:

Study the structure of the results returned by the model when it is run on an image. This is described in the Chapter 8 readings
Uncomment the call to findEyes in runFacialLandmarks
Modify findFacing by doing the following:
- Remove the pass statement from the definition of findFacing, and replace it by code that prints the detect_results (temporarily, until you figure out the format of the results, then comment out the print statement)
- Examine the blendshapes, and determine which ones will help you to discover if the person’s eyes are open or closed
- Experiment until you can figure out how to get the correct blendshape information from the detection results
- Determine a threshold or thresholds for when the blendshapes you have chosen indicate that eyes are closed
- Implement an if statement to test if eyes are open or closed, and print a message accordingly: “Eyes closed!” or “Eyes open”

Hand landmark detection

Open up the Chapter 8 readings about the Mediapipe hand landmark model
Open the mediapipeHand.py program
Try running the program, to see how it works

Try this:

Study the structure of the results returned by the model when it is run on an image. This is described in the Chapter 8 readings
Uncomment the call to findEyes in runFacialLandmarks
Modify findHandPose by doing the following:
- Remove the pass statement from the definition of findHandPose, and replace it by code that prints the detect_results (temporarily, until you figure out the format of the results, then comment out the print statement)
- Examine the landmarks that are returned. Determine some subset of landmarks that you could use to distinguish between someone making a fist and someone with an open palm (fingers up, palm forward).
- Experiment until you can figure out how to get the coordinates of those landmarks out of the detection results
- Determine what relationship between the landmark coordinates indicate a fist versus an open palm
- Implement an if statement to test whether the person is making a fist or an oopen palm, and print a message accordingly: “Palm” or “Fist”

Body pose landmark detection

Open up the Chapter 8 readings about the Mediapipe body pose landmark model
Open the mediapipePose.py program
Try running the program, to see how it works

Try this:

Study the structure of the results returned by the model when it is run on an image. This is described in the Chapter 8 readings
Uncomment the call to findEyes in runFacialLandmarks
Modify findHands by doing the following:
- Remove the pass statement from the definition of findHands, and replace it by code that prints the detect_results (temporarily, until you figure out the format of the results, then comment out the print statement)
- Examine the landmarks that are returned. Determine which landmarks would tell you the relative position of the person’s hands and their head: we want to tell if someone’s hands are above their head, or below or even with it
- Experiment until you can figure out how to get the coordinates of those landmarks out of the detection results
- Determine what relationship between the landmark coordinates indicate hands up or hands not up
- Implement an if statement to test whether the person has their hands up, and print a message accordingly: “Hands up” or “Hands down”

Object detection (OPTIONAL, just for fun!)

If you like, experiment with the Object Detection code as well, and try to figure out some of the categories of objects it can detect.

Next steps…

Explore the two websites listed below. Each hosts hundreds of thousands or even millions of models trained for various purposes. HuggingFace is focused more on transformer models than CNN or object detection, but they both have interesting work.