Overview
This activity uses the Mediapipe library to demonstrate how we can take existing deep learning models trained on computer vision tasks, and build on their results. We won’t be re-training these models on new data, although that is one common use case.
- The Github repository for this assignment will contain five starter code files, one for each Mediapipe model.
- You will also need to download the models themselves from our Moodle site (also available on the Mediapipe website), and move them into your Pycharm project.
Do not try to add the models to the Github repo! If asked about adding them, say NO. They are too large!
Face detection
- Open up the Chapter 8 readings about the Mediapipe face detection model
- Open the
mediapipeFaceDetect.pyprogram - Try running the program, to see how it works
Try this:
- Study the structure of the results returned by the model when it is run on an image. This is described in the Chapter 8 readings
- Uncomment the call to
findFacinginrunFaceDetect - Modify
findFacingby doing the following:- Remove the
passstatement from the definition offindFacing, and replace it by code that prints thedetect_results(temporarily, until you figure out the format of the results, then comment out the print statement) - Experiment until you can figure out how to get the left ear and left eye coordinates out of the detection results
- Do the same with the right ear and right eye coordinates (assign variables to hold these four values)
- Compute the distance in the x dimension between the left ear and eye, and do the same for the right ear and eye
- Write an
ifstatement that checks the distances, and if the distance between left eye and ear is small enough, and the distance between right eye and ear is large enough, then print that the person is facing toward the left - Do something similar for facing toward the right
- If neither of these hold, then print facing forward
- Remove the
Facial landmark detection
- Open up the Chapter 8 readings about the Mediapipe facial landmark model
- Open the
mediapipeFaceLandmark.pyprogram - Try running the program, to see how it works
Try this:
- Study the structure of the results returned by the model when it is run on an image. This is described in the Chapter 8 readings
- Uncomment the call to
findEyesinrunFacialLandmarks - Modify
findFacingby doing the following:- Remove the
passstatement from the definition offindFacing, and replace it by code that prints thedetect_results(temporarily, until you figure out the format of the results, then comment out the print statement) - Examine the blendshapes, and determine which ones will help you to discover if the person’s eyes are open or closed
- Experiment until you can figure out how to get the correct blendshape information from the detection results
- Determine a threshold or thresholds for when the blendshapes you have chosen indicate that eyes are closed
- Implement an
ifstatement to test if eyes are open or closed, and print a message accordingly: “Eyes closed!” or “Eyes open”
- Remove the
Hand landmark detection
- Open up the Chapter 8 readings about the Mediapipe hand landmark model
- Open the
mediapipeHand.pyprogram - Try running the program, to see how it works
Try this:
- Study the structure of the results returned by the model when it is run on an image. This is described in the Chapter 8 readings
- Uncomment the call to
findEyesinrunFacialLandmarks - Modify
findHandPoseby doing the following:- Remove the
passstatement from the definition offindHandPose, and replace it by code that prints thedetect_results(temporarily, until you figure out the format of the results, then comment out the print statement) - Examine the landmarks that are returned. Determine some subset of landmarks that you could use to distinguish between someone making a fist and someone with an open palm (fingers up, palm forward).
- Experiment until you can figure out how to get the coordinates of those landmarks out of the detection results
- Determine what relationship between the landmark coordinates indicate a fist versus an open palm
- Implement an
ifstatement to test whether the person is making a fist or an oopen palm, and print a message accordingly: “Palm” or “Fist”
- Remove the
Body pose landmark detection
- Open up the Chapter 8 readings about the Mediapipe body pose landmark model
- Open the
mediapipePose.pyprogram - Try running the program, to see how it works
Try this:
- Study the structure of the results returned by the model when it is run on an image. This is described in the Chapter 8 readings
- Uncomment the call to
findEyesinrunFacialLandmarks - Modify
findHandsby doing the following:- Remove the
passstatement from the definition offindHands, and replace it by code that prints thedetect_results(temporarily, until you figure out the format of the results, then comment out the print statement) - Examine the landmarks that are returned. Determine which landmarks would tell you the relative position of the person’s hands and their head: we want to tell if someone’s hands are above their head, or below or even with it
- Experiment until you can figure out how to get the coordinates of those landmarks out of the detection results
- Determine what relationship between the landmark coordinates indicate hands up or hands not up
- Implement an
ifstatement to test whether the person has their hands up, and print a message accordingly: “Hands up” or “Hands down”
- Remove the
Object detection (OPTIONAL, just for fun!)
If you like, experiment with the Object Detection code as well, and try to figure out some of the categories of objects it can detect.
Next steps…
Explore the two websites listed below. Each hosts hundreds of thousands or even millions of models trained for various purposes. HuggingFace is focused more on transformer models than CNN or object detection, but they both have interesting work.