Overview

This assignment may be completed individually, or in teams of 2 or 3. Form a team and let me know, so I can set things up in Moodle.

Do not share or borrow solutions from students outside your team members.
Do not use AI assistants to generate code for you.
Do ask preceptors and instructors for help with the code, including what to write and debugging.
Do ask peers or AI assistants for help in understanding error messages, but not for help in writing or debugging your work.

This homework assignment will ask you to demonstrate your skills at working with machine learning for computer vision tasks. The list of topics for this assignment include:

Using Mediapipe models to identify gestures, body or face poses
Training a convolutional neural network to classify images
Working with an object detection or segmentation model
Analyzing the result of image-generating models

To hand in: You will use the ICA 16 Github assignment to submit your solutions to questions 1 and 2. Additionally, you will use one or two Google Colab notebooks, and will submit a link to them in your Github repo (as directed below), and will submit a document for your write-up for question 6.

Handling images, videos and models:

I will provide in the Github assignment just those images you need to have to do this assignment
Because videos are often too large to be stored in Github, you will need to copy any video files to your project and do not add them to those managed by Git
Models are even larger than videos, thus you will need to copy any existing models to your project and not add them to Git

Question 1: Working with Mediapipe models, part one

For this question, you will choose one of the four Mediapipe models, and demo programs, described in Chapter 8.1 and [ICA 16](https://comp-194-vision-master.github.io/PublishedMaterials/In-Class-Activities/ICA16-UsingMLModels.html.

Pick one, and complete the extensions described in the in-class activity. These extensions involve pulling apart the detect-result to get coordinates of features it has identified, and then using some threshold or calculations on those features to decide what is happening in the image. You then print the outcome (you can also display it on the image along with the visualization of the results, if you like, but it isn’t required).

Specifications for Question 1

Base specifications:

Extensions are added to the designated helper function (findFacing, findEyes, findHandPose, or findHandsUp)
The function includes code that extracts appropriate values from the detection result
The function implements reasonable calculations and/or thresholds for deciding what is happening in the image
- facing left, right, or forward for the face detection program
- eyes open or shut for the facial landmark program
- hand in fist or open palm (vertical, fingers pointing upward) for hand landmark program
- hands above head or below/even with head for body pose program
Function prints or displays a message correctly describing the result (some inaccuracy is okay)

Extended specifications:

Code includes a triple-quoted docstring that describes the function’s input, purpose, and results
Student name is at the top of the file
Any completed TODO comments have been removed
File format has appropriate style:
- import statements at the top
- then function definitions with no script elements
- then the main script, all insidde the if __name__ == '__main__': statement

Ratings:

To receive a gold rating, complete at least 5 specifications, at least 4 of them base
To receive a silver rating, complete at least 4 specifications, at least 3 of them base
To receive a bronze rating, complete at least 3 specifications, at least 2 of them base

Question 2: Working with Mediapipe models, part two

For this question, you will choose a different one of the four Mediapipe models, and demo programs, described in Chapter 8.1 and ICA 16.

Pick one, and complete the extensions described in the activity. These extensions involve pulling apart the detect-result to get coordinates of features it has identified, and then using some threshold or calculations on those features to decide what is happening in the image. You then print the outcome (you can also display it on the image along with the visualization of the results, if you like, but it isn’t required).

Specifications for Question 2

Base specifications:

Extensions are added to the designated helper function (findFacing, findEyes, findHandPose, or findHandsUp)
The function includes code that extracts appropriate values from the detection result
The function implements reasonable calculations and/or thresholds for deciding what is happening in the image
- facing left, right, or forward for the face detection program
- eyes open or shut for the facial landmark program
- hand in fist or open palm (vertical, fingers pointing upward) for hand landmark program
- hands above head or below/even with head for body pose program
Function prints or displays a message correctly describing the result (some inaccuracy is okay)

Extended specifications:

Code includes a triple-quoted docstring that describes the function’s input, purpose, and results
Student name is at the top of the file
Any completed TODO comments have been removed
File format has appropriate style:
- import statements at the top
- then function definitions with no script elements
- then the main script, all insidde the if __name__ == '__main__': statement

Ratings:

To receive a gold rating, complete at least 5 specifications, at least 4 of them base
To receive a silver rating, complete at least 4 specifications, at least 3 of them base
To receive a bronze rating, complete at least 3 specifications, at least 2 of them base

Question 3: Data preparation for image classification

In ICA 17, you work through an example of training a CNN to perform image classification. The second half of the activity asks you to repeat the process for a new dataset.

Open the ICA Colab Notebook, make a copy for your team, and work through these sections:

Step 1: Reading in the data
Step 2: Examining the data
Step 3: Preprocessing the data

Specifications for Question 3

Base specifications:

Code block successfully reads in both training and validation data (using the as_supervised input)
Code block displays 3x3 grid of sample images from the start of the dataset
Code block correctly creates the preprocess mini-model to adapt the dataset
Code block correctly applies the preprocess mini-model to both training and validation sets

Extended specifications:

A link to the Colab Notebook is added to the README.md file in Github
Student names are added to the top of the Colab notebook
Any completed TODO comments have been removed

Ratings:

To receive a gold rating, complete at least 5 specifications, at least 3 of which are base
To receive a silver rating, complete at least 4 specifications, at least 3 of which are base
To receive a bronze rating, complete at least 3 specifications, at least 2 of which are base

Question 4: Training a CNN

In ICA 17, you work through an example of training a CNN to perform image classification. The second half of the activity asks you to repeat the process for a new dataset.

Open the ICA Colab Notebook, make a copy for your team, and work through these sections:

Step 4: Setting up the CNN
Step 5: Compiling the model
Step 6: Training the network
Step 7: Visualizing the results
Step 8: Trying another model

Specifications for Question 4

Base specifications:

Code block correctly builds the model similar to the CIFAR10 model
Code block correctly configures the model for training
Code block correctly runs the training algorithm
Code block(s) correctly display the loss and accuracy epoch-by-epoch
Final code blocks correctly set up the VGG-19 model and train it on the data

Extended specifications:

A link to the Colab Notebook is added to the README.md file in Github
Student names are added to the top of the Colab notebook
Any completed TODO comments have been removed

Ratings:

To receive a gold rating, fully complete 5 specifications, at least 4 base, and partially complete at least 1 more
To receive a silver rating, fully complete at least 4 specifications, at least 3 base, and partially complete at least 1 more
To receive a bronze rating, fully complete at least 3 specifications, at lesat 2 base, partially complete at least 1 more

Question 5: Experimenting with Object Detection and Segmentation Models

Complete the steps outlined in ICA 18.

Specifications for Question 5

Base specifications:

Object detection:
- Code block implements a loop over the images in the Images folder
- Code correctly calls the model and saves the results
- Code block iterates through the results and calls the displayResults function
- Text block includes a written summary of interesting results: accuracies, inaccuracies, patterns in what is detected, or not
Image segmentation:
- Code block implements a loop over the images in the Images folder
- Code correctly calls the model and saves the results
- Code block iterates through the results and calls the displayResults function
- Text block includes a written summary of interesting results: accuracies, inaccuracies, patterns in what is detected, or not
Training YOLO:
- Text block describes students’ experience with training YOLO on the Kitti dataset

Extended specifications:

Top text block is updated to include students’ names
Any completed TODO comments have been removed

Ratings:

To receive a gold rating, complete at least 8 specifications in all, at least 7 of them base specifications.
To receive a silver rating, complete at least 7 specifications in all, at least 6 of them base.
To receive a bronze rating, complete at least 6 total specifications, at least 5 of them base.

Question 6: Evaluating Generative AI for images

For this question, you won’t be writing code. Training generative AI systems is beyond our capabilities, other than finetuning existing models. Instead, you will evaluate several different image-generating AI systems. You will briefly describe what you did, and what the results were. Be sure to include (1) your prompt or prompts and (2) screenshots or saved images of the generated images.

Step 1: Read about good image prompt writing

To get good results from a generative AI system, you need to give detailed descriptive prompts (requests). Read the source below to learn how to craft a good prompt.

Harvard University IT: Getting started with prompts for image-based Generative AI Tools

Step 2: Create a starter prompt

Decide on a single safe for work (family-friendly) prompt that you will use throughout this assignment (if you are a team, agree on the prompt, and maybe work through the first example with all members present.

Step 3: Test three different AI image generators

We will look at three different AI image generators:

Chat-GPT, available at chatgpt.com
Nano Banana, available through the Google AI Studio
Reve, available at app.reve.com

Pick one, and work through your original prompt, plus 2-3 modifications you use to get closer to what you had in mind (again, if you are a team, work through this first one together). Record the original prompt, and the exact wording for each modification, and the resulting image from each stage.

Next, try the other two systems, and use the same original prompt, and modifications prompts. Record all the images produced.

Special note: Reve produces multiple images for each prompt. Pick the two you like best, and change your modification prompt to read: “Based on @1 and @2, …” and then add your original wording.

Finally, write a report (see Susan’s Sample GenAI Report for an example) where you describe your prompts, and how what each AI system produced. Then analyze the results: which ones did you like better, and why?

Overall, how well does this kind of system work as an assistant to a human designer or artist? Is this an improvement upon “clip art” or downloading images from websites?

I chose these three AI systems based on an article by Harry Guinness that talked about good current AI generators. I chose ones that had free modes. Note that ## Specifications for Question 6

Base specifications:

Report clearly describes initial prompt, and prompt is detailed and appropriate
Report lists 2-3 modification prompts also used
Report shows all images produced for these prompts and all three AI systems
Report discusses positives and negatives of these specific results
Report analyzes which systems performed better or worse on this particular set of prompts

Ratings:

To receive a gold rating, complete fully at least 4 specifications in all, 1 partially
To receive a silver rating, complete fully at least 3 specifications in all, 1 partially
To receive a bronze rating, complete fully at least 2 total specifications, 2 partially

What to hand in

For questions 1 and 2, you should complete ICA 16 in its Github repo.
- be sure to check the specifications here to make sure you have completed it properly
- add a note in the Moodle assignment telling us that you have completed this ICA
For questions 3 and 4, your work will be in a colab notebook, submit to Moodle a link to the notebook
- be sure it is shared with all team members, with Susan, and with Jay and Sam
- put all team member names in a text block at the top of the file
For question 5, your work will be in a colab notebook
- be sure it is shared with all team members, with Susan, and with Jay and Sam
- put all team member names in a text block at the top of the file
For question 6, your work will be a report, submit it to the appropriate homework assignment in Moodle!
- put all team member names at the top of the report
- if it is a link to a Google doc, check that it is shared with all team members, with Susan, and with Jay and Sam

In the assignment in Moodle, include a note telling me that you have submitted ICA 16, and include links to the two colab notebooks. You can either upload the report as a PDF file, or include a link to a Google Docs file.

Make sure both notebooks and all Google Docs are shared with me, with commenting privileges!