ICA: OpenCV Basics

Author

Susan Eileen Fox

Published

February 23, 2026

Overview

In this activity, you will practice with the fundamental tools of OpenCV and the Numpy representation of images. In addition, we’ll look at image arithmetic, blending of images, and making image masks and threshold images.

The Github repository for this assignment will contain a starter code file, ICAIntro.py. Put your code in this file, as directed by the TODO comments.

Before working on this activity, I encourage you to download the following zip files, unzip them, and move them into the Github repo for this activity. Do not add them to the files managed by Git, you do not want to try to push these files to Github!

Image and OpenCV Basics

Reading and displaying images

The script below reads in and displays a single image. Review Chapter 1 from your Vision readings if you need help understanding what each step does. Read the script carefully, and predict for yourself and your teammate what you think the program will do.

import cv2

image = cv2.imread('SampleImages/mushrooms.jpg')
cv2.imshow("My image", image)
cv2.waitKey()

Finally, run your copy of this program to see if your predictions were correct.

Try this: Copy this program, and follow the steps below to write a program that displays all the images in the SampleImages folder (you don’t have to loop through subfolders, though it can be done).

  • Add an import statement to import the os module. Use the os.listdir command to get a list of all the filenames in the SampleImages folder.
  • Loop over those filenames
  • Inside the loop, check if the image filename ends with png or jpg or jpeg. If not, skip them
  • If it is an image name, then use cv2.imread to read in the image
  • Use cv2.imshow to display the image: put all images in the same window by using the same window name

Experimenting with colors

The program below, included in your Github repo as, colorBackground.py, uses Numpy tools to construct a new image built from scratch. The zeros function creates an array of the size and data type we specify, so we can make a new image array without reading from a file! This program, as written, sets every pixel in the array to be a dark red color.

import cv2
import numpy as np

image = np.zeros((200, 200, 3), np.uint8)

image[:,:] = (0, 0, 128)

cv2.imshow("Color", image)
cv2.waitKey()
1
Asks for every row and every column of image to be set to the color tuple

Try this: Run this program to be sure what it does. Then make four copies of lines 6 to 9. You can comment out my original line by putting # at the start of the line (or use the keyboard shortcut listed under the Code menu as Comment with Line Comment).

For each copy, experiment with color values until you create the following:

  • A pleasing purple color
  • A dark blue-green color
  • A murky green with a hint of yellow
  • A pale orange color

Drawing on images

Chapter 1 of your Vision readings talks about the basic drawing tools OpenCV provides. You might want to open that document, or the OpenCV documentation, for assistance during this part.

Try this: Create a new Python file, and in it write a script to create a simple picture.

  • First, choose your background image, which could be one of the SampleImages, an image of your own that you copy into your PyCharm project, or a blank image with a background color of your choice
  • Next, choosing your own colors, size, and details, draw a smiling stick figure on your image using OpenCV’s drawing functions
  • Be sure to save your new picture to a new file

Converting color representations

The cvtColor function in OpenCV will convert an image from one color representation to another. The key to this function is the code that we pass as its second argument. The code tells the function what the representation of the current image is, and what representation we want. Below are some typical conversion codes:

Code Description
cv2.COLOR_BGR2GRAY Convert BGR to grayscale
cv2.COLOR_BGR2HSV Converts BGR to HSV
cv2.COLOR_GRAY2BGR Converts grayscale to BGR
cv2.COLOR_BGR2RGB Converts BGR to RGB
`cv2.COLOR_BGR2

See the Color Space Conversions page in OpenCV’s documentation for a complete list.

The function below is in the starter code. Try it with the sample call in the file, then try varying the image, and examine the results. Do you understand how to use this function?

def testConversions(origImage):
    """Takes in an image, and it converts it to grayscale and HSV. It also converts the grayscale back to BGR,
    displays the results, and prints the shapes of the images."""
    gray1 = cv2.cvtColor(origImage, cv2.COLOR_BGR2GRAY)
    BGRIm2 = cv2.cvtColor(gray1, cv2.COLOR_GRAY2BGR)
    HSVIm = cv2.cvtColor(origImage, cv2.COLOR_BGR2HSV)

    print(origImage.shape, gray1.shape, BGRIm2.shape, HSVIm.shape)
    cv2.imshow("Original", origImage)
    cv2.imshow("Gray1", gray1)
    cv2.imshow("BGR2", BGRIm2)
    cv2.imshow("HSV", HSVIm)
    cv2.waitKey()

Images as matrices

Creating image arrays with Numpy

The list of functions below allows us to create Numpy arrays from scratch. With this, we can make a blank canvas to use the Numpy drawing functions, create synthetic images, or just make a black and white mask or frame to apply to an image.

Function Description
array Takes a sequence data type (list, string, tuple, or Numpy array), and builds a Numpy array with the same shape. Optional input allows us to specify the type of data for the array.
zeros Takes in a tuple giving dimensions, and an optional input for the type of data, and it makes an array with the given dimensions and type, all filled with zeros.
ones Similar to zeros but it fills the array with the value 1.

Try this: Create a function blackCanvas that takes in two inputs, the width and height for the canvas, and it creates a new color Numpy array all filled with black, with the given width and height. It should return the new image array. In your main script, try several test calls to this function with different sizes of images, and draw a square centered in each.

Image Arithmetic

Remember that both Numpy and OpenCV provide tools for doing arithmetic on images. This means we can add or subtract a constant amount to every value in an image, changing the overall brightness of the image. We can also add or subtract images from each other, so long as they are the same shape: corresponding values in the image arrays are added or subtracted from each other.

Subtracting images

We can perform a simple kind of motion detection on frames from a video, by subtracting one frame from its predecessor. The Numpy module tells Python how to interpret the minus sign when applied to Numpy arrays. OpenCV has two functions for taking the difference: subtract and absdiff. The first one just does subtraction, the second one computes the absolute value of the difference for each corresponding value in the arrays.

Try this: In the script section of the starter code file, add a script to experiment with subtracting two frames from a video (this video is one we will work with later, it shows Prof. Fox holding an orange ball and moving it around in the air):

  • Start by reading in frame1.jpg and frame2.jpg
  • Use imshow to display the two images; notice that frame 1 is slightly different from frame 2
  • Define a new image diff1 to be the result of calculating frame 1 minus frame 2, using the minus sign (-)
  • Define another new image diff2 to hold the result of OpenCV’s subtract function applied to frame 1 and frame 2
  • Define a third new image diff3 to hold the result of OpenCV’2 absdiff function
  • Use imshow to display all three diff images
  • Examine the results. Based on what you know about how Numpy and OpenCV handle arithmetic, and how subtract and absdiff differ from each other, explain why the three difference images look the way they are, and how they are different.

Blending images

We can use image arithmetic to blend two images together, using the addWeighted function from OpenCV. To blend two images, we want to look at corresponding pixels, and average their two red values to make a new red value, and do similarly with green and blue channel values. (It is kind of amazing that this works, actually!)

Resizing images

In order to blend two images, we need to make them the same size and shape. We could do that using the Numpy slicing operators, but here we will take a look at a function that lets you resize an image either by scaling it to a specific size, or by scaling it by given factors in the x and y directions.

cv2.resize(<img>, (<wid>, <hgt>), fx=?, fy=?)

The resize function takes in the image to be resized, and a tuple giving the new size as width followed by depth, and it returns a new image of the new size. However, if we set the width and height in the tuple to be zero, then we can provide optional inputs that give the new size as a factor of the original. The optional input fx specifies the factor in the x dimension, and fy in the y dimension. If we set fx to 0.5, for example, then the new image will have a width one half the size of the original’s width.

The examples below illustrate different ways of calling resize.

Examples Meaning
cv2.resize(src, (100, 100)) Returns a new stretch/squashed image that is 100 x 100 pixels
cv2.resize(src, (0, 0), fx = 2, fy = 2) Returns a new image twice the size of the original
cv2.resize(src, (0, 0), fx = 0.5, fy = 1.0) Returns a new image whose width is half the original size

Try this: In the script section of the activity code file, read in three images from SampleImages (any ones you choose). Use the resize command to change the second and third images to match the size of the first. Be sure to imshow the images so that you can check your work. Call these images img1, img2, and img3.

Blending with arithmetic

Examine the code fragment below (also reproduced in your activity code file).

blendImg1 = cv2.add(img1, img2)
cv2.imshow("Blend by adding", blendImg1)

Try this fragment on the resized images you created in the previous section, and observe the results. If we just add the two images, the result is too bright, and would have tons of overflow artifacts if we used Numpy addition. We want to average the two image values, not just add them. But consider this: if we first add the images, and then divide by 2 (the way we typically think about computing an average) the result will be distorted. Even OpenCV’s addition operator avoids overflow by capping the values at 255 when they would have added up to more than 255. That means that adding and then dividing by 2 will produce a different result than dividing each original image by 2 and then adding.

Try the code below on the images you resized, and compare blend1 and blend2.

sumImg = cv2.add(img1, img2)
blend1 = cv2.divide(sumImg, 2)

divImg1 = cv2.divide(img1, 2)
divImg2 = cv2.divide(img2, 2)
blend2 = cv2.add(divImg1, divImg2)

We can also use Numpy commands to compute the average more easily, so long as we remember to divide first, and then add, to avoid overflow artifacts. Try this:

avgImg = 0.5 * img1 + 0.5 * img2
blend3 = avgImg.astype(np.uint8)

Weighted averages

A normal average (add the two numbers and divide by 2) weights both pixels/images equally: 50% from one image and 50% from another. The previous example multipled each image by 0.5: by shifting from division to multiplication we can see that we are really multiplying each image by a weight, the percentage of the final image that should come from each image.

We can change the percentages, by changing the weights. Just make sure they are each between 0.0 and 1.0 and that they add up to 1.0, so that the resulting image has the same brightness as the originals.

Try varying the weights for the example above, and examining the resulting blend.

The addWeighted function

OpenCV actually provides a function, addWeighted, to perform a weighted average of two images.

blend4 = cv2.addWeighted(img1, 0.5, img2, 0.5, 0)   

The addWeighted function has 5 required inputs: the first image to blend, the weight to multiply the first image by, the second image to blend, the weight for the second image, and a constant to add the result. In other words, the function computes this mathematical formula:

\[newIm = \alpha \cdot img1 + \beta \cdot img2 + \gamma\]

Try this as an alternative to the Numpy arithmetic version, and compare the results.

Try this: Create a function phaseBlend that takes in two images presumed to be the same size. This function should include the following steps:

  • Set up one weight value, w, to be 0.0 (w is an accumulator variable foer this loop)
  • Repeat with a for loop, ehough times for w to go to zero (experiment or calculate)
  • Inside the for loop, use w and 1 - w as the weights, and blend the two input images, assigning the result to a variable
  • Also in the loop, imshow the blended result, and include a waitkey
  • Finally, in the loop, add a small amount to w to change the weight for next time (use 0.1 or 0.05, or similar)
  • Optional extension: Instead of trying to time the loop stopping to when w gets to 1.0, we could change the direction of the blend and start reducing w each time (until it gets to 0.0). To do this, we need another accumulator variable, deltaW, to hold the amount to change w by each time. It will stay at 0.1 or 0.05 until w reaches 1.0, and then it should change to be -0.1/-0.05.

Accessing channels

The split and merge functions allow us to pull apart an image’s channels, and put them back together again.

Try this: Examine the code below, also found in your starter file.

flowerIm = cv2.imread("SampleImages/wildcolumbine.jpg")
(blueChan, greenChan, redChan) = cv2.split(flowerIm)
cv2.imshow("Original", flowerIm)
cv2.imshow("Blue channel alone", blueChan)
cv2.imshow("Green channel alone", greenChan)
cv2.imshow("Red channel alone", redChan)
cv2.waitKey()

When you run this code, what happens, and why? Discuss with classmates, preceptor, or instructor if you aren’t sure why the channels appear the way they do when displayed.

Add a call to merge to your code file, right after these lines. Merge takes in a tuple of three channels, and treats them as the blue, green, and red channels of the image. Try these variations:

  • Make an exact copy of the original, by calling merge and passing it a tuple containing the channel images in the original order (blue, then green, then red). (Be sure to imshow the result.)
  • Use zeros to make a copy of the red channel that is all filled with zeros. Call merge again, but replace the red channel with the new blank one. How does it differ? What if you make a white image the same shape as the red channel, and then used that in merge?
  • Call merge a third time, and use the original three channels, but put them in a different order. What does the result look like?

Can you explain the results you see? If not, discuss with a neighbor or teammate, or with preceptor or faculty member.

Try this: Create a function, colorShuffle, that takes in one image as an input parameter. It will return a new image that has the three color channels randomly shuffled to a new order!

Do this:

  • Add an import statement at the top of the file to import the random module
  • Inside the function, use split to separate the three channels from each other
  • Define a variable to hold a list with the three channel arrays in it
  • Call random.shuffle and pass it the list from the previous step (this will change the list to a new ordering, try printing the list before and after to see how shuffle works)
  • Pass merge1 the list to get the new image
  • Return the image

Regions of Interest

A region of interest is a section of an image that we want to focus on. Often it is the result of running computer vision algorithms to determine where something interesting is in the image, but it could also be human-designed.

At its core, we make a region of interest using Numpy’s array slicing operator. We will practice the slicing operator here, on small arrays and on images.

Key idea: Remember that Numpy often avoids copying the data in an array, both when we use slicing to access portions of it, and when we use the astype method to change the type of data. Instead, it provides a view of the original data, either limiting the indices we can see, or mapping the values to a new type as we access them. When this happens, changes to the original array show up in the view array, and vice versa.

Consider the small 2d array shown in the code below.

arr1 = np.array([[2, 4, 6, 8], [3, 6, 9, 12], [4, 8, 12, 16]])
print(arr1)
[[ 2  4  6  8]
 [ 3  6  9 12]
 [ 4  8 12 16]]

If we want to access and individual element of the array, we can put its row and column indices inside square brackets, separated by commas:

print("last of second row:", arr1[1, 3])
print("second of last row:", arr1[2, 1])
last of second row: 12
second of last row: 8

If we want a subarray, we extend this notation to use slicing operators for the row or column we are selecting from. Here are a few examples:

print("middle values:", arr1[1, 1:3])
print("third and fourth columns:")
print(arr1[:, 2:4])
middle values: [6 9]
third and fourth columns:
[[ 6  8]
 [ 9 12]
 [12 16]]

Now you try some examples, putting your code in the script section indicated by a TODO comment.

  • Access just the value 16 from this array
  • Access the first column of the array
  • Select the last two elements from the first two rows (giving a 2x2 matrix)
  • Select values from every other row, and every other column, starting with the 2 at [0, 0]

When working with images, we typically use slicing in two ways: to select the color at a specific pixel, or to select a rectangular region of the picture. The code sample below illustrates two ways to select the color channels from a specific pixel location. It also draws a tiny circle at that location on the image and displays it.

img = cv2.imread("SampleImages/antiqueTractors.jpg")
col1 = img[150, 325, :]
col2 = img[150, 325]
print(col1, col2)
cv2.circle(img, (325, 150), 2, (255, 255, 255))
cv2.imshow("Image", img)
cv2.waitKey()

Notice that this is one of the places where Numpy and OpenCV’s different ordering of rows and columns comes into play: Numpy orders the location as (row, column), but when we specify the location for drawing the circle, we give it as (x, y).

To access a region of an image, we select the range of rows, then the range of columns, and then place a colon (:) for the channel dimension, to indicate we want all three channels.

steer = img[150:225, 410:460, :]
cv2.rectangle(img, (410, 150), (460, 225), (255, 255, 255))
cv2.imshow("Image", img)
cv2.imshow("ROI", steer)
cv2.waitKey()

Try this: Create a function centerCrop that takes in an image as its input parameter. The function should make and return an ROI that is 200x200 pixels, centered in the image. * Get the height and width of the image, and from that calculate the center row and center column * Define an ROI that extends 100 to the either side of the center column and 100 to either side of the center row * Display the ROI and the original, and use waitKey to pause * Return the ROI

Working with Video

The table below lists the main functions you need in order to access frames from the computer’s camera. First there is the function VideoCapture, which takes one input and creates a VideoCapture object. The one input specifies which camera connected to the computer should be used (zero is the default: the built-in camera if there is one). Three methods belonging to the VideoCapture object are shown, as well.

Examples Meaning
cap = cv2.VideoCapture(0) Creates a VideoCapture object connected to specified camera
cap.isOpened() Boolean method returns true if camera connection succeeded
ret, image = cap.read() Returns a boolean (was read successful) and a frame
cap.release() Disconnects from camera

Start with the script below to get images from a built-in camera. If your computer does not have a built-in webcam, see Susan to get an external webcam!

You are going to improve this script.

import cv2

def videoFeeder(camOrFile = 0):
    vidCap = cv2.VideoCapture(0)
    while True:
        ret, img = vidCap.read()   # Read a frame from the video camera
        cv2.imshow("Webcam", img)  # Display the frame
        cv2.waitKey(10)            # Wait 10 milliseconds and then go on
    
    vidCap.release()
    
videoFeeder()

Ending the program with key input

Be sure to check the readings for today for how to use waitKey to get the key the user pressed.

  • Modify the code above like the example in Chapter 3 of our Vision readings, so that it catches the value returned by waitKey and checks it to see if the value is -1. If not, convert it to a character with chr and see if the user typed 'q'. Break out of the loop if so.
  • Add a function called processImage to the file. It should take in an image and return an image. For now just have it return the image it is passed (we will add functionality in later steps).
  • Insert a call to processImage in between the vidCap.read() line and the cv2.imshow line. Pass img to processImage, and assign the returned image to img2.
  • Change the displayed image to be img2.

Now that you have a functioning video streaming program, let’s start making improvements.

Try to hold something up in front of the camera while your program is running, and then move the object to the different corners of the video image. Did you find yourself moving the wrong way? We often have trouble with left and right movements on a video feed, because we are more used to seeing a mirror image of ourselves. We can use the OpenCV function flip to flip the image. This function will flip an image upside down, or left-to-right.

flippedIm = cv2.flip(img, flipCode)

The flip function takes in an image and a flip code which specifies whether to flip around the horizontal axis, or around the vertical one. It returns a new image. A flip code of 0 will flip a picture upside down (around the horizontal axis), and a flip code of 1 will flip a pictures left-to-right (around the vertical axis).

Try this: For this task, you will add an option of display the flipped version of the video feed.

  • Modify your processImage function so that it takes in an extra input: doFlip. When doFlip is True, the function should call the flip function, using flip code 1, and should return the resulting image. When doFlip is False, the function should return the original image.
  • In your main program, you will need to set up a boolean flag variable. This is a special kind of accumulator variable that just holds True or False. We call it a flag variable because it signals when a certain condition holds
    • Before the while True loop, set up the flag variable to be False
    • Inside the loop, add the flag variable to the call to processImage
    • Add to the if statement that checks the user’s input key: if the user hits the f key, we want to toggle the value of the flag variable: if it was False, it should become True, and vice versa.

Try this: Suppose that you wanted to be able to save individual frames from the video feed to image files.

  • Add to the if statement that checks the user’s input key, so that it checks if the user hits the s key
  • When the user hits that, save the current image to a file, using the cv2.imwrite function. This function takes two inputs: (1) a string for the filename of the file to save to, including .jpg or .png as the filename extension, and (2) the image you want to save.
cv2.imwrite("screenshot.jpg", frame)

For an extra, optional challenge, give each screenshot a unique filename, so that you can save any number you want. (Adding a counter to the loop, and attaching the counter to the filename is a great way to do this.)

Masks and Thresholds

Image masks

A mask is just a black and white image. It can be created by processing a regular image with something like the threshold operations, or you can just make a black image from scratch and then draw white shapes on it.

To apply a mask to an image, the mask and the image need to be the same size. Then we use the bitwise_and operation to combine the two. Anywhere that the mask is white, it will keep the other image’s color, and anywhere that the mask is black, it will set the pixel to black.

Below is an example where we mask all but three rectangles of an image.

def maskBuilder(image):
    """Takes in an image and draws rectangular masks on it."""
    (h, w, d) = image.shape
    print(h, w)
    maskIm = np.zeros(image.shape, image.dtype)

    cv2.rectangle(maskIm, (0, 0), (w, h // 4), (255, 255, 255), -1)
    cv2.rectangle(maskIm, (200, 2 * h // 3), (400, 5 * h // 6), (255, 255, 255), -1)
    cv2.rectangle(maskIm, (3 * w // 5, 350), (3 * w // 4, 450), (255, 255, 255), -1)

    # Apply the mask
    newMidway = cv2.bitwise_and(midway, maskIm)
    return maskIm, newMidway

midway = cv2.imread("SampleImages/mightyMidway.jpg")
mask, finalIm = maskBuilder(midway)
cv2.imshow("Final", finalIm)
cv2.waitKey()

Try this function in. Try changing the coordinates of the rectangles, adding shapes, removing shapes.

Modifying the moving mask on video

In the reading you had a program that connected to the video camera, and showed a masked version of the camera image, where the only part of the video visible was a square that moved from one frame to the next, bouncing around. The starter code for this activity contains a function-based version of this program.

  • Try the program out as is, there are sample calls that run it on different video files as well as a webcam
  • Modify the program so that it draw a circle on the mask image, rather than the square it currently draws
  • Change the size of the circle drawn
  • Change the speed and direction of the mask shape’s movement by altering deltaX and deltaY

Using the threshold function

The threshold function takes in a grayscale image and produces a thresholded image that is grayscale, and often black and white. The readings go over all the many different thresholding modes that this function has. We will start with just the basic cv2.THRESH_BINARY mode for today.

Note: The threshold function returns two values as a tuple. The first value is the threshold that was used, and the second is the actual resulting thresholded image.

The function below takes in an image and a threshold value, and it returns the result of calling threshold.

def binaryThresh(image, threshValue=128):
    """Takes in an image and also has an optional input, threshValue, which can be set by the function call, or
    it defaults to the value 128. This function converts the image to grayscale and performs a binary threshold
    on it. It returns the resulting image."""

    grayImg = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    tVal, threshIm = cv2.threshold(grayImg, threshValue, 255, cv2.THRESH_BINARY)
    print("Threshold value:", tVal)
    return threshIm

In the starter code, read through this function, and then go to the bottom of the file where we have some sample calls. Try out the samples given to you, and examine the results.

Then, perform these experiments, and record the results either as a comment in the code file, or in a separate .txt file (this next part might be interesting to try with a partner):

  • Write code to try each of the images in the Coins folder (inside SampleImages) with this function. For each image, how well does the binaryThresh function isolate the coins from the background?
  • Can you modify the threshold value and improve the results? Try a few values and see: report on what you discovered.
  • Modify this function to use the combination of cv2.THRESH_BINARY and cv2.THRESH_OTSU or cv2.THRESH_TRIANGLE instead of using a fixed threshold value (see readings for how to combine and use these). Evaluate the results on all the coin pictures: does it help?
    • Look at the BallFinding folder, which contains images and videos of brightly colored balls. Pick one or two ball colors, and select one image for each. Try your function on these: how well does it work?

Combining thresholds with video (OPTIONAL)

Examine the functions processImage and threshVideo: these are a variation on the videoFeeder program from the last activity. Run the examples to see what it does.

Modify the processImage function so that it calls either cv2.threshold directly, or the binaryThresh function, to perform a threshold on the image. Examine the results. Test this in class using some of my bright and dark colorful balls and other objects.

Thresholding with color

The inRange function can isolate colorful objects better than the grayscale-based threshold function. To do this best, we need to work with HSV images, so that hue is a single value that won’t change greatly between well-lit and shadowed pixels of the same object. The inputs to inRange are as shown here:

threshImage = cv2.inRange(colorImage, lowBounds, highBounds)
  • It assumes that the colorImage is a three-channel color image. We will use HSV rather than BGR for this.
  • The lowBounds input is a tuple with three values: the lower boundary of acceptable values for each of the three channels
  • The highBounds input is a tuple with three values: the upper boundary of acceptable values for each of the three channels

Remember for HSV images that the three channels represent:

  • Hue: a value between 0 and 180 in OpenCV representing the hue of the pixel (given a global HSV values that ranges from 0 to 360, divide it by 2 to get the corresponding value in OpenCV’s HSV representation)
  • Saturation: a value between 0 and 255, represents the intensity of the color. Low saturation pixels tend to look gray, high saturation values have an intense version of the color
  • Value: a value between 0 and 255, represents the brightness of the color. High value pixels are very light, low value pixels are very dark

The code snippet below shows how to use inRange on one of the ball pictures:

ballImg = cv2.imread("BallFinding/Green/Green1BG1Mid.jpg")
hsvBall = cv2.cvtColor(ballImg, cv2.COLOR_BGR2HSV)
threshImg = cv2.inRange(hsvBall, (45, 10, 0), (65, 255, 255))

cv2.imshow("Original", ballImg)
cv2.imshow("inRange", threshImg)
cv2.waitKey()

This code is included in the starter code: try it out!

  • Experiment with varying the bounds, changing one value at a time.
    • What happens if we increase the lower bound on saturation?
    • What if we lower the upper bound on either saturation or value?
    • What if we broaden or narrow the hue range?

Suppose we wanted to try this on a different picture, with a different color of ball or one of the coin pictures.

To do that, we need to determine the correct range of hue values to pick the ball.

You can do this in one of two ways:

Option 1:

  • Bring up a picture of the object you want to track
  • Open an online color-picker that displays HSV
  • Do your best to pick a color close to your target color
  • Read its hue value, and divide it by 2, then make a range of acceptable hue values that surround it

Option 2:

  • Bring up a picture of the object you want to track (some image viewers will let you select colors from the pictures, with an “eyedropper” tool, but it isn’t universal)
  • Open a tool to let you select the color from your Desktop (Digital Color Meter on the Mac, Power Toys’ Color Picker on Windows)
  • Record the color, probably in RGB, of target pixels from the image
  • Use an online converter to convert RGB to HSV

Try at least one other colorful object

Combining color thresholds with video (OPTIONAL)

  • Copy the threshVideo and processImage functions, and rename them colorThreshVideo and processImage2
  • Change the colorThreshVideo to call processImage2 instead of processImage
  • Modify the processImage2 function so that it calls cv2.inRange on the input image and returns the result
  • You can hard-code the color boundaries in the function, so it only looks for one color
  • Test this on the relevant video files, or try it on the webcam with you holding the correct colorful object (I have external USB cameras if you want to try rolling a ball on the floor)