Overview
In this activity, you will practice with the fundamental tools of OpenCV and the Numpy representation of images. In addition, we’ll look at image arithmetic, blending of images, and making image masks and threshold images.
The Github repository for this assignment will contain a starter code file, ICAIntro.py. Put your code in this file, as directed by the TODO comments.
Before working on this activity, I encourage you to download the following zip files, unzip them, and move them into the Github repo for this activity. Do not add them to the files managed by Git, you do not want to try to push these files to Github!
Image and OpenCV Basics
Reading and displaying images
The script below reads in and displays a single image. Review Chapter 1 from your Vision readings if you need help understanding what each step does. Read the script carefully, and predict for yourself and your teammate what you think the program will do.
Finally, run your copy of this program to see if your predictions were correct.
Try this: Copy this program, and follow the steps below to write a program that displays all the images in the SampleImages folder (you don’t have to loop through subfolders, though it can be done).
- Add an import statement to import the
osmodule. Use theos.listdircommand to get a list of all the filenames in theSampleImagesfolder. - Loop over those filenames
- Inside the loop, check if the image filename ends with
pngorjpgorjpeg. If not, skip them - If it is an image name, then use
cv2.imreadto read in the image - Use
cv2.imshowto display the image: put all images in the same window by using the same window name
Experimenting with colors
The program below, included in your Github repo as, colorBackground.py, uses Numpy tools to construct a new image built from scratch. The zeros function creates an array of the size and data type we specify, so we can make a new image array without reading from a file! This program, as written, sets every pixel in the array to be a dark red color.
import cv2
import numpy as np
image = np.zeros((200, 200, 3), np.uint8)
image[:,:] = (0, 0, 128)
cv2.imshow("Color", image)
cv2.waitKey()- 1
-
Asks for every row and every column of
imageto be set to the color tuple
Try this: Run this program to be sure what it does. Then make four copies of lines 6 to 9. You can comment out my original line by putting # at the start of the line (or use the keyboard shortcut listed under the Code menu as Comment with Line Comment).
For each copy, experiment with color values until you create the following:
- A pleasing purple color
- A dark blue-green color
- A murky green with a hint of yellow
- A pale orange color
Drawing on images
Chapter 1 of your Vision readings talks about the basic drawing tools OpenCV provides. You might want to open that document, or the OpenCV documentation, for assistance during this part.
Try this: Create a new Python file, and in it write a script to create a simple picture.
- First, choose your background image, which could be one of the
SampleImages, an image of your own that you copy into your PyCharm project, or a blank image with a background color of your choice - Next, choosing your own colors, size, and details, draw a smiling stick figure on your image using OpenCV’s drawing functions
- Be sure to save your new picture to a new file
Converting color representations
The cvtColor function in OpenCV will convert an image from one color representation to another. The key to this function is the code that we pass as its second argument. The code tells the function what the representation of the current image is, and what representation we want. Below are some typical conversion codes:
| Code | Description |
|---|---|
cv2.COLOR_BGR2GRAY |
Convert BGR to grayscale |
cv2.COLOR_BGR2HSV |
Converts BGR to HSV |
cv2.COLOR_GRAY2BGR |
Converts grayscale to BGR |
cv2.COLOR_BGR2RGB |
Converts BGR to RGB |
| `cv2.COLOR_BGR2 |
See the Color Space Conversions page in OpenCV’s documentation for a complete list.
The function below is in the starter code. Try it with the sample call in the file, then try varying the image, and examine the results. Do you understand how to use this function?
def testConversions(origImage):
"""Takes in an image, and it converts it to grayscale and HSV. It also converts the grayscale back to BGR,
displays the results, and prints the shapes of the images."""
gray1 = cv2.cvtColor(origImage, cv2.COLOR_BGR2GRAY)
BGRIm2 = cv2.cvtColor(gray1, cv2.COLOR_GRAY2BGR)
HSVIm = cv2.cvtColor(origImage, cv2.COLOR_BGR2HSV)
print(origImage.shape, gray1.shape, BGRIm2.shape, HSVIm.shape)
cv2.imshow("Original", origImage)
cv2.imshow("Gray1", gray1)
cv2.imshow("BGR2", BGRIm2)
cv2.imshow("HSV", HSVIm)
cv2.waitKey()Images as matrices
Creating image arrays with Numpy
The list of functions below allows us to create Numpy arrays from scratch. With this, we can make a blank canvas to use the Numpy drawing functions, create synthetic images, or just make a black and white mask or frame to apply to an image.
| Function | Description |
|---|---|
array |
Takes a sequence data type (list, string, tuple, or Numpy array), and builds a Numpy array with the same shape. Optional input allows us to specify the type of data for the array. |
zeros |
Takes in a tuple giving dimensions, and an optional input for the type of data, and it makes an array with the given dimensions and type, all filled with zeros. |
ones |
Similar to zeros but it fills the array with the value 1. |
Try this: Create a function blackCanvas that takes in two inputs, the width and height for the canvas, and it creates a new color Numpy array all filled with black, with the given width and height. It should return the new image array. In your main script, try several test calls to this function with different sizes of images, and draw a square centered in each.
Image Arithmetic
Remember that both Numpy and OpenCV provide tools for doing arithmetic on images. This means we can add or subtract a constant amount to every value in an image, changing the overall brightness of the image. We can also add or subtract images from each other, so long as they are the same shape: corresponding values in the image arrays are added or subtracted from each other.
Subtracting images
We can perform a simple kind of motion detection on frames from a video, by subtracting one frame from its predecessor. The Numpy module tells Python how to interpret the minus sign when applied to Numpy arrays. OpenCV has two functions for taking the difference: subtract and absdiff. The first one just does subtraction, the second one computes the absolute value of the difference for each corresponding value in the arrays.
Try this: In the script section of the starter code file, add a script to experiment with subtracting two frames from a video (this video is one we will work with later, it shows Prof. Fox holding an orange ball and moving it around in the air):
- Start by reading in
frame1.jpgandframe2.jpg - Use
imshowto display the two images; notice that frame 1 is slightly different from frame 2 - Define a new image
diff1to be the result of calculating frame 1 minus frame 2, using the minus sign (-) - Define another new image
diff2to hold the result of OpenCV’ssubtractfunction applied to frame 1 and frame 2 - Define a third new image
diff3to hold the result of OpenCV’2absdifffunction - Use
imshowto display all three diff images - Examine the results. Based on what you know about how Numpy and OpenCV handle arithmetic, and how
subtractandabsdiffdiffer from each other, explain why the three difference images look the way they are, and how they are different.
Blending images
We can use image arithmetic to blend two images together, using the addWeighted function from OpenCV. To blend two images, we want to look at corresponding pixels, and average their two red values to make a new red value, and do similarly with green and blue channel values. (It is kind of amazing that this works, actually!)
Resizing images
In order to blend two images, we need to make them the same size and shape. We could do that using the Numpy slicing operators, but here we will take a look at a function that lets you resize an image either by scaling it to a specific size, or by scaling it by given factors in the x and y directions.
The resize function takes in the image to be resized, and a tuple giving the new size as width followed by depth, and it returns a new image of the new size. However, if we set the width and height in the tuple to be zero, then we can provide optional inputs that give the new size as a factor of the original. The optional input fx specifies the factor in the x dimension, and fy in the y dimension. If we set fx to 0.5, for example, then the new image will have a width one half the size of the original’s width.
The examples below illustrate different ways of calling resize.
| Examples | Meaning |
|---|---|
cv2.resize(src, (100, 100)) |
Returns a new stretch/squashed image that is 100 x 100 pixels |
cv2.resize(src, (0, 0), fx = 2, fy = 2) |
Returns a new image twice the size of the original |
cv2.resize(src, (0, 0), fx = 0.5, fy = 1.0) |
Returns a new image whose width is half the original size |
Try this: In the script section of the activity code file, read in three images from SampleImages (any ones you choose). Use the resize command to change the second and third images to match the size of the first. Be sure to imshow the images so that you can check your work. Call these images img1, img2, and img3.
Blending with arithmetic
Examine the code fragment below (also reproduced in your activity code file).
Try this fragment on the resized images you created in the previous section, and observe the results. If we just add the two images, the result is too bright, and would have tons of overflow artifacts if we used Numpy addition. We want to average the two image values, not just add them. But consider this: if we first add the images, and then divide by 2 (the way we typically think about computing an average) the result will be distorted. Even OpenCV’s addition operator avoids overflow by capping the values at 255 when they would have added up to more than 255. That means that adding and then dividing by 2 will produce a different result than dividing each original image by 2 and then adding.
Try the code below on the images you resized, and compare blend1 and blend2.
We can also use Numpy commands to compute the average more easily, so long as we remember to divide first, and then add, to avoid overflow artifacts. Try this:
Weighted averages
A normal average (add the two numbers and divide by 2) weights both pixels/images equally: 50% from one image and 50% from another. The previous example multipled each image by 0.5: by shifting from division to multiplication we can see that we are really multiplying each image by a weight, the percentage of the final image that should come from each image.
We can change the percentages, by changing the weights. Just make sure they are each between 0.0 and 1.0 and that they add up to 1.0, so that the resulting image has the same brightness as the originals.
Try varying the weights for the example above, and examining the resulting blend.
The addWeighted function
OpenCV actually provides a function, addWeighted, to perform a weighted average of two images.
The addWeighted function has 5 required inputs: the first image to blend, the weight to multiply the first image by, the second image to blend, the weight for the second image, and a constant to add the result. In other words, the function computes this mathematical formula:
\[newIm = \alpha \cdot img1 + \beta \cdot img2 + \gamma\]
Try this as an alternative to the Numpy arithmetic version, and compare the results.
Try this: Create a function phaseBlend that takes in two images presumed to be the same size. This function should include the following steps:
- Set up one weight value,
w, to be 0.0 (wis an accumulator variable foer this loop) - Repeat with a
forloop, ehough times forwto go to zero (experiment or calculate) - Inside the for loop, use
wand1 - was the weights, and blend the two input images, assigning the result to a variable - Also in the loop,
imshowthe blended result, and include awaitkey - Finally, in the loop, add a small amount to
wto change the weight for next time (use 0.1 or 0.05, or similar) - Optional extension: Instead of trying to time the loop stopping to when
wgets to 1.0, we could change the direction of the blend and start reducingweach time (until it gets to 0.0). To do this, we need another accumulator variable,deltaW, to hold the amount to changewby each time. It will stay at 0.1 or 0.05 untilwreaches 1.0, and then it should change to be -0.1/-0.05.
Accessing channels
The split and merge functions allow us to pull apart an image’s channels, and put them back together again.
Try this: Examine the code below, also found in your starter file.
When you run this code, what happens, and why? Discuss with classmates, preceptor, or instructor if you aren’t sure why the channels appear the way they do when displayed.
Add a call to merge to your code file, right after these lines. Merge takes in a tuple of three channels, and treats them as the blue, green, and red channels of the image. Try these variations:
- Make an exact copy of the original, by calling
mergeand passing it a tuple containing the channel images in the original order (blue, then green, then red). (Be sure toimshowthe result.) - Use
zerosto make a copy of the red channel that is all filled with zeros. Callmergeagain, but replace the red channel with the new blank one. How does it differ? What if you make a white image the same shape as the red channel, and then used that inmerge? - Call
mergea third time, and use the original three channels, but put them in a different order. What does the result look like?
Can you explain the results you see? If not, discuss with a neighbor or teammate, or with preceptor or faculty member.
Try this: Create a function, colorShuffle, that takes in one image as an input parameter. It will return a new image that has the three color channels randomly shuffled to a new order!
Do this:
- Add an import statement at the top of the file to import the
randommodule - Inside the function, use
splitto separate the three channels from each other - Define a variable to hold a list with the three channel arrays in it
- Call
random.shuffleand pass it the list from the previous step (this will change the list to a new ordering, try printing the list before and after to see howshuffleworks) - Pass
merge1the list to get the new image - Return the image
Regions of Interest
A region of interest is a section of an image that we want to focus on. Often it is the result of running computer vision algorithms to determine where something interesting is in the image, but it could also be human-designed.
At its core, we make a region of interest using Numpy’s array slicing operator. We will practice the slicing operator here, on small arrays and on images.
Key idea: Remember that Numpy often avoids copying the data in an array, both when we use slicing to access portions of it, and when we use the astype method to change the type of data. Instead, it provides a view of the original data, either limiting the indices we can see, or mapping the values to a new type as we access them. When this happens, changes to the original array show up in the view array, and vice versa.
Consider the small 2d array shown in the code below.
[[ 2 4 6 8]
[ 3 6 9 12]
[ 4 8 12 16]]
If we want to access and individual element of the array, we can put its row and column indices inside square brackets, separated by commas:
last of second row: 12
second of last row: 8
If we want a subarray, we extend this notation to use slicing operators for the row or column we are selecting from. Here are a few examples:
middle values: [6 9]
third and fourth columns:
[[ 6 8]
[ 9 12]
[12 16]]
Now you try some examples, putting your code in the script section indicated by a TODO comment.
- Access just the value 16 from this array
- Access the first column of the array
- Select the last two elements from the first two rows (giving a 2x2 matrix)
- Select values from every other row, and every other column, starting with the 2 at [0, 0]
When working with images, we typically use slicing in two ways: to select the color at a specific pixel, or to select a rectangular region of the picture. The code sample below illustrates two ways to select the color channels from a specific pixel location. It also draws a tiny circle at that location on the image and displays it.
Notice that this is one of the places where Numpy and OpenCV’s different ordering of rows and columns comes into play: Numpy orders the location as (row, column), but when we specify the location for drawing the circle, we give it as (x, y).
To access a region of an image, we select the range of rows, then the range of columns, and then place a colon (:) for the channel dimension, to indicate we want all three channels.
Try this: Create a function centerCrop that takes in an image as its input parameter. The function should make and return an ROI that is 200x200 pixels, centered in the image. * Get the height and width of the image, and from that calculate the center row and center column * Define an ROI that extends 100 to the either side of the center column and 100 to either side of the center row * Display the ROI and the original, and use waitKey to pause * Return the ROI
Working with Video
The table below lists the main functions you need in order to access frames from the computer’s camera. First there is the function VideoCapture, which takes one input and creates a VideoCapture object. The one input specifies which camera connected to the computer should be used (zero is the default: the built-in camera if there is one). Three methods belonging to the VideoCapture object are shown, as well.
| Examples | Meaning |
|---|---|
cap = cv2.VideoCapture(0) |
Creates a VideoCapture object connected to specified camera |
cap.isOpened() |
Boolean method returns true if camera connection succeeded |
ret, image = cap.read() |
Returns a boolean (was read successful) and a frame |
cap.release() |
Disconnects from camera |
Start with the script below to get images from a built-in camera. If your computer does not have a built-in webcam, see Susan to get an external webcam!
You are going to improve this script.
Ending the program with key input
Be sure to check the readings for today for how to use waitKey to get the key the user pressed.
- Modify the code above like the example in Chapter 3 of our Vision readings, so that it catches the value returned by
waitKeyand checks it to see if the value is -1. If not, convert it to a character withchrand see if the user typed'q'. Break out of the loop if so. - Add a function called
processImageto the file. It should take in an image and return an image. For now just have it return the image it is passed (we will add functionality in later steps). - Insert a call to
processImagein between thevidCap.read()line and thecv2.imshowline. PassimgtoprocessImage, and assign the returned image toimg2. - Change the displayed image to be
img2.
Now that you have a functioning video streaming program, let’s start making improvements.
Try to hold something up in front of the camera while your program is running, and then move the object to the different corners of the video image. Did you find yourself moving the wrong way? We often have trouble with left and right movements on a video feed, because we are more used to seeing a mirror image of ourselves. We can use the OpenCV function flip to flip the image. This function will flip an image upside down, or left-to-right.
The flip function takes in an image and a flip code which specifies whether to flip around the horizontal axis, or around the vertical one. It returns a new image. A flip code of 0 will flip a picture upside down (around the horizontal axis), and a flip code of 1 will flip a pictures left-to-right (around the vertical axis).
Try this: For this task, you will add an option of display the flipped version of the video feed.
- Modify your
processImagefunction so that it takes in an extra input:doFlip. WhendoFlipisTrue, the function should call theflipfunction, using flip code 1, and should return the resulting image. WhendoFlipisFalse, the function should return the original image. - In your main program, you will need to set up a boolean flag variable. This is a special kind of accumulator variable that just holds
TrueorFalse. We call it a flag variable because it signals when a certain condition holds- Before the
while Trueloop, set up the flag variable to beFalse - Inside the loop, add the flag variable to the call to
processImage - Add to the
ifstatement that checks the user’s input key: if the user hits the f key, we want to toggle the value of the flag variable: if it wasFalse, it should becomeTrue, and vice versa.
- Before the
Try this: Suppose that you wanted to be able to save individual frames from the video feed to image files.
- Add to the
ifstatement that checks the user’s input key, so that it checks if the user hits the s key - When the user hits that, save the current image to a file, using the
cv2.imwritefunction. This function takes two inputs: (1) a string for the filename of the file to save to, including.jpgor.pngas the filename extension, and (2) the image you want to save.
For an extra, optional challenge, give each screenshot a unique filename, so that you can save any number you want. (Adding a counter to the loop, and attaching the counter to the filename is a great way to do this.)
Masks and Thresholds
Image masks
A mask is just a black and white image. It can be created by processing a regular image with something like the threshold operations, or you can just make a black image from scratch and then draw white shapes on it.
To apply a mask to an image, the mask and the image need to be the same size. Then we use the bitwise_and operation to combine the two. Anywhere that the mask is white, it will keep the other image’s color, and anywhere that the mask is black, it will set the pixel to black.
Below is an example where we mask all but three rectangles of an image.
def maskBuilder(image):
"""Takes in an image and draws rectangular masks on it."""
(h, w, d) = image.shape
print(h, w)
maskIm = np.zeros(image.shape, image.dtype)
cv2.rectangle(maskIm, (0, 0), (w, h // 4), (255, 255, 255), -1)
cv2.rectangle(maskIm, (200, 2 * h // 3), (400, 5 * h // 6), (255, 255, 255), -1)
cv2.rectangle(maskIm, (3 * w // 5, 350), (3 * w // 4, 450), (255, 255, 255), -1)
# Apply the mask
newMidway = cv2.bitwise_and(midway, maskIm)
return maskIm, newMidway
midway = cv2.imread("SampleImages/mightyMidway.jpg")
mask, finalIm = maskBuilder(midway)
cv2.imshow("Final", finalIm)
cv2.waitKey()Try this function in. Try changing the coordinates of the rectangles, adding shapes, removing shapes.
Modifying the moving mask on video
In the reading you had a program that connected to the video camera, and showed a masked version of the camera image, where the only part of the video visible was a square that moved from one frame to the next, bouncing around. The starter code for this activity contains a function-based version of this program.
- Try the program out as is, there are sample calls that run it on different video files as well as a webcam
- Modify the program so that it draw a circle on the mask image, rather than the square it currently draws
- Change the size of the circle drawn
- Change the speed and direction of the mask shape’s movement by altering
deltaXanddeltaY
Using the threshold function
The threshold function takes in a grayscale image and produces a thresholded image that is grayscale, and often black and white. The readings go over all the many different thresholding modes that this function has. We will start with just the basic cv2.THRESH_BINARY mode for today.
Note: The threshold function returns two values as a tuple. The first value is the threshold that was used, and the second is the actual resulting thresholded image.
The function below takes in an image and a threshold value, and it returns the result of calling threshold.
def binaryThresh(image, threshValue=128):
"""Takes in an image and also has an optional input, threshValue, which can be set by the function call, or
it defaults to the value 128. This function converts the image to grayscale and performs a binary threshold
on it. It returns the resulting image."""
grayImg = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
tVal, threshIm = cv2.threshold(grayImg, threshValue, 255, cv2.THRESH_BINARY)
print("Threshold value:", tVal)
return threshImIn the starter code, read through this function, and then go to the bottom of the file where we have some sample calls. Try out the samples given to you, and examine the results.
Then, perform these experiments, and record the results either as a comment in the code file, or in a separate .txt file (this next part might be interesting to try with a partner):
- Write code to try each of the images in the
Coinsfolder (insideSampleImages) with this function. For each image, how well does thebinaryThreshfunction isolate the coins from the background? - Can you modify the threshold value and improve the results? Try a few values and see: report on what you discovered.
- Modify this function to use the combination of
cv2.THRESH_BINARYandcv2.THRESH_OTSUorcv2.THRESH_TRIANGLEinstead of using a fixed threshold value (see readings for how to combine and use these). Evaluate the results on all the coin pictures: does it help?- Look at the
BallFindingfolder, which contains images and videos of brightly colored balls. Pick one or two ball colors, and select one image for each. Try your function on these: how well does it work?
- Look at the
Combining thresholds with video (OPTIONAL)
Examine the functions processImage and threshVideo: these are a variation on the videoFeeder program from the last activity. Run the examples to see what it does.
Modify the processImage function so that it calls either cv2.threshold directly, or the binaryThresh function, to perform a threshold on the image. Examine the results. Test this in class using some of my bright and dark colorful balls and other objects.
Thresholding with color
The inRange function can isolate colorful objects better than the grayscale-based threshold function. To do this best, we need to work with HSV images, so that hue is a single value that won’t change greatly between well-lit and shadowed pixels of the same object. The inputs to inRange are as shown here:
- It assumes that the
colorImageis a three-channel color image. We will use HSV rather than BGR for this. - The
lowBoundsinput is a tuple with three values: the lower boundary of acceptable values for each of the three channels - The
highBoundsinput is a tuple with three values: the upper boundary of acceptable values for each of the three channels
Remember for HSV images that the three channels represent:
- Hue: a value between 0 and 180 in OpenCV representing the hue of the pixel (given a global HSV values that ranges from 0 to 360, divide it by 2 to get the corresponding value in OpenCV’s HSV representation)
- Saturation: a value between 0 and 255, represents the intensity of the color. Low saturation pixels tend to look gray, high saturation values have an intense version of the color
- Value: a value between 0 and 255, represents the brightness of the color. High value pixels are very light, low value pixels are very dark
The code snippet below shows how to use inRange on one of the ball pictures:
This code is included in the starter code: try it out!
- Experiment with varying the bounds, changing one value at a time.
- What happens if we increase the lower bound on saturation?
- What if we lower the upper bound on either saturation or value?
- What if we broaden or narrow the hue range?
Suppose we wanted to try this on a different picture, with a different color of ball or one of the coin pictures.
To do that, we need to determine the correct range of hue values to pick the ball.
You can do this in one of two ways:
Option 1:
- Bring up a picture of the object you want to track
- Open an online color-picker that displays HSV
- Do your best to pick a color close to your target color
- Read its hue value, and divide it by 2, then make a range of acceptable hue values that surround it
Option 2:
- Bring up a picture of the object you want to track (some image viewers will let you select colors from the pictures, with an “eyedropper” tool, but it isn’t universal)
- Open a tool to let you select the color from your Desktop (Digital Color Meter on the Mac, Power Toys’ Color Picker on Windows)
- Record the color, probably in RGB, of target pixels from the image
- Use an online converter to convert RGB to HSV
Try at least one other colorful object
Combining color thresholds with video (OPTIONAL)
- Copy the
threshVideoandprocessImagefunctions, and rename themcolorThreshVideoandprocessImage2 - Change the
colorThreshVideoto callprocessImage2instead ofprocessImage - Modify the
processImage2function so that it callscv2.inRangeon the input image and returns the result - You can hard-code the color boundaries in the function, so it only looks for one color
- Test this on the relevant video files, or try it on the webcam with you holding the correct colorful object (I have external USB cameras if you want to try rolling a ball on the floor)