convolution

the convolution operation is defined as the integral of the product of two functions after one is reflected about the y-axis and shifted:

in essence, the convolution operation shows how one function affects the other

for two functions , the discrete convolution of and is given by: extending to 2 dimensional functions we'd get:

in machine learning, specifically in convolutional neural networks, the convolution operation is used do detect features in an image, one of the operands is the image and the second is the kernel, see fig-conv-1.

the convolving of kernel over image , , visualized (a single step)

although the kernel has to be rotated around every axis before we begin sliding it over the image, otherwise we'd be doing cross-correlation instead of convolution.

an example of a flipped kernel:

if you look at fig-conv-1, you may notice the edges essentially get "trimmed off", converting a 7x7 matrix into a 5×5 one. the pixels on the edge are never at the center of the kernel, because there is nothing for the kernel to extend to beyond the edge. but we often would like the size of the output to equal the input.

to solve this, a padding is introduced to the image, where "fake" pixels are placed around the image to extend its dimensions so that the result of the convolution would grow to equal the size we want.

often when applying a convolution, we want an output with a lower size than the input. one way of accomplishing this is by using a stride, which is the amount of pixels the kernel slides over (skips) in a given dimension.

convolution is commutative

given an image and a kernel , assuming a stride of 1 and no padding, the resulting matrix would be of dimensions , so and the operation would be defined as: or equivalently (since both convolution and multiplication are commutative):