An image is
- array of numbers
- made up of a matrix of pixels
- each pixel having an intensity between
A JPEG image
- 3D matrix
- with channels for red, green, and blue pixel intensities.
For each pixel in the matrix, the intensity values across these three channels combine to create a full spectrum of colours, and varying the pixel values in each layer, changes the overall visual effect.
Most widely used Image Processing & Manipulation Tools:
- scikit image [Note: scikit-learn is different]
- Pillow – Python Imaging Library (Fork)
Numpy array format is perfect for sharing image data across all tools and is used extensively like a common currency in the image processing field.
To transpose an image means to flip and then mirror the image.
Resizing images in different aspect ratios distorts like squashing or stretching effect. To prevent distortion, resize proportionally with respect to the longest dimension meeting the required size and add padding to the shorter edge.
Good level of contrast. CDF cumulative distribution function of pixel values in range
3 simple ways to improve contrast - stretching histogram – scaling pixel values lowest to highest – however, values doesn't change histogram shape – only contrast may improve by little – Contrast/ Histogram Stretching
Better technique – Normalize pixel values on a scale of
0-255 with frequent intensities spread out evenly – Flattening histogram producing diagonal CDF – relatively uniform distribution.
Filters are used to change pixel values (visual effects) in an image. E.g., blurring, sharpening, embossing. Applied using a matrix of values – Kernel that is overlaid on the original image with pixel values to change in centre.
This process of matrix calculation of a filter over the original image pixels values to obtain a new set of calculated values in the range 0-255 producing a filtered version of image = Convolution
IF convoluted value > 255 then set to 255. IF convoluted value < 0 then set to 0.
But, on the edges of the original image, convolution cannot be performed so to solve it:
- Retain original values of edges as it is or
- adding a border of neutral pixel values or
- extending the original image and etc.
Edge detection - uses filters to find sudden changes of pixel values indicating boundaries, shapes and objects.
- convert images to Grayscale so we deal with only 1 channel of pixels.
- apply specific filter like Sobel Filter – idea of
3×3filter to find gradients
- 2 stages of process – 2 kernels to find horizontal & vertical gradients
Then add squares of
x,y values for each pixel and take sq. root to determine the length of vector
G = √ Gx² + Gy²
Then calculate inverse tangent of those values to determine its angle
Maths morphology – using maths to change/ morph images
Dilation – start by creating a mask aka structuring element.
Place this over the image just like a filter. Now performing logical OR operation on any cell matching a cell in the image. IF matched, then activate that target cell by setting it to
0 = black. And repeat in all.
Hence the main effect of dilation is to enlarge images by extending pixels around edges and fill in small holes.
Contrary to Dilation, Erosion removes pixels in the image. Similar to above, if we compare cells using AND operation setting target cell to
0 only when all cells match Else set it to
255. This results in erosion; removes a layer of cells at edges, removes round edges and fine details from the image.
The net effect of applying
first dilation and then erosion = closing and
first erosion and then dilation = opening
Thresholding – binarizing pixels – 2 types – Global & adaptive.
For eg. with threshold of 127.
IF pixel values > 127 then set to 255 IF pixel values < 127 then set to 0
Or vice versa to obtain inverse thresholding. Adapting threshold applies to localize regions of thresholding.
To calculate threshold values for specific regions, use Otsu's binarization algorithm.
After thresholding, erosion and dilation help separate foreground and background.
But for cases where multiple objects overlapping in the image.
- Convert image to grayscale.
- Apply thresholding to binarize image and erosion then dilation to obtain a monochrome format.
- Use Distance Transformation to change intensities of foreground pixels based on their proximity to background.
- Threshold these to separate darker pixels from lighter ones.
Now we have 3 kinds of pixels –
- foreground pixels
- background pixels
- unknown pixels around edges.
Mark both distinct known region to different
int value and make unknown pixels to
Apply Watershed segmentation to fill in marked foreground leaving boundary pixels.
Limitations of classical algorithms
Classical algorithms used in machine learning are straightforward and used heavily. But as the limitations of computer vision, these classical algorithms can be heavily influenced by the variations of pixel values due to color contrast and other image features.
Moreover, they apply to the entire set of pixel values when only some of the prominently important pixel values describe the features of the image that we’re trying to classify.
It is also more influenced by the background pixel values that have little or no weight.