By Sparsh in artificial learning — Oct 27, 2019

Beginner's Guide to Image Manipulation Terms

An image is

array of numbers
made up of a matrix of pixels
each pixel having an intensity between 0-255.

A JPEG image

3D matrix
with channels for red, green, and blue pixel intensities.

For each pixel in the matrix, the intensity values across these three channels combine to create a full spectrum of colours, and varying the pixel values in each layer, changes the overall visual effect.

Most widely used Image Processing & Manipulation Tools:

scikit image [Note: scikit-learn is different]
OpenCV
Matplotlib
Pillow – Python Imaging Library (Fork)

Numpy array format is perfect for sharing image data across all tools and is used extensively like a common currency in the image processing field.

To transpose an image means to flip and then mirror the image.

Resizing images in different aspect ratios distorts like squashing or stretching effect. To prevent distortion, resize proportionally with respect to the longest dimension meeting the required size and add padding to the shorter edge.

Good level of contrast. CDF cumulative distribution function of pixel values in range 0-255.

3 simple ways to improve contrast - stretching histogram – scaling pixel values lowest to highest – however, values doesn't change histogram shape – only contrast may improve by little – Contrast/ Histogram Stretching

Better technique – Normalize pixel values on a scale of 0-255 with frequent intensities spread out evenly – Flattening histogram producing diagonal CDF – relatively uniform distribution.

Filters are used to change pixel values (visual effects) in an image. E.g., blurring, sharpening, embossing. Applied using a matrix of values – Kernel that is overlaid on the original image with pixel values to change in centre.

This process of matrix calculation of a filter over the original image pixels values to obtain a new set of calculated values in the range 0-255 producing a filtered version of image = Convolution

During convolution

IF convoluted value > 255 then set to 255.
IF convoluted value < 0 then set to 0.

But, on the edges of the original image, convolution cannot be performed so to solve it:

Retain original values of edges as it is or
adding a border of neutral pixel values or
extending the original image and etc.

Edge detection - uses filters to find sudden changes of pixel values indicating boundaries, shapes and objects.

To start,

convert images to Grayscale so we deal with only 1 channel of pixels.
apply specific filter like Sobel Filter – idea of 3×3 filter to find gradients
2 stages of process – 2 kernels to find horizontal & vertical gradients

Then add squares of x,y values for each pixel and take sq. root to determine the length of vector

G = √ Gx² + Gy²

Then calculate inverse tangent of those values to determine its angle

arctan(Gy/ Gx)

Maths morphology – using maths to change/ morph images

Dilation – start by creating a mask aka structuring element.

For eg:

Place this over the image just like a filter. Now performing logical OR operation on any cell matching a cell in the image. IF matched, then activate that target cell by setting it to 0 = black. And repeat in all.

Hence the main effect of dilation is to enlarge images by extending pixels around edges and fill in small holes.

Contrary to Dilation, Erosion removes pixels in the image. Similar to above, if we compare cells using AND operation setting target cell to 0 only when all cells match Else set it to 255. This results in erosion; removes a layer of cells at edges, removes round edges and fine details from the image.

The net effect of applying first dilation and then erosion = closing and first erosion and then dilation = opening

Thresholding – binarizing pixels – 2 types – Global & adaptive.

For eg. with threshold of 127.

IF pixel values > 127 then set to 255
IF pixel values < 127 then set to 0

Or vice versa to obtain inverse thresholding. Adapting threshold applies to localize regions of thresholding.

To calculate threshold values for specific regions, use Otsu's binarization algorithm.

After thresholding, erosion and dilation help separate foreground and background.

But for cases where multiple objects overlapping in the image.

Convert image to grayscale.
Apply thresholding to binarize image and erosion then dilation to obtain a monochrome format.
Use Distance Transformation to change intensities of foreground pixels based on their proximity to background.
Threshold these to separate darker pixels from lighter ones.

Now we have 3 kinds of pixels –

foreground pixels
background pixels
unknown pixels around edges.

Mark both distinct known region to different int value and make unknown pixels to 0 black.

Apply Watershed segmentation to fill in marked foreground leaving boundary pixels.

Limitations of classical algorithms

Classical algorithms used in machine learning are straightforward and used heavily. But as the limitations of computer vision, these classical algorithms can be heavily influenced by the variations of pixel values due to color contrast and other image features.

Moreover, they apply to the entire set of pixel values when only some of the prominently important pixel values describe the features of the image that we’re trying to classify.

It is also more influenced by the background pixel values that have little or no weight.

Subscribe to verbose tethics