Image – array of numbers
made up of a matrix of pixels
each pixel having an intensity between 0-255.
A JPEG image – 3D matrix, with channels for red, green, and blue pixel intensities.
For each pixel in the matrix, the intensity values across these three channels combine to create a full spectrum of colors, and varying the pixel values in each layer, changes the overall visual effect.
Image processing manipulation tools – scikit image , opencv, matplotlib and PIL – fork Pillow
Numpy array format perfect for sharing image data across all tools and used extensively as common currency
Transpose – flip – mirror the image
Resizing image in different aspect ratios distorts like squashing or stretching effect. To prevent distortion, resize proportionally wrt longest dim meeting the required size and add padding to shorter edge.
Good level of contrast. CDF cumulative distribution function of pixel values in range 0-255
3 simple ways to improve contrast – stretching histogram – scaling pixel values lowest to highest – however values doesn’t change histogram shape – only contrast may improve by little – Contrast/ Histogram Stretching
Better technique – Normalize pixel values on a scale of 0-255 with frequent intensities spread out evenly – Flattening histogram producing diagonal CDF – relatively uniform distribution.
Filters used to change pixel values (visual effects) in image. Eg: blurring, sharpening, embossing. Applied using a matrix of values – Kernel that is overlaid on the original image with pixel values to change in center.
This process of matrix calculation of a filter over the original image pixels values to obtain a new set of calculated values in range 0-255 producing a filtered version of image = Convolution
During convolution, IF convoluted value > 255 then 255. IF convoluted value < 0 then 0
But, on the edges of original img, convolution cannot be performed so to solve:
Retain original values of edges as it is Or adding border of neutral pixel values Or extending the original image and etc.
Edge detection – uses filters to find sudden changes of pixel values indicating boundaries, shapes and objects.
To start, convert images to Grayscale so we deal with only 1 channel of pixels. Apply specific filter like Sobel Filter – idea of 3×3 filter to find gradients – 2 stages of process – 2 kernels to find horizontal & vertical gradients
Then add squares of x,y values for each pixel and take sq. root to determine the length of vector
G = √ Gx² + Gy²
Then calculate inverse tangent of those values to determine its angle
Maths morphology – using maths to change/ morph images – Dilation – start by creating a mask aka structuring element. For eg:
Place this over the image just like filter. Now performing logical OR operation on any cell matching a cell in image. IF matched, then activate that target cell by setting it to 0 = black. And repeat in all.
Hence the main effect of Dilation is to enlarge images by extending pixels around edges and fill in small holes.
Contrary, Erosion removes pixels in image. Similar to above, if we compare cells using AND operation setting target cell to 0 only when all cells match Else set it to 255. This results in erosion; removes a layer of cells at edges, removes round edges and fine details from image.
Net effect of applying; Dilation and then Erosion = closing erosion and then dilation = opening
Thresholding – binarizing pixels – 2 types – Global & adaptive.
for eg. with threshold of 127.. Pixel values > 127 set to 255 and below to 0 Or vice versa to obtain inverse thresholding. Adapting threshold applies localize regions of thresholding.
To calculate threshold value for specific regions – Otsu binarization algorithm
After thresholding, erosion and dilation helps separate foreground and background. But for cases where multiple objects overlapping in image.
Convert image to grayscale.
Apply thresholding to binarize image and erosion then dilation to obtain a monochrome format.
Then use Distance Transformation to change intensities of foreground pixels based on their proximity to background. Then threshold these to separate darker pixels from lighter ones.
Now we have 3 kinds of pixels – foreground pixels, background pixels and unknown pixels around edges. Mark both distinct known region to different int value and make unknown pixels to 0 black.
Apply Watershed segmentation to fill in marked foreground leaving boundary pixels.
Limitations of classical algorithms
Classical algorithms used in machine learning are straightforward and used heavily. But as the limitations of computer vision, these classical algorithms can be heavily influenced by the variations of pixel values due to color contrast and other image features.
Moreover they apply to the entire set of pixel values when only some of the prominently important pixel values describe the features of the image that we’re trying to classify.
It is also more influenced by the background pixel values that have no or little weight.
Artificial neuron y = f(x, w, b) where y is label output and x input w weight and b bias
At core, the function has the weighted sum of x inputs multiplied by its corresponding w weight and adding the b bias.. I.e E(x*w)+b
More we want to simulate a neuron that fires or not based on whether it reaches a specific threshold or not
Wrap the weighted sum E(x*w)+b in a function that squashes the overall output value within a range(0 to 1). This is carried out mostly using the popular Sigmoid function that yields output in (0-1).
This is Activation function which determines whether the artificial neuron fires or not.
How these neuron functions work to fit in a machine learning model?
We apply multiple filter kernels each initialized with random weights to the image. Then each of these kernels convolve across the image to produce each one of the feature maps. A ReLU activation function is applied to these feature maps values. ReLU sets negative values to 0 and values greater than 0 as it is.
So after one or more convolutional layers we use pooling or downsampling kernel that convolves across the obtained feature map similar to the conv filter layers before. But this downsample the map by only taking the max values to a smaller feature map to emphasize activated pixels.
Overfitting poses a tough challenge during any convolutional NN training process. It is the behaviour of a model that learns to classify the training data very well with high accuracy but contrarily fails to generalize the never seen new data on which it hasn’t been trained with a lower accuracy.
Building a CNN, we feed training data through network layers in batches, apply a loss function to assess the accuracy of the model and using backpropagation to determine how best to adjust the weights to reduce the loss.
Then similarly we feed validation data except without adjusting weights this time so that we can compare the loss achieved on the data on which the model has never seen or been trained.
Altogether we repeat this two process multiple times as epochs. And we track each process to collect the statistics for training loss and validation loss as the epochs complete.
Ideally both the training loss and validation loss should have a progressive drop that tends to converge. But if the training loss continues to drop and validation loss begins to rise or levels off then the model is clearly overfitting to the training data. It won’t generalize well for new data.
Ways to minimize or mitigate the risk of overfitting during CNN training
- Randomly drop some of the feature maps generated in the feature extraction layers of model. It creates some random variability in training data to mitigate overfitting.
- Data augmentation – transforming images in each batch of training data like rotating, flipping etc. This also helps to increase the quantity of the original batches of training data. For eg: a batch of 1k dog images can be flipped horizontally to generate another set of 1k dog images.
Combining both ways can help reduce overfitting more efficiently. Data augmentation is always and only performed on training data. Validation data is used as is to check the performance of the model.
It is easier to learn a new skill if you already have expertise in a similar transferable skills.
In a CNN that is trained to extract features from images and identify classes, we can apply a technique like Transfer learning to create a new classifier that builds upon the trained knowledge by the previous model.
Here the pretrained feature extraction weights that have already learned to extract edges and corner in feature extraction layer are not changed. They are retained for the use in new model. However we replace the fully connected classifier with a new layer that maps the features to classes that we want to identify in input images.
Then train the model by feeding data into it. Only weights and fully connected layer are adjusted. At last the validation data as usual are used to check the performance of the model.