Object Detection

Ability to detect multiple classes of objects and identify them within a best possible concise bounding boxes in an image or video or live camera feed. There are various algorithms to achieve object detection like R-CNN, Fast R-CNN, Faster R-CNN, SSD and YOLO.

For eg: let’s classify vehicles and persons in this image. Generally the classification and location of multiple objects is known as Object detection.

( Image Classification + Image Localization ) = object detection

Fundamentally the solution is to generate outputs that indicate probabilities of the detected object to be a particular class and location of bounding box containing that object.

Now here we use a convention to denote target locations of detected object based on the presumed top left coordinate (0,0) in the image and locations as X, Y coordinate for center of bounding box.

To clarify, in the image:

first array of values [0.10, 0.85, 0.05] represent the generated probabilities of the detected object classes in range (0-1) totalling 1. So the detected object was classified as a “car” by 10%, “bus” by 85% and “person” by 5%. Hence the maximum probability of 85% being “bus” was selected.

Similarly, the second array of values [0.3, 0.5, 0.5, 0.25] represent the width and height of bounding box relative to the size of the overall image. So the center of the “bus” is around 0.3 from the left of image, 0.5 from top, 0.5 as wide as image and 0.25 as tall as the image.

Now to understand the process of object detection, we need to understand how to locate and classify multiple objects in an image. Out of various approaches, we take a look at some of the standard approaches.

Applying a classification model to multiple regions of the same image. And this could be achieved by using the Sliding Windows approach. Here we create usually a square shaped window and use it place it in a specific region of the image; mostly starting at the top left part like in the figure below.

Then feed this into a convolutional neural network to classify this patch of the image. And we slide the window along to the neighbouring regions and continue the process to classify all of the window size regions in the image. And we can repeat the same process with different sized windows or on the rescaled versions of the image size so that we can detect objects of varying sizes. Now the only problem with this is that it is computationally impractical for a convolutional neural network to apply classifiers for every position of the window.

Bounding boxes
to be continued…

Last modified: January 23, 2020


SEO Affiliate Program 

Awesome post! Keep up the great work! 🙂

I am genuinely thankful to the holder of this website who has shared this impressive paragraph at at this place.

Lisa Henderson 

I’m not sure where you are getting your info, but good topic.
I needs to spend some time learning more or understanding more.
Thanks for great information I was looking for this info for my


If some one needs expert view concerning blogging and site-building after that i propose him/her to pay a quick visit this webpage, Keep up the nice

An impressive share! I have just forwarded this onto a co-worker who was conducting a little research on this.
And he in fact bought me breakfast simply because I found it
for him… lol. So allow me to reword this…. Thanks for the
meal!! But yeah, thanks for spending time to discuss this subject
here on your web page.


Heya i am for the first time here. I found this board and I find It really
useful & it helped me out much. I hope to give something back and aid others like you helped me.


There’s definately a great deal to learn about this subject.
I like all the points you’ve made.

Write a Reply or Comment

Your email address will not be published.