Basic Computer Vision with ML (ML Zero to Hero, part 2)

Basic Computer Vision with ML (ML Zero to Hero, part 2)


♪ (music) ♪ Hi, everyone, and welcome to episode 2
of TensorFlow Zero to Hero. In the last episode, you learned about machine learning
and how it works. You saw a simple example
of matching numbers to each other and how, using Python code, a computer could learn
through trial and error what the relationship
between the numbers was. In this episode, you’re going
to take it a little further by teaching a computer how to see and recognize
different objects. For example, look at these pictures. How many shoes do you see? You might say two, right? But how do you know they are shoes? Imagine if somebody
had never seen shoes before. How would you tell them that despite the great difference
between the high heel and the sports shoe, they’re still both shoes. Maybe they would think
if it’s red, it’s a shoe. Because all they’ve seen are these two,
and they’re both red. But, of course, it’s not that simple. But how doyouknow
that these two are shoes? Because, in your life,
you’ve seen lots of shoes, and you’ve learned to understand
what makes a shoe a shoe. So it follows logically
that if we show a computer lots of shoes, it will be able to recognize
what a shoe is. And that’s where the dataset
called Fashion MNIST is useful. It has 70,000 images
in 10 different categories. So there’s 7,000 examples
of each category, including shoes. Hopefully, seeing 7,000 shoes is enough for a computer
to learn what a shoe looks like. The images in Fashion MNIST
are only 28×28 pixels. So they’re pretty small. And the less data used, the faster it is
for a computer to process it. That being said, they still lead
to recognizable items of clothing. In this case, you can still see
that it’s a shoe. In the next few minutes, I’ll show you the code that will teach you
how to train a computer to recognize items of clothing
based on this training data. The type of code you write is almost identical to what you did
in the last video. That’s part of the power of TensorFlow that allows you to design neural networks
for a variety of tasks with a consistent programming API. We’ll start by loading the data. The Fashion MNIST dataset
is built into TensorFlow, so it’s easy to load it
with code like this. The training images
is a set of 60,000 images, like our ankle boot here. The other 10,000 are a test set
that we can use to check to see how well our neural network performs. We’ll see them later. The label is a number indicating
the class of that type of clothing. So, in this case, the number 09
indicates an ankle boot. Why do you think it would be a number
and not just the text, “ankle boot”? There’s two main reasons:
first, computers deal better with numbers; but perhaps more importantly,
there’s the issue with bias. If we label it as “ankle boot,” we’re already showing a bias
towards the English language. So by using a number, you can point to a text description
in any language as shown here. Can you guess all of the languages
that we used here? When looking at a neural network design, it’s always good to explore the input values
and the output values first. Here we can see that our neural network
is a little more complex than the one in the first episode. Our first layer has the input
of shape 28×28, which, if you remember,
was the size of our image. Our last layer is 10,
which, if you remember, is the number of different
items of clothing represented in our dataset. So our neural network
will kind of act like a filter, which takes in a 28×28 set of pixels
and outputs 1 of 10 values. So what about this number, 128?
What does that do? Well, think of it like this,
we’re going to have 128 functions, each one of which
has parameters inside of it. Let’s call these f0 through f127. What we want is that
when the pixels of the shoe get fed into them, one by one, that the combination
of all of these functions will output the correct value. In this case, 9. In order to do that,
the computer will need to figure out the parameters inside
of these functions to get that result. And it will then extend this to all of the other items
of clothing in the dataset. The logic is, once it has done this, then it should be able
to recognize items of clothing. So if you remember from the last video, there’s theoptimizerfunction
and thelossfunction. The neural network
will be initialized with random values. Thelossfunction will then measure
how good or how bad the results were, and then with theoptimizer, it will generate new parameters
for the functions to see if it can do better. You probably also wondered about these. And they’re calledactivationfunctions. The first one is on the layer
of 128 functions, and it’s calledrelu, or rectified linear unit. What it really does
is as simple as returning a value if it’s greater than zero. So if that function had zero
or less as output, it just gets filtered out. Andsoftmaxhas the effect
of picking the biggest number in a set. The output layer in this neural network
has 10 items in it, representing the probability that we’re looking
at that specific item of clothing. So, in this case, it has
a high probability that it’s item 09, which is our ankle boot. So instead of searching through
to find the largest, whatsoftmaxdoes is it sets it to 1
and the rest is 0, so all we have to do is find the 1. Training is then very simple: we fit the training images
to the training labels. This time, we’ll try it for just 5epochs. Remember earlier we had 10,000 images
and labels that we didn’t train with? These are images that the model
hasn’t previously seen, so we can use them to test
how well our model performs. We can do that test by passing them
to the evaluate method, like this. And then, finally, we can get
predictions back for new images by callingmodel.predictlike this. And that’s all it takes
to teach a computer how to see and recognize images. You can try this out for yourself in the notebook that I’ve linked
in the description below. Having gone through this,
you’ve probably seen one drawback and that’s the fact that the images
are always 28×28 grayscale with the item of clothing centered. So what if it’s just a normal photograph,
and we want to recognize its contents, and you don’t have the luxury
of it being the only thing in the picture as well as being centered? That’s where the process
of spotting features becomes useful and the tool of convolutional
neural networks is your friend. You’ll learn all about that
in the next video, so don’t forget to hit
thatsubscribebutton, and I’ll see you there. ♪ (music) ♪

25 thoughts to “Basic Computer Vision with ML (ML Zero to Hero, part 2)”

  1. Very excited and appreciative of this series. Doing a great job of simplifying a complex topic. Thanks again and can't wait for the next episode!

  2. Great video.
    Last number of Table showing softmax at minute 5:54 should be 0.978 and not 9.78. These are probabilities and should sum to 1

Leave a Reply

Your email address will not be published. Required fields are marked *