Internet - Understanding Technology - by CS50 at Harvard

Internet – Understanding Technology – by CS50 at Harvard



[MUSIC PLAYING] DAVID J. MALAN: The internet. Odds are, you use this
every day, and odds are you have internet
connectivity at home these days, or at work, or at school. But how does it all work? How is it that you can
use your phone wirelessly, how is it that you can use
your laptop, and your desktop, and so many other devices
all, somehow, on a network. Well let's consider what you yourself
might have at home, or in your office, or at school, and let's assume
for the sake of discussion that it's a home network. So over here is, of course, your
home, and inside of that door some number of devices that
actually get you on the internet. But what are those devices? Well odds are, inside of
your home for instance, you have a device that might be
called a cable modem, or a DSL modem, or a FiOS device these
days, and that device is something you generally pay
some number of dollars per month for because you're paying for an
ISP, an internet service provider. So that device is somehow
connected to the internet, which for now for our purposes
right now we'll just draws a cloud, and that there is the internet. And that device comes from an internet
service provider like Verizon, or Comcast, or any number
of other providers, and somehow they themselves
are on the internet. But how do we now get the rest
of your home on the internet if all you have is just this one device? Well depending on how
this device functions, it might just be all that you need. And wirelessly, somehow now,
your phone and your laptop, and all of your other devices just work. Or, maybe you need a
second device, that we might call home router, that somehow
connected to that cable modem, or FiOS device, or the like,
that in turn makes network connectivity possible in your own home. And maybe this little home
router does a little bit more, and maybe it's got a
couple of antennas that actually provide the Wi-Fi service. Meanwhile, maybe it also has some
jacks or some physical ports in back into it you can plug
cables so that if you have a wired device
like a DVR, or an XBox, or something else that's
not necessarily wireless, you have some place to plug
those devices into as well. But this is so high level, and
this sense this is so poorly drawn. What is actually going on
underneath the hood, so to speak, and how is it that bits, zeros and ones,
can transmit themselves from my house to everywhere else in
the world and back. Well, let's take a closer look. Every computer on the
internet, it turns out, has something that looks like
this, so-called IP address, or internet protocol
address, which really is just a number dot another number
dot another number dot another number. So four numbers separated by
dots, and each of those numbers is a value between zero and 255. So there's 256 total possibilities
for each of those values. Now it turns out there's other
types of IP addresses today that are actually much bigger than
this, but more on that in a bit. So these IP addresses, much
like our postal addresses, uniquely identify
computers on the internet. So if you have a laptop,
if you have a desktop, if you have a mobile phone, if you have
an Xbox on the internet, that device, by definition of how the internet
works, has an IP address. It has a unique address that allows
other computers on the internet to talk to it, much like you might live at
123 Main Street in Anytown, USA, or the computer science building
down the road at 33 Oxford Street, Cambridge, Massachusetts 02138, USA. These very specific phrases
describe uniquely some building in the world, much like these
numbers just fine uniquely some computer in the world. But where does this number come from? If I open up my laptop, or turn on
my desktop, or take out my phone, how does any of those devices
know what IP address to use? Because, it might have
just been some time, but I don't remember ever having typed
in a value like that into my phone, so it's got to be coming
from somewhere else. But where? Well this is one of the things you get
from your Internet Service Provider, or ISP. You get an IP address. And back in the day, not
all that many years ago, there would actually be a
technician that would probably come to your house, or
your home, or business, and actually configure your computers
to use this numeric address. But these days, software
is a bit fancier. There's actually something called DHCP,
Dynamic Host Configuration Protocol, which is software that ISPs, Internet
Service Providers, run and really provide to you that allowed your
Mac, or your PC, or your iPhone, or Android device to say
upon turning on, hello world, I need a unique address. And that DHCP server responds to those
open ended questions with a specific IP address that the internet
service provider controls and has allocated specifically for your home. Well, that's all fine and good. But if my ISP is only providing
me with one such address, how is it that I can have
multiple devices at home on the internet at the same time? A whole family, indeed, could be
on the internet simultaneously, and yet if that means four separate,
or five, or more separate devices, gosh, that means that somehow each of
those devices needs its own IP address. So where do those come from? Well those two come from DHCP,
but not necessarily from your ISP, your Internet Service Provider. Those additional IP addresses come
from a device in your very home, that home router to which
I alluded earlier that's probably connected to your
cable modem or your FiOS device. It's this home router that might have
those little antennas that itself also supports DHCP. So when you turn on your laptop, turn
on your desktop, power up your Xbox, or take out your phone,
and those devices say, hello world, I need an IP address, odds
are it's this device within your home that's answering that question, but
it's providing other answers as well. It's not just giving
you an IP address, it's also telling you how to communicate, it
turns out, with the rest of the world. Because indeed, when I type
an address into a browser, it's not numeric last time I checked. Indeed the last time I typed
something into a browser was not something dot something
dot something dot something, it was like Facebook.com,
or Twitter.com, or Gmail.com or any number of other domain names. Because indeed, recall that most
any web sites certainly these days that you'd visit has a domain name. It's something.com, or
something.edu, or something dot any number of other
Top Level Domains, or TLDs. So we humans are much better at
remembering, I would think words, and/or phrases like
dot come and dot edu, then we are arbitrary numeric
addresses, like 1.2.3.4, or 5.6.7.8, or completely arbitrary
numbers that aren't even so simple to remember as those. So how is it that when I type
in Facebook.com or Google.com, my computer knows how to find that
computer in the world, if in the world there are computers with
just these IP addresses? Well, it turns out
that computers not only have IP addresses that they get
from DHCP servers, they also have what are called DNS servers. And indeed, DHCP provides us with
access to exactly that as well. So in addition to having a
DHCP server somewhere out there in the world from your ISP or maybe even
your home, you also have DNS servers. And DNS servers or Domain Name System
servers, and their sole purpose in life really is to convert domain names
like Facebook.com and Gmail.com to corresponding IP addresses. And these DNS servers, therefore,
can help our computers talk to computers that, by
definition, have IP addresses but that we humans would never
know if someone didn't tell us. So there's already so many
acronyms piling up here. Just to recap, every
computer has an IP address. That IP address typically
comes from a special server called the DHCP server, that lives
within your ISP, Internet Service Provider, whoever that is, or
maybe even within your own home, more on that in a bit. And meanwhile, there's also
DNS servers in the world, also controlled by your ISP, that
convert domain names to IP addresses so that when you actually try to
go to Facebook.com your computer, Mac, PC, iPhone, Android, whatever,
knows what the actual IP address is. So why is that? Why does that matter? Well turns out that the way computers
intercommunicate on the internet is by sending packets to one
another, or virtual envelopes or much like you might, or once much
like you might have in the past sent someone a physical letter, a
handwritten letter inside an envelope with an address on the front
and probably even a stamp, so can computers communicate
in very much the same way, but it's all digital. It's all zeros and ones. So what do these envelopes look like? What are these packets look like? Well, why don't we go
ahead and construct something a little more physical? All right so I really
like cats, and I want to find myself a cat on the internet. And so, I'm going to send a request
to someone, a server, in fact. Maybe someone like Google,
and I'm going to say literally get me a cat dot jpeg, where jpeg
is a common file format for cats, so this is the message I
want to send to some server. Of course it doesn't have
any information on it, so who is actually going to feel this? Well I also have to go ahead
and put it in an envelope, so I might go ahead and do this, put
this message here in an envelope. Just a moment, I'll make the
envelope and message disappear. No, we'll now go ahead and address
the envelope to the destination. So as a destination on
the internet, this server is going to have its own IP
address, and little old me as a computer on the internet
laptop, desktop, phone, or whatnot, I too am going to have an IP address. And so, what I'm going
to go ahead and do is put my IP address in
the top left corner– doesn't really matters
since this is imaginary– and my IP address shall be 1.2.3.4
just for the sake of discussion. The server, meanwhile, I don't know
what the IP address of the server is. I know my own IP address because
that came from my ISP's DHCP server, but the other server's address, unless
they really know Google's IP address, I wouldn't know it myself. So I'm going to have to rely on DNS. So I, as a computer, would actually
send a request to my ISP's DNS server, saying, hey, DNS server, what
is the IP address of Google.com. Hopefully, my ISP knows, and a response
will come back, and maybe it's 5.6.7.8, and so I'm going to go
ahead and write 5.6.7.8. And frankly, if my ISP doesn't know–
which is unlikely these days just given how popular Google is, but smaller
web sites might not be as well known to an ISP– well, my ISP is going to be
configured by the owners of the ISP to know about some other
DNS server in the world. And so, they will simply escalate
it to another DNS server, and maybe that DNS server will
escalate it to someone else. And thankfully, by nature of how
the domain name system works, there's going to be some number of
root servers, special servers, that in the worst case, at
least know, who else knows, what the IPs are of all of the
dot coms, or all of dot edus, or all of the something
other top level domain. So there's this recursive system,
this tiered system of questions, that can be asked for that finally
someone knows, and then my own Mac, or PC, or phone, can remember it. So this message is going to go
2.5.6.7.8, which I'm presuming is the IP address of Google.com as per
the response from my ISP's DNS server, and it's going to be from little
old me at IP address 1.2.3.4. So I'm going to go ahead
and seal this, all right, and I'm going to hand
it off on the internet. Now where does it go? More on that in just a moment. But some number of seconds, or
hopefully some number of milliseconds later, I'm going to get back
a response, and indeed I'm going to get back, of course,
a cat, this here happy cat. But it's not going to
be as simple as just being handed a cat off the internet. This cat too, meanwhile, is going
to be in one or more envelopes. That is to say Google's own server is
going to put this cat into an envelope. But maybe, Google when trying to do
that, oh, maybe it doesn't quite fit. And frankly, maybe this image
is so big that it would just be rude to other customers to
cram this whole big image of a cat into just one envelope, thereby blocking
other customers' data from potentially getting to them as quickly. And so, what Google might actually
do, and this is very common, is divide the cat into fragments. So hang in there little guy. But we might chop up this larger image
into four or so smaller fragments, so that now these are much
more reasonably sized, and what Google can do is put
one of these into one envelope, can put another of these into
another envelope, and then of course if there's four fragments in total, we
can put like a third in this envelope, and then we can go ahead and put the
fourth in a fourth and final envelope. Now of course, I'm going to have
to write some information on each of these four envelopes. So what goes on the outside here? Well previously, my IP address was
1.2.3.4, and Google's was 5.6.7.8. If they're responding to my
original request with this response, those numbers are going to have to be
reversed so that this packet is going to be coming from Google
at 5.6.7.8, and it's going to be going to
me, which is 1.2.3.4. And they're going to go ahead
and put that same information on every one of these envelopes. But that's not quite
enough, it turns out. It's not quite enough for them to just
put my address on these envelopes, because there's four of these packets. And so, you know what, they're going
to have to provide another clue. They're going to have to tell
me how many total packets there are in the response. So I'm going to put one of four,
and this one will be two of four, this one will be three of four, and
this of course will be four of four. So what Google has put on
each of their envelopes now looks a little something like this. To 1.2.3.4, which is me, from them,
it's 5.6.7.8, and per this mark down here, this is packet
number one of four. So this is to say that
IP goes beyond addresses. IP, Internet Protocol, is
really a set of conventions. It's a set of rules that computers
and servers are supposed to follow, so that when they enter communicate,
one knows what to expect from the other, and the other knows how
to respond to the first. And so, this support for fragmentation
is also part of this feature of IP. Now what is the benefit of this? Well this way, if I now
get as little old me off the internet, packet two of four– it's a little strange that it's out
of order– packet three of four, and packet four of four, but I
don't seem to have actually received packet one of four. I can logically infer from the packets
I did get which of them I'm missing. But IP, Internet Protocol,
alone says nothing about what I should do as a
computer in that situation. So it turns out that computers actually
use not just IP, Internet Protocol, but another protocol,
another standard called TCP. And in fact, these are
so commonly used together that you might have heard or read at
some point of something called TCP/IP, or TCP slash IP, which is just
Transmission Control Protocol slash Internet Protocol, which just refers to
the combination of these two protocols in order to transmit
data on the internet. Now among the roles that IP plays is to
support addressing, and fragmentation, and a bunch of other things too. And among the roles that TCP
plays is to ensure that packets can get to their destination. And in fact, TCP support something
called sequence numbers in addition to any fragment identifiers
that also allows to ensure that data gets to
its intended destination. And so upon receiving just
three of these packets, clearly missing fourth, what I, a
computer, can do is say, hey Google, I need you to send one or more packets
because I know I'm missing them, because they haven't been
properly acknowledged. And so, oh, thankfully, Google has
retransmitted to me this packet. And so now, I have all four,
and I can, of course, on my end, reassemble albeit with
some virtual tape, the cat in its final form, which
is going to look like– if all of the packets indeed
came through the cat in question. And because of course, these are all
just bits, all just zeros and ones. They can certainly be
stitched back together, so that we never actually know
that the splitting happened. So, turns out TCP does
something a little more. Because what if my
original request to Google went to a server that
does multiple things? Like Google is obviously a website. They have search results, they
have email, they have calendars, and so much more. But they also have email servers, right? Gmail itself, not to mention
their own employees' e-mails. And they probably have chat servers,
or video conferencing servers, like Google Hangouts and the like. So when I originally sent
a packet to Google.com, it probably needed a little
more information than I gave it. It probably wasn't sufficient
for that original message for me, get cat.jpeg, to contain only Google's
IP address, which again was 5.6.7.8, and my own from address,
which was again 1.2.3.4. I, just for thoroughness, could
on this envelope say one of one, because it's a pretty small
request to just say get cat.jpeg, but I probably need a bit more
information to make clear to Google that this is a request for a web
page, not a request for an email, or not a chat message, or not
certainly a video stream from me. And so I'm going to actually
append one piece of information. I'm going to put literally a
colon after Google's IP address, and I'm going to go ahead
and say 80, the number 80. So it turns out that per TCP,
the world has standardized on certain numbers that
represent different services, that servers might provide. 80 means HTTP, Hypertext
Transfer Protocol, and that's just the language
that web servers speak, and it's the language that I've been
speaking inside of these envelopes. So that little message
I wrote a moment ago, get cat.jpeg, that was an HTTP message. And this cat that came back in
several parts, that together was an HTTP response. And so by clarifying on
the envelope, this message is meant specifically for port 80. That is the service, known as HTTP,
Google's physical servers know we should hand this packet and
any others to our web server, not to our email server, or chat
server, or video server, or the like. And it might not actually be 80. In fact, odds are these days
Google, like many websites, is using SSL, or HTTPS,
a secure connection, and that actually happens to
use a different number than 80, technically 443. You don't tend to see
either of these numbers because they're just assumed to be
the default in modern web browsers, but they are there underneath the hood. They are there on the virtual envelopes. Turns out there's other numbers too. E-mail tends to use 25, TCP
port 25 and a few others, FTP, File Transfer Protocol,
and many other protocols all have their own numeric port identifiers,
and indeed that's all this number is. Whether it's 80, 443, or something
else, it's a so-called port number. So this then is a more
representative picture of what it is that's going across
the internet and coming back to me. This is more of the
information, though not all of it that's going back and forth
across the wires, or wirelessly. So these things, protocols,
IP is Internet Protocol, TCP is Transmission Control Protocol. What is a protocol? Well again, it's just kind of a
set of standards, a set of rules. And in fact, we humans have protocols. And some of them, if you stop to
think about it, are a little silly. Like in a lot of cultures, when you meet
some other human for the first time, you do something kind of weird
and you extend your a hand to shake that person's
hand, and then you just do this down thing for like a second
or two, sometimes longer awkwardly, and that somehow
completes the transaction. Well that's actually what's
going on with computers. When I send that message
originally, get cat.jpeg, Google according to the HTTP protocol,
Hypertext Transfer Protocol, it's going to read that
message, and realize, oh this user wants a picture of a cat,
let's search for that file, and let's actually return cat.jpeg. And I'm simplifying the
format of the message because when you're actually
searching for results, the message actually looks a
little more complicated than that. But we're assuming we're just getting
a very specific cat from the server. And according to HTTP,
Google's web server, because it supports that
protocol, it speaks that protocol, it speaks human just like
I and my colleagues do, it knows to respond with one
or more envelopes of its own containing that cat. But there's even more
protocols than this. There's a UDP, which you don't use
quite as often, but actually has value. And the biggest difference
between UDP and TCP is that UDP does not guarantee
delivery, and we're guaranteed delivery so long as the internet
is actually up and running between you and some endpoint. Why does TCP then guarantee delivery? Well, it knows how to respond
packets as needed, UDP by definition does not do that. That is just not a feature you get. You can still use it with
IP to get data somewhere, but it's not necessarily going
to come back what you request. So why would you ever want to send
a request, and maybe or maybe not get a response? Well sometimes, this is useful. Like if– video conferencing– if you've
ever used FaceTime, or Google Hangouts, or Skype, you sometimes
see things buffering. But if while you're trying to talk
to some other human in real time so to speak, if the video kept
buffering, and kept buffering, and kept buffering, and prevented
you from seeing that person, or hearing them in real time,
frankly it would get pretty annoying pretty quickly and you
just take to your phone or take a phone off the
wall, an old landline, and make a call which is much more
synchronous, much more real time. But movies of course do this. If you're watching Apple TV, or
Netflix, or iTunes, or something, those videos do tend to buffer
because you don't really want to miss a few
seconds of, or a minute, of a movie or some climactic ending. But in real time when
talking to another human, it's not really ideal to just delay
the conversation while someone else is there on the other end of the line. And because there's so many
packets going back and forth for things like video
conferencing, you know what, if you drop a few, literally,
like if some of those packets just kind of get lost, don't worry
about it I will infer from context, I'll infer from the conversation
I'm having what it is I missed and we'll just forge ahead. Or you know what, I'm just
going to say hey, hey, buddy, what is it you said, can you repeat
that, and he or she can simply oblige. So sometimes, when you want the
data they keep coming, and keep coming, especially
when it's high volume, you don't want to stop
and resend data, you want to just ignore it and trust that
the users are going to be OK with that. And for live video
conferencing that might make sense, for live
sporting events that might make sense so that you're not
drifting behind the rest of the world. So some applications that
actually does make good sense. But where do these packets keep
going as they leave my hand, and where are they coming from
when they land in my hand? Well there's a whole internet out there
that uses TCP or UDP, and uses IP, but there's a lot of devices between
me and Google, me and anyone else in the world, that somehow routes
that data left, right, top, bottom. So how does all that work? So we know then that my
computer has an IP address, and we know that it's of this format. And this format, again, is just a
number dot a number, dot a number, dot a number, and each of those
numbers is between zero and 255. And we dive in a little
deeper, if you remember your binary, that actually means
that each of those numbers is 8 bits. So that's eight, plus eight,
plus eight, plus eight. So that's 32 bits, and– hang in there– that means there's two to the 32. That's four billion
possible IP addresses. But I mentioned a bit
ago that there's also a longer formed format because
the world, it turns out, is running out of IP addresses. Even though there's as many
as four billion possible, there are so many phones, and
people, and laptops, and servers, and an internet of things, IoT
devices these days, all of which need an IP address that frankly,
we've been running out for some time. And so instead of using this format
moving forward, IP Version 4, or v4, the world is gradually
starting to use IPv6, which actually uses 128-bit addresses
which are much, much larger. If you were to actually multiply
this out, if you have two to the 32, that's roughly four billion
possible IP addresses. But if you use not a 32-bit IP
address, but a 128-bit IP address, it doesn't sound like that much bigger
of a number, but this is exponents, not just multiplication. And so, that is how many IP addresses. I can't even pronounce that but the
world is now going to have access to. So with that said, where can you
see this kind of information? Well turns out that if you
have a Mac for instance, you could to go to System
Preferences and then Network, and then poke around, hopefully
without changing anything, and you'll see something like this,
that you'll see a mention of IPv4, and you'll actually see a mention
of this protocol using DHCP unless for some reason it's been
statically hardcoded or configured by perhaps someone else. And you'll see that at the
moment the screenshot suggest that I'm connected with
IP address 10.0.1.34 and actually, as it turns out,
there's a lot of IP addresses that are actually private. And so, if you have an address
that starts with 10 dot something, or an address that starts with 192.168
dot something, or 172.16 dot something, turns out your computer is
using a private IP address that most likely came from a home router,
or a business router, or maybe even your ISP, but it's private in the sense
that only with special configuration can someone talk to your computer. And this is OK, because generally
our phones and our Xboxes, and our laptops, and
desktops in our homes, and generally in our businesses, and
schools themselves are not servers. People are not trying to
contact us directly per se, we are trying to contact them. And even when someone
sends you an email, it doesn't go to your own
laptop or desktop per se, it generally goes to a server like
Gmail, or Outlook, or the like, and your phone or laptop or
desktop connects to that server in order to get the information. If now on Mac OS, you happened
to click on Advanced here, you'll see some additional
settings and you'll see that my IP address is
again 10.0.1.34 in this case, you'll see a subnet mask which
is used to decide whether or not some other computer is on
the same network as you, and then most importantly, you'll
see router, sometimes called gateway. And in this case, it seems
that my gateway has an address, or my router has an address of 10.0.1.1. So that too of course is an IP address. And a router, as the name
suggests, is responsible for doing this kind of thing, routing
data in some direction. And if you run Windows, here's
what a similar screen might look like on that operating system which
shows, of course, your IPv4 address, and in this case, multiple
addresses for DNS servers. Router's purpose in
life is to be computers on the internet that have bunches
of wires usually coming into them and going out of them,
and they have essentially kind of a table, like a big
list, like an Excel spreadsheet, inside of themselves like inside
the RAM, Random Access Memory. And that table, generally has at
least like two columns, conceptually. One of which has an IP
address or a prefix, the first few numbers of an IP
address, and then some explanation of where data should be
routed to if it's destined for that IP address or that prefix. So maybe if an IP
address starts with one, it should go that way out that cable. Or if it starts with two, it
should go that way instead. Routers' purpose in life is to
route data in some direction to some next hop, or that is
to say to some next router. And so this means that this Mac
here with IP address 10.0.1.34 is preconfigured by DHCP– which again, came from my ISP, or
from my university, or company– is going to go to either local
computers on the internet if I happen to be talking
to another Mac or PC maybe to transfer file just
a few feet or somewhere else on campus or in the office. But if it's destined for somewhere
in the outside world like Google.com, well that's where the router comes
in, because routers purpose in life is to get data toward
another destination. And my little old laptop frankly doesn't
know where in the world Google.com is, but maybe this router does because
that's its purpose in life. And frankly, if that router
doesn't know, no big deal. There's other routers in the world. And so long as that router can
route data to some other server, well then hopefully that other router
can get data closer to its destination. And hopefully indeed, within some
number of hops, some number of steps, transmissions of packets from
one router to another to another, the data will reach its destination. And frankly, generally speaking,
data will reach its destination within 30 or fewer such hops. There will be 30 or fewer routers
between me and some destination because humans and software have
gotten really good at configuring the internet dynamically, so that
data can route across continents, across countries, across oceans even, in
order to get from one place to another. So if this then is little
old me on my laptop here, and I want to talk to Google.com which
of course is a big company over here, inside of whose door is a whole
bunch of servers, well between us is the internet, and somehow
we're both connected, and somehow or other data is going
across the internet from me to Google. And that's because
inside of this internet, there's a whole bunch of routers
which I'll draw here as dots, and each of these routers is controlled
by other big internet service providers, big companies,
maybe even big universities, and they all have agreed
to connect their routers. That's indeed what the internet is. It's a network of networks. So it's a network of Harvard and
MIT's network, and UC Berkeley, and Stanford's, and
Comcast, and Verizon, and all of these very big entities
have connections among themselves, and each of them have
some number of routers. And what happens ultimately,
is that these routers are interconnected with cables,
or some kind of satellite connectivity, or radio
waves, or the like, and notice too there's very often
multiple ways to go from one location to another, and indeed
there might be multiple ways to reach your destination,
depending on which path you take. And this is a feature. The internet of course, has its
origins in US military design, and among the goals was to have
some resilience against downtime. If one or more cities or one or more
routers went down for whatever reason, that one of the design
principles of the internet was to be able to route
around that issue. And so it stands to reason
that it's a good thing if data can flow from one point to another, but
following different intermediate stops. Which is to say, when Google sent
that cat over the internet back to me, that cats four parts might have
gone in four different directions, but somehow all made their way
back to me because the routers know how to get data to me again– based on that envelope,
based on that IP address– but they might take
different paths just because. Now what does that mean? Well sometimes, the internet gets busy. Routers get busy, they get
overloaded with lots of packets, and so sometimes routers have
to say go this way instead. Or sometimes, packets– some things are
just so busy that the router just gets overwhelmed and it has to literally,
but slowly, drop packets on the floor so to speak, deleting the packets
without ever delivering them, at which point, hopefully,
if the users are using TCP, their computers will retransmit that
data so it's not actually a problem. And all of this is happening
so quickly, that you never really notice some of these delays
or some of these reroutings, and so here might be
several paths that data takes to get from me to Google.com
and maybe a different path back, and each of these represents
a hop, and each of these takes some amount of time. So how much time does it take for
data to go across the internet? Well let's actually take a look. I'm going to go ahead here and run
a program that is called traceroute. And this is going to, per
its name, actually allow me to trace the route between
me and some other computer. To do this, I'm going to type traceroute
into the special window here on my Mac, and I'm going to do traceroute of– well
let's try it– www.google.com Enter, and I'm going to see some
interesting information here. Seems to be a little slow at the
moment and that's interesting, it seems stars probably
don't mean good things. So let's scroll up here
and see what's going on. I'm tracing the route to www.google.com,
and it turns out parenthetically, that is in fact Google's IP address
at least at this moment in time here on campus, 4.53.56.109. So it's not, as it turns out, 5.6.7.8. It's that instead. And each of these rows of output–
one, two, three, four, five, six– represent a router
between me and Google.com. So what traceroute does is it sends
a message to the first router, then a message essentially to the
second router, then the third router, then the fourth router, and it
asks it, one, for its IP address– or it figures it out– or its name. In fact, notice that
some of these routers seem to have somewhat cryptic,
but English-like names, and it also tells me, traceroute, how
many milliseconds it took for the data to get from me to that destination. Look how fast this is. I don't know exactly where all these
routers are, but all of these numbers are super small. 3 milliseconds just to get from one
point, my computer, to another router. Now, you can infer
what some of these are. I don't know where these IP addresses
are, but odds are they're on campus. Odds are rows one and two, both of
whose IP addresses start with 10, are somewhere on campus,
routers on campus. Step three, I'm very confident
that it is one of Harvard's routers because it's called Core GW, which
I just know by convention means Core Gateway or Core router, and it belongs
to the faculty of Arts and Sciences on Harvard's network. Then there's another one also
called Core Gateway, which is probably somewhere
slightly different on campus, maybe not the faculty
of Arts and Sciences, but in the core Harvard network. And then it gets a
little more interesting. Then apparently gets handed off
to a bear on rows five and six, or two routers whose names have the
word bear in it for some reason, but odds are they're indeed in Boston– which is not too far here from Harvard– on level three's network which is a
very big common ISP, Internet Service Provider. Level three. Now thereafter, for whatever reason,
the routers between me and Google are not responding to this inquiry. And that's fine. They might just be configured
to ignore this type of request, but it's not all that enlightening. I just know that it's taking more
steps to actually reach Google.com because their servers are
beyond that sixth router. So let's try another destination. When in doubt let's just try again, and
let's try someone like our friends at, maybe UC Berkeley who maybe
are a little looser when it comes to sharing information. And let me go ahead and hit
Enter now, and wow, just flew by. 19 steps later, notice what's happened. Looks like two of
Harvard's nameless routers up top, then that ACore router– this
one's a little different– northeast gateway, so it actually took a
different route this time off campus. Then this border gateway, BDR, probably
meaning border also in harvard.edu. Row five is some nameless router
somewhere else, not sure where. Row six is something in
northerncrossroads.org. Nox.org. This is a very big peering point
were lots of ISPs interconnect. And then we're going to have
to take some guesses here. Then we have SDN, SW. I don't know where
this is, but internet2 is a network, a very high speed
network of a lot of universities. So that's great. It looks like our packet's got
on kind of the superhighway academically speaking, which is good
because it tends to be pretty fast. And now, I don't know
where all of these are. But I'm going to go out
on the limb and say, you know what, this router in row eight
is probably in Chicago just because of that abbreviation. The next one is as well,
rows 10 and 11, maybe if you're familiar
with US cities, Denver, probably there, Las Vegas,
these next two, Los Angeles here in row 14, 15, probably
[? LosLA ?] as well for LAX. For whatever reason,
system administrators have historically often named their
routers after airport codes like LAX. And then of course, we're
in California at that point, so it's not all that
far from UC Berkeley. Up north and indeed, it looks like
the official name of UC Berkeley's web server is CalWeb for
California Web Server. Farm, which means a
cluster of computers. Prod, which means production like
the official web servers in use. Then ist.berkeley.edu. Now it took me way longer to tell
this story than for the actual data to get from here to there. It only took 80
milliseconds for that data to get from Cambridge, Massachusetts
on the east coast of the US, to Berkeley, California, on
the west coast of the US, and that might take a human like five
hours, six hours at least to fly, not to mention waiting in the airport
and then getting your luggage. That can be an all-day affair, when
if I just want a cab from UC Berkeley, for instance, it's going
to take me 80 milliseconds to make that request it would seem,
with less than 1/10 of one second. And then the cab probably takes
about that much time to come back. And notice the variability, though. Sometimes, routers are a little
busier than at other times, and so there's variance between
all of these various measurements, and each of these, to be
clear, is not cumulative. So they might go up and down. It's how much time it takes to go from
my laptop to each of those routers. You don't just keep adding them,
you keep looking back at the origin. So about 80 milliseconds in total. Well let's try another one, one
that's a little closer to home here. Traceroute www.mit.edu,
which is also in Cambridge. Already done. Also in Cambridge,
Massachusetts, only eight hops away, eight routers
between us, and indeed we seem to be going through
Harvard's Core network again then we get connected
to Quest, another ISP, and this is actually
kind of interesting. It looks like my data is making
a little stop in New York City if these names are to be believed. And that's kind of wild, and yet it
doesn't even seem to go to mit.edu, but akamaitechnologies.com. So this is interesting,
and my inference here is that MIT has probably
outsourced parts of its website to a company called Akamai, which
ironically is themselves based in Cambridge itself,
but their servers seem to be in New York City
or thereabouts, and it seems that MIT is essentially using them
as some kind of CDN, Content Delivery Network, which is indeed Akamai's
business to host MIT'S website. So even though I think of MIT as being
walkable from this theater here just down the road, their servers
can certainly be somewhere else. And thanks to DNS, Domain Name
System, and thanks to these routers, nonetheless can my laptop reach
MIT'S web servers really wherever they are in the world. And in this case, they're
only six milliseconds away. So not necessarily as compelling when
I can still walk to MIT pretty quickly, but six milliseconds is certainly
faster than the six minutes it might take me to drive, or the
half hour it might take me to walk. And what about places even farther? What if I am interested in the news
somewhere abroad relative to here? I might do traceroute www.cnn.co.jp if
I wanted to trace the route between here and what I presume is CNN's
Japanese web server for news. And here, we again see the data
leaving Harvard's routers in steps one, and two, and three, and four. Seven didn't really answer, it seems. And then it got a little private,
didn't really respond thereafter. But notice something
interesting here is going on. Somewhere among these
first few steps, I'm going through internet2,
which is encouraging because that's a fast
connection typically, then a nameless router, step 7, can't
quite make sense of all of these. But maybe SEA is Seattle if those
airport codes are to be believed. And then, wow, notice this gap. We're starting at like
less than one millisecond, less than one millisecond, one
millisecond, 20 milliseconds, then 85, then 106, then like 193, from 213, 191. That's a big jump, and it doesn't
seem to just be a bit of variance. It doesn't look like just
the routers are busy, it seems to persist because
getting to each subsequent router takes about the same amount of time. Why is my connection so
much slower all of a sudden? Why is it taking so long
between steps nine and 10? Well, Seattle, where is that? That happens to be on the
far west coast of the US, and maybe Osaka, Japan
is right there across– what– Pacific Ocean? And so it would seem that
between steps nine and 10, maybe there's a really being body
of water between these two routers, and that explains why all of a
sudden there's so much of a delay. And indeed, the internet of
course spans the globe these days. It spans oceans, either through
big trans-Atlantic, trans-Pacific, trans-oceanic cables that are
laid down by really large ships, or maybe it's via satellite, or
microwave, or other technologies. The world is so
incredibly interconnected, but you can see visually how those
interconnections are laid out, and where they actually are. In fact, thanks to this animation,
we can see even more visually what the internet looks
like around the whole world. [MUSIC PLAYING] All right. Demonstration time. So within your home, or campus, or
office, we had a number of devices, and one of them was like a cable
modem, or DSL modem, or a FiOS device. So what does that device look like? Well if you have a cable modem, maybe
from a company like Comcast whose brand name is Xfinity, you might
have a device like this, and it usually stands up
on your counter like this. It's got some blinking lights
in front, and in the back are a whole bunch of connectors. Now what are these connectors? Well the biggest of them, and
frankly the oldest one of them, is this metal thing here
which is a coaxial connector, and this is what's long been used
for TV antennas and cable connections for your own TV into the wall. And the kind of cable that you
might use to plug into that generally is pretty thick, and it's
got a cylindrical end, and a little pin in the middle, and it's often kind of
annoying to screw the thing in there. But if you have a cable
modem, odds are you've got a jack that looks also like this
somewhere on one of your walls, maybe near your actual TV, and what you
really just need is a cable like this. One and it goes into the cable modem,
the other end goes into the wall, and that's a haul physically you
need in terms of a connection to the wall beyond, of course, the power
cable which would plug into down here. And those are going to
vary based on the model. But there's some interesting
ports up top here too. There's some phone jacks it
seems, because it turns out that a lot of internet service
providers these days, especially those who have digital's support for not
just internet services but also TV and phone, you can actually plug one
or two landline telephones in here and get telephone service. And then below that are four
jacks that look pretty similar, but they're actually a
bit wider, a bit fatter. And so these phone jacks, if you
never knew are called RJ11 connectors, and that is what, historically, you
would plug into the wall of your home or now the back of this device. And these other bigger
ones are RJ45 jacks into which you plug generally
the ethernet cables, which is the name given to network cables. So if back in the day, you had a phone
with one of these things on the wall, you would have one of these
RJ11 connectors, super small, and you'd plug that into the
phone and then into the wall, or the back of this device. Meanwhile though, you might
have a ethernet cable, which is a little wider. So whereas the phone connector
might look like this– yeah– ethernet connector
is going to look like that, and you can probably tell here just
how much bigger one is than the other. And so inside of those
cables are just a whole bunch of wires that actually allow the
electricity to flow, the electrons traveling across them copper wires
from this device into the wall. And from there, Comcast, or Time
Warner, or whoever your internet service provider takes care of the
technology there on out. But what you can plug into
this device via those cables– not the phone cables,
but the ethernet cables– is your desktop computer, your
Xbox, or some other devices that use wired internet. Or if your cable modem has, like
this one does, Wi-Fi support, wireless capabilities, and even though
there aren't antennas on this one, they're actually inside the
case, which frankly might partly explain why this thing is so darn big. There's absolutely no good reason that
these devices need to be this large, but this device happens to be not just
a cable modem, but also a home router inside of which is
support for DNS and DHCP. It also has Wi-Fi capabilities. So you don't actually need, with
this cable modem, a second device. You don't need your own
Wi-Fi device in the house. You can get all of that from your ISP. Now if you have FiOS, another technology
that's in some cities here and abroad, you might have a device
that looks pretty similar. This one, frankly, looks
a little more elegant, and it probably has very
similar jacks on the back. Some kind of coaxial connector that goes
into the wall, and from there, Verizon or whoever your provider is might take
it from there, Frontier in this case, and then you might again
have some RJ45 jacks that allow you to connect devices
in your home to this very device. But not all devices are this big. Here is another cable modem made
by a company called Netgear, and it's this small. So case in point, ridiculous,
not necessary, same technology, much smaller. Much smaller form factor, so the
hardware that's inside this device is obviously much smaller. But we still see the coaxial connector,
some kind of power connector there, just one jack for an ethernet
cable, but that's probably fine so long as you have another
device, a home router, or a switch to connect it to. Indeed, if you simply want to provide
your home with a bunch more wired jacks, you might use
something like this. So this is a Cisco Linksys this device. It's a pretty dumb device. It's just a switch that's got a
whole bunch of those RJ45 connectors. So you plug one of these into your
cable modem, or into your home router, and then you can plug up to seven
other devices into this device, thereby creating kind of a mesh
network among those many devices. And this switch simply
switches data, switches traffic among the several ports
based on who's talking to who. Or, you might have something a little
beefier that looks pretty darn amazing, I must say. Very geometric these days. This one also made by a company
called Linksys, owned by Cisco. This might have these
antennas on back, which suggests that this has Wi-Fi support. This device happens to be
a home router, and it also has firewalling capabilities,
Wi-Fi capabilities, and switching capabilities. Indeed, in back, it has not
just a connection for your home router, or rather your cable modem,
or your FiOS device to plug into, it also has a few, but not as
many, ethernet jacks, or RJ45 jacks for your several devices. So which devices you need entirely
depends on your own situation, and odds are the first person to ask
is your internet service provider. Increasingly these days are
internet service providers bringing you, or selling you,
or renting you a device that takes care of all of this. So odds are you just need these
days one device, and not several, but sometimes you might get something
lower profile like this one here, and maybe you'd buy it yourself
and plug it into the wall, and all your ISP does
is take it from there. They don't give you any
devices for your own home, so you might have to wire some
of this up together on your own. Now at the end of the day though,
it is all kind of pretty simple, whether your cable is this to
connect your various devices, or this, the coaxial
connector, or even this, which is a fiber optic cable which essentially
has little strands across which light travel even faster than
electrons across these copper wires. Inside of many of these cables, like
this one here, is just a bunch of wires and they're actually
pretty cheap devices. And in fact, I thought it'd be fun
to maybe get our hands dirty here with a cable that hopefully
I won't need any more, and see if we can't see
inside this here thing. So wouldn't necessarily
do this more than once because scissors aren't going
to work very well on this one. And actually, we can see what's
starting to happen before I even finish. Notice that as I pull back, the blue
part of the cable, which is really just a rubbery sheath, you can see that
there's eight different wires in there, two of which I've cut,
so hopefully those were the bomb diffusing wires I cut. If we just keep pulling, you can
see a lot of the wires inside. And these wires all
are different colors. Some of them are striped, some
of them are solid, some of them have been cut so they're
shorter than others, and so long as the right
colors on this end line up with the right colors on
this end, your two devices will be able to talk because some of
these wires are used for transmission, some of them are used for
receiving, some of them might not technically be used at all. They're really used for insulation and
cancellation of what might otherwise be interference. So inside of here is
pretty simple technology, and much like we've
seen in other contexts is there's just this layering, and
layering, and layering of complexity so that at the end of the day,
this is what's carrying your data, but there's just so much software and so
many interesting advanced ideas on top of it, all of which ultimately
make the internet work. Now how about some homework? So your homework for tonight,
perhaps, is when you go back home, whether it's your house,
or your dorm, or maybe your company if you're staying
late, find a device that looks a little something like this. Maybe it's your cable
modem, or your FiOS device, or maybe it's your home router, or
maybe it's someone else's home router, and turn it around carefully,
take a look at the various, connectors on the back see if you don't
recognize some of the various shapes, and some of the various labels, and
some of the, ultimately, technologies that we've been discussing here. If you really want to be brazen,
go ahead and hold your breath and unplug everything, and see if you
can, via a bit of pattern matching, plug everything back together. Of course in the process, you'll take
down your entire internet most likely, or your company's, or your
neighbor's, in fact very much, possibly your neighbor's as well. And that's OK if you're sort of
confident you can reassemble that. I mean, if you're really daring,
and you have an extra ethernet cable lying around, go to town
on one of these things here. You're not really going to be
able to put this back together without special hardware
and a spare little clip, but that would be the extreme form
of getting your hands dirty here with the internet.

14 thoughts to “Internet – Understanding Technology – by CS50 at Harvard”

  1. @26:00: Maybe it's worth mentioning that these private IP addresses are only unique within that private network (e.g. for multiple devices connected to the same router)?

Leave a Reply

Your email address will not be published. Required fields are marked *