Technology Stacks – Computer Science for Business Professionals – by CS50 at Harvard

Technology Stacks – Computer Science for Business Professionals – by CS50 at Harvard


SPEAKER: So much as you might
understand how the internet works, whether it’s HTTP that transports
data or HTML that is that data, and as much as you might understand
some of the fundamentals of programming like loops and conditions and Boolean
expressions, variables, and more, it turns out there are so many
different ways in which you can implement those ideas. And so, indeed, when it
comes time to actually build a website, a web application,
a mobile application, it turns out that it’s rather non
obvious where to begin some times if only because you have so
many options ahead of you. And much like the world of clothing and
the world of fashion, more generally, is constantly evolving such that what
is cool and appropriate to use now or to wear now might not necessarily be
appropriate some months or years hence, the same can be said for better or for
worse of the technology world in so far as humans are constantly innovating. Humans are constantly finding fault
or opportunities for improvement in languages that we’ve used for
years, in server software that we’ve used for years, and
intuitively improving on it. And so the reality is that staying
current with this whole world takes some effort even as the
fundamentals largely remain constant. And so what we’ll try to do here is give
you a sense of some of the languages, some of the frameworks,
some of the libraries, some of the overarching design decisions
that are both in vogue and both here to stay right now, as well as take the
lid off of some of these technologies and give you a better
understanding of how some of the fundamental types
of technologies from which you can choose actually work. So let’s consider for the moment the
so-called front end of an application. Front end generally refers to
that which is facing the user. So it’s the user interface and more
with which the human user typically interacts. Now, we’ve discussed, for
instance, the world of the web and how you might assemble a web-based
experience using HTML and CSS and, even more dynamically, using JavaScript. But that’s by using
those native languages right as that come out
of the box, so to speak. But it turns out that some
tasks are not as easily done in those various languages
as might be convenient. It turns out that there are
certain design patterns, so to speak, types of
code, types of markup, types of properties that people
have found themselves using again and again and again. And so much like you can factor out
into your own CSS files and JavaScript files, code that you want to
share across multiple files or even across multiple
projects so can– so has the world more generally
realized, you know what? Maybe I should package up my CSS
or my JavaScript in such a way that other people can
actually use it as well and thus have been born things
called libraries, collections of code that other people have written that we
can all use, often under an open source license, which means the code is
freely available for the world to critique, to use, to adapt some
times, and contribute back to. Now, within the world
of the front end, there are so many different
JavaScript frameworks. Indeed, depicted here just a few of
perhaps the most popular right now. But even this list is going to
grow stale over the coming months, certainly over the coming
years, and the like. And so really rather
than dive into the weeds of some of these
technologies in particular, we really aspire to
give you just a sense of what’s current, what should be
in your vocabulary perhaps now, and perhaps some context when it comes
to recruiting engineers or deciding among engineers which technologies
to build a business on, just how current you are, just how
dated you are, and the like. But invariably this kind of
thing requires some due diligence when the time comes to
design an actual project. Angular, Ember, Meteor,
React, View, these and more are the names for various
JavaScript frameworks. And a framework is not
just a library per se. A framework is also typically
a way of doing things. So a framework includes
some code that you should integrate into your own
projects, whether it’s CSS, JavaScript, or something more but it
also includes typically a way of doing things, a way
of naming your own files, a way of formatting your files, a way
of building ultimately your application. And reasonable people,
of course, will disagree, and so you’ll find among several of
these frameworks different design paradigms, different design beliefs,
the best way as to do things. And again different–
reasonable people will disagree, and so part of the process
of choosing these frameworks really boils down to what
resonates most with you or with the engineers
with whom you’re working. And indeed what resonates
above all else perhaps is what one is most familiar with. In fact, it’s often the case that you
or engineers you’re working with simply have done a previous project in one of
these frameworks but not the others. And so even if that framework is for
some definition of inferior inferior, that might not necessarily
be an overriding concern if you can actually build
your MVP or your prototype faster with that particular framework
because you know it already. Then if you could build
it a little bit better, quote unquote, in quotes and
so far as reasonable people can disagree as to what’s inferior
or superior in this world, then if you were to design
it using a completely new framework for which there’s just
a non-trivial learning curve for you. And so there’s, as in the
case of data structures, as in the case of algorithms, as in
the case of computer science more generally, there’s these tradeoffs,
and human time, developer time, learning time is certainly
one of the resources that you have to decide how much
of which you want to spend upfront. Meanwhile, in the world
of CSS, there are also libraries there, collections
of CSS files and frameworks really, methodologies for which you
lay– via which you lay out your site, like Bootstrap, Foundation,
Semantic UI, and more, and these focus more so on the
aesthetics of a user’s experience, more so on the
presentation of information and the types of user interface
mechanisms, the buttons, the menus, the windows, and the like that
a user might see on the screen. But here, too, there are
so many different wheels that have been invented in the past. So many different people
have decided, you know what? That default link on a web
page could look much prettier. Or that button on a web page could
look much better if you used my design. And so this is what’s happened. The world has created and shared with
others in the world various files that you, either in the context
of JavaScript or in CSS or beyond, can integrate into your own projects. So how to even begin to
vet these kinds of things, particularly since in
a class like this we won’t go into the weeds
of evaluating these and even then we might not
reach any sort of consensus. So the reality is typically
relying on the engineers with whom you’re working is first
and foremost the place to start. What do people know? What are they comfortable with? What did they like? What they dislike about some framework? Did it actually speed up the work? Did it slow down the work? Did it create– did it build up
technical debt for them so to speak? For instance, just because something
is easy and quick to get started with from the get go, is it so easy
because it’s riddled with poor design decisions such that as you
get more and more users, maybe your application or your
website’s going to be slower and slower? Or maybe it’s going to
become harder to maintain, or it’s going to be harder to
onboard new people altogether. There’s various trade offs there. And so considering what
is optimal now, what is optimal in the medium term,
what is optimal in the long term should perhaps be part of
that whole conversation. Meanwhile, it’s certainly a compelling
thing from professional development perspective, for keeping things fresh,
to actually go and learn something new. And so certainly punctuating
one’s experience in tech, should there be an opportunity
to both pick up some new skills, to familiarize oneself with the latest
and greatest and not necessarily change direction with each and
every fad but generally be familiar with some of
the trends in industry. And there’s a bunch of
ways with which to do that. I mean one, certainly relying on
Google and other search engines just to get a sense of what
the most popular hits are or search results when you search
for something like popular JavaScript framework or some such
search string like that. Looking on websites like Hacker News
from Y Combinator, where there’s an active community of folks
from the startup community, especially talking about these
kinds of technical decisions and design decisions more generally. Websites like Quora
or other Q&A websites. Looking at GitHub.com, a popular web
site where people store their code and can actually follow
or star other people’s repositories of code from which
you can infer a sense of popularity based on how many people are
following a framework x or y or z. But this is always a moving
target, and so it’s simply part of the conversation
to have from the get go. And you’re not necessarily
going to regret a decision if you don’t necessarily
pick the most trendy or the one that’s poised
to take over all others because this is a fast changing world. And, in fact, one of the most
frustrating if not expensive aspects of this world is
just how quickly it changes. And so what you design today might
not be what you design tomorrow, but that’s also part of the
excitement of this space. So with that said, that’s just a glance
at what the front end design process or decision process might be like. Let’s take a look now at the back end,
at least in the context of languages. So here you have an even longer list
because at least in the front end world, recall that the
de facto standard is to use JavaScript in the user
facing web browser experience, but on the back end on the servers from
which the HTML and the CSS and even the JavaScript are ultimately coming,
you have so many more design decisions. So you have languages like Go and Java,
JavaScript, .NET, PHP, Python Ruby, Scala, and so many others. These are perhaps just a few
of the most popular these days. And all of them have
their pluses and minuses. All of them have their
supporters and the detractors. And all of them have folks
who already know them or who might have to learn them among
engineers with whom you might work. Meanwhile, though, those languages
out of the box, so to speak, don’t necessarily make designing a
web-based application easy or as easy as it could be. They don’t necessarily make building a
mobile application as easy as it could be, or even if it is
relatively easy, humans have found over time that, gosh, every
time I build a mobile application, I’m like copying and pasting
dozens or hundreds of lines of code because they all share a
common framework or maybe a common meaning system or a
common set of functionality. And so in this world to have
libraries of reusable code built up and frameworks, libraries of code
and methodologies via which you’re building your applications, arisen. Among them Django, Flask, Laravel,
.NET, Node.js, Rails, and the like, .NET being up there, too, because it
generally refers to a set of languages you might use as well as the framework
that oversees those various languages. And there’s even more
options ahead of you here. So how do you begin to pick
among these options as well? Well, here, too, you’re often guided
by what your engineering team knows, perhaps what your own
system administrators or your operational people know,
so the folks who were actually maintaining the servers,
whether they’re locally on site, maybe they are the ones
running things in the cloud, whether that’s Amazon’s or Google’s or
Microsoft’s cloud or some other company still, depending on what that
cloud infrastructure supports, might influence your decision making
as to what language you might use. Sometimes the nature of
your application might influence the language you might use. For instance, some of these
languages make it a little bit easier to make real-time
applications, applications that support chat servers
or immediate interactivity, where there’s a constant connection or
the illusion of a constant connection between browser and server. PHP doesn’t really make
that all that easy. You can do it, but it wasn’t really
designed with that use case in mind. By contrast, JavaScript, via
framework called Node.js, makes it really easy to do,
and it was designed more so with that kind of use case in mind. And so here, too, you see hints of why
some of these languages and in terms frameworks have arisen because they are
actual solutions to concrete problems people have experienced in the past. And some of these languages
are newer than others, and so they might come
with more features so you don’t have to rely as
much on third-party libraries. Some of them less vetted or maybe less
robust or even less secure than what comes with the language itself. And recall, too, that these languages
are often constantly evolving, some more quickly than others,
but there are new versions of these languages coming out. And so even within the confines
of a given language like Java might there be new and improved features
every year, every couple of years. And so sometimes actually
picking a version of these languages or frameworks is
one of the design decisions to bear. And I would say from a non-technical
perspective, most compelling is just to be aware of these
kinds of technologies, these kinds of buzzwords du jour, but
also aware of these kinds of tradeoffs. It’s not necessary to
get into the weeds I think of understanding each and every
language and what it’s good for. Although once you have
a general understanding of this world, of programming,
of server side architecture, of HTTP, and web pages and the like,
can you via Google and other websites I think start to wrap your mind
around some of the tradeoffs and perhaps even start
to tease apart which are technically compelling arguments
that you might see online versus just religious objections to this
language or that because that’s often the case when folks get into
heated discussions of language choices for instance. But I think ultimately understanding
the tradeoffs, the onboarding time or the learning curve for various
languages, the appropriateness of language for certain
specific use cases like the real-time chat applications
or whatever your own product happens to be, what your engineers
already know what they’re good at, what they prefer to use what
language and framework is easiest for new hires, maybe six months hence or
two years and to actually come on board and understand so that you’re not
expecting the most experienced of new hires to constantly
be in your pipeline. So appreciating these kinds of tradeoffs
and asking these kinds of questions even among the engineers that are
perhaps making the decision ultimately is a valuable way to
contribute to the conversation and take some comfort in the
fact that your product need not be a complete black box. You might not necessarily be able to
implement it from scratch yourself, but you can at least
ask the right questions and be a sounding board for some
of the answers that come back. Now, there are some
fundamental differences and some of these architectural
decisions among which are around choice of database. Indeed, most any web application
today has a back end database inside of which is stored data from users,
whether its purchase orders or user registrations and passwords
and any amount of data that’s collected from users at the end
of the day is stored somewhere and that somewhere is called a database. But there’s different
types of databases. Two of the biggest categories these days
or perhaps SQL and the opposite NoSQL, as it’s playfully called, SQL being
structured query language and NoSQL referring to a class of databases
that doesn’t support SQL and indeed is not generally relational. And we’ll soon see what that means. But even with this world, do you
have a veritable menu of options MariaDB mySQL, Oracle,
PostgreSQL Server. And then within those– within that relational world
do you also have the cont– in addition to that
relational world, do you have the contrast of the
object-oriented or document store world, things like Bigtable,
Cassandra, HBase, MongoDB, and others. And already it can be sort of
overwhelming to feel like just as you’re getting up to speed on what
the web is and how web pages work, oh my god! We’re just beginning
to make our decisions. But generally these
decisions, too, can be guided by what your team knows,
what you’re comfortable with, what the price might
be for some of these. And some of these are
free and open source. Some of them have commercial
licenses associated with them. Some of them are supported easily
for you with your cloud provider, wherever you’re hosting your servers or
you might have to host them yourself. So you can begin to narrow
the field of options. And indeed, especially when
building multiple products, might you build on past
experience of yourself. So, for instance, for the course and
all of our web-based applications, we tend to use a lot of
the same technologies and only recently have we begun to
transition from one main language to another but doing it pretty
much for all of our applications across the board so that we don’t have
to worry about some of the team members knowing x and y and z. It’s just there’s an economy to
scale to focusing on relatively fewer technologies internally. But let’s dive in a little deeper into
SQL and NoSQL if only because they’re so cleanly bucketized into
SQL and not-SQL really. What do we mean by this, and
what does a database really do? So a database typically supports
these– at least these four operations or categories of
operations playfully called CRUD, which stands for create,
read, update, and delete, though you might see some variations
on what the actual words are. But CRUD refers to those
four fundamental operations. Now, in the world of SQL, or
S-Q-L, structured query language, which itself is a programming
language, and it’s a programming language you use to query
a database, to add data to a database, remove it, edit it, and the like. Within the world of SQL, there are– is a direct mapping of
these four operations, the four keywords, for
features of the SQL language, namely create, select,
update, and delete. So it’s almost CRUD, but it
doesn’t quite line up perfectly. So create, read, update, delete is the
general notion of the four operations database might support, and in the world
of SQL, which we’re about to dive into, might you see these four commands
specifically create, select, update, and delete. So what does it mean to
be a SQL database, or more generally, what does it mean
to be a relational database? Because a relational database
is often historically what people think of when they think
of databases and only in recent years has this NoSQL trend been catching
up that changes the paradigm. And we’ll look at the flip
side in just a moment. So if you’ve ever seen this,
whether it’s this version of Excel or some equivalent version of Numbers
or Google Spreadsheets or the like, this is kind of a relational database. It is a piece of software
that allows you to lay out your data in rows and columns. And among those rows and columns
are there typically relationships. Consider after all the last time you
used a spreadsheet, if ever, odds are there was some kind
of meaning if you put data in column A versus B versus
C versus D. In other words, when adding data to a spreadsheet,
typically if you’re using it correctly, you don’t just start plopping your
data in any random box that doesn’t yet have a number or a word in it. You generally organize the
data such that in column A might be one type of
data, column B might be another type of data, and so forth. And so it’s relational in the sense
that the numbers and the data within relate to one another. And it’s also relational in the sense
that you can have multiple sheets even within a file. So by default with Excel and Numbers and
Google Sheets do you get just one sheet or worksheet by default.
But if you ever want to have multiple types of
data but all in the same file just because it’s kind of nice and
orderly to keep it all together, you can click the plus
and create a new sheet and have a completely different tabular
structure, a different number of rows and columns and different
meanings for those columns but somehow the data
is all related, ergo this notion of a relational database. So a relational database
stores data in tables, and a table is in turn a
set of rows and columns. So why does this actually matter? Well, Excel is not all that powerful
when it comes large datasets. In, fact it wasn’t all that long
ago that Excel I believe only supported as many as
65,536 or 35 rows probably, and that’s because,
long story short, they used the 16-bit integer, the
biggest number for which is 65,535, and so Excel physically couldn’t
count as high as 65,536 or 7 or 8 because they just didn’t use
enough storage underneath the hood. But even if you had that much data,
and that’s quite a lot of rows, the software just
tended to be super slow at least in my own
experience back in the day trying to manipulate
very large datasets. And Excel just wasn’t designed
for tens of thousands of rows. By the time you’re at
that much data, you should really be graduating, so to
speak, to an actual relational database management system, a server-driven
database that actually leverages not just files but memory more effectively. In fact, what– among
the features you get from products like MariaDB and MySQL
and Oracle and Postgres and the like is you get really smart people who have
implemented the software in such a way that it makes your creates and your
reads and your updates and your deletes faster than they might be if you
were just storing all of your data in a big file. For instance, in a big
file like Excel, if you want to search for some information,
you can hit Command F or Control F or whatever, type in a keyword,
and then Excel or Numbers or Sheets will search for it. But generally these spreadsheet programs
are going to search for your data pretty much by a brute
force, search top to bottom, left to right, looking in
every darn cell for that data. That’s fine if you’ve got
a pretty small spreadsheet. We slow humans aren’t going
to notice the difference. But in the context of really
big datasets, tens of thousands of rows, let alone
millions or billions, it does not suffice to look
at every piece of data when looking for a certain phrase or
a certain number or some such value. You want the database itself to do
some anticipatory optimization, sort of working its magic underneath the
hood using various algorithms and data structures, so as to
optimize those queries and to give me answers
in logarithmic time, not linear time, or time that’s faster
than searching the whole darn thing. So at some point, spreadsheet
software does not cut it, and you transition to a more
proper relational database. But moreover at that
point, you have to start deciding how you want the
database to store your data. Because at the end of the
day, we humans generally know a little more about the
programs we write about the data we’re going to be storing. And by this I mean, if I am storing
a bunch of data in a database, I probably know better than the computer
might know which of these values is always going to be like an
integer or which of these is always going to be a dollar amount
or which phrase is always going to look like a time
of day or a date or day of the week or some other such value. And so we humans can actually
help databases help us be more highly performing by
providing them with hints, otherwise known as data types, that tell
the database what type of data to store and therefore how to
store it most efficiently. Some of those popular data types
in the world of SQL then are these, and let’s just take a
look at a few of these. So char and varchar. So char being shorthand
for character, and it’s not a single character like a or b or c. Character, or char,
generally refers to a column in a database that is
going to store one or more characters a little confusingly,
a string, so to speak, where a string is a sequence
of 0 or more characters. So when designing a database
column that you know is going to contain a word or
a sentence or even a paragraph, you can tell the
database, hey, database, make this column this
many characters wide, i.e. allocate that much data upfront. But if you’re not sure, as
might often be the case– maybe someone has a short name. Maybe someone has a long name. Maybe someone has a long
address or a short address. If you don’t really know
what the right length is for a column for the values
a user is going to provide, you can instead use varchar for variable
length character strings, which is to say you specify only an upper bound. So I don’t know what the longest
name is in the whole world. But my name is D-a-v-i-d, five
feels like it’s kind of short. Probably some people with
longer names in the world. 20, is that enough? I don’t know? 50? I don’t know, 100? Probably. I should probably Google to find
out with a bit more reassurance, but this is a decision that the web
designer or the database designer is going to have to make. You can’t just tell, and
you don’t want to just tell, the database accept any length
string because the more flexible you expect the database to be, the more
generous you expect the database to be, the less optimization it can do for you. By contrast, the more precise you can
be, the more conservative you can be, the more optimization
algorithmically the database can do so that when you ask for data back,
it can give you those answers faster. When you insert data,
it can insert it faster. So the more helpful we humans
can be with our databases, the more help the
database can be in turn, and that’s probably a good thing when
we have lots and lots of data and users because we want the common
case to be highly performing. It might cost me a minute, five minutes
upfront to really noodle on the problem and figure out what the best design is. But that cost is going to
be amortized over thousands of users, millions of users, who
are then benefiting thereafter from a better database design. So where is the line to be drawn? We’ll explore this in the context of
an example, but it kind of depends. There is no right answer necessarily. It really depends on your use case
and the data you’re trying to store. With numbers, too, do
you have some discretion. Integer means what it is,
generally a 32-bit value, which means you can have a number
from negative two billion to positive two billion, give or take. But that might be overkill. If you know you’re dealing
with really small integers, maybe you don’t need 32 bits. Maybe you want fewer and so
you might just say, small int. Doesn’t need to be that many bits. I know my values aren’t
going to get that large. I might as well save the database space. Or by contrast, wait a minute. Going to have more than two
billion users, success permitting. I’m going to, therefore, want
to use a big int like 64 bits, so I can have many, many, many,
many users or rows in my database. And indeed some of the most
popular websites out there have run into this issue. The Facebooks, YouTubes,
and the others of the world, well, they just have
so much darn data, they had better not cap the number
of rows in their database table at only two billion because they
might well have that many and more. Then why not just choose varchar
with a really big number? Why not choose big int with
a really big number of bits? Well, it’s wasteful. You shouldn’t over-allocate because
then you’re just spending more space, and space costs money
and might even cost time to search if there’s more
bits to be looked at. And so you don’t necessarily want to
just cop out and say, use as much space as you want or as is necessary because,
again, we can’t be as helpful therefore to the database. Now, Numbers are an interesting one, and
this is true in programming languages whether it’s SQL or C
or C++ or yet others. It turns out that choosing how big
your data is, or anticipating it, has some real impact in
some cases of numbers. So it turns out that a integer of course
is just a number like negative 101 and on up in both directions. But a floating point value
or float is a real number, a number that’s not necessarily
an integer, but a real number that has a decimal point and some number
of digits after that decimal point that may or may not be representable
precisely as a fraction. So that’s a real number or a float. If you want more bits
or precision than that, you can actually specify double
precision, which gives you more bits and therefore you can have even
more digits after the decimal point. But the key takeaway here is
that at the end of the day, it’s going to be finite if
you’re representing a number. And so if you do use something like a
float, even a double precision float, which gives you more bits of precision. At the end of the day, last
I recall from grade school, there is an infinite number of
numbers in the world, both integers and real numbers for that matter. So in both the case of these integer
base numbers and these floating point values, you can only count
so high, or you can only specify a number so precisely. And at the end of the day,
you might have some overflow where you just can’t represent bigger
numbers, whether positive or negative, or you just can’t represent
enough decimal points– enough numbers after the decimal
point to represent a number perfectly accurately. And so there’s this tradeoff. You might want more and more
space, but at some point, you can have an infinite
amount of space. Computers are physical devices. They only have a physical
amount of memory inside of them. You might have to draw a line. And so if you’ve ever seen some
older movies like Superman 3, which has a great incarnation of
this or somewhat more recently, Office Space, where there’s money making
scam whereby the companies in question, long story short, were constantly
manipulating monetary amounts in their database systems,
but they were always rounding off fractions of pennies. And so the masterminds
in both movies started pocketing all of those
fractions of pennies, but hilariousness ensues
when they don’t quite realize how much those
fractions of pennies add up. But that too is an issue of imprecision. We in the human world generally,
when going to stores and such, use only two decimal
points of precision. But investment banks
and banks more generally might actually use more
decimal point– more numbers after the decimal point than that. And so having the ability of expressing
numbers more precisely is compelling. Thankfully, there does
exist Decimal, which allows you to specify how many
numbers maximally you essentially want before and after the decimal
point or the total number in question. And so that would be an
alternative to these others. But it might end up then
costing you more space just to get that more precision. So here, too, as with the
decisions around frameworks and libraries and languages,
here, too, there’s a tradeoff. Even at this lower level, when
you really get into it, deciding how to store your data in a database. Lastly, and a little more easily, there
are data types like to Date and Time and Timestamp, which do as they say. They look like dates. They look like times. They look like timestamps,
just some counter from some preordained moment in time. And these data types
are commonly used as you might guess to store these
types of data in a database. When did the user last log in? When did he or she
register for the website? When did he or she buy something
from our catalog or the like? You can represent those
and more data types in a standard relational
database that supports SQL. But you have some other options, too. It turns out that in
a relational database, you can be even more helpful to the
database by telling it in advance if any of your columns
should be considered a primary key or a foreign
key or a unique constraint. Now, what does this mean? Well, typically with
data, it is useful to be able to uniquely identify a row
in your table in your spreadsheets without having to look at the whole row. For instance, when using Excel
or Numbers or Google Sheets, you’ll notice that by
default, all of the rows are just numbered 1 through whatever. That’s useful because if you are
collaborating with someone or you yourself are just trying
to find some value, you could just jump ahead to like row 50
to identify the 50th row of your data. You don’t have to look
for a specific name or address or purchase
order or whatever it is that you’re storing in this table. You can just jump to the
row number in question. A relational database very
often takes the same approach, using some piece of data, usually
just an integer 1, 2, 3, 4, just like the spreadsheet
programs, to uniquely identify the rows so that you
can access them very quickly via that number or that index. A foreign key, we’ll see,
is a notion of a piece of data that exists in
two separate tables– two sheets where there’s
an interrelationship but more on that kind
of example in a moment. And a unique key, a unique
column, is one where you should not see any duplicates. So, for instance, maybe when
building a website that has users register for your website,
if you want to ensure that no user can have the
same email address as another, you can specify to your
database, hey, database, make sure that [email protected], or
whatever the user’s email address is, only appears once in a
column in my database. Don’t let David or not-David register
with that same email address. And so this is a useful
way to ensure that you have correct behavior of your website
and you have integrity of your data so that you don’t accidentally
have duplicate values, which would lead potentially to ambiguity. And there’s even more features you
might get from a typical database. So let’s indeed now try an
example whereby we decide how best to store data in my database. But to simulate my database I’m going
to quite simply just use Excel here. I could use Apple
Numbers or Google Sheets or the like or any spreadsheet
program, but at the end of the day I’m really just using this because
it’s a program with rows and columns. In reality, if I am a business
owner and I have a web-based store and I sell widgets and
sprockets on my store, the reality is I want
to keep track of who has bought what so I
know what my revenue is, so I know to whom I need to
ship things, and so forth. And so I’m going to
pretend to be the database here so that we can walk through an
example where we design this database but realize that the actual data
that’s being inputted by the user into my website’s front end, the HTML,
JavaScript, and CSS user interface, is going to get sent to the
server, as by an HTML form, where my back end language, whether
it’s Python or PHP or Java or Ruby or the like with some
framework probably, is going to be ultimately
storing it in a database. And that database in turn might be my
SQL or MariaDB or Oracle or Postgres or something else. We’re just going to focus on what
any of those databases might– how any of those databases might
potentially store the information. So someone has just submitted
a form on our website. They’ve given their credit
card information and the like, and therefore it is time
for my website to store this information in a database. What am I going to store? Well if they’ve bought a widget, I
might type in widgets, quantity 1, and maybe it was Zamyla
Chan who bought this widget, and she is at the CS
Building at 33 Oxford Street. And that’s in Cambridge, and that’s
in Massachusetts in 02138, USA. So here is some information
therefore that I might store. This is good because I
know to whom to ship it. I know how many widgets I have sold. And maybe the price should
be in there as well. So she paid maybe $9.99 for this widget. All right, now, let’s
fast forward in time, and let’s assume that someone
else has visited my website, and they too have decided to buy
a widget but multiple widgets. We upsold them. So a widget was bought, quantity 2. This is, say, Rob Bouden also in 33
Oxford Street, Cambridge, Mass, 02138, USA, and this one was a total of
$19.98 since he bought two of them. So I’ve just been storing this
data in sort of freeform format but each of these columns
clearly has meaning. So maybe this first column
should really be called Product. This one should be called Quantity. This one could be called Name. This is maybe Street. This is maybe City. This is maybe State. This is Zip. This is maybe Country. And this is maybe total. But even here there are some
opportunities for disagreement. This is a little US centric, the
fact that we have cities and states, as well as zip codes. Indeed it might be the case
that zip codes don’t all follow the same format indeed even
in the US sometimes people write them with five digits, sometimes
with nine digits and a hyphen, so there’s a design opportunity there. But let’s drill in deeper as to
what data type these various fields should be at least right now. Let me make room at the top
here so we can just make notes as to the data types
but in reality these would be stored not in the table itself
but somewhere else in the database. What data type should product be? And remember that among our options
are data types like these product. It’s not a number. So we can knock off
most of these options. It’s definitely not a date, time, or
timestamp, so then it boils down to is it a char or a varchar? So char is a fixed-length field,
so we have to decide in advance is it going to be eight characters, 16
characters, 100 characters or something else. Varchar would mean we
just know the upper bound. So I don’t know. W-i-d-g-e-t is 6, so minimally it’s
got to be six characters but then there’s sprocket, s-p-r-o-c-k-e-t. Not sure if I’ve ever
had to spell that word. That’s eight characters so
six isn’t going to cut it. So maybe we do something like char8. But there’s a tradeoff here. If I specify now that this field is
maximally going to be eight characters, then I can’t sell anything with
a longer name than sprocket. I could change the database
size– or the column size later on but it’s ideal to get these
things right from the get go and not have to go back in and change your
infrastructure or hire someone to come in and make modifications. So maybe that’s a little shortsighted. Maybe it shouldn’t be eight. Maybe it should be twice that, like 16. I don’t know, and I don’t– I’m not necessarily going
to offer an answer here because it entirely depends on
what data you’re trying to store, what items you’re trying to sell. This, in fact, now is even more
wasteful because even though I’m now anticipating product names
that are up to 16 characters, the char data type is going to use
for every product name 16 characters, even if a whole bunch of those are
blank because the word isn’t long enough to need 16 characters. So maybe I would go for a variable
length field not char but varchar whereby the maximum length of my
column should be 16 characters. But here as in CS more
generally tradeoff like it might seem
like a win like OK wolf the problem is I’m using
too much space all the time, Let me just put an upper bound. There’s gotta be some price you
pay there’s got to be a tradeoff and indeed there is. It turns out that a
database can generally search your data more
quickly if it knows the entire column is the same width. Long story short, if it knows that
this column has eight characters, eight characters, eight
characters, eight characters, it can use very simple
arithmetic to jump mathematically from one row in that column to another
because they’re all the same distance apart essentially. But if you have a varchar
column and variable length, you can think of the column not as
being perfectly smooth on both sides but kind of jagged on one side. Some rows are short. Some rows are long. And so you can’t just
blindly use simple arithmetic and jump eight characters at a time if
the length were eight or 16 characters at a time if the length were 16. So it’s a tradeoff. If we want to be able to search
through our product names quickly, might not want to use a varchar. So here, too, no right answer. It’s a tradeoff. And it might not matter
for small datasets. Indeed probably doesn’t. If you don’t have many customers, you
don’t have many products but certainly and scale these kinds of things matter. And even when building something that’s
not going to have that many users, just getting it right doesn’t
cost that much upfront time. The most important thing, I dare say,
is to actually give it some thought and not just leave it to chance or take
the easiest way out because invariably over time, you will build
up so-called technical debt, where you make poor decision, poor
decision, poor decision, and now you have a very expensive decision later
on if you have to go back and change a lot of those things. What about quantity? Well, quantity, nicely enough,
would seem to fall more cleanly into one of these fields. Now, is that an integer? Is that a big integer? I think we’re doing pretty well if we
need more than 2 billion products sold. Maybe we’re sort of a
smaller shop, and we can get away with an
integer or even a small int, but this too would be a tradeoff. How many bits do you want to spend? I would say that the default typically
would be an integer unless you really are expecting a huge amount
of data, billions of rows. So we might say something
like integer here, name. Oh, gosh, this is that
can of worms again. How long is a maximum name? Maybe I do some googling and some due
diligence as the maximum length names. Maybe I just want to cut it off. You’ve probably been to a website
before where you’re happily entering your information, and then you
keep typing and nothing is happening. And that’s because the programmer,
or the database designer, has decided your name
doesn’t need to be that long. Or your address doesn’t
need to be that long. And it’s infuriating
sometimes because there are assumptions, naive,
insensitive assumptions sometimes, but that boiled down to perhaps either
calculated design decisions or maybe just poor design decisions. So I don’t know what is right here. 16 feels a little too
conservative, so maybe I would say something like varchar 128. But even that I’d probably want
to take a look at my customer base and see if that’s well beyond the
limit of what I might actually need. Same thing for street. Same thing for city. I don’t really know
what the right length is, but let’s assume it’s
going to be varchars for those. So we’ll just use a dot, dot, dot to
suggest that it’s an open question. State is an interesting one. If we expect to have only US customers,
we can do a little optimization here. If every US state has a
two character abbreviation, we could do char2 so that
we get that performance benefit of knowing that every row
is the same width, two characters, so long as we’re comfortable not selling
products to anyone else in the world beyond the United States zip code 2. Design opportunity there. I think it’s fair to say that
integer, while seemingly correct, might get you into some trouble,
at least here in Massachusetts, where we have a whole bunch of
zip codes that start with zero. Like in the world of
numbers and integers, leading zeros are meaningless. You can have as many zeros
to the left of your number and they don’t change the
actual value of your number. But in a zip code, it does have meaning. It is the first of five
digits here, and so calling this an integer probably isn’t
very wise because if the database is like most humans. The database might
ignore that first digit, and so my zip code is going to appear
to be 2138, which really isn’t right. Now, we could fix that in code. We can make sure that, well, if
we ever see a zip code that’s only four or fewer digits, this let’s
pre-pin some zeros, that feels messy. If we’re going to put that
data in there from the get go, let’s make sure it comes
back to us correctly. And so I might actually
say something here. Even though it looks
like a number, maybe I would actually say it’s a
char5 field, or maybe it’s nine or 10 if I want to have a
hyphen in there for US zip codes. Country, too. Here maybe it’s– going to be
a three-character abbreviation of two-character abbreviation,
not sure what’s best there. Really depends, too,
on what countries want to sell to if not just the US perhaps,
so there’s a design opportunity there. And then perhaps the last
to consider is this total. I think it’s fair to say that
integer would not be correct because we would either be
rounding down or rounding up all of the money we’re supposed to
be collecting from our customers. So we probably want one of these. And some databases differ, but generally
a data type like Decimal is ideal. You don’t want to even get
into the business of worrying about these rounding errors
or errors of imprecision as in Superman 3 and Office Space. Much better to just say that you
want a fixed number of digits to the left and a fixed number of
digits to the right of the decimal place so that you are not losing
even fractions of pennies or mischarging anyone or
losing out in any way. So we might use Decimal in that way. Some databases, though,
have an actual currency data type, which operates similarly. So there remains to be seen some
opportunities for improvement. If I continue to sell
widgets, let alone sprockets, I’m going to have more and more
and more rows in this table. And if Rob and Zamyla end
up being repeat customers, I might have more and more Robs and
more and more Zamylas in the same table. And as that happens, there begins
to be quite a bit of redundancy. Indeed, what can you
factor out over time? Well, certainly if Zamyla and Rob
keep ordering more and more items from my database, I could just
keep updating the quantity, but that feels a little messy. It’d be nice to have a veritable
history of all of my sales. I don’t want to just
aggregate everything. So adding more and more rows for
every sale seems pretty compelling, but then I’m going to
see Zamyla Chan and Rob Bouden again and again and again
and again and again in this table. And I’m also going to see their address
again and again and again and again. And herein lies now the
capabilities and of the feature of a relational database. You know what I’m going to do rather
than just treat this as my one and only table, let me go ahead
and just rename this sheet or worksheet to be Orders. I could call it anything I want. And you know what? Let me create another table or sheet,
and let me call this Customers. So even though, again,
I’m using Excel here, this is just like I might be
doing in Oracle or a SQL Server or in Postgres or mySQL or the like,
I’ve just created a second table. But as per the name
relational database, there’s going to be a relation
across these two tables now. And what’s that relation going to be? Well, you know what? I’m going to go ahead and
copy all of this customer data and actually cut it and paste it over
into this new table called Customers. And now this isn’t quite sufficient. I’m going to go ahead and
notice that Excel has already numbered these things for me. But I’m going to go ahead just
for clarity and add my own column, and I’m going to call this ID. And it’s going to be, say, an integer. And I’m going to cause Zamyla my first
customer, Rob my second customer, and in this case, notice now these
unique identifiers are part of my data. It’s not just part of Excel’s arbitrary
numbering on the left and arbitrary lettering on the top. Rather these are now actual
pieces of data in my database that will be stored and
backed up and so forth. But notice now that Zamyla
is customer number 1 and Rob is customer number 2, each
of whom lives at these addresses, I don’t have to worry now about
redundantly storing that data because in my orders table now,
any time Rob or Zamyla or some other customer
purchase from my website– notice I can shrink this. And I can say, you know what? This is the customer who bought this. It’s going to be an integer. And you know who bought
that first widget? Well, it was Zamyla. And you know who bought that
second widget and the third widget, too, since quantity was 2 was Rob. And if some new customer
comes into my database– so, for instance, suppose that
someone new orders from my website, they are going to become
customer number 3. That will be, for instance,
Doug Lloyd, and suppose he, too, is at that same address
at that same zip code in the USA. But now in my orders table, suppose
that Doug has bought 10 widgets. He really went all in. Well, he, too, is going to
have widget there, quantity 10, his customer ID is 3, and he, of course,
is going to have spent $99.90 with us in total. So notice how we’ve factored
out the common information to eliminate a redundancy. Notice now that if Doug or Rob or Zamyla
move addresses or change their address, or if we were storing more information,
like their phone number and email address and other personal data, too,
we could change it in just one place. And not in our orders table because,
indeed, there’s now distinct semantics. Our orders table stores orders. Our customers table stores customers. And if we wanted yet another
table, as we probably should have, it could store, say, products. In fact, there’s still this redundancy. Let’s go ahead and create another
table called Products inside of which is an ID field as
well as a product field, and then, just as before, let’s
start numbering our IDs from 1. So our first product is a widget
and while we’ve not sold any yet, our second product, ID 2, is a sprocket. Now, in this way in my orders table,
can I store not a product per se, but a product ID. And so now even though
my table is frankly becoming more and more
cryptic and a little harder for me to wrap my mind around– what am I looking at,
it’s all just numbers, it is now what we would call normalized
in the context of a database. And a database typically is not meant
to be looked at by human eyes just like this. Rather it’s meant to be queried and
data created and updated and deleted. And so there are certain
commands in this language called SQL that actually
facilitate programmatically, using a programming language, what I’ve
been doing with my keyboard and fingers alone. Indeed, the commands with which
you can manipulate the data in the database itself is
going to be SQL’s commands, create, select, update,
delete, and others. Indeed, much like Scratch
has the various puzzle pieces via which you can implement
logic in a Scratch-based program, so does SQL will have these
puzzle pieces, if you will, via which you can create and
select and update and delete data from your database just like I’ve
been simulating by using Excel here and my keyboard. And indeed, some of these
more sophisticated concepts, like primary key and
foreign key and unique key, now rather start to jump out at us
because if we consider what my orders table now looks like, notice
that it’s indeed mostly numbers, but those numbers are essentially
keys into another table. In fact, if you look at
products, my products table has an ID column, which
has unique numbers 1 2, and so forth my customers
table has its own ID column. And these are same numbers,
but different meaning. These are customer numbers
1, 2, 3, and so forth. So in each of these tables
customers and in products is that ID column a primary key for
the customers and products table respectively. Within each of those
tables, it is that ID column that uniquely identifies rows. Zamyla is and shall always
be customer number one. Rob is and shall always
be customer number 2. Doug is and shall always
be customer number 3. So those IDs those primary
keys must not change. They must be invariant,
and as such they can be reliably used to
uniquely identify customers or, in the context of
products, uniquely identify a product or, in the case of orders– we forgot something. It would seem valuable if we
continue this train of thought to also have here in my orders table an
order ID that should probably represent each of these orders, which is just
going to similarly be an integer that just keeps track really of how
many total orders have been placed, 1, 2, 3, 4, 5, 6 on up, all the way up
to 2 billion or best yet even higher than that. But notice these other numbers now. The product column is no longer
the name widget or sprocket. The quantity column,
still just an integer. That’s not anything to do with a
key even though it’s also an integer but customer is an ID. But it’s not a primary key, nor
is product a primary key here. In this context of my
orders table is product and is customer a foreign key
because those two columns are primary keys in two other tables. So within one table
if you have an ID, it should be generally
considered your primary key if that is the role it’s playing,
uniquely identifying your rows. But if that same number
appears in some other table for the purpose of cross-referencing
really, is it a foreign key? And suffice it to say that in
SQL, this database language, even though this looks cryptic to
us humans, realize that with SQL can you stitch these distinct
tables or sheets back together. You can quote unquote join
SQL tables in such a way that you can take your customers
table and your orders table and reassemble them so that
you see next to each order, say, on your administrative
web page that allows you to see all
of your recent orders, not the customer IDs of who
has bought what but actually the customer names and their addresses
and maybe their phone numbers and e-mails and more. You can join this information
back together again. And what databases are good at is
doing exactly that kind of joining, not to mention searching or more. But sheesh, this was a lot of work
just to get to this point right? It was pretty easy to make one worksheet
just put all of my orders in there. But then we went down this slope of oh,
well, maybe we should factor this out. Oh, wait. We can factor this out. Oh, maybe we should add some IDs here. We just created a whole
lot of work for ourselves. Now, I dare say it will
pay off over the long run, and indeed our database
will be much better designed where better design will lead
to faster performance, less redundant storage of data, and more,
but it certainly took a lot of work. So it turns out there is the
opposite of a SQL database that’s been in vogue for some
time called a noSQL database, or a document store, an
object-oriented database where the defining characteristic
really is that it is not SQL. It does not store data
in rows and columns. It does not store data in one or
more tables that can then be joined. Rather it stores all of your
data really all together in a hierarchical structure. And that’s an oversimplification
because there are other features. But consider this example here. This is written in essentially a format
that’s called JSON, JavaScript Object Notation, but this idea
of a noSQL database has no fundamental connection
to JavaScript the language. Just so happens this tends to be
the language with which these data structures are represented. The curly brace here and here just
means here is an object of information. The quotes are just used
around words and numbers, and the colon separate keys from
values where keys and values is a very common paradigm where
on the left is metadata and on the right is data typically,
key and value respectively. Square brackets just
mean an array, which means that this is an array
or a list of two values, something comma something, which
happens to be GPS coordinates, latitude and longitude here. So what is this? What are we looking at? This appears to be an
object, shall we say, that represents the city of
Austin where Harvard’s business school is, where Harvard’s
engineering school will soon be. And so this object contains a
bit of hierarchical information, not a huge amount, but
notice it has an ID, which happens to be its zip code 02134. It has a city name, Austin. Has a location which by convention is
a comma-separated list of two values, latitude and longitude, and so
that’s kind of some hierarchy. It’s not just a simple value. And then there’s a population of 23,775
at last count though surely to rise. And then in the state of Massachusetts. So this is actually a snippet
from a database called MongoDB, which is a very popular noSQL
database that stores data essentially like this. So rather than flatten
all of your data as is the case in a relational
database using SQL an object-oriented database or a
document store like this noSQL database called MongoDB, really stores
things as key value pairs. And those key value pairs might actually
have some hierarchical structure. So if you, for instance, stored
an order like we just did, instead of storing it
in rows and columns, you would just store it is one big
chunk of information like this. And inside of that
object, an order object might actually be the entire customer. Inside of that customer might be his
or her city and state and so forth. So there might actually be retained
in some hierarchy like you see here. And so this is just a different
way of viewing the world. It has typically been a more
efficient way of viewing and modeling your
world because you don’t have to give frankly as much thought
to the design and the division of some of your data and the
normalization thereof, but you do sometimes pay
a performance penalty. You do sometimes pay a penalty
and redundancy of data, though there are ways to avoid that by
reusing something like that ID field. So it really is ultimately a
different philosophy right now. And its a competing alternative to
something like a relational database, and here, too, will
there be an opportunity to read up on and to
debate exactly what is best for your actual problem at hand. And now mobile. Up until now we focused
on the front end, on the back end of really
web-based applications that you might access on a laptop
or desktop or even a mobile device. But what we haven’t given thought
to is the design opportunities for mobile devices specifically. Indeed, most any of you who
have a smartphone these days, iPhone, Android, or
the like, have probably downloaded some application that
did not come with your phone. And you downloaded that from the Google
Play Store or the Apple App Store and that software is quite likely
written in a very specific language. Indeed, the language
for Android is typically Java in which programs are written. The languages in which iPhone
and iPad applications are written is Objective-C or more recently
Swift, and so there, too, at least in the world
of iPhones and iPads, do you have design
discretion over what language you use with Swift being
the more modern and the one that Apple’s really been pushing. But even then, do you have the option to
not implement a native application per se, one that is implemented in Java or
in Objective-C or Swift, all of which are programming languages,
you can actually implement a web-based
application but package it up in a way that makes it seem
like it’s a native application, allows you to distribute it via the
App Store, via the Google Play Store, so that it ends up putting an
icon on your customers phones. But when they click it, they’re not
seeing an iPhone application per se or an Android application per se. They are seeing really
a secretly embedded web browser whereby your
application is implemented at the end of the day in
JavaScript and HTML and CSS, but it’s got a nice little
rectangular window around it, so the users don’t realize that
they’re looking at Safari or Chrome because all of the menus of those
browsers have been stripped away. All you get is an embedded web browser. And so here, do you have an opportunity
to choose among these options. And so one of the design
decisions one makes when designing for a mobile user base is
do we develop an iAndroid application? Do we develop an iPhone
or iPad application? Do we do both environments still? And how do you choose among those? Well, it certainly depends
on your demographic. Android is by far the most popular
mobile operating system these days. But in certain contexts, a
campus like this, iPhones are actually even more popular. So do you want to cater to one
demographic or another or ideally both. Both is probably your instinctive
answer, but that comes with a tradeoff. That certainly comes with a price. If you want to ship some new
and improved tool that you want to make available
to the world or a game or any other piece of
mobile software, well, it’d be nice to have it available
to all users with smartphones. But then you’re going to have
to know how to program in Java. You’re going to have to know how
to program in Swift or Objective-C. Or you’re going to have to know how to
take this hybrid approach of developing it using JavaScript and HTML and CSS,
but there, too, there’s a tradeoff. You tend to get very good performance
out of Android applications that are natively written in Java
and native applications in iOS that are written in Objective-C or Swift. They just tend to be very responsive. They tend to follow a
very similar paradigm. Menus and buttons and so forth
all tend to look and feel the same and therefore be familiar to users. And they’re very responsive. Touch a button, something
happens quickly. There doesn’t seem typically
to be much latency. In hybrid applications, that are really
written in JavaScript, HTML, and CSS, especially if they’re technically
server side hosted on your servers or in some cloud server,
they might actually feel a little slower because
there’s a whole internet between you and your user’s experience. Or they might have to download more data
that would be better to just bundle up in the application itself
where the menus and the buttons they don’t quite feel as native as
the default Android and iOS user experiences. So a bit of a tradeoff there. OK, so if you don’t want to pay that
penalty of performance and perception, implement the Android
app and the iOS app. But now you need two developers or one
developer who knows both platforms. So that, too, comes with a cost, both
in time or salary or talent or the like. So there, too, it’s
not obvious how to go. And even more recently are there
frameworks like Cordova, Ionic, Meteor, React Native, Supersonic,
Xamarin, and more that actually offer yet a fourth
option whereby you implement your application in some neutral
language like JavaScript, and then using these frameworks, these
tools that other people have kindly or commercially developed
for us to use, you can essentially convert or translate
that middle language JavaScript to Objective C or to Swift or to Java
or really to the underlying code that gets shipped ultimately
to the app stores so that you can actually
develop native applications but in an intermediate language. But there the learning curve might
be a little bit a little higher. Indeed, the menu of
options is even longer than the list of native
languages itself. So that requires some learning curve
or some time or some talent or money again. And so there too were
there are some tradeoffs. And so it really depends ultimately
on what is your application and who you have working with
you and what is most important and how much time do
you have and what do you view the technological horizon
looking like some months ahead? So at the end of the day, these
technology stacks, as they’re called, are really just menus of options. And those menus are constantly
evolving, and they focus on the front and on the back and on
the server, on the client, on mobile devices,
laptops, and desktops. There are solutions to
any number of problems. Indeed, the process of software
engineering and developing a product and developing a web app
or a native application itself is first doing some due diligence
and bringing yourself up to speed on what the design possibilities
are, having a discussion, having a debate even, with the
engineers with whom you’ll be working. And ultimately making the most
informed decision that you can with an eye toward what is trending
now, what has been trending, and where the industry might
be going but ultimately focusing on solving
optimally your own problems and choosing among these various
and ever-changing technology stacks.

7 thoughts to “Technology Stacks – Computer Science for Business Professionals – by CS50 at Harvard”

  1. Great overview. Got here from CS50-week3 and got hooked due to curiosity. If I had seen this back in 2010 when we were kickstarting an ERP project, it would have been an amazing time saver. Funny, today, on the other monitor right now, I'm setting up a datagridview query for a database I created, and it feels as familiar as analyzing a simple dcf valuation or some other business artifact. It would have been much easier if there were lectures like this available. Thank you for these videos, CS50.

  2. Front End
    CSS, HTML, JavaScript, …
    Angular, Ember, Meteor, React, Vue, …
    Bootstrap, Foundation, Semantic UI, …

    Back End
    Go, Java, JavaScript, .NET, PHP, Python, Ruby, Scala, …
    Django, Flask, Laravel, .NET, Node.js, Rails, …

    Back End
    SQL, NoSQL, …
    MariaDB, MySQL, Oracle, PostgreSQL, SQL Server, …
    Bigtable, Cassandra, HBase, MongoDB, …

    Mobile
    Android, iOS, …
    Java, Objective-C, Swift, …
    Cordova, Ionic, Meteor, React Native, Supersonic, Xamarin, …

Leave a Reply

Your email address will not be published. Required fields are marked *