A Possible Future of Software Development

A Possible Future of Software Development

>>PAWLIGER: Good afternoon.
>>Hello.>>PAWLIGER: Hi. So my name is Marc Pawliger.
I’m engineering director for client applications. And I’m here today with Sean Parent from Adobe
Systems. Sean and I have known each other since ’93, ’94 or something like that, long,
long time, when we worked together at Adobe on the Photoshop Team. And Sean has done a
tremendous amount primarily in and around Photoshop up until recent times for the work
that he’s going to be talking about today put amongst other things; automation and UI
layout and just a tremendous amount of capability into Photoshop. And of–I guess, the last
couple of years has moved on into more of the research side of Adobe and has concentrated
on areas primarily centering around STL and Adobe editions to them and also sort of looking
at the whole problem of user interfaces and their complexity and behavior and why a lot
of the software that we have works maybe despite itself and despite the way that it’s built.
So with that, I’ll turn it over to Sean.>>PARENT: Okay. So, is my mic on? Yeah. So,
that was a nice introduction. Thanks, Mark. I’ve been at Adobe now for something like
13 years and about the last four years working in either our advanced technology group or–now
I run our software technology lab which is a small team trying to rethink how Adobe goes
about building software. Now this talk came about because Bjarne Stroustrup, who created
C++, invited me to give a talk down at Texas A&M and said, “You’re in the industry, why
don’t you come and give a talk to the students about the future of software development?”
And I said, “I don’t know about the future of software development. I know about what
I’m building which is hopefully the future of software development. So, I’ll give a talk
on a possible future of software development.” For those of you who don’t know much about
Adobe, a little background information. We’re a big and growing company about two and a
half billion dollars of revenue, not growing nearly as fast as Google or as big as Google,
but we’ve got about 6,000 employees worldwide, very distributed. We do a number of products;
some of these you’ve probably heard of. So, internally, we’re structured loosely like
this. It varies from business unit to business unit, but we have product lines. So, Photoshop
is not just a product; it’s a product line. There’s Photoshop Elements, Photoshop Extended
Edition. And we have product teams where we typically have 20 developers working on a
product team, about 30 testers which surprises most people more QE than engineering. Typically
one or two UI designers working with the team, and then we have about 20 shared technology
groups which all have similar structure to the product teams dealing with libraries for
vector graphics, typography, color, help, localization, sure things like that. So development
process, we strive towards an incremental development process but the business we’re
in tends to push things into more of a waterfall process. There’s a huge lag between the times
that you have to reserve press to print the manuals, and the manuals have to be done between
you–when you can actually ship the products since we do physical boxes, getting everything
done in the quantities we demand takes a manufacturing ramp. And that tends to push the development
cycles into more of a waterfall model where you sacrifice quality early to get the features
in, so you can write the manual and ship later. We ship mostly Macintosh and Windows, simultaneously,
English, French, German, Japanese, our tier one languages. And by the time we’re done,
usually about three months after the first ship, we’ve covered about 24 languages with
the products. So, if you’re going to talk about the future of software development,
I thought I’d do a little research on what do the analysts say. You’ve got lots of buzzwords.
I spent sometime going through analysts’ reports. You hear lots about Java, and C#, and XML,
and web services, and open source and some of these things I agree are important and
some I don’t think are very important. Best practices and methodologies is listed first
up there because that’s where most people think the problems are. I pulled some quotes
from analysts that I just thought were interesting. The second one being the most interesting,
“Microsoft rolling out a new development process,” this is Directions on Microsoft which is an
analyst publication that’s usually very friendly to Microsoft. And the fact that they would
go through the trouble of putting together a whole development process when analysts
say they don’t think it will necessary help the company ship its products on time or with
fewer bugs. The top one is talking about the problem of security. And the number of analysts
who believe that the solution to security is more and more testing, which I think has
very, very diminishing return. We already have 30 QE people to 20 engineers. If you
double that to 60 QE people testing for security holes, you might find, you know, another quarter
or half of the security holes out there. So, you’re not going to close the holes that way.
Ultimately though, this is why I think the status quo in software development fails.
This is a quote from Jon Bentley from his book “Programming Pearls” published in ’86.
John went around teaching classes on how to develop software and gave a test to mostly
senior engineers at IBM in Bell Labs and asked them to write the binary search problem. Given
a specification, write a correct binary search. Ninety percent of the programmers failed finding
bugs in their solutions and of the solutions where they couldn’t find bugs, many of them,
they still weren’t convinced that they were correct. Okay. To put it in perspective, this
is a correct solution to Jon Bentley’s binary search problem. Okay. And they were not given
just like a few minutes to do this. They were several hours to try to do this. Photoshop
uses the same problem and has, for many, many years, as a take home test. And more than
90% of the candidates just fail miserably at coming up with anything remotely resembling
this piece of code. I think the worst case is probably 48 pages or something like that
of code. And once he figured out what the 48 pages of code did, it was wrong.
>>Man.>>PARENT: So, that’s a problem. Now Jon Bentley’s
own solution is considerably more complicated than this solution, okay, and considerably
less efficient. So, our experience at Adobe in teaching programmers is that the same rule
holds and you can take, you know, PhDs from Stanford which comprises a large section of
Adobe’s advanced technology group, and 90% of the PhDs from Stanford can’t write a correct
binary search. So, how in the world do we ever ship a product? Okay. If you can’t write
that much code correctly, how do you get Photoshop out the door? How do you get Word out the
door? How do you get Google Desktop out the door? Well, these are actual bug graphs from
product cycles inside of Adobe and this is a clue. What’s going on here is we’re not
solving problems; we’re approximating solutions. And some of these curves, this one in particular,
I don’t believe. What happened here is there was a projected ship date out here. The little
peak here is because QE stayed over a weekend to finish their regression tests while engineers
went home so you had a little peak which caused a panic, which led to a nearly linear slope.
The linear slope is not a fixing bugs, it’s deferring bugs. Okay? So, we never hit a zero
bug count, even pretended to. Here, you’ll see we’re still off zero. But in reality,
what’s not shown here is the number of bugs that were just plain deferred off the bottom.
So, that’s what our current programming methodologies do. They allow us to iterate and refine and
approximate a solution, but never actually come up with a correct solution. So, one thing
my team is working on is studying how do we go about writing correct software? Okay. If
you–if you want to write it correct to begin with, how would you go and do that? And this
leads us to generic programming, where my team spends a lot our time. The idea with
generic programming is you start with a concrete algorithm. Okay. You refine the algorithm,
reducing it to its minimal requirements. Clusters of related requirements are known as concepts,
and we have a little more formal definition that we’ll get to in a minute. Then you define
the algorithm in terms of the concept supporting maximum reuse. So if you have an algorithm
and you can then prove that it’s a correct algorithm, now what you want to do is lift
out that algorithm so it’s written in terms of its minimal requirements, so that you can
reuse it in the most number of areas. Now there’s a parallel here between Mathematics
and generic programming. Programming is Mathematics. Okay. In generic programming, we have what
we call Semantic Requirements. In Mathematics, they’re called Axioms. Okay? What we call
a concept in generic programming is an Algebraic structure like a ring or a monoid in Mathematics.
Models, we use the same terminology. In computer science and in generic programming what we
call–what we call an algorithm is just a theorem. Okay? So an algorithm is written
in terms of concept just as a theorem is written in terms of an algebraic structure. Both rely
on axioms for the correct definition. So a regular function, a function with no side
effects, corresponds to just a mathematical function. And the one thing we have as computer
scientists that mathematicians don’t have to worry about is complexity, right? We have
to worry about, how long does it actually take to compute the answer here? Now, we can
refine a concept just as mathematicians can refine an algebraic structure. So a mathematician
would that a monoid is a semi-group with an identity element, if you ever took any abstract
algebra. Okay. That’s an equivalent statement to saying–roughly equivalent–a bidirectional
iterator is a forward iterator with constant complexity decrement. Okay. So you can take
a concept and add additional axioms, semantics requirements, okay, in order to refine it
and that will give you something which is, in some sense, more powerful in that you should
have algorithms that you can execute more quickly on it. More algorithms, but you’ll
have less models. Okay? Less types that you can actually use with this thing. We also
refine algorithms, which is something mathematicians tend not to do. Mathematicians typically do
not refine a theorem. A refined algorithm is saying if you have a concept which is itself
refined, then you can perform the same operation that you could on the unrefined concept on
the refined concept but potentially, you can do it with better complexity or reduced space
requirements. Okay. So, in C++, this is the way concepts appear in the existing C++ standard.
They are tables that appear in the standard itself. They are not expressed in code. And,
in fact, the tables are very, very weak on their semantic requirements on this end. So,
my group takes it a bit more seriously and applies much stronger semantic requirement
to what each of the axioms are. So, we’re here by back up. We defined the notion of
copy to say that T is equivalent to U. The reason why the standard uses equivalence instead
of equals is because it doesn’t require that equal be defined in order to define a copy.
Okay. But if you want to define what it means to have a copy, then you have to require a
quality. Now, concepts enable equational reasoning about your code just as axioms and algebraic
structures enable equational reasoning about Mathematics. Through concepts, you can prove
algorithm correctness. Okay. And you can then reason about your system in its entirety.
Generic programming often gets confused with, you know, angle bracket templates in C++.
Generic programming is not a programming style as object-oriented programming is a programming
style. Generic Programming is a way to reason about code. I can reason about code through
axioms defined through generic programming even if the program is written in an object-oriented
style. Okay? The same axioms hold, so the axioms have to exist and the theorems have
to be correct if the system functions correctly and works. And where it doesn’t, it’s because
something’s been violated. Okay. So you can use the same ideas from generic programming
to–programming to analyze an existing piece of code as well as write a correct piece of
code. Now, in C++, it’s pretty horrible right now. So, we can’t do generic programming in
the language. So, what we do is we approximate it with templates which are really just macros,
and we have comments. So, when I say “type name I” or an STL that might “type name input
iterator” or, you know, “random access iterator,” in here, I say, “I models random access iterator”
just to be a little more explicit. Random access iterator is naming the concept that
you can formally define. Regular is naming the concept that you can formally define.
In fact, that was the definition of regular a couple of slides back. Okay. So in C++0x,
if anybody’s been following the progression of the language, much of what’s written here–up
here in comments, you can now write in code in your C++. So, this is a big part of what
my team is working on. Now, this brings us to a small pause in the talk. Okay. So, great.
So, we can write sort. We can prove sort correct and we can write reverse. And maybe we can
write some nice image processing algorithms going this way. So, how do you get from there?
How do you get from forward iterator to Photoshop? It just seems like this very large chasm.
Okay. How do you get from STL to Microsoft Word? How do you span that gap? So, we’re
going to change gears a little bit here. I have a conjecture, which is that all systems
eventually grow and turn into some kind of network problem. Okay. So if you have any
problem of scale, okay, what that usually means by a problem of scale is that you have
to be able to reason about the system as a whole by being able to reason about it locally,
okay? Such systems become graphs with sets of rules that apply to the graphs and those
describe an awful lot of the systems that we build. So if you go and you look at Photoshop,
what you’ll find internally is it’s become this mass of network of objects. These objects
are all interconnected. They form some giant, implicit data structure. There is messaging
between the objects. So those messages do something, hopefully, that’s an algorithm.
Okay. We use tools like design tool or like design patterns, Scott Meyers’, you know,
50 rules writing the C++ program to come up to come with local rules that, if we apply
them, hopefully, we get a correct system out the back. And then the–once we assemble all
these objects, we iteratively refine and refine and refine and refine until we can ship the
product, until it’s good enough. So, take a moment and define a generic algorithm for
solving a large system. If you’re–if you’re just, you know, building word, how are you
going to go do it? Well, identify the components and how they connect. Okay. This is my generic
algorithm. Organize the system into a directed acyclic graph. Okay. Everybody know what a
DAG is? Hopefully engineers here. So, the reason why a DAG is important, okay, if you
have–a DAG is just a directed graph with no cycles. So if you have a cycle in your
system, it makes reasoning about the correctness of that system very difficult. Okay? You can
no longer reason about the correctness of the system with only local knowledge. You
have to have complete knowledge, okay, and know what the effect of that data feeding
back on itself is. So, you want to define your system in terms of a DIG. Now, of course,
you’re going to have loops. You’re going have things that do iterate. And what you want
to do is encapsulate those, figure out where the loops in your graph are, and figure out
what the algorithm is that’s done by that loop and write that algorithm and make that
algorithm be just a node in your graph. It becomes just processing node in your graph.
And then, ensure that for all possible states in the system, you know–this DAG is going
to be changing, things are conditional, stuff coming on and off–that cycles don’t pop up,
okay, that this thing stays a DAG. And then make it generic which means once you have
this for Word, figure out how you describe similar structures to build other applications.
Now, this all sounds very complicated but the idea actually, for me, came from this.
One of the things I worked on at Adobe was a layout engine which is used in most of the
Adobe products at this point to lay out user interfaces. And it’s a constraint system of
sorts but it always forms DAGs. And I had this realization one day that I could solve
a spreadsheet inside of the layout system. And that I could turn the layout system into
a spreadsheet if I wanted to. And this is why; they both form DAGs. So, this is just
a graphical representation of this. We’re just summing up three numbers while I can
draw it like this. Okay? I’ve got three cells feeding into the plus operator and coming
out with one result. When you start to talk about constraint systems and declarative programming,
which is what we’re talking about here, people have a panic attack mostly because in school
they might have been exposed to the wonders of Prolog. Prolog is a rule-based system.
In some sense, a spreadsheet is a rule-based system. But Prolog’s was striving to be a
Turing complete system, a complete programming language where you could write anything. You
can sort in Prolog. You really don’t want to sort in Prolog, okay? So, sorting is an
iterative loop. If you have a rule-based system that’s iterating, it becomes very difficult
to reason about. So, what we’re talking about here is more the long–along the lines of
something like HTML or a spreadsheet or SQL or Lex & YACC. Those are also declarative
rule-based system of sorts, okay, that strive not to be Turing complete. Some of them are
accidentally, but in general, you stay from the scary edges. So part of what my team has
been working on is something called the property model library, which is to apply these ideas
to a real system. And this is the domain we went after. It’s a piece of a user interface.
So, this is the dataflow of a small section of an actual user interface in the product
that Marc Pawliger is very, very familiar with. The symbology here are basically these
four items in the middle are widgets, UI widgets. You can think about them that way. So, the
top two editing numbers, the bottom two are check boxes, just Booleans. And we have some
code that sets up the dialogue, so it populates these widgets with stuff. And then, we have
a bunch of event handlers, so if I enter a value into one thing, then it percolates and
changes the values in the UI. And I literally went through the code and drew a connection,
drew an arrow anytime one piece of code was connected to another piece of code; data could
flow from there to there. And the complete section of this dialogue has 14 widgets, I
think, in the middle. I stopped after four and said, “Screw it, it’s just a big ball
of spaghetti.” I did add this one box down here, which is script validation. So what’s
happening here is, you know, we have a UI. This is setting an image size for a particular
width and height. And you can also drive it without a UI attached by executing a script
and all the validation logic that’s contained in the spaghetti code of event handlers has
to be replicated into the script validation code so that you make sure that, you know,
what ever I record up here, I can play back without the UI and not end up with an error.
So, in fact, if you look at Adobe’s code base, a third of the code in our products ends up
being event handling code of this sort, part of the big ball of spaghetti, and if you survey
our bug database, you’ll find that half the bugs during the new development cycle are
in this body of code. Now, when I first saw this, I thought cherry picking. Go after the–this
problem, I come up with a solution and I’m the hero. I eliminated a third of the code
and half the bugs. What I didn’t realize was this is indicative that it’s a very hard problem.
So–and here’s why. If writing a correct algorithm itself is difficult, then writing a correct
algorithm when it’s implicit, okay, when you don’t actually have the algorithm as loop
in your–on one page, what you’ve got is a message to another object that eventually
loops back to you or not and algorithm is buried in there, gets to be almost impossible.
So, start to break down this problem. How would we define it? Well, you have to start
and say, “What is this thing?” You know, what’s this event handling logic? How can we even
define this? And we backed all the way up and said we don’t even have a good definition
of what it means to have a user interface. So, we came up with these as working definitions
for what it means to have a user interface. And it’s just a system designed to assist
a user in selecting a function and providing a valid set of parameters to that function.
So, what distinguishes a UI and a product from, say, consuming a movie is the fact that
the user is trying to do something. He’s selecting a function and then executing it. Now, a graphical
user interface is just a user interface that’s visual and interactive. A big part of the
code we’re talking about here is not the code that deals with selecting the function, right?
You pick the menu item, you’ve got the dialogue box or you’re on the palette and the function
is implied that’s going to set the set of properties on your document. But what does
it mean to assist the user in providing a valid set of arguments to that function? So,
we have two loaded terms there. One is valid, you know, we know what arguments to a function
are. What does it mean that they’re valid? And what does it mean to assist? So, valid
is pretty easy. It’s just a predicate that corresponds to the preconditions of the function.
Assist, there are many, many, many, many probably an infinite number of ways that you can assist
a user but if you go and look at the applications, we have general patterns and you can start
to categories these patterns. So, here are four that we’re dealing explicitly with. One
is validation, which is given that you have said predicate to know whether or not your
preconditions are satisfied, you can simply–validation is simply rejecting that input, saying no,
you can’t do that. Okay. correction is the idea that, let’s say I have a valid set of
input and now I poke a value and now the system is invalid, how do I get it back to a valid
state? That’s correction. So, prediction is letting the user specify the values for the
function in terms of the desired result. Okay. So, if I want to take a file and split it–split
it into 4k chunks, then depending on the size of the file, I’ll get some number files out
the back so the user could specify the file that they want and how many pieces they want
out the back and you can calculate how big the chunk needs to be. So, prediction is specifying
the parameters for the function in terms of the desired result. Then, related values.
So, if a user is specifying, we’ll use our example here, the size of a document in pixels
which might be the only thing the application cares about, that might not be a natural user
interface. There’s a related value there of inches, which is related to pixels through
resolution. So, you end up with related pixels or related values through which you can define
the arguments for the function. So, let’s make this a little more concrete. Flip to
a demo here. Are there questions at this point? People following? Think I’m nuts? Okay. So,
this is a little picture of the UI that we had the big ball of spaghetti about before,
and this is fully functioning here. Okay. So, we can change values, their constraining
proportions to the original document within height just up there. We can toggle our display
between percent and pixels. We can unconstraint and change the value. And then we can re-constrain
and the other one updates appropriately, and if you put this in percent, they’ll always
be equal. So, that’s the basic idea for this UI. Over there on the left, you see the complete
description for the underlying model for that. And we also have a description through our
layout language of what the visual portion of the UI looks like. So, this is–looks very
much like a box model, the solver is a little more general; it automatically does things
line up colons and has a little more smarts underneath it, but things are described in
terms of a basic box model. And you’ll see these bind statements against things. You
know, there’s bind here, bind here. The bind statements are what connect the UI to the
model. So, the way most people tend to write an application is the application logic reaches
up into the UI and mucks with it. And if your application is doing that, your application
knows all about the UI; you can’t have an alternate UI on it, you can’t re-share that
logic for scripting. So here, what we’ve done is we’ve factored out the description of the
model which is how these properties are connected together independently from what UI is bound
to it. And we can demo that by opening up another UI. So, this is another UI that we
just attached to the same model. We can see the original document with the height down
there and you’ll see these are actually attached to same instance. So, if I change the UI or
if I change the values up above, they’re reflected down below and it’s running non-modal so I
can change it here and it will change up above and un-constrain and change and re-constrain.
Okay. So, two alternate UIs. Or I can do a third UI and here, I wanted to represent my
percentages, sliders. So, I’ve got two sliders and it will rule through the numbers and all
the other views. And it’s pretty traditional model view controller stuff that you probably
could have seen demoed on small talk systems at Xerox many years ago but somehow, we’ve
forgotten how to do this. What we’ve forgotten how to do is define models. Okay? So, that’s
a basic idea of what we’re talking about here. Let’s flip back. Now, when I came up with
that piece of code, I went and found one of the–I think one of the best engineers at
Adobe, a guy named Mark Hamburg. He was chief architect on Photoshop for many years. He’s
responsible for Photoshop Lightroom which is recently shipping. And I said, “Hey Mark,
this little mini image size is just a portion of the Photoshop image size. It’s just got
a couple of connections fully specified. I’ll tell you exactly what the specification for
this dialogue is.” Write the code in the language of your choice and the framework of your choice.
I don’t care about how your UI is laid out. I just want the logic for this dialogue. This,
you can zoom in on it if you want or get really close to the screen, is his resulting code.
It’s written in Objective C to Cocoa on the Mac only. The gray areas here roughly correspond
to the logic that’s in the model, and the rest of the code is code that, given a change
in the model, reaches up into the UI and manipulates it. It’s all the event handlers going on there.
>>This isn’t the full program, right? The full program would also include an .nib file.
>>PARENT: Yeah, there is no .nib file here. There’s no description of the UI. This is
just the logic for it.>>Is this showing you’re not using any bindings
[INDISTINCT]>>PARENT: What’s that?
>>Since it’s normally not using Apple’s binding…>>PARENT: Oh, no. It is–there is a .nib
file and there are bindings. So…>>It’s just not showing it?
>>PARENT: This is just not showing it. Okay.>>If there are any complications using bindings.
>>PARENT: Yeah, yeah. This is not like trying to recreate the world. This is as small and
tight as he could get it. Okay? This is my solution in comparison.
>>Now, excuse me.>>PARENT: Okay. Yeah?
>>A huge chunk of code is [INDISTINCT] to interpret it.
>>PARENT: There’s–yeah, so the amount of code to interpret that is 2,000 lines. Hopefully,
the next time I get around to rewriting it will be about 1,500 lines. In comparison,
the code for the complete version of image size in Photoshop is 6,000 lines. So, my complete
system for the UI layout engine, the solver for the models and both the parsers is smaller
than 6,000 lines. Okay? So, this is the resulting structure that, you know, if I were to draw
that, this is what we end up drawing. Okay. So, here, the notation here is a solid arrow
with a solid point is a fixed connection. Okay? That’s a directive connection. The soft
arrows here can go either way, they can flow around. The doted line here, this is a conditional
connection, so there’s a predicate attached to that which can toggle it on or off. And
that describes the complete system. So, a little more about the structure, it’s always
expressed as a bipartite graph. It’s very similar to spreadsheets except directionality
can be determined at runtime, so you can kind of flip things the way they’re going. Data
is flowed from higher priority cells out towards dependent lower priority cells. So, there’s
a link reversal algorithm that’s employed to reserve soft cycles which there are systems
where if I tried to pin two values, it becomes over-constrained. What if I only pinned one?
Then it’s not over-constrained, so that’s a soft cycle. Hard cycles are very easy to
detect. So, we just detect and reject them. And soft cycles, we just solve them. Now,
the full notation here, we have sources which just have N output, syncs just have a single
input. A cell can have zero or one input, meaning it can be derived or self can be–can
be a source and the remaining are outputs on it. And then, relationships have N-inputs
and one output always. And a conditional relationship is the same with just a predicate on it. So,
those are the basic computational structures for the system. Now, we’re attacking 30% of
the problem. We’re getting pretty good traction right now. Last time I looked, we had–this
is going in as part of–part of future of Photoshop and several other apps. And last
time I looked, we’ve got 20, maybe 25 dialogues re-expressed this way and all the thousands
of lines of code underneath them ripped out. So, it’s–took a long time to get traction
but now that I have traction, the roll is actually going pretty quickly. My estimate
is that 85% of the existing code base can be replaced with small declarative descriptions
and a small library of generic algorithms if you go and look at a product. Now, that’s
not these small declarative descriptions. There are other problems that manifest themselves
in the applications and they need to be solved. But that’s my gut feeling looking at the application
code. In fact, if you would look at all of it, I think we’re about two or just a magnitude
off from the minimal expression for any large application which Photoshop is roughly 3 million
lines of code last time I counted, which means I think you ought to be able to do express
all of Photoshop in about 300,000 lines of code. Am I off there? Sorry, 30,000 lines
of code. I’ll let you guys read the rest. I’ll pretty much close it in. So, if you guys
want more information, I’ve got some links up here. We’ve got our open source site, all
the example here, the example application that I’m running this through, and I can give
examples all day of the system. It’s available on opensource.adobe.com. You can download
it; all the code is there. Stepanovpapers.com, Alex Stepanov, if people know who he is, the
guy who created STL. He works for me on my team and we have a website just for his collected
works. One of the things we’re working on is a book, kind of bottom to top here. These
are the drafts of the book. The first failed draft, second failed draft and the draft we’re
on, so we will get this thing out one of these days. But please read them, give us feedback.
We’re specifically most interested in feedback on the talk which is our current attempt.
This is the book that I think Alex should have written when he release STL, well, almost
15 years ago now. People look at STL and think it’s a nice container and algorithm library
and not it’s a different way to think about code. So the point of this book is to explain
what means to do generic programming, not to teach you about how to use STL. That’s
it. Adobe’s tag line. Question.>>What concept that here’s long programming
that [INDISTINCT] address much here and also that currency? I mean, does it work in [INDISTINCT]?
Are there any problems?>>PARENT: So…
>>Repeat the question or use the microphone?>>PARENT: Sure. I can–I can repeat the question.
Yeah, so the problem of concurrency, we’ve done a little work on concurrency usually
when we’re stuck on something else. Our two side projects have been–have been threading
issues and Unicode issues. I think the same basic techniques apply to concurrency meaning
that successful systems that I’ve seen that are concurrent are written basically in one
of two ways, either in a structure similar to a Unix pipe system which is basically flowing
data out into a dag and at each node, you have a cue and the cue provides enough elasticity
at the join so that you can wait until you have enough data to progress along the dag.
The other systems are ones where they are algorithmically parallel where I can do, you
know, a reduction algorithm, things of that nature. So if you look on kind of the algorithm
parallelization side, there’s a lot of work that’s been done on, you know, there’s the
staple project at take–Texas A&M which is parallelizing STL algorithms. I’m drawing
blank on an example but there’s many out there, especially in the domains of linear algebra,
image processing and we have a fair amount of work going on at Adobe in those domains.
On the how do you structure an application into a sequence of processes, you know, basically
even though maybe they’re threaded, they’re not full processes but how do you go about
building an application like that and describing the structure much as you would, say, on the
command line of a Unix shell to stream together a bunch of–a bunch of Unix tools. How would
you build complete applications, piping a bunch of components together? And that’s where
my team is most interested, and I think there’s some good work to be done there. So I don’t
know if we’ll ever get to it seriously but that’s my thought on it currently.
>>I see you’re trying to mold STL as the generic code right now, [INDISTINCT] ask you
then. You didn’t actually use any [INDISTINCT] to your solution. I’m just wondering if there’s
a reason for that.>>PARENT: Yeah, the code under the hood certainly
uses templates and the generic programming language itself is.
>>[INDISTINCT] language in terms of a bunch of [INDISTINCT] things, the C++.
>>PARENT: Oh, do it as like a metaprogramming language? It’s not part of our goals. One
of the guys at Adobe has done that for the layout engine. There’s a–there’s a domain
specific language done as metaprogramming in C++. Our goal is to get the mathematics
correct to describe what the computational structure of these systems are. And give a
generic description of what it means to solve one of these systems; not to do C++ metaprogramming.
>>Well, if you recall, the [INDISTINCT] if you–like are you trying to convince people
that C++ is a good generic programming language if you’re not even using that in your model
library?>>PARENT: Well, all the libraries are implemented
in C++. Do I think C++ is a good generic programming language? I think C++ is the closest approximation
that we have to a good generic programming language and getting better. So, you know,
Alex wrote STL in the Ada long before he wrote it in C++. And it’s been written in–portions
of it in Scheme and Java and C and we still tend to product type codes, sometimes in Scheme.
So, it’s the ideas of generic programming, supersede what language you express them in.
You know, typically, what we do is we work out what’s the generic description of this.
What are the axioms and what is the algorithms? And then you go through the painful process
of how do you map that into C++, you know. What I want is to just be able to write a
type function and what I have to do is write traits classes and partial specializations
to get them to dispatch in C++ because I can’t just write a function in the natural terminology
that manipulates types. So, yeah. So–and you had one more point there which was–which
is are we trying to convince people of this? I’ve got nothing to sell. I mean, we’re a
very small team. Everything we do is open source. The reason why we open sourced it
is because it’s interesting problems and I’m interested in collaborating with other people.
We do a lot of work with Indiana University and Texas A&M. I see a general trend in this
direction. I think eventually, this approach wins out, you know. By this approach, I just
mean a mathematical approach to programming. I don’t think we can continue to treat programming
as an approximation. Our systems are just getting too big and too complex. And without
perfect approximation, it just means more and more and more failures. So…
>>What kind of bug rates have you seen with this approach and are the–are the bugs easier
to solve or are they harder?>>PARENT: So, we’re seeing pretty much zero
bugs so far. The ones you have, your description out there. The challenge is getting the model
written in the first place and the system right now does a fair amount of checking,
not quite as much as I would like, but it does enough that it tends to bitch at you.
So, the whole reason for coming up with a way to draw the picture, and eventually we’re
going to wire that in with graphics and use dots so you can–the system can automatically
draw the pictures for you, is that–is that, you know, if the system bitches and says there’s
a cycle here because you over-constrained the system to finding 200 relationships; you
need some way to figure out well, what did I do wrong and how do I get it right? So,
there tends to be more thought process up front and it’s more difficult to get kind
of anything working. You know, once you get it working, you’re done. Here it comes.
>>So, back to the question whether C++ is a good language for generic programming.
>>PARENT: Yeah?>>You talked about the inadequacies in C++’s
support for concepts and especially about the goal of expressing concepts at run time
not just at compile time. What is missing from Haskell-type classes that would make
them useful as a representation of concepts?>>PARENT: I would go and read–let’s see,
what is that? Doug Injako and that crew did a survey paper not too long ago that covers
that in far more depth than I could give it justice here because I’m not much of a Haskell
programmer. My opinion is I would program in Haskell if I had a reason or some–you
know, whatever language worked the best if I had a path to deploy that in products. To
fund my team, I have to ship software. And right now, that means I ship in C++ so I’m
not a religious zealot when it comes to C++. I do think it’s the best language out there
for concept-based programming but…>>So when you’re making UIs, like, as a programmer,
you sort of already naturally think of it as a single when you’re designing, you know,
like it’s [INDISTINCT]>>PARENT: Correct.
>>So [INDISTINCT] to you and your newer model language for describing it. But have you tried
complying it to, like, [INDISTINCT] that aren’t already compliant?
>>PARENT: We’ve done somewhat wacky things with it. We’ve used the modeling system to
describe overload sets for functions in C++. So you described the relations on a set of
parameters and the system will go through and iterate what all the possible overloads
for that function are that have any meaning. We’ve used it to, well, the layout engine
which I didn’t talk too much about here. We’ve used the layout engine which usually does
2D UI layouts to describe 3D worlds and the same thing that connects baselines, made sure
that the you had paths between rooms that, you know, you didn’t end up with completely
walled in spaces. The solver here has been used for doing dependency tracking for installers.
What else? Somebody at Apple used it for doing music mixing. And the graph solver itself
is fairly independent of what the equation is on the node; it doesn’t–it doesn’t care
what the function is on the little circles, just that things are connected. So you can
plug whatever you want in there. Some people have used it for music processing. You know,
how far–how far it goes? I don’t know. I mean, we’re– from the–from the UI standpoint,
we solved a good chunk of the problem right now but we still have little pieces that are
hanging out that we’re focused on, not figuring out more domains for it.
>>Well, does this approach lend itself to maintainability? I mean, you said the main
problem is coming up with the model>>PARENT: Yeah.
>>Now when we want something else or changes a little bit, do you already think about that?
>>PARENT: Yeah, it lends itself pretty well. The interesting thing is developers have a
hard time changing the way they think. Okay? So what tends to happen is frequently, the
request comes in and says, well, the UI designer says they want it to work this way. And that
might just might be a view behavior on the outside that has nothing to do with the model,
the model itself doesn’t change. And getting developers in the mindset where if it does
come and impact the model, I mean, this is a model of parameters to some function and
you–and the goal is to try to make the model as complete as possible and that gives the
UI designers much flexibility as they want to put a big UI on it or a little UI on it.
You know, you don’t have to expose everything; you can expose whatever sets that you want
so long as the model is consistent and correct. So the system tends to be fairly easy to maintain.
What tends to be hard is recasting the problem into what is–what is the relationship here
that is being asked for. How do I express that? And since what I’m modeling is typically
a set of parameters to a function or a set of properties in a document, going from that
relationship to say, well, does that imply about what the function has to do? These things
just don’t exist externally. Okay? So you have to, you know–I’m going to go change
the behavior of this function and what it does and then I’m going to go express that
in my model and that’s–it’s just a different way to think about software. I mean, my experience
so far has been the more years somebody has spent doing object-oriented programming, you
know, myself included, you know, over decades worth of object-oriented programming, the
more trouble they have expressing things in terms of the relationships within the model
as opposed to how events are flowing through the system. So you know, one typical example,
people will say, well, I want this checkbox–when this checkbox is checked, I want this thing
to be disabled. Okay? Well, what it means, when something is disabled in the system is
that it’s disconnected from the output, that no matter what value you put into that cell
in the model, it could have no effect on the result. Okay? So there isn’t any way to say
disable this thing when I click this checkbox. What’s implied by that request is that there’s
some relationship to that checkbox that tends to disconnect that cell from the output. What
is that relationship? Write that down and the UI will just work. Okay? So, are there
questions? I’ve got, I don’t know, three minutes. I can give more demos or ask–answer a question
or…>>Have you had any interests in your QA team
using a model as sort of a [INDISTINCT] that they can work off?
>>PARENT: Yeah, the QA team likes the models because it gives them, you know, a very concise
spec of what’s going on. The other thing is we have a set of just command line tools that
you can check the models independent of the app. And it’s also nice. You can, you know,
we’ve got a little sample app here, so you can bring up a model, you know, they’ll function
attached on the back end so I can bring up, like, the complete image size from Photoshop
with all the 14 widgets and all the connections between them and fully test the thing without
launching Photoshop. Okay? And from a UI designer standpoint, it means that I can play with
my UI and say, oh, I want three panels instead of two panels and I want a reveally widget
and I want a slider here and never have to touch the code. So they like that. The one
thing that–where we do have tension with QE is QE is used to working in kind of a puppet
string world where, you know, a lot of the automation test tools will, like, reach into
the UI and say what’s the state of this checkbox? And in our system, our checkbox don’t have
any real, intrinsic states, they’re simply a view of the underlying model. So you can
ask the model what’s the state of this cell but it’s–it doesn’t make any sense to ask
the checkbox what its state is. And our automation QE team has a hard time getting past that
hurdle. They’re like, well, you know, what if the model is correct but the connection
to the UI is wrong? How do we test that? So I think that’s just friction that will fold
out over time.>>The–90% of the computer scientists that
[INDISTINCT] how do they fit into this new model with a program? You think this is going
to be easier to teach them or you think it’s not as important as you [INDISTINCT]
>>PARENT: Yeah, so I think eventually you end up with a shift in what it means, you
know, for people to be programmers. I think assembling collections of, say, generic algorithms
is an easier task than writing the generic algorithms to begin with. I think even, you
know, constructing these models for a system is somewhat easier. I can show you. It’s a
hundred lines for the image size in Photoshop and it’s way easier to write that hundred
lines than it is to write the 6,000 lines of code in Photoshop, and Mark can probably
attest to it. And the 6,000 lines is also wrong and I can prove to you it’s wrong.
>>I didn’t write that code.>>PARENT: I wrote one version of it. So,
I do think it makes it easier, but I think what you want is a shift. I mean, at this–at
this year, at this point in history, I think it would be great, you know, if you could
get a PhD by trimming a couple of cycles out of one of the STL algorithms. That should
be, you know, get you a PhD. Coming up with a new algorithm that’s generally useful certainly
should be worth a PhD, and yet, right now, very few people–you know, we don’t have a
massive library of this stuff. Very small libraries and everything is very primitive.
And my team spent the last week and a half working on a formal definition of integers
because math–the mathematic definition of integers doesn’t hold-up real well when you
integers of a fixed range. Right? We can define Z but we can’t define what Z over range of
zero to N is. Because N is Z if you…>>[INDISTINCT]
>>PARENT: What’s that?>>[INDISTINCT]
>>PARENT: Z sub N which doesn’t quite capture, say, signed arithmetic. It works pretty good
for twos complement unsigned. So, also, Pno axioms which is how mathematics are defined
are in terms of successor. So, you start with zero over one and then you add you one and
you can build up all of mathematics with just successor with basic increment. From a computability
standpoint, that’s very unsatisfactory. Right? We don’t–we don’t add two numbers by one,
two, three, four, okay? It just doesn’t work that way. So, coming up with kind of re-casting
mathematics into axioms that apply to computer science is a part of the work and, you know,
eventually, these should be textbooks and, you know, we need our Oilers and our Euclids
and those folks. Any more questions? Is it on?
>>Yeah, this is back on. So, thank you, Sean.

32 thoughts to “A Possible Future of Software Development”

  1. It's still the same old approach to software engineering though: look at a class of specific problems to see if it can be solved by a more general solution, which is basically a reusable library. By properly limiting the scope and expressiveness of the library via a DSL, you reduce the probability of errors. Somebody still have to have solved the class of problems before one can embark on writing such generic libraries.

    There is just no silver bullet in software engineering.

  2. As with geometry, there is no royal road to computer science. It is a challenge to the industry to learn to collect our knowledge into (re)usable components and the responsibility of every professional engineer and scientist to contribute.

  3. Agreed. The video is a great case study on how to build a quality reusable library/framework.

    Kudos to you and everyone involved to make ASL open source in MIT license. I'm one of the fortunate (paid) open source developers as well.

    My original comment meant to point out that the title of the video might be a bit misleading to people who're expecting new methodologies and/or new tools/languages.

  4. Sorry – I haven't been monitoring the questions here. Paper mentioned (incorrectly attributed to Doug Gregor – who's done other good work) is:
    Ronald Garcia, Jaakko Järvi, Andrew Lumsdaine, Jeremy Siek, and Jeremiah Willcock. An Extended Comparative Study of Language Support for Generic Programming. Journal of Functional Programming, 17(2):145–205, March 2007

  5. This sounds a great deal like why Erlang uses functional programming (no side effects) and actor-based messaging. A spreadsheet is the most common sort of functional programming. You have established a sort of analogy between a spreadsheet, a DAG and your programming tasks. It seems like this suggests that we are going to see more functional programming. Time to brush up on Lisp/Scheme/Clojure, Erlang, Scala or F#.

  6. @ProphetVS concrete class with template constructor taking an arg of any T and holds the arg via type-erasure. Operations on the held arg are done through free-functions found via Argument Dependent Lookup.

  7. How is that a correct binary search solution? If you have an array [0,1,2] it returns the same result when you search for 2 or 3…

  8. @jgoemat The range is a semi-open interval. So you would call this as:
    int a[] = {0, 1, 2};
    int* p = lower_bound(&a[0], &a[0} + 3, 2, less());

    The result for searching for 2 will be "&a[0]+2" and searching for 3 will be "&a[0]+3". The later being the end of the array so it would be invalid to dereference the pointer.

  9. @seanparent65 Ok, so instead of 'last' meaning the last element of the array, it actually means a pointer to one element past the end of the array?

    Also, the caller must then check if the returned value is past the end of the array and add the element to the end if so?

    And if the value is not past the end of the array, the caller must check the value at that location to see if it was actually found?

    Seems like bad practice to me.

  10. @jgoemat "last" refers to the last position, not element, in the array, so yes, one past the last item. By convention this interval is denoted [first, last). For an array of n elements there are n+1 possible positions and n+1 possible results for lower_bound. The return value is the first place the element being searched for could be inserted without violating the ordering. This is an algorithm in the C++ standard library.

  11. I know its been years past, but I am compelled to comment on this very important subject. [1] I think your approach and support of 'generic programming' as a fundamental is a good one. For, as you observe, much of the technology in CS (including hardware) really do possess a common root system (pun not intended). Service orientation, workflow, concurrency, grid computing, robotics (and any MOM), any form of networking (like network axiom) all share immense commonality, in principle.

  12. (cont.) [2] My second point, however, will argue that we will never be able to boil away variation, only minimize it. Your view of the programming discipline agrees with this, since we must deal with complexity and consequently variance. Consider the problem of infinity in either mathematics or CS. One must gulp hard and accept it as a reality. In my view, a 'model' is inherently approximate and non-atomic, a perspective – a resolution – of reality. Therein lies the 'art' of science.

  13. (cont.) Indeed developers must change the way to think about software code and design. We can thank the multi-core shift for that. The biggest primitive of all to manage is data vs messaging, which yields behavior. OOP is a nice paradigm for modeling data, but not as readily for modeling messaging. They must be married to be effective. Service orientation offers much to this problem but I think its a deeper problem. We need more primitives (software and hardware) to deal with these things.

  14. No – I would have to say we haven't succeeded. STLab is no more (but we did manage to publish Elements of Programming before our demise). I've returned to Adobe working on mobile software (you can see a demo in my C++Now keynote). ASL is still kicking and used in _many_ Adobe products, but not to the degree I'd hoped. There is a research project at Texas A&M continuing the property model work (publishing and a JS library) – I'm hoping to collaborate with Prof. Jarvi on this next year.

  15. Wonderful lecture but can you elaborate exactly what is generic programming and declarative programming? How exactly are they different from what's being used in today's computer programming?

  16. The term Generic Programming was first coined in the paper: "Generic Programming", Musser and Stepanov '88. It is the process of defining algorithms and data structures in terms of the axiomatic properties of types and their operations rather than on specific, concrete, types.
    Declarative programming is defining a program in terms of rules and constraints rather than specify an imperative sequence of actions.
    Both constitute programming at a higher level of programming than is common today.

  17. This is flow based programming. It is (as most things in computer science) nothing now. Applications like Houdini have used it for years. However it is interesting how it is used here. I haven't seen that before.

Leave a Reply

Your email address will not be published. Required fields are marked *