Teaching from Simple Abstractions

(You need to know programming to understand this post. If you know what linked lists are, that’s enough to get the general point, but more knowledge would be more helpful.)

Within the Programming Languages community, there’s a subcommunity that thinks a lot about education, especially for introductory courses. Two main approaches are SICP approach and the HtDP approach (which addresses shortcomings in SICP). I have taught the SICP approach several times at Berkeley using Scheme, and have talked with people who have taught the HtDP approach several times. At Berkeley, the first thing we teach is the sentence data structure (from Simply Scheme), whereas HtDP starts with the linked list data structure. I want to talk about one advantage of the sentence approach (but note that I’m completely ignoring the main justification for HtDP’s approach, which is mostly orthogonal to this point).

First, an aside, to explain what these data structures are. With the linked list approach, students are only given cons, car, cdr, and the empty list '(). (You may know these as cons, first, and rest, or cons, item, and next.) They write standard list functions like list, append, and length themselves using recursion. Some examples:

> (cons 1 '())
> (define lst (cons 1 (cons 3 (cons 5 '())))
> lst
(1 3 5)
> (car (cdr lst))
> (cons 7 lst)
(7 1 3 5)
> (cdr lst)
(3 5)

Note in particular that the first argument to cons must be an element of the list, while the second argument must be a list. If you disobey this rule you get strange results (or possibly an error message, depending on what language you are using):

> (cons 1 3)
(1 . 3)
> (cons 5 (cons 6 7))
(5 6 . 7)
> (cons lst lst)
((1 3 5) 1 3 5)

Sentences are similar, but they remove most of the restrictions on cons. Instead of cons, we have sentence, which can take any number of arguments, which can all be lists, or elements of the list. Instead of car, we have first. Instead of cdr, we have butfirst. In addition, we add selectors last and butlast. Sentences are typically made up of words and not numbers:

> (define request (sentence ‘pass ‘me ‘the ‘salt))
> request
(pass me the salt)
> (sentence ‘would ‘you request)
(would you pass me the salt)
> (define prefix (sentence ‘would ‘you))
> (sentence prefix request ‘please)
(would you pass me the salt please)
> (first request)
> (butfirst request)
(me the salt)
> (last request)
> (butlast request)
(pass me the)

Sentences are implemented as linked lists that are forced to be flat (not nested). The sentence constructor looks at all of its arguments and smooshes them all together into a single flat list using combinations of cons, list, and append. first and butfirst are the same as car and cdr. last and butlast are implemented as recursive functions on lists. (This is a bit simplified — in reality these functions work on words too, but we can ignore that here.)

I am a fan of sentences because students don’t have to worry about the type and number of arguments that they give to sentence — it just works and does the obvious thing. On the other hand, with cons, you have to make sure that you only give it two arguments, the first of which is an element, and the second of which is a list. This is hard for students to keep track of in addition to all the other things they’re keeping track of. (Once we introduce cons to them, they have all sorts of bugs involving incorrect applications of cons.)

One proponent of the HtDP approach has suggested that linked lists are better because it’s easier to understand how it works. (Again, this is not the main argument for the HtDP approach.) cons is the simplest possible thing you can think of, all it does is “glue” any two things it is given together. On the other hand, sentence is fairly complicated — it’s not clear to introductory students how on earth it “knows” to distinguish between words and sentences and do the right thing with each of them. Students would have no idea what’s going on behind the scenes of the sentence abstraction.

One issue I have with this is that students don’t know what’s going on behind the scenes of cons either — “gluing two things together” is still an abstraction. Sure, sentences are a higher level of abstraction than linked lists, but I don’t see why that matters. Perhaps it’s easier to believe that cons works than to believe that sentence works.

But the real issue I see here is that this puts a focus on the operational semantics (how the thing works) instead of the denotational semantics (what the thing does). It seems to me that you should aim for the simplest possible denotations, not the simplest possible operations. (Use the data structure where it is easy to explain what it does, as opposed to how it does it.) One of the central tenets of programming is abstraction, where you don’t think (much) about how the components you are using work, but just know what they do so that you can build them into bigger components. The denotations of sentences are much easier to explain, understand and use than the denotations of linked lists (to introductory students, at least).

This reasoning suggests changes to the ways we teach other programming topics:

  • Data structures: It may be preferable to first simply list out all the data structures that the class will consider, list the methods that they support and their running times, and first teach the skill of how to pick data structures for the task at hand and later teach the skill of how to come up with new data structures and to implement them yourself.
  • Memory management: Instead of explaining how a C program has both a “stack” and “heap” and “pointers”, you could start working in a pedagogical language where there are no functions and memory is a single long (but not infinite) array. There are no registers and no calling conventions, but you could still have high-level primitive operations (such as drawing pictures), values in memory could have different sizes, and values could retain type information. You could then show how to build calling conventions, the stack, the heap, and the idea of pointers from there. (This is all assuming that students already understand arrays from some other higher level language.)
  • Security: Rather than delving into real-world security problems (SQL injection, cross-site scripting, buffer overflows), start with a pedagogical application with security flaws that don’t depend on an intimate understanding of the underlying programming language or technology or protocol that’s being used. For example, the application could have TOCTTOU vulnerabilities — perhaps when you click “sign up and get $25 added to your account”, it redirects you to a page where you have to confirm, after which the $25 is added. It only checks whether you’ve already signed up for a free trial on the first page, so you can click the button many times and confirm all of them, to get as much money added to your account as you want. It could allow you to ask queries about your friends, but simply assume that the name you enter is a friend instead of checking, leading to privacy violations (bad input validation is what caused Heartbleed and is typically the entry point for buffer overflow attacks). It could have a secret admin panel, that relies on security through obscurity of the URL. You could also build a simple interpreter for a programming language and show how using this language with user input allows for injection attacks. (It’s not clear that this is simpler than just showing SQL injection though.)
  • Andrew

    Synced from FacebookIf I understand correctly, you argue that “when using abstractions to teach, pick the one that’s easiest to reason with”, and suggest that sentences are easier to reason with compared to pairs.

    One counter to that second point is that many students dislike toy examples or would prefer learning “how things really work”. Bringing this back to your example, you could argue that the pair abstraction is better suited to introducing students to sequences, because at the core, that’s what a linked list is–a sequence of pairs.

    Another point–once you teach students sentences and then reveal that they are implemented with pairs, then you have to deal with the complexity of translating sentences into pairs. In some sense, you’ve backloaded more things for the student to master.

    I think HtDP probably avoids the issue of the pair abstraction being more difficult to initially master by by putting more weight into getting the mechanics of manipulating pairs. Not 100% sure though, since I haven’t taught HtDP personally. SICP (Purple book, not Composing Programs) as Berkeley has taught it, has never been very good at deliberate, repetitive drills.

    Another way to see this is that you’ve said in some sense “I would rather them to Recursion early and so I’ll find the abstractions that best let the majority of the students focus on that.” Then we get into the argument of when students should tackle Recursion and what should be the focus earlier on in the curriculum of a course.

    Nit: “Sentences are typically made up of words and not numbers”–we can totally sentence together words and numbers though. All numbers are words.

    • Synced from Facebook“many students dislike toy examples or would prefer learning “how things really work”

      Agreed, I’m not suggesting that we not show students how things really work. I’m suggesting adding intermediate layers. (So first teach sentences, and then show them linked lists, and so on.)

      “then you have to deal with the complexity of translating sentences into pairs”

      Yup, that’s true, and that is a point against. I don’t think it’s too bad though. I think Scheme 61A could have done better here by enforcing that list selectors not work on sentences and vice versa.

      (Semi-related: One of HtDP’s main strengths is that where we tell students “don’t do this, it’s bad”, HtDP languages will often throw errors when students do bad things. I think that would have helped us a lot.)

      “I think HtDP probably avoids the issue you’re talking about by putting more weight into getting the mechanics of manipulating pairs”

      Idk, we put quite a lot of weight on that in lesson 4, and it is still always a sticking point for students.

      “Another way to see this is that you’ve said in some sense “I would rather them to Recursion early and so I’ll find the abstractions that best let the majority of the students focus on that.””

      Actually, HtDP is aiming to do the same thing. And in fact, they probably get to recursion faster than we do, because they start with *so little* (just cons, car, cdr, null?, if, define, and arithmetic, I think). They have a “structural recursion” problem solving technique that basically writes all the code for you with a few blanks that you fill in. (Similar to how Simply Scheme motivates every and keep.)

      It’s more like I’m saying “teach one thing at a time, first do recursion, then deal with pairs”. The HtDP method arguably also achieves this, mainly by restricting the functions they can write to such an extent that they don’t have to think about complicated uses of cons.

      “we can totally sentence together words and numbers though.”

      Absolutely, but the examples we use more often involve non-number words, because those are more intuitive. It also makes more sense for sentences of words to be flat (vs. say lists of numbers, where there’s no reason to be flat).

  • Pierre

    If I were to summarize this, it would be something along these lines:
    A good way to teach programming topics is starting with simple primitives and exploring the relevant concepts using these. Whereas people may interpret “simple” as “simple to implement”, we should instead think about primitives “simple to interpret”.

    I think this is a good idea, and I like the examples along this line. It’s more of a top-down teaching style, from concepts to structures/code, rather than the other way around. Instead of building a world little by little, you build a sequence of increasingly complex worlds.

    This approach certainly gets students thinking about many more rulesets than the bottom-up approach. I think this is generally better overall, although they may sometimes mangle the rulesets. For instance, I know many students tried to treat linked lists as sentences when progressing in Berkeley CS (possibly leading to the confusion about cons you identify). In your 3rd example, it’s unclear how privacy violations relate to buffer overflow attacks. You might imagine a student learning that bad input validation leads to privacy violation, then trying to think of ways how Heartbleed causes privacy violations.

    As long as this problem is attacked (e.g. by emphasizing the changes when moving between worlds), I think it’s really worth trying.

    That said, on linked lists vs sentences, I don’t think there’s such a big difference. The rule that “the first argument to cons must be an element of the list, while the second argument must be a list” is a confusing rule because the abstraction leaks in Scheme (or because you’re introducing the wrong abstraction, linked lists vs plain cons), not because there’s something inherently confusing about linked lists. This can be fixed by changing how cons behaves, with little change to linked lists as a concept.

    • Yeah, I agree with your summary, that’s a nice concise way to put it. And yes, in practice this does end up being top-down. Although the suggested change for memory management makes it a bit more bottom up — rather than being told how C works, you start with an assembly-like language and build the programming conventions that give you pointers, functions, the stack, and the heap.

      I do think the confusion about cons is inherent to cons — you see this confusion even in Python 61A, where students never see sentences at all. A properly type safe version of flat linked lists would help a lot with the confusion. (Neither Scheme nor Python 61A have this property.) That said, there’s still more to think about with linked lists (there are more restrictions on how you can use cons) so I expect that there would still be issues. This likely only matters for introductory students — I expect that the difference is too trivial to really matter for anyone who has had a semester of experience already.

      The point with the security example was to show that input validation is important. I do see the point that students may draw the wrong lesson from it though (input validation causes privacy violations). Though on the other hand, Heartbleed *can* cause privacy violations 😛 But yeah, I agree with you that it is a problem, but also that the problem can be attacked.

      • Pierre

        Yeah I’m not quite sure what to do with the 2nd example. For this one, it seems you have a case of both simple operational and denotational semantics. If I were someone who emphasized simplicity in operational over denotational semantics, I can’t think of a better way to teach it. Maybe you do?

        That’s why I thought of this way as emphasizing top-down over bottom-up, as in the cases where it is bottom-up, it reduces to the same as simple operational semantics.

        “A properly type safe version of flat linked lists would help a lot with the confusion.”
        — I wonder if students learning programming in Haskell (they exist!) have this trouble.

        • Yeah, that’s fair, the second example is just an example of using simple semantics (both operational and denotational). So I guess my strategy in choosing examples was “try to find ways to teach that use simpler denotational semantics than standard approaches”, and in that case the example I found has both simpler denotational and simpler operational semantics than the standard approach.

          Of course, the advantage of the standard approach is that you don’t have to build a brand new assembly-like language, and students learn about languages that are actually used in the real world, rather than wasting brain space on pedagogical languages. I’m not sure how valuable that is though.

  • Mehrdad

    Synced from FacebookHm, it seems you’re saying once you teach bubble sort then people should stop worrying about _how_ to sort their data and just use it whenever they need to sort, right? (After all, the denotational semantics of all sorting algorithms should be the same.) Which means now they start writing programs that don’t behave sanely. Wouldn’t this be bad? Wouldn’t you need to know the internals & operational semantics to understand the complexity, efficiency, and failure modes of your program?

    • Synced from FacebookNot exactly. I’m saying that you should figure out what it is you’re trying to teach, figure out what abstractions lie *beneath* the thing you’re trying to teach, and choose the simplest possible versions of those that still meet your constraints when teaching the thing you’re actually trying to teach.

      In your example, if you’re trying to teach sorting, of course you also want to teach the time complexity of various sorting algorithms as well, and so you wouldn’t just stop at bubble sort. My reasoning would say “figure out the abstraction beneath sorting and make it simple”. The abstraction beneath sorting is arrays, and an imperative programming language. I can’t think of any way to make that simpler, so I wouldn’t change how sorting is taught.

      On the other hand, if you are trying to teach something that uses sorting where time complexity is not important, then this reasoning would suggest that you should use the sorting algorithm that is simplest, ignoring time complexity. Of course, *all* sorting algorithms have the same denotations (they do the same thing), so you could use any of them. (If students had to implement the sorting algorithm themselves, then I would recommend using a simpler algorithm like insertion sort.)

    • Mehrdad

      Synced from FacebookOh I see. I’m fine with your first paragraph here, but then doesn’t it run against what you said in the article? Here (where you say “figure out what abstractions lie *beneath* the thing you’re trying to teach, and choose the simplest possible versions of those that still meet your constraints when teaching the thing you’re actually trying to teach”) you’re really saying we should find the sorting algorithm with the simplest operational semantics, whereas in the article you’re saying “It seems to me that you should aim for the simplest possible denotations, not the simplest possible operations”?

    • Synced from FacebookWell, all sorting algorithms have the same denotational semantics, so given that you might as well go with the one with the simplest operational semantics. If some sorting algorithms had weird edge cases (eg. only working on small numbers, or only working on lists of length 5 or more, or potentially never terminating) then I would recommend not using them since their denotational semantics are more complicated. (For this reason, I would recommend not using counting sort, bucket sort or bogosort.)