Skip to content

Programming without Text Files

July 19, 2013

When I was an undergrad, I remember losing marks on a programming assignment because the teacher didn’t like that my source code spanned more than 80 columns, which didn’t play nicely with his Emacs setup. I also remember having arguments with people about the use of tabs vs spaces for indentation, or whether curly braces should be placed on their own line or not. It’s somewhat amusing to think that in 2013, decades after the first high-level programming languages (FORTRAN and LISP) were invented, we’re still programming by writing source code in ASCII (or UTF-8) text files. In some ways, programming hasn’t changed all that much since the days of the PDP-11.

This is somewhat strange to me, because as a compiler designer, I can tell you that the first thing any compiler typically does is to parse your source code into something known as an Abstract Syntax Tree or AST for short. This step is necessary for the compiler to be able to extract any meaning out of your program. It isn’t just compilers though. Static analysis, documentation generator or lint tools need to do the same thing: textual source can’t be manipulated directly. Working with text-based source has advantages in terms of simplicity and interoperability, but I believe that it will become advantageous to move to designs where source code is represented and edited directly as a hierarchical data structure instead. What I’m proposing is that programming languages of the future shouldn’t be based on text-files. I believe that working with streams of characters imposes limitations on language design, limits expressiveness and ends hurting tool support and interoperability in the long run.

In terms of language design, working with linear streams of characters means we have to deal with silly things such as:

  • Designing a grammar that avoids ambiguity
  • Delimiters and operators not based on formal notation, but rather on ASCII characters
  • Limiting the length of keywords to save screen real-estate
  • Ambiguity in lookups and the names of symbols, name collisions, variable shadowing
  • Fixed-width fonts and inconsistent source code layouts
  • Significant vs insignificant whitespace
  • Tacked on documentation and source annotations through special comments

As for expressiveness, consider one of the most powerful ideas behind LISP: that of macros. The C language has “macros”, but they’re based on text substitution. It’s generally agreed that C’s macro system is weak and error-prone, and its usage is often discouraged. In contrast, LISP macros are based on the idea that you have macro functions that are executed at compilation time. These functions generate an AST (new programming code) to be inserted into your program. The LISP macro system is very powerful because it allows you to create new language constructs that integrate relatively seamlessly into the language. These new constructs can form a domain-specific language (DSL) which you can then use to better express solutions to the problem you’re trying to solve. Why don’t more programming languages have a macro system like LISP’s then? The problem is in large part that it’s not practical to implement such a system in other languages, because they’re text-based.

scheme

LISP source code is also text, but its parenthesized format is very regular: it represents an AST in the form of nested linked lists. Programmers can generate and manipulate code in the same nested representation internally, and even execute generated source code on-the-fly using the eval function. One of the main issues people have with LISP-like languages is that of readability. I believe it should be possible to improve upon LISP by bypassing the parenthesized representation and programming by editing the underlying data structures directly instead. At this point I should clarify that when I speak of programming languages that aren’t text-based, I’m not suggesting visual languages you edit using a giant touch interface as seen in minority report, nor do I mean languages based on boxes and arrows that require two dozen mouse clicks to implement one addition. I’m thinking about enhanced Integrated Development Environments (IDEs) you can use with a good old keyboard, in a workflow fairly similar to what you’re already used to.

noooo-minority

What would this buy us, besides having a language that is potentially more LISP-like? For one, if such a language were to have macros, they could have an even more seamless integration into the language than what LISP allows. These constructs could have their own customized visual representations. They could also have additional semantic information attached for better integration with your IDE and compiler. Imagine being able to design a calculus-oriented DSL which is visually represented using mathematical notation, has associated error-detection and autocompletion in your IDE for quick editing and is automatically optimized based on rules you’ve made known to the compiler. Imagine being able to request the AST for a function so that you can transform and instrument it on the fly. Now imagine being able to visually format source code any way you want, without having to worry about tabs, spaces, curly braces, commenting styles or the width of other people’s displays.

In my mind, it’s quite clear that there is much to gain by editing and storing source code using a data format that more directly represents its actual semantics. One of the bigger obstacles for this to work, however, is that of designing suitable editing software. People are relatively comfortable writing source code as text, but editing ASTs seems like it would be more tedious at first glance. It should be possible to make the editing easier and faster by having the editor produce accurate auto-completion suggestions. Since the editor has direct access to the AST, it would become possible for it to make better auto-completion suggestions based on knowledge of language semantics. Another possibility that opens up for speeding up editing is that of building your own pattern-based shortcuts. For example, you could have your own collection of macros (code generator functions or patterns of source trees with holes in them) to be expanded at the touch of a few keys.

What about tool support? As it stands, many programming languages, even among the most popular ones, have rather poor tool support. Part of the difficulty comes from the fact that tools need to parse and generate source code. If you take a languages like C and C++, for example, it can actually be difficult to find a parser that supports the complete grammar. If you want to write a tool that does code transformations, you also have to worry about issues like preserving comments, which is sometimes difficult, especially if the parser you chose has no support for this. If source code were represented in a more uniform data structure, it might actually be easier to write tools that parse, analyze, and generate code. It would also become easier to add metadata to existing source code. In the end, I believe that transitioning to this kind of programming model would enable significant gains that would largely outweigh the disadvantages.

34 Comments
  1. hi, I’m experimenting with making such an editor http://celeriac.net/iiiiioiooooo/public/ – very primitive now

  2. David Pokorny permalink

    LyX is a little bit like this. Now, it isn’t programming, but at least you get the sensation of dealing directly with the ASTs. They’re working on this for teaching programming at snap.berkeley.edu/run.

  3. Hello Maxime, I think you are ready for Smalltalk now.

  4. Craig Henderson permalink

    Text files are the lowest common format for writing, so language constructs are constrained by what can be expressed in a text grammar, as you describe. A language that’s is not text based will, I guess, have to be tied to an environment that can edit it efficiently, and then you get compatibility problems across platforms, language vendors and even versions.

    Can you give some examples of what a non-text-based language would look like? I’m not clear on your distinct from Visual Languages.

    Interesting article. I agree there’s a problem, but I’m not sure if you have a solution yet?

  5. Guido permalink

    Have a look on JetBrains’ MPS.

    • Federico Tomassetti permalink

      I was about to suggest the same!
      By the way me and other developers using jetbrains mps just started a blog to help people dive in in this magical new way of prgramming:

      jetbrainsmpscommunity.wordpress.com

  6. Jan permalink

    You may find MPS from Jetbrinas (http://www.jetbrains.com/mps/) interesting.

    • Very similar to what I wrote about. They even mention better integration of macros and designing your own DSL. Nice to see that there is a real instantiation of this and that it’s a product people actually use.

  7. ASTs have this problem of just not being very legible. The trees tend to be a lot larger than their associated text forms. People also don’t think in terms of the AST, but rather the way their textual code looks. I admit, this may or may not be an advantage.

    A bigger problem I see is that the form of the AST is somewhat compiler dependent. It may differ slightyl from compiler to compiler for the same language. It depends on how the parsing and lowering stages are implemented. In my Leaf language I continually alter the AST but allow the text code to stay the same.

    • that’s just a matter of formatting and style

    • and I very much disagree about thinking in terms of text – absolutely not! We think in terms of processes, spaces, mechanisms, not text! We have to translate into text, which is a massive source of errors. Working with the AST is a small step towards reducing the distance between our thoughts and the code.

  8. Have you had a look at http://mbeddr.com? This is a full-blown C IDE base on Jetbrains MPS which does already most of the things you propose. In addition to that we extended C with e.g. support for physical units, components, statemachines. In addition to that we make use of the projectional nature of MPS. E.g. it is possible (if you want to) to use other notations than text. For example we have used tables to build decision tables. Another example is the way we build product line support. Instead of using IFDEFs from C we have special annotations and different projections. This allows you to look at your source code either in the product line view or only see the code for one given configuration.

    We have published a whole bunch of research papers on that topic:

    Learn

    A good overview can be found here: http://mbeddr.files.wordpress.com/2010/06/voelteretal-mbeddr-final.pdf

  9. To some degree (although not as far-going as your suggestions) similar things are already happening in IDEs. For instance, the question of code formating (including placement of braces and spaces vs. tabs) is increasingly uninteresting with automatic code formaters. In theory, every member of the team code have his own preferences for formating and see the code automatically moved from the “official” format to his own and back again as starts and ends the editing. (Not that most people bother.) Another interesting feature is folding, which makes it easier to focus on specific aspects of the code—and effectively implies that what one is currently editing is not a “pure” text file. The way syntax highlightning or the display of Javadoc (resp. any equivalent it may have in the current language) is handled can move us yet further away from text-editing and somewhat more in your direction.

    (Not to mention areas that may or may not be of relevance, like mdd, code generation, …)

    At any given time, languages and IDEs are mutually influenced by what the counter-part has provided over the last years, and it seems plausible to me that they will, slowly and over time, develop together to a place that will seem very strange to us today—let alone to someone hacking Scheme in Emacs twenty years ago. Possibly, that place will be what you envision; possibly, it will be something entirely different.

    • I think IDEs have started to implement some of these things (collapsing, auto-formatting, autocompletion) because it’s obvious that they are good ideas that can help make you more productive. In order to implement these features though, the IDEs have to internally parse your code to obtain an AST representation, and map everything back to text, which sometimes doesn’t work all that well, and limits what the IDE is able to do.

      I’d say modern IDEs are sort of tending towards advanced AST manipulation. They’re offering you features that manipulate ASTs under the hood, but what we’re seeing is only the surface of what is possible. By switching to more direct manipulation of ASTs, I think we can shift to a paradigm that helps programmers better express their ideas, something more intuitive that really helps you better understand the nature of the code you’re working with. We should also be able to create more powerful IDEs that will be able to help you be even more productive.

      • I think it’s worthwhile to consider that not all programmers find IDEs to help their productivity. If something is already getting in the way, will making a language even more IDE dependent attract more followers, or turn even more people away? It would be worthwhile to figure out why these people don’t like the IDE. Perhaps hidden in their dislike is the correct path towards an AST centric programming style.

      • Something relevant to both my last paragraph and your answer is that IDEs are to some degree limited by the current paradigms of the languages (and vice versa, but somewhat weaker). What I suspect will happen is that there are gradual paradigm shifts in both areas—quite likely containing a drift towards more abstract not-for-plain-text-editing storage of the actual code resp. its successor.

        However, mortoray makes a valuable point: I am often more frustrated than helped by IDEs that do not deliver on their promises (e.g. through plugins for tools like Maven or Subversion that cause problems just as often as they help) and in many ways do not Integrate, but instead close out other tools. Comparing e.g. the Eclipse framework with its implicit inside, outside, outside-but-has-a-plugin-inside divisions with the Unix philosophy of true borderless cooperation can be quite insight giving. Further, I often find myself doing tasks in Vim, because I have yet to find an IDE with a good built-in editor…

  10. Huw permalink

    I made a prototype of this idea a few years ago, but I didn’t have the time to go anywhere with it.

    First you need a common representation format for all programming code. Image formats have all kinds like .png, .bmp, etc.

    Different compilers would use this standard, different editors would read and write this standard. So already there’s an ecosystem for it. The format could also include rulesets. So different compilers could implement different features, even different projects could do the same. Developers could ADD NEW FEATURES for your compiler, pick and mix, this would be a 100% configurable language. And then you could add standards on top and those could be the official, safe, uses.

    The editor itself:

    So right away we have powerful features.

    Search: Search completely within context of the AST tree. instead of searching for “myfunc(” and going through the list, just say “I want to find this function”.

    Renaming variables: The editor explicitly knows what variables are what, renaming them is not an issue. In addition, the editor would be able to tell the difference between assigned variables and do all kinds of crazy stuff!

    No more files. You know the drill, open blah.c, scroll to where you want, now open blah.h… nono, the concept of opening and closing files are gone. You click on a file, it just goes into it, like part of the tree, files are treated like part of the structure. You have breadcrumb trails at the top allowing you to easily get back up the AST so it’s like sort of like a file explorer – but for code. You can have tabs like internet browsers do. You could have many tabs for the same file, but each tab looking at a different place. NO MORE PAGE UP AND PAGE DOWN! YES.

    It’s tempting to go the the-one-true-language oh so perfect route like professors would like, with large LISPY tree view that descends from heaven anyone who’s done any real programming knows that code is not anything like that.

    So instead we use the idea of “tools”. Little programs within the IDE structure that operate on the AST nodes. The tools are operated context style, like how in VIM you press W, then start typing, the same sort of principle. No mouse required. This is very important.

    The comment tool: Attach comments to any element you like. Don’t worry, it won’t get in the way, they’re compact until you want to see them!

    The list/tree tool: NO MORE COMMAS. No more tabbing everything into place, you press a key, it adds an item. You can easily shuffle and SORT THE LISTS AT DESIGN TIME. WOW. Edit anything from a simple struct to a full blown XML file. As simple as editing a spreadsheet. The possibilities are endless.

    The expression tool: Simple enough, type in your maths. It automatically spaces all the appropriate variables for you, so typing (a+b)=c would show up on screen (a + b) = c. When you pressed the appropriate keys, it would show the list of available variables and functions. It can also evaluate simple expressions at design time so you can be even lazier! type 123*456, press the right key, it’s turned into the number you want.

    The text tool: Edit multiline text with every character explicitly viewed in this window, advanced hex and character set tools, full foreign language support and the ABILITY TO EMBED VARIABLES. None of that [lolbrackets] bbcode style shit, the editor would format and recognize and automatically tostring() all references. No more terrible “hello” + myvar + “world” setup, no more multiline nightmares trying to align it with all the rest of the file. And no more escaped characters. You just insert a character, sure you might still type it with slash-n, but it would become a literal newline character (represented graphically by the tool).

    The functional code / imperative code tools:

    This is where things get interesting. Already, we have two distinct domains, separation of interest so we can support every functional and imperative idea.

    Variable assignment, AUTOMATIC STATIC TYPING. You don’t type int 123 or string hello. No, you type it as if you were writing python or similar language and the tool automatically infers the type and explicitly sets it. Not happy with the choice? Then you can change it manually. Now, we already have all the tostring() stuff covered with the text tool, but what about changing ints to floats, etc.? Well that’s covered too, it automatically does (int)123.0 when you move data between types and makes it very clear on the screen, and you didn’t have to lift a finger, unless you want to manually change things.

    Assignment is very clear and distinct from comparison. No more operator pain.

    Functions are decoupled from their arguments at last! You can swap the arguments around at any time and every call of the functions are automatically swapped too. (The function itself just states the preferred order really and internally calls are like myfunc(myvar=”hello”, a=2, b=3), the function tool makes it easy to design the layout and documentation of arguments can be encouraged.

    Because functions are explicitly functions, and so are commands. There is no such thing as a reserved keyword anymore. Want to use ‘new’? go ahead. It will be completely different from the ‘new’ command.

    Class tool is as you’d expect, not much to say about that.

    Of course, you could extend the editor. Want a custom tool for editing a certain class? Sure. Want to extend the expression tool to include a new type? Go for it. Want to support a new language feature? Yep.

    And one of the best features that could be really powerful is design time evaluation. It’s shown up now and again in various IDEs, but with the AST, you could see your code evaluating in the editor far more often with plenty of indicators and warnings, the editor would make you far more efficient as a programmer.

    I really hope one day this comes to fruition somehow. Clearly we’re all coming to the same conclusion ourselves, so it should only be a matter of time before this technology becomes a reality.

    • Not to get too philosophical, but I believe that some ideas are somewhat “inevitable”. This is a logical progression towards a better paradigm. It will most likely happen, for the simple reason that it’s a better paradigm. Just one person/team needs to implement a programming language (and associated tools) that make effective use of this, and it will demonstrate how much of a huge improvement this can be over what currently exists. From that point on, other people will copy the idea and it will eventually become mainstream, and maybe even the “default” method of programming

      It could take a long time, possibly multiple decades, since new programming languages take a long time to catch on, and old ones take a long time to die out, but the way I see it, it’s only a matter of time. It will happen faster if someone comes up with a very good implementation of it. Clearly, the time for this idea to materialize is coming.

    • it’s exciting to hear that other people have come to the same place with this idea. Once you’re working in the tree, so many things become clear and many possibilities are available. The one principle I’ve been guided by, in the spirit of Smalltalk and LISP, is that the editor should first be able to edit itself. I’m trying to find the fewest features that would support its complete alteration by its users:

      The editor is a program, so start with a view on the editor itself (actually, its history of states), and add the code to be edited as subtrees.

      The editor has multiple views on the same data, each of which can have multiple subtrees selected at once. Most of the time people will be viewing subtress from different points of view.

      The selected things form an expression, a tree, which can be evaluated in a certain context (another expression in the tree, which says what to do with the results), the results going back into the editor (or modifying it, depending on the context). The simplest example is to select one subtree and another one as the function to apply to it, and another to say what to do with the result, which means “apply this function to this subtree and put the result here”. If that’s useful, attach a keybinding.

      So the way to build features in the editor is to select various parts of the tree and evaluate them together. I’m using functional zippers, so your can also do indirect references by pointing at other zippers. This way, the editor is also a REPL, but more structured, because the results don’t just append to the end, instead they can be sent anywhere in the tree.

      Search and navigation – I’m experimenting with a keyboard navigation system in which keys can either mean their symbol (to be in used in a code-completion-like way) OR their position on the keyboard, so that sometimes you can navigation left, right, up or down by pressing any key in that relative position to the last key pressed – fingers on middle row as much as possible for comfort.

      Relational refactoring – we can use Clojure core.logic (miniKanren) rules and pattern-matching to perform refactorings throughout the tree.

      SVG view is zoomable – no modes, just one big tree and zoom.

    • Suman permalink

      Hue, any chance we could look at the prototype that you created?

  11. Some of the ideas here were also the ideas of intentional programming. Have a look around, starting here: http://en.wikipedia.org/wiki/Intentional_programming

  12. You may find the work of my team relevant. We are working on composing language runtimes, but one of the aspects of this is a specialised editor:

  13. Thank you for sharing your thoughts. I kept thinking about this post and wrote my opinion about megamodels, AST editing and possible outcomes (http://language-engineer.com/inference-in-programming/).

    I have seen you are also pursuing a PhD, all the best for that (I just started writing the thesis)!

  14. I absolutely agree with your thoughts and in fact I’m working on a projectional editor called ProjecturEd that allows free combinations of different problem domains including text, graphics, markup, programming languages, etc.

    The project home page is at http://projectured.org and there’s also some documentation in a wiki at https://github.com/projectured/projectured

    There are also screenshots and videos on youtube.

  15. rst256 permalink

    “If you take a languages like C and C++, for example, it can actually be difficult to find a parser that supports the complete grammar” – may be compilers frontend. Who can know the language better than them. Clang provides a special api for this purpose

  16. Hi there,

    I couldn’t help but see a resemblance between some of your ideas and some of the ideas Victor expresses in his talk “Inventing on Principle”. I think between you and him there is definitely something magical. I think proprietary-ness with IDE rendering will be a huge hurdle though.

Trackbacks & Pingbacks

  1. On holistic programming languages. – Das Glasperlenspiel
  2. In the News: 2013-07-21 | Klaus' Korner
  3. Static Typing may not Scale | Pointers Gone Wild
  4. There’s Too Many Programming Languages! | Pointers Gone Wild

Leave a comment