Implementing an Emacs programming language mode: beyond the basics

As one of the two primary authors of idris-mode, I’ve had to learn a fair bit about implementing Emacs modes. While I’m far from an expert on the subject, I do think that I’ve picked up a few ideas that are useful. There are already a number of good tutorials about the very basics, such as define-derived-mode, font-lock, and Emacs packages. What I haven’t been able to find is an answer to the question “what’s next?”. You can view this post as a kind of checklist for useful things to add to your mode.

I won’t go into great detail about how to do all of these things, because the documentation is typically quite good. The real problem is learning that there is documentation to be read!

Imenu

While many Emacs users turn off the menu bar entirely, making Imenu not particularly attractive, it pays to support it anyway. This is because lots of other features in Emacs use Imenu as a data source. For example, speedbar will use it to show an index of your file and which-func-mode uses it to determine what to show in your mode line.

Completion

These days, it is very straightforward to implement completion for your mode. Many older modes contain complicated code to show completion buffers, restoring window configurations afterwards. In Emacs 24, we have completion-at-point. Simply ensure that this is bound to something (often M-TAB), and then arrange for it to have enough information to present good completions

The variable completion-at-point-functions provides the raw data that completion-at-point will use to construct a completions buffer. The variable contains a list of 0-argument functions, each of which should return either nil or a collection of completion candidates. Emacs takes care of all the tedious organization of buffers. In idris-mode, we simply delegate to the compiler to provide completion candidates.

Eldoc

Eldoc is a minor mode for showing documentation strings in the echo area. By default, it supports Emacs Lisp, but it’s quite straightfoward to make it work for your language. All you need is a function that returns either a docstring for the symbol at point or nil. In the case of Idris, we already had a command to ask the compiler for a type signature, so it was especially easy. Once you have this function, just set eldoc-documentation-function to its name and place turn-on-eldoc-mode in your default mode hook.

Here’s the function from idris-mode:

(defun idris-eldoc-lookup ()
  "Support for showing type signatures in the modeline when there's a running Idris"
  (let ((signature (ignore-errors (idris-eval (list :type-of (idris-name-at-point))))))
    (when signature
      (with-temp-buffer
        (idris-propertize-spans (idris-repl-semantic-text-props (cdr signature))
          (insert (car signature)))
        (buffer-string)))))

All of the idris-propertize-spans stuff is to apply font information from the compiler.

Flycheck

Flycheck is a minor mode that runs quick external programs on the contents of your buffer, annotating the buffer with any warnings produced, without you needing to save the buffer. It is fairly straightforward to define a checker for Flycheck, and it can be added to your mode without disturbing users who don’t use Flycheck by wrapping it up in an eval-after-load. Here is the entire checker from idris-mode, written by Brian McKenna:

;;;###autoload
(eval-after-load 'flycheck
  '(progn
     (flycheck-define-checker idris
       "An Idris syntax and type checker."
       :command ("idris" "--check" "--nocolor" "--warnpartial" source)
       :error-patterns
       ((warning line-start (file-name) ":" line ":" column ":Warning - "
                 (message (and (* nonl) (* "\n" (not (any "/" "~")) (* nonl)))))
        (error line-start (file-name) ":" line ":" column ":"
               (message (and (* nonl) (* "\n" (not (any "/" "~")) (* nonl))))))
       :modes idris-mode)

     (add-to-list 'flycheck-checkers 'idris)))

Customize

In my opinion, it’s important to have variables that control settings for a mode be separate from other variables, so that users can see what they’re intended to change and what they are not. A good way to indicate that a variable is intended for user customization is to make it available through customize. When thinking about how to customize features of your mode, consider how to organize them into groups so that they are easy to find.

If you define your mode as a derive mode (and you should), remember to pass the :group option to define-derived-mode. This makes the standard command customize-mode work.

Additionally, your mode should really avoid using built-in faces. Instead, define your own faces with defface and set them to inherit from the built-in faces that you were thinking of using. This lets your users decide how they want to see your mode. Good defaults are also important, to the extent that you can have them, so pay attention to built-in faces that you can inherit from, which will make your mode automatically conform to the user’s color theme. Perhaps the biggest source of complaints about idris-mode has come from default colors that don’t look good for users, where I was unable to find a built-in face to inherit from that makes sense. Surprisingly many Emacs users don’t know that they can easily customize a face.

Prog mode

Remember to derive your mode from prog-mode. If you don’t do that, at least make sure that you call prog-mode-hook. This allows Emacs users to set things up for every programming language mode, while not using these settings in other places. This is useful for things like showing trailing whitespace or turning on line numbering.

Common Lisp features

If you have experience writing Common Lisp code, you’re in for a disappointment when you write Emacs Lisp. There are many, many useful features, such as destructuring-bind, that Emacs Lisp just doesn’t support. Furthermore, when these features are available, you either have to use a compile-time-only macro package (cl) or an ugly prefixed version (cl-lib). I recommend the latter, as you avoid having to worry about which features are macros and which are functions. Unfortunately, you will probably end up with cl loaded by something in your initialization file, so Emacs will probably not complain if you accidentally use cl names instead of cl-lib names, which can lead to problems for your users. Consider using cl-lib-highlight to get warnings in your Lisp buffers when you use un-prefixed cl symbols.

All the rest

After you’ve got these things done, then it’s time to start attacking all the little things, like getting the parsing working for motion commands such as forward-sexp, implementing fill-paragraph for comments and docstrings, and so forth. I haven’t done all these things yet, but I’ll write it up if I find a nice strategy.

Techniques for using decision procedures in Idris

This post is a literate Idris file, so first let’s get some bureaucracy out of the way:

module LT
%default total

When writing Idris code, we sometimes need a proof as an argument to a function. In many cases, these proofs can be constructed automatically by a decision procedure. Here, I’m using the term "decision procedure" for some property p to refer to a function that returns something in the type Dec p.

An instance of Dec p is either a proof of p or a proof that p is an absurdity. The datatype is defined as follows:

data Dec : Type -> Type where
  Yes : p -> Dec p
  No : (p -> _|_) -> Dec p

As an example, take the following type from the Idris library that represents a proof that one natural number is less than or equal to another:

data LTE : Nat -> Nat -> Type where
  lteZero : LTE 0 m
  lteSucc : LTE n m -> LTE (S n) (S m)

The first constructor, lteZero, should be interpreted as an assertion that zero is less than or equal to any natural number. The second, lteSucc, is an assertion that if \(n \leq m\), then \(n+1 \leq m+1\). It happens to be the case that, for any two natural numbers, it is decidable whether the first is less than or equal to the second.

To prove this, we start with two simple lemmas. The first states that it is not the case that the successor of any number is less than or equal to zero:

notLTESZ : LTE (S k) Z -> _|_
notLTESZ lteZero impossible

Additionally, we show that, if \(k \not\leq j\), then \(k+1 \not\leq j+1\):

notLTE_S_S : (LTE k j -> _|_) -> LTE (S k) (S j) -> _|_
notLTE_S_S nope (lteSucc x) = nope x

We are now in a position to write the decision procedure, by examining each case and using our lemmas to produce the appropriate contents for the No constructor. If the details don’t make sense, then please fire up Idris, replace some right-hand sides with metavariables, and examine the proof context.

decLTE : (n,m : Nat) -> Dec (n `LTE` m)
decLTE Z m = Yes lteZero
decLTE (S k) Z = No notLTESZ
decLTE (S k) (S j) with (decLTE k j)
  decLTE (S k) (S j) | (Yes prf) = Yes (lteSucc prf)
  decLTE (S k) (S j) | (No contra) = No $ notLTE_S_S contra

Now that these details are out of the way, we can get to the main point of this blog post: demonstrating various techniques for getting Idris to fill out these proofs for us.

The most straightforward way is to use the interactive editing features to get Idris to construct the proof term directly. For example, if we want to show that \(3 \leq 5\), then we need only create a definition with type 3 `LTE` 5:

ok : 3 `LTE` 5

We can then use C-c C-s in idris-mode to cause it to become

ok : 3 `LTE` 5
ok = ?ok_rhs

Using the proof search command C-c C-a on the metavariable gives us

ok : 3 `LTE` 5
ok = lteSucc (lteSucc (lteSucc lteZero))

and the decision procedure simply isn’t necessary. This approach can make our programs ugly, but it is straightforward and direct. We need not actually show the resulting term, however. In Idris, tactic scripts are allowed as the right-hand side of definitions. So we can simply invoke the search tactic explicitly, rather than using it through the compiler’s editor interface, and the resulting term does not need to occur in the source file. This can also be more robust in the face of changes to the definition of LTE. The tactic-based definition is:

okTactic : 3 `LTE` 5
okTactic = proof search

Here, proof begins a tactic script, and search is the tactic to be invoked.

Using Idris’s built-in proof search has two major limitations:

  1. It cannot solve more "interesting" goals, in which a simple recursive application of constructors is insufficient
  2. It fails to find large goals, as the recursion depth is limited. For example, it cannot solve 103 `LTE` 105.

What we really want to do is use decLTE to find the term.

One non-solution is to have the right-hand side of the definition perform a case analysis on the result of a call to decLTE, and then return the contents of the Yes case. Because we know that three is, in fact, less than five, we will never need the No case:

okCase : 3 `LTE` 5
okCase = case decLTE 3 5 of
           Yes prf => prf

Unfortunately, the Idris compiler needs to be convinced of the fact that No cannot occur. We do this by showing that the No case could cause the empty type to be inhabited, and then use FalseElim. Unfortunately, doing this requires actually proving the property again:

okCase : 3 `LTE` 5
okCase = case decLTE 3 5 of
           Yes prf => prf
           No nope => FalseElim . nope $ lteSucc (lteSucc (lteSucc lteZero))

So this method is a non-starter.

The next method is to cause the solver for implicit arguments to execute the decision procedure while filling out an implicit argument for the proof, and then return this implicit argument. The definition okImplicit demonstrates this technique:

okImplicit : 103 `LTE` 105
okImplicit  = getProof
  where getProof : {prf : LTE 103 105} -> {auto yep : decLTE 103 105 = Yes prf} -> 103 `LTE` 105
        getProof {prf} = prf

All of the work happens in getProof, which is in a where block to keep the type signature of okImplicit clean. In the definition of getProof, the implicit argument prf is the one that will be filled out with the proof, and then returned. The argument yep is an implicit proof that the result of executing the decision procedure will be Yes prf – in other words, that it will succeed, and the thing it succeeds with will unify with prf. The auto keyword causes Idris to fill out the argument with refl, the constructor of proofs of equality.

While the okImplicit technique leads to noisy boilerplate in type signatures, it also tends to cause error messages that are easy to understand. I’ve had a lot of luck using it for embedded DSLs with some kind of side condition (like "these queries must have disjoint schemas").

The final technique that I’d like to demonstrate uses a dependently-typed extractor of proofs from Dec. The type of the function getYes does case-analysis on the result of a decision procedure. If the result was Yes, then getYes will return the contents – an instance of p. Otherwise, it returns unit, which contains no interesting information. Note that the result of the case expression is a type — either the type parameter to Dec or the unit type.

getYes : (res : Dec p) -> case res of { Yes _ => p ; No _ => () }

The body of getYes performs a case analysis on the result of the decision, either returning a proof or a trivial value, in accordance with the type. Note that Idris’s interactive editing features were able to write this code entirely – the only thing I needed to do was tell it to case split on the argument. Agda, which has more advanced interactive editing features, could probably write the entire body by passing Agsy the -c option, which enables case splitting.

getYes (Yes prf) = prf
getYes (No contra) = ()

Now, getYes can be used on the right-hand side of our definition to extract the proof:

okYes :  3 `LTE` 5
okYes = getYes (decLTE 3 5)

While this approach is clean, the error messages can be somewhat unpleasant if Idris is asked to prove a falsehood. The following code:

oops :  5 `LTE` 3
oops = getYes (decLTE 5 3)

results in this unification error:

     When elaborating right hand side of oops:
     Can't unify
             case block in getYes (LTE (fromInteger 5) (fromInteger 3)) (decLTE (fromInteger 5) (fromInteger 3)) (decLTE (fromInteger 5) (fromInteger 3))
     with
             LTE 5 3
     
     Specifically:
             Can't unify
                     case block in getYes (LTE (fromInteger 5) (fromInteger 3)) (decLTE (fromInteger 5) (fromInteger 3)) (decLTE (fromInteger 5) (fromInteger 3))
             with
                     LTE 5 3

whereas the error message from

badImplicit : 105 `LTE` 103
badImplicit  = getProof
  where getProof : {prf : LTE 105 103} -> {auto yep : decLTE 105 104 = Yes prf} -> 105 `LTE` 103
        getProof {prf} = prf

was

     When elaborating right hand side of badImplicit:
     When elaborating argument yep to LT.badImplicit, getProof:
             Can't solve goal 
                     decLTE (fromInteger 105) (fromInteger 104) = Yes prf

You have now seen a few ways of getting Idris to find proofs. Hopefully, you’ve also gotten a bit more understanding of how Idris works while compiling your source file.

This post sprang out of a discussion on a Github issue with Nicola Botta. The discussion of error messages is improved as the result of a suggestion from Anthony Cowley. Thanks, Nicola and Anthony!

New tutorial on bidirectional typing rules

I’ve just written a new tutorial on bidirectional typing rules. It’s available from my tutorials page at ITU. The PDF can also be downloaded directly.

Tutorials on my ITU page

In case you’re interested in programming languages and you’re working from Peter Sestoft’s book Programming Language Concepts, I’ve just posted some supplementary materials to my ITU page. It consists of a tutorial for converting from regular expressions to discrete finite automata via NFAs and a tutorial on constructing typing derivations in the particular miniature ML-like language used in the book.

You can find them on my tutorials page.

Using Scala libraries with new compiler versions

I recently needed to use Scalacheck with a milestone release of the compiler, and it took me a little while to wrap my head around the infrastructure. To preserve the steps for prosperity, I’m documenting them here, as I couldn’t find it described anywhere.

The first step is to understand the relationship between the various parts:

SBT
SBT is responsible for compiling your project, running tests, and generally performing the same role as make or ant. It also uses Ivy for retrieving library dependencies.
Ivy
Apache Ivy provides an infrastructure for dependency management. Libraries exist in repositories, similar to Debian package repositories, and they are available in multiple versions.

When you add library dependencies in your project’s SBT configuration, it will find the code by first looking in your local Ivy repository, then in any remote repositories that are configured. In other words, even if you don’t have access to the remote repositories, you can still put libraries in your local repository to be found by SBT.

Libraries such as Scalacheck that are intended to be used with multiple versions of Scala will typically use the crossScalaVersions setting to build against multiple compiler versions. The workflow to get a library working is, then:

  1. Get ahold of the source code to the library
  2. In build.sbt, find the crossScalaVersions setting. Add the desired version of Scala.
  3. Run sbt. At the command line, type + test. This will compile the project for each configured version of Scala and run the tests.
  4. Assuming the project passes, run + publish-local to push the compiled version

Now, the library will be available in your other projects.

Make Ensime work with newer versions of SBT

If you try to use Ensime with newer versions of SBT, you’ll find that SBT grows to fill available memory and then crashes. This is because the latest released version of Ensime assumes the project format of an old version of SBT. To fix this, just add the following to your .emacs after you require Ensime:

(defun ensime-sbt-project-dir-p (path)
  "Does a build.sbt or a project/build.properties exists in the given path."
  (or (file-exists-p (concat path "/build.sbt"))
      (file-exists-p (concat path "/project/build.properties"))))

The problem occurs because Ensime looks for your project directory by recursively going to parent directories until it sees something that looks like an SBT project. Eventually, it ends up in root, and SBT crashes while trying to recursively enumerate all the files in the current directory. Redefining this function to also look for newer project formats fixes the issue.

Emacs code for fetching a bibliography in BibTeX form from CiteULike

I typically manage my references with CiteULike. My typical workflow is to add the articles that I’m interested in, and then while writing, keep a complete copy of the exported BibTeX next to the LaTeX files. This gets tedious to maintain by hand, so I wrote this little chunk of elisp to download the file and name it the way I do.

To use, simply place it in a file called “citeulike.el” on your load-path and add “(require ‘citeulike)” to your .emacs. Make sure to look at the customization options. Then, run “M-x citeulike-download-bibtex” in your LaTeX buffer.

;;;; citeulike.el --- Getting things from CiteULike while working in AUCTeX
;;
;;; Copyright (C) 2011 David Christiansen
;;
;;; Author: David Christiansen 
;;
;; This program is free software: you can redistribute it and/or
;; modify it under the terms of the GNU General Public License as
;; published by the Free Software Foundation; either version 3 of the
;; License, or (at your option) any later version.
;;
;; This program is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;; General Public License for more details.
;;
;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs; see the file COPYING.  If not, write to the
;; Free Software Foundation, Inc., 59 Temple Place - Suite 330,
;; Boston, MA 02111-1307, USA.
;;
;;; Commentary:
;; Purpose:
;;
;; To make downloading an updated copy of references from CiteULike easier.
;;
;;; Code:

(provide 'citeulike)

(defgroup citeulike nil
  "Options for CiteULike support"
  :prefix 'citeulike)

(defcustom citeulike-username ""
  "CiteULike username, used for downloading BibTeX exports."
  :type 'string
  :group 'citeulike)

(defcustom citeulike-export-username-prefix nil
  "Whether to add a username prefix to keys"
  :type '(choice (const :tag "Yes" t)
                 (const :tag "No" nil))
  :group 'citeulike)

(defcustom citeulike-export-key-type 0
  "What kind of keys to put in the exported BibTeX"
  :type '(choice (const :tag "Prefer personal key; otherwise use numeric key" 0)
                 (const :tag "Prefer personal key; otherwise use AuthorYearTitle key" 4)
                 (const :tag "Prefer numeric keys" 1)
                 (const :tag "Only export articles with a personal key" 2)
                 (const :tag "Export both keys" 3))
  :group 'citeulike)

(defcustom citeulike-export-include-amazon-link nil
  "Include Amazon links for books in exported BibTeX"
  :type '(choice (const :tag "Yes" t)
                 (const :tag "No" nil))
  :group 'citeulike)

(defcustom citeulike-export-clean-urls t
  "Escape URLs for BibTeX?"
  :type '(choice (const :tag "Escaped URLs" t)
                 (const :tag "Don't escape URLs" nil))
  :group 'citeulike)

(defcustom citeulike-export-smart-wrap 0
  "Work around BibTeX capitalization"
  :type '(choice (const :tag "Don't wrap" 0)
                 (const :tag "Smart wrap whole field" 1)
                 (const :tag "Smart wrap individual words" 2))
  :group 'citeulike)

(defun citeulike-export-url ()
  (concat "http://www.citeulike.org/bibtex/user/"
          citeulike-username "?"
          "do_username_prefix=" (if citeulike-export-username-prefix "1" "0") "&"
          "key_type=" (number-to-string citeulike-export-key-type) "&"
          "incl_amazon=" (if citeulike-export-include-amazon-link "1" "0") "&"
          "clean-urls=" (if citeulike-export-clean-urls "1" "0") "&"
          "smart_wrap=" (number-to-string citeulike-export-smart-wrap)))

(defun citeulike-download-bibtex ()
  (interactive)
  (let* ((extension (file-name-extension buffer-file-name))
         (rest (file-name-sans-extension buffer-file-name))
         (target (if (equal extension "tex")
                     (concat rest ".bib")
                   (read-file-name "Filename to save: "))))
    (if (or (equal citeulike-username "") (not (stringp citeulike-username)))
        (error "No valid CiteULike username is configured")
      (let* ((command (concat "wget \"" (citeulike-export-url) "\""
                              " -O " target))
             (final-command (read-string "Command: " command)))
        (shell-command final-command)))))

Loading the current buffer in F#

My primary editor for use with F# is Emacs. One feature that I missed in F# mode was the ability to load the file corresponding to the current buffer. Here’s some elisp to do it. Put this in your .emacs file, and then when you are editing an F# module you can load it at the toplevel by pressing C-c C-f.

(defun fsharp-load-buffer-file ()
  "Load the filename corresponding to the present buffer in F# with #load"
  (interactive)
  (require 'inf-fsharp)
  (let* ((name buffer-file-name)
         (command (concat "#load \"" name "\"")))
    (when (buffer-modified-p)
      (when (y-or-n-p (concat "Do you want to save \"" name "\" before loading it? "))
        (save-buffer)))
    (save-excursion (fsharp-run-process-if-needed))
    (save-excursion (fsharp-simple-send inferior-fsharp-buffer-name command))))
(add-hook 'fsharp-mode-hook
          (lambda ()
            (define-key fsharp-mode-map "\C-c\C-f" 'fsharp-load-buffer-file)))

Hop Harvest 2010

Well, it’s just about time we get another beer post in here.

My girlfriend’s sister’s family has a hop plant in their back yard (Brewer’s Gold).  Last year, which was its first year, it yielded about 20 grams of dried cones.  This year, it looks as though it will be a bit more. Hops are such an amazing plant – everything in this picture is from just one plant!

I don’t know exactly how much their will be when it’s done drying, but the plant definitely outdid itself from last year.  I look forward to getting some good beer out of this!

Gammel vare i ny indpakning

I regeringen og Dansk Folkepartis nye udlændingeaftale er der kommet
et pointsystem, hvor udlændinge, som ønsker en tidsubegrænset
opholdstilladelse, skal opnå 100 point ved at demonstrere deres
integration i det danske samfund på forskellige måder.  Før i tiden
behøvede man at opholde sig i Danmark i 7 år, eller 5 år hvis man
havde fuldtidsjob, og bestå nogle prøver.  Nu kan man få
tidsubegrænset opholdstilladelse allerede efter 4 år.

Umiddelbart lyder forslaget meget godt – indvandrere er belønnet for
at integrere sig, og der er mulighed for hurtigere at blive
selvstændig.  Desværre viser en nærlæsning af aftalen, at regeringen
og Dansk Folkeparti stadig lider af akut betontænkning i forhold til
indvandring og integration.

Den største grund til at bruge point er, at man vil kunne være
fleksibel og tage højde for forskellige menneskers livssituationer.
Hvis man er i stand til at integrere sig hurtigt på nogle måder og
ikke på andre, så behøver man ikke blive hængt op på én ting, som man
ikke kan.  Desværre er pointsystemet i den nye aftale skruet sådan
sammen, at det har karakter af den gamle ufleksible tænkning.

70 ud af de 100 nødvendige point kommer nemlig i én samlet pakke.
Uden disse 70 point, er der ikke mulighed for at nå 100.  Denne pakke
er ret omfattende.  Bl.a. skal man bestå en danskprøve og en skriftlig
prøve om dansk historie og samfund. Desuden skal man skal have haft
fuldtidsarbejde i 3½ ud af de seneste 4 år og fortsat have
fuldtidsarbejde på ansøgningstidspunktet.  De resterende 30 point
kommer i blokke på 15 point, og inkluderer fine ting som frivilligt
arbejde for almennyttige foreninger og højere sprogprøver.

Reelt er der tale om et system, hvor man er stillet en lang række
ufravigelige krav plus en liste over fire krav, hvorfra man skal vælge
to.  Her er der ikke nytænkning – det er blot en gammel vare i ny
indpakning.