Filling out source locations in Idris

The mainline branch of Idris now has support for source location reflection. This means that Idris code can be informed about which line and column of which file it occurs on. In this post, I’ll walk you through how this can be used for implementing programmer tools, such as a facility similar to Haskell’s error function. Then, I’ll briefly discuss how the feature is implemented and the implications for referential transparency and claims of purity.

Note that I’m not claiming any major innovation here – this blog post exists only to explain a feature that I took from scala-virtualized and implemented in Idris.

Terminating with an error

The error function stops execution immediately, printing a user-supplied message. Because it will never return, it should inhabit every type, but it should most assuredly not pass the totality checker.

It is straightforward to implement error using Idris’s unsafe features. The C FFI can be used to call exit, terminating the program, and unsafePerformIO allows us to call arbitrary C functions in a pure context. The built-in primitive believe_me enables us to subvert the type checker. Combining these ingredients yields:

||| Terminate the program after printing a user-specified error message.
|||
||| @ message The error to print
partial
error : (message : String) -> a
error message =
  believe_me . unsafePerformIO $
    do putStrLn message
       exit 1

If this function is ever called, it means that something has gone wrong, and we have a bug in our program. However, the error itself may not be particularly helpful: there may be insufficient clues to determine why it occurred. Knowing where it occurred in the source code can make it much easier to debug the problem. Let’s extend error to display its call site along with the message.

Source locations for dynamic errors

Broadly speaking, Idris functions accept two kinds of arguments: explicit arguments, which the user directly provides, and implicit arguments, which the compiler is expected to work out for itself in most situations. An explicit argument can be provided implicitly by writing and underscore in its position, and an implicit argument can be provided explicitly by writing it in curly braces.

Implicit arguments are normally solved by unification. In other words, something else in the program will typically cause there to be precisely one possible value that would still type check, so the compiler is able to fill it out. When defining a function, you can specify an alternative way to fill out the implicit argument, such as by providing a default value or invoking the proof search system. Additionally, the default value need not be literal Idris code: it can also be a tactic script to be executed during elaboration, which is what we call the process of translating the high level Idris code that users write into the highly tedious, fully explicit core language that the compiler then proceeds to optimize and send off to the individual backends. Typical tactics include things like refine, which applies a name to the goal, generating subgoals as necessary for its arguments, and try, which attempts one tactic and falls back to another if the first fails.

As of a few days ago, Idris supports a new tactic called sourceLocation that fills out the argument using the source location at which it was invoked. This tactic can be used to improve the messages from error. Source locations are provided using the type Language.Reflection.SourceLocation, which is defined as follows:

data SourceLocation : Type where
  FileLoc : String -> (Int, Int) -> (Int, Int) -> SourceLocation

The first argument to the constructor is the source file name, and the second and third are the line and column at the beginning and end of the relevant expression. Because not all of the parser has been updated to save source spans, the second and third arguments are often identical for now, but this will eventually be fixed.

To improve the messages provided by error, we need to arrange for the sourceLocation tactic to be used to fill out an implicit source context argument. Then, we can destructure this implicit argument using ordinary pattern matching to recover the filename, line number, and column number to report. The updated error function (which is in the Idris library in the Debug.Error package) is:

||| Terminate the program after printing a user-specified error message.
|||
||| @ loc     The source location to display for the error
||| @ message The error to print
partial
error : {default tactics {sourceLocation} loc : SourceLocation} ->
        (message : String) ->
        a
error {loc = FileLoc filename (line, col) _} message =
  believe_me . unsafePerformIO $
    do let place = filename ++ " line " ++ show line ++ " column " ++ show col
       let message' = place ++ ": " ++ message
       putStrLn message'
       exit 1

The same basic technique can also be used to report source locations for errors in deeply embedded langauges that aren’t easily encoded in the type system. Note that higher-level features, such as an assertions system built on top of error, can provide loc explicitly, enabling them to override the displayed source code location with their own. This straightforwardly allows a fine-grained control over the level at which errors should be displayed.

Implementation

Idris’s expression elaborator is a recursive traversal of the expression. Elaboration occurs within a monad that resembles a tactic-based interactive theorem prover, with a present goal and list of assumptions, a stack of remaining goals to be solved, and tactics that manipulate this state. Smaller tatics can be composed to create larger tactics. For details, check out Edwin’s JFP paper from last year.

The high-level Idris AST already contains source code information, which is used to report static error messages such as unification errors. To implement sourceLocation, the already-existing source locations are simply used to generate Idris-level source locations. Because not every term resulting from the parser contains a source location, the closest containing location is passed recursively in the elaborator.

Purity

Does this change mean that Idris is no longer a pure language? After all, the following function will return a different value depending on where it is called:

getLocation : {default tactics {sourceLocation} loc : SourceLocation} -> SourceLocation
getLocation {loc} = loc

Even mere changes to whitespace could cause a program that uses getLoc to compute a radically different value! Clearly, the high-level Idris language is no longer referentially transparent.

However, this does not affect the purity of the uderlying core language. From the perspective of the core, when you move a call to getLoc to a new location, the term itself changes, and the constant value of loc is simply changed. While this feature could most assuredly be abused, it is surely no worse than type class resolution or proof search when it comes to referential transparency.


If you find a creative use for the sourceLocation tactic or if you use it in a serious manner, please let me know! I’d love to hear how it works for real-world users.

A Pretty-Printer that Says What It Means

Idris supports semantic highlighting of compiler output. In this post, I’ll sketch what the feature brings users, why it’s interesting, what inspired it, and how it’s implemented in the Idris compiler. Rather than spending a bunch of time describing a series of screenshots, here’s a quick video of it in action, inside of idris-mode for Emacs:

Note that all the interesting features there are implemented in the Idris compiler (specifically, the pretty-printer), and that Emacs does very little work to use these features. They are available for any editor, and partially at the command line REPL, in a manner that can be incrementally implemented to get more and more features.

I showed Simon Peyton Jones a quick demo at the Haskell Implementors’ Workshop in Gothenburg today, and he encouraged me to write it up, thinking that there might be more general interest. So here we go!

Context

The Idris compiler and REPL have two output modes: one intended for direct human interaction in a terminal, and one intended for integration into external tools. When in console mode, Idris emits ANSI codes in its output stream that place semantically relevant font information into the interpreter output. However, we’re limited by what terminals support. When working with more expressive output contexts, we can present much more than just fonts.

Output intended for other programs, when Idris is running in IDE mode, consists of S-expressions that represent the semantics of the output. Sometimes, the entire sense of the message can be a machine readable message, but other times annotated strings are sent. Examples of the first include a notification that a file was successfully type checked and a signal indicating that the user has completed a tactic proof, and examples of the second include error messages and REPL output. In this post, I’ll mostly focus on how annotated strings are generated, as it’s the most fun part. I’m focusing on the underlying ideas here, so I’m glossing over some details and type signatures may not match the real ones in the Idris code base.

Having colors in the output from the Idris interpreter was originally inspired by a demonstration of Julia at OPLSS 2013, by Leah Hanson. I was really impressed by how the colored output from the REPL was helpful in understanding the results and distinguishing between program output and expression values. I was also feeling envious of Agda’s nice semantic coloring in Emacs buffers.

The Pretty-Printer

All output from the Idris compiler is generated by the Idris pretty-printer. In this pretty-printer, we need to do two things: (1) generate strings from compiler data structures, and (2) maintain the link between regions of these strings and semantically interesting information about them. For example, the compiler needs to know that the “Nat” in “Z : Nat” is a type constructor and that “Z” is a data constructor, and furthermore that both really occur in the namespace “Prelude.Nat”, but are being displayed in a truncated form. That way, the full disambiguated name can be used to do things like looking up documentation.

It is not possible, however, to maintain the link between regions of text and semantic annotations during pretty-printing, because the whole reason we have a pretty-printer is to free us from the burden of performing specific text formatting. The pretty-printing library combinators generate an abstract type Doc, which can later be rendered to specific output contexts, such as terminals of different widths.

The approach taken in Idris to bridge this gap is to replace the type Doc with a type constructor Doc a, where the type a represents the type of the semantic annotations in the document. Then, a single additional combinator annotate :: a -> Doc a -> Doc a can be used to attach elements of the semantic annotation type to the document.

The Idris pretty-printer is based on wl-pprint, and its available as a mostly-API-compatible library on Hackage called annotated-wl-pprint. Please refer to the original library documentation or to Phil Wadler’s paper “A Prettier Printer” for the basics of the approach. Rendering, in wl-pprint, converts a Doc to a SimpleDoc, which is specialized to a particular output width. Then, various functions are available to convert a SimpleDoc to a String and to write it to an output strea in IO. In annotated-wl-pprint, both Doc and SimpleDoc are parameterized over annotations, and additional String-rendering functions are available that can use the semantic information.

The original rendering functions are still available in annotated-wl-pprint: they merely discard the annotations. However, two new rendering functions are available. The first, called displayDecorated, take a decorator function with type a -> String -> String as an argument, and uses this to rewrite each substring as it is generated. This function is used in Idris to generated console output. The second, called displaySpans, generates a pair consisting of the ordinary string representation as well as a list of text spans and their corresponding annotations. This is used in Idris to generate machine-readable IDE output.

Doc and SimpleDoc are Functors

The constructors of Doc a and SimpleDoc a are not exported, as they must maintain some invariants that the combinator language ensures. However, they have instances of Functor, which means that post-processing and rendering of user representations of the semantic annotations is just an fmap away. This is very useful in Idris, and I would like to thank Philippa Cowderoy for pointing out to me on IRC that pretty-printer documents are in fact functors.

For example, when the pretty-printer encounters a global name in a term, it simply saves its fully-qualified form in an annotation. Later, we can add the additional information (such as documentation summaries and type signatures) that is sent to IDEs. This enables the pretty-printer to be independent of the global definition context in Idris, enabling greater modularity. Additionally, the type system can be used to ensure that various processing steps have been carried out, although this could use some refactoring in Idris at the moment.

This structure enables a clean separation between the pretty-printing of textual output and the processing of semantic information. Additionally, the technique is also useful when displaying documentation, where parsed Markdown formatting information is inserted in the strings to be displayed, and to associate source locations in error messages with a declarative specification of the full filename and line and column numbers.

Active Terms in Idris

In the demo that I linked to at the top of this post, you can see editor commands to show the implicit arguments to already-printed terms and to normalize them on demand. This is implemented by annotating subterms in the output record with a serialized representation of their corresponding term in Idris’s core language, as well as some information about the names available in their scope. These terms are then attached as a semantic annotation. When these annotations are rendered for IDE clients, they are base64-encoded.

Idris’s IDE protocol has commands that can manipulate these encoded terms, pretty-printing them with various options, potentially after normalization. Naturally, these commands annotate their output with a serialized representation of the resulting term, allowing them to be used again.

Remaining

The current implementation of annotated-wl-pprint is lacking one important rendering function, which would allow actions in a monad to delimit output strings. This would enable the display of semantic colors on terminals in Windows, which don’t support ANSI color codes. Additionally, I’m quite sure that I can attach more information that would be useful.

Do you have any nifty ideas for idris-mode? Are you working on Idris support from another editor? Is there something in the above that should be done a better way? Do you need help using annotated-wl-pprint in your own project? Please don’t hesitate to get in touch!

New paper submission: “Type-Directed Elaboration of Quasiquotations: A High-Level Syntax for Low-Level Reflection”

I just finished a submission to IFL 2014. It’s a paper about Idris’s quasiquotations mechanism, which allow the use of high-level Idris syntax to describe low-level reflected terms, with the ability to escape from quotation in chosen areas and intentionally control the details of the representation. This is very much inspired by Lisp quasiquotation.

You can find the paper on my academic page. Because of the IFL submission process, I have ample time to make changes, so if I’ve made a mistake or failed to explain something clearly, I’d love to hear about it!

New podcast on type theory

I’m a cohost of a new pocast on type theory, The Type Theory Podcast. We just posted our first episode: an interview with Peter Dybjer on types and testing.

Implementing an Emacs programming language mode: beyond the basics

As one of the two primary authors of idris-mode, I’ve had to learn a fair bit about implementing Emacs modes. While I’m far from an expert on the subject, I do think that I’ve picked up a few ideas that are useful. There are already a number of good tutorials about the very basics, such as define-derived-mode, font-lock, and Emacs packages. What I haven’t been able to find is an answer to the question “what’s next?”. You can view this post as a kind of checklist for useful things to add to your mode.

I won’t go into great detail about how to do all of these things, because the documentation is typically quite good. The real problem is learning that there is documentation to be read!

Imenu

While many Emacs users turn off the menu bar entirely, making Imenu not particularly attractive, it pays to support it anyway. This is because lots of other features in Emacs use Imenu as a data source. For example, speedbar will use it to show an index of your file and which-func-mode uses it to determine what to show in your mode line.

Completion

These days, it is very straightforward to implement completion for your mode. Many older modes contain complicated code to show completion buffers, restoring window configurations afterwards. In Emacs 24, we have completion-at-point. Simply ensure that this is bound to something (often M-TAB), and then arrange for it to have enough information to present good completions

The variable completion-at-point-functions provides the raw data that completion-at-point will use to construct a completions buffer. The variable contains a list of 0-argument functions, each of which should return either nil or a collection of completion candidates. Emacs takes care of all the tedious organization of buffers. In idris-mode, we simply delegate to the compiler to provide completion candidates.

Eldoc

Eldoc is a minor mode for showing documentation strings in the echo area. By default, it supports Emacs Lisp, but it’s quite straightfoward to make it work for your language. All you need is a function that returns either a docstring for the symbol at point or nil. In the case of Idris, we already had a command to ask the compiler for a type signature, so it was especially easy. Once you have this function, just set eldoc-documentation-function to its name and place turn-on-eldoc-mode in your default mode hook.

Here’s the function from idris-mode:

(defun idris-eldoc-lookup ()
  "Support for showing type signatures in the modeline when there's a running Idris"
  (let ((signature (ignore-errors (idris-eval (list :type-of (idris-name-at-point))))))
    (when signature
      (with-temp-buffer
        (idris-propertize-spans (idris-repl-semantic-text-props (cdr signature))
          (insert (car signature)))
        (buffer-string)))))

All of the idris-propertize-spans stuff is to apply font information from the compiler.

Flycheck

Flycheck is a minor mode that runs quick external programs on the contents of your buffer, annotating the buffer with any warnings produced, without you needing to save the buffer. It is fairly straightforward to define a checker for Flycheck, and it can be added to your mode without disturbing users who don’t use Flycheck by wrapping it up in an eval-after-load. Here is the entire checker from idris-mode, written by Brian McKenna:

;;;###autoload
(eval-after-load 'flycheck
  '(progn
     (flycheck-define-checker idris
       "An Idris syntax and type checker."
       :command ("idris" "--check" "--nocolor" "--warnpartial" source)
       :error-patterns
       ((warning line-start (file-name) ":" line ":" column ":Warning - "
                 (message (and (* nonl) (* "\n" (not (any "/" "~")) (* nonl)))))
        (error line-start (file-name) ":" line ":" column ":"
               (message (and (* nonl) (* "\n" (not (any "/" "~")) (* nonl))))))
       :modes idris-mode)

     (add-to-list 'flycheck-checkers 'idris)))

Customize

In my opinion, it’s important to have variables that control settings for a mode be separate from other variables, so that users can see what they’re intended to change and what they are not. A good way to indicate that a variable is intended for user customization is to make it available through customize. When thinking about how to customize features of your mode, consider how to organize them into groups so that they are easy to find.

If you define your mode as a derive mode (and you should), remember to pass the :group option to define-derived-mode. This makes the standard command customize-mode work.

Additionally, your mode should really avoid using built-in faces. Instead, define your own faces with defface and set them to inherit from the built-in faces that you were thinking of using. This lets your users decide how they want to see your mode. Good defaults are also important, to the extent that you can have them, so pay attention to built-in faces that you can inherit from, which will make your mode automatically conform to the user’s color theme. Perhaps the biggest source of complaints about idris-mode has come from default colors that don’t look good for users, where I was unable to find a built-in face to inherit from that makes sense. Surprisingly many Emacs users don’t know that they can easily customize a face.

Prog mode

Remember to derive your mode from prog-mode. If you don’t do that, at least make sure that you call prog-mode-hook. This allows Emacs users to set things up for every programming language mode, while not using these settings in other places. This is useful for things like showing trailing whitespace or turning on line numbering.

Common Lisp features

If you have experience writing Common Lisp code, you’re in for a disappointment when you write Emacs Lisp. There are many, many useful features, such as destructuring-bind, that Emacs Lisp just doesn’t support. Furthermore, when these features are available, you either have to use a compile-time-only macro package (cl) or an ugly prefixed version (cl-lib). I recommend the latter, as you avoid having to worry about which features are macros and which are functions. Unfortunately, you will probably end up with cl loaded by something in your initialization file, so Emacs will probably not complain if you accidentally use cl names instead of cl-lib names, which can lead to problems for your users. Consider using cl-lib-highlight to get warnings in your Lisp buffers when you use un-prefixed cl symbols.

All the rest

After you’ve got these things done, then it’s time to start attacking all the little things, like getting the parsing working for motion commands such as forward-sexp, implementing fill-paragraph for comments and docstrings, and so forth. I haven’t done all these things yet, but I’ll write it up if I find a nice strategy.

Techniques for using decision procedures in Idris

This post is a literate Idris file, so first let’s get some bureaucracy out of the way:

module LT
%default total

When writing Idris code, we sometimes need a proof as an argument to a function. In many cases, these proofs can be constructed automatically by a decision procedure. Here, I’m using the term "decision procedure" for some property p to refer to a function that returns something in the type Dec p.

An instance of Dec p is either a proof of p or a proof that p is an absurdity. The datatype is defined as follows:

data Dec : Type -> Type where
  Yes : p -> Dec p
  No : (p -> _|_) -> Dec p

As an example, take the following type from the Idris library that represents a proof that one natural number is less than or equal to another:

data LTE : Nat -> Nat -> Type where
  lteZero : LTE 0 m
  lteSucc : LTE n m -> LTE (S n) (S m)

The first constructor, lteZero, should be interpreted as an assertion that zero is less than or equal to any natural number. The second, lteSucc, is an assertion that if \(n \leq m\), then \(n+1 \leq m+1\). It happens to be the case that, for any two natural numbers, it is decidable whether the first is less than or equal to the second.

To prove this, we start with two simple lemmas. The first states that it is not the case that the successor of any number is less than or equal to zero:

notLTESZ : LTE (S k) Z -> _|_
notLTESZ lteZero impossible

Additionally, we show that, if \(k \not\leq j\), then \(k+1 \not\leq j+1\):

notLTE_S_S : (LTE k j -> _|_) -> LTE (S k) (S j) -> _|_
notLTE_S_S nope (lteSucc x) = nope x

We are now in a position to write the decision procedure, by examining each case and using our lemmas to produce the appropriate contents for the No constructor. If the details don’t make sense, then please fire up Idris, replace some right-hand sides with metavariables, and examine the proof context.

decLTE : (n,m : Nat) -> Dec (n `LTE` m)
decLTE Z m = Yes lteZero
decLTE (S k) Z = No notLTESZ
decLTE (S k) (S j) with (decLTE k j)
  decLTE (S k) (S j) | (Yes prf) = Yes (lteSucc prf)
  decLTE (S k) (S j) | (No contra) = No $ notLTE_S_S contra

Now that these details are out of the way, we can get to the main point of this blog post: demonstrating various techniques for getting Idris to fill out these proofs for us.

The most straightforward way is to use the interactive editing features to get Idris to construct the proof term directly. For example, if we want to show that \(3 \leq 5\), then we need only create a definition with type 3 `LTE` 5:

ok : 3 `LTE` 5

We can then use C-c C-s in idris-mode to cause it to become

ok : 3 `LTE` 5
ok = ?ok_rhs

Using the proof search command C-c C-a on the metavariable gives us

ok : 3 `LTE` 5
ok = lteSucc (lteSucc (lteSucc lteZero))

and the decision procedure simply isn’t necessary. This approach can make our programs ugly, but it is straightforward and direct. We need not actually show the resulting term, however. In Idris, tactic scripts are allowed as the right-hand side of definitions. So we can simply invoke the search tactic explicitly, rather than using it through the compiler’s editor interface, and the resulting term does not need to occur in the source file. This can also be more robust in the face of changes to the definition of LTE. The tactic-based definition is:

okTactic : 3 `LTE` 5
okTactic = proof search

Here, proof begins a tactic script, and search is the tactic to be invoked.

Using Idris’s built-in proof search has two major limitations:

  1. It cannot solve more "interesting" goals, in which a simple recursive application of constructors is insufficient
  2. It fails to find large goals, as the recursion depth is limited. For example, it cannot solve 103 `LTE` 105.

What we really want to do is use decLTE to find the term.

One non-solution is to have the right-hand side of the definition perform a case analysis on the result of a call to decLTE, and then return the contents of the Yes case. Because we know that three is, in fact, less than five, we will never need the No case:

okCase : 3 `LTE` 5
okCase = case decLTE 3 5 of
           Yes prf => prf

Unfortunately, the Idris compiler needs to be convinced of the fact that No cannot occur. We do this by showing that the No case could cause the empty type to be inhabited, and then use FalseElim. Unfortunately, doing this requires actually proving the property again:

okCase : 3 `LTE` 5
okCase = case decLTE 3 5 of
           Yes prf => prf
           No nope => FalseElim . nope $ lteSucc (lteSucc (lteSucc lteZero))

So this method is a non-starter.

The next method is to cause the solver for implicit arguments to execute the decision procedure while filling out an implicit argument for the proof, and then return this implicit argument. The definition okImplicit demonstrates this technique:

okImplicit : 103 `LTE` 105
okImplicit  = getProof
  where getProof : {prf : LTE 103 105} -> {auto yep : decLTE 103 105 = Yes prf} -> 103 `LTE` 105
        getProof {prf} = prf

All of the work happens in getProof, which is in a where block to keep the type signature of okImplicit clean. In the definition of getProof, the implicit argument prf is the one that will be filled out with the proof, and then returned. The argument yep is an implicit proof that the result of executing the decision procedure will be Yes prf – in other words, that it will succeed, and the thing it succeeds with will unify with prf. The auto keyword causes Idris to fill out the argument with refl, the constructor of proofs of equality.

While the okImplicit technique leads to noisy boilerplate in type signatures, it also tends to cause error messages that are easy to understand. I’ve had a lot of luck using it for embedded DSLs with some kind of side condition (like "these queries must have disjoint schemas").

The final technique that I’d like to demonstrate uses a dependently-typed extractor of proofs from Dec. The type of the function getYes does case-analysis on the result of a decision procedure. If the result was Yes, then getYes will return the contents – an instance of p. Otherwise, it returns unit, which contains no interesting information. Note that the result of the case expression is a type — either the type parameter to Dec or the unit type.

getYes : (res : Dec p) -> case res of { Yes _ => p ; No _ => () }

The body of getYes performs a case analysis on the result of the decision, either returning a proof or a trivial value, in accordance with the type. Note that Idris’s interactive editing features were able to write this code entirely – the only thing I needed to do was tell it to case split on the argument. Agda, which has more advanced interactive editing features, could probably write the entire body by passing Agsy the -c option, which enables case splitting.

getYes (Yes prf) = prf
getYes (No contra) = ()

Now, getYes can be used on the right-hand side of our definition to extract the proof:

okYes :  3 `LTE` 5
okYes = getYes (decLTE 3 5)

While this approach is clean, the error messages can be somewhat unpleasant if Idris is asked to prove a falsehood. The following code:

oops :  5 `LTE` 3
oops = getYes (decLTE 5 3)

results in this unification error:

     When elaborating right hand side of oops:
     Can't unify
             case block in getYes (LTE (fromInteger 5) (fromInteger 3)) (decLTE (fromInteger 5) (fromInteger 3)) (decLTE (fromInteger 5) (fromInteger 3))
     with
             LTE 5 3
     
     Specifically:
             Can't unify
                     case block in getYes (LTE (fromInteger 5) (fromInteger 3)) (decLTE (fromInteger 5) (fromInteger 3)) (decLTE (fromInteger 5) (fromInteger 3))
             with
                     LTE 5 3

whereas the error message from

badImplicit : 105 `LTE` 103
badImplicit  = getProof
  where getProof : {prf : LTE 105 103} -> {auto yep : decLTE 105 104 = Yes prf} -> 105 `LTE` 103
        getProof {prf} = prf

was

     When elaborating right hand side of badImplicit:
     When elaborating argument yep to LT.badImplicit, getProof:
             Can't solve goal 
                     decLTE (fromInteger 105) (fromInteger 104) = Yes prf

You have now seen a few ways of getting Idris to find proofs. Hopefully, you’ve also gotten a bit more understanding of how Idris works while compiling your source file.

This post sprang out of a discussion on a Github issue with Nicola Botta. The discussion of error messages is improved as the result of a suggestion from Anthony Cowley. Thanks, Nicola and Anthony!

New tutorial on bidirectional typing rules

I’ve just written a new tutorial on bidirectional typing rules. It’s available from my tutorials page at ITU. The PDF can also be downloaded directly.

Tutorials on my ITU page

In case you’re interested in programming languages and you’re working from Peter Sestoft’s book Programming Language Concepts, I’ve just posted some supplementary materials to my ITU page. It consists of a tutorial for converting from regular expressions to discrete finite automata via NFAs and a tutorial on constructing typing derivations in the particular miniature ML-like language used in the book.

You can find them on my tutorials page.

Using Scala libraries with new compiler versions

I recently needed to use Scalacheck with a milestone release of the compiler, and it took me a little while to wrap my head around the infrastructure. To preserve the steps for prosperity, I’m documenting them here, as I couldn’t find it described anywhere.

The first step is to understand the relationship between the various parts:

SBT
SBT is responsible for compiling your project, running tests, and generally performing the same role as make or ant. It also uses Ivy for retrieving library dependencies.
Ivy
Apache Ivy provides an infrastructure for dependency management. Libraries exist in repositories, similar to Debian package repositories, and they are available in multiple versions.

When you add library dependencies in your project’s SBT configuration, it will find the code by first looking in your local Ivy repository, then in any remote repositories that are configured. In other words, even if you don’t have access to the remote repositories, you can still put libraries in your local repository to be found by SBT.

Libraries such as Scalacheck that are intended to be used with multiple versions of Scala will typically use the crossScalaVersions setting to build against multiple compiler versions. The workflow to get a library working is, then:

  1. Get ahold of the source code to the library
  2. In build.sbt, find the crossScalaVersions setting. Add the desired version of Scala.
  3. Run sbt. At the command line, type + test. This will compile the project for each configured version of Scala and run the tests.
  4. Assuming the project passes, run + publish-local to push the compiled version

Now, the library will be available in your other projects.

Make Ensime work with newer versions of SBT

If you try to use Ensime with newer versions of SBT, you’ll find that SBT grows to fill available memory and then crashes. This is because the latest released version of Ensime assumes the project format of an old version of SBT. To fix this, just add the following to your .emacs after you require Ensime:

(defun ensime-sbt-project-dir-p (path)
  "Does a build.sbt or a project/build.properties exists in the given path."
  (or (file-exists-p (concat path "/build.sbt"))
      (file-exists-p (concat path "/project/build.properties"))))

The problem occurs because Ensime looks for your project directory by recursively going to parent directories until it sees something that looks like an SBT project. Eventually, it ends up in root, and SBT crashes while trying to recursively enumerate all the files in the current directory. Redefining this function to also look for newer project formats fixes the issue.