type system for the win: phantom types

Today I want to show you some cool stunts you can do with single-case ADTs to help you make invalid states unrepresentable.

To be precise I want to talk a bit about Phantom Types – those are parametrised types where a type-variable only appears on the left-hand side (in the type-definition) but not on the right-hand side (the type-constructor). You can use this to put some kind of type-label on you types to help you keep things separated.

This has the advantage that (in contrast to ADTs with several cases) you can extent it without having to change the type.

text encodings

To make this a bit more practical the example throughout this short essay will be encoding of strings as byte-arrays and hashing those with a crypto-hash. So there are basically a lot of different byte-arrays you can operate on, but if you choose the wrong ones (say you try to compare a SHA1 hash with an MD5 hash) you are usually out of luck (better have some tests or you might have a nasty bug). Here you can use the labeling-technique with phantom-types to let the compiler help you out.

So the first example for a phantom type is this:

type 'a Bytes = Bytes of byte array

As you can see the type-variable 'a (the future label) only appears on the left side (so you can create inhabitants just by Bytes [| ... |]). Of course we need some labels so let’s differentiate between two encodings: Ascii and Utf8 and define some functions to encode strings into the right type:

type Ascii = Ascii
type Utf8  = Utf8

let ascii : string -> Ascii Bytes =
    System.Text.Encoding.ASCII.GetBytes >> Bytes

let utf8 : string -> Utf8 Bytes =
    System.Text.Encoding.UTF8.GetBytes >> Bytes

With this you can already have some fun and catch errors in FSI:

let text1 = ascii "Hello";;
 >> val text1 : Ascii Bytes = Bytes [|...

let text2 = utf8 "Hello";;
 >> val text2 : Utf8 Bytes = Bytes [|...

// compare with the right encoding
text1 = ascii "Hello";;
 >> val it : bool = true

// wrong spelling
text1 = ascii "HellO";;
 >> val it : bool = false

// different encodings
text1 = text2;;
 >> error: Type mismatch. Expecting a
        Ascii Bytes    
    but given a
        Utf8 Bytes    
    The type 'Ascii' does not match the type 'Utf8'

As you can see you equality just works but if you try to compare strings with different encodings the compiler will not let you – nice!

You can of course write generic functions for this too – here is a concatenation function that will not let you concat different encoded byte-arrays:

let (++) (Bytes bs : 'a Bytes) 
         (Bytes bs' : 'a Bytes) 
         : 'a Bytes = 
    Array.append bs bs' |> Bytes

// matching encodings
ascii "Hello" ++ ascii "World";;
 >> val it : Ascii Bytes =
      Bytes [|...

// different encodings
ascii "Hello" ++ utf8 "World";;
 >> error: This expression was expected to have type
        Ascii Bytes    
    but here has type
        Utf8 Bytes    

crypto hashes

No let’s add some types and functions to do some crypto-hashes:

type Hashed<'alg,'a> = Hashed of byte array 

Here I create another phantom-type that has labels for the hash-algorithm and the encodings of the hased types. Let’s add two hashes – MD5 and SHA1:

type Sha1 = Sha1
type Md5  = Md5

let hashSha1 (Bytes bs : Bytes<'a>) : Hashed<Sha1,'a> =
    use sha1 = System.Security.Cryptography.SHA1.Create ()
    sha1.ComputeHash bs |> Hashed

let hashMd5 (Bytes bs : Bytes<'a>) : Hashed<Md5,'a> =
    use md5 = System.Security.Cryptography.MD5.Create ()
    md5.ComputeHash bs |> Hashed

As you can see this copies the encoding from the argument into the resulting type – now the compiler will help you if you picked the wrong encodings:

let hash1 = hashMd5 (ascii "Hello");;
 >> val hash1 : Hashed<Md5,Ascii> = ..

// will the hashes fit?
hash1 = hashMd5 (ascii "Hello");;
 >> val it : bool = true

// what if the text is wrong?
hash1 = hashMd5 (ascii "HellO");;
 >> val it : bool = false

// what about the wrong encoding?
hash1 = hashMd5 (utf8 "Hello");;
 >> error: This expression was expected to have type
        Ascii Bytes    
    but here has type
        Utf8 Bytes    
 
// what about the wrong algorithm?
hash1 = hashSha1 (ascii "Hello");;
 >> error: Type mismatch. Expecting a
        Hashed<Md5,Ascii>    
    but given a
        Hashed<Sha1,'a>    
    The type 'Md5' does not match the type 'Sha1'

Nice huh?

checking the hashes … reaching the limits

Now let’s write something to check hashed values:

let checkSha1Hash (hs : Hashed<Sha1,'a>) 
                  (bs : 'a Bytes) : bool =
    hashSha1 bs = hs

let checkMd5Hash (hs : Hashed<Md5,'a>) 
                 (bs : 'a Bytes) : bool =
    hashMd5 bs = hs

It’s of course rather easy – but it’s a bit of a shame that we have to use different functions (name) when the types tell the complete story.

Well we are not totally out of luck – while we sadly cannot have functions with the same name in a module we can have them as static methods on a class (although we have to use tupled arguments):

type Hashes () =
    static member check ( hs : Hashed<Sha1, 'a>
                        , bs : 'a Bytes) =
        checkSha1Hash hs bs
    static member check ( hs : Hashed<Md5, 'a>
                        , bs : 'a Bytes) =
        checkMd5Hash hs bs

This indeed still works fine:

let hash1 = hashMd5 (ascii "Hello");;
 >> val hash1 : Hashed<Md5,Ascii> = ..

// check it
Hashes.check (hash1, ascii "Hello");;
 >> val it : bool = true

// wrong text:
Hashes.check (hash1, ascii "HellO");;
 >> val it : bool = false

// wrong type:
Hashes.check (hash1, utf8 "HellO");;
 >> error: No overloads match for method 'check'. ...

When we try to do the same for the hash itself we are out of luck:

...
static member hash (bs : 'a Bytes) : Hashed<Sha1,'a> =
   hashSha1 bs
   
static member hash (bs : 'a Bytes) : Hashed<Md5,'a> =
   hashMd5 bs

 >>> Error: Duplicate method. 
     The method 'hash' has the same name 
     and signature as another method in this type.

What we would need here is overloading based on the generic types – more or less what Haskell gives you with it’s type-classes.

There are advanced techniques using static-constraints in F# where you can simulate those but for now I just wanted to hint at the limits.

conclusion

I hope I could show a little trick to use F#s type-system for great profit – phantom types of course have many more use-cases, some of which you can read about in the Haskell-Wiki (make sure to follow the links there).

a possible Haskell implementation

Just in case you are interested how this could look in Haskell here is a simple translation using typeclasses:

import qualified Data.ByteString as BS

import qualified Data.Encoding as Enc
import Data.Encoding.ASCII (ASCII (..))
import Data.Encoding.UTF8 (UTF8 (..))

import qualified Crypto.Hash.SHA1 as CSha1
import qualified Crypto.Hash.MD5 as CMd5


newtype Bytes l = 
    Bytes { getBytes :: BS.ByteString }
    deriving (Show, Eq)

data Ascii
data Utf8

class Encoding l where
  encode :: String  -> Bytes l
  decode :: Bytes l -> String

instance Encoding Ascii where
  encode = Bytes . Enc.encodeStrictByteString ASCII
  decode = Enc.decodeStrictByteString ASCII . getBytes

instance Encoding Utf8 where
  encode = Bytes . Enc.encodeStrictByteString UTF8
  decode = Enc.decodeStrictByteString UTF8 . getBytes

newtype Hashed alg enc = 
    Hashed { getHashBytes :: BS.ByteString }
    deriving (Show, Eq)

data Md5
data Sha1

class Hash alg where
  hash :: Bytes enc -> Hashed alg enc

instance Hash Md5 where
  hash = Hashed . CMd5.hash . getBytes

instance Hash Sha1 where
  hash = Hashed . CSha1.hash . getBytes

  • C

    Awesome article. Just a quick comment, in the first declaration you used let utf8 : string and later dropped the 8 to just use utf (for example at the end of the text encodings paragraph). Still, great article helping me learn more about the power of the type system. Thanks much!

    • Hi and thanks for your comment and for pointing this out.

      Indeed I first used just `utf` and later decided to add the 8 for clarity but of course forgot to edit all the occurrences – I hope I got em all now.