Today I want to show you some cool stunts you can do with single-case ADTs to help you make invalid states unrepresentable.
To be precise I want to talk a bit about Phantom Types – those are parametrised types where a type-variable only appears on the left-hand side (in the type-definition) but not on the right-hand side (the type-constructor). You can use this to put some kind of type-label on you types to help you keep things separated.
This has the advantage that (in contrast to ADTs with several cases) you can extent it without having to change the type.
text encodings
To make this a bit more practical the example throughout this short essay will be encoding of strings as byte-arrays and hashing those with a crypto-hash. So there are basically a lot of different byte-arrays you can operate on, but if you choose the wrong ones (say you try to compare a SHA1 hash with an MD5 hash) you are usually out of luck (better have some tests or you might have a nasty bug). Here you can use the labeling-technique with phantom-types to let the compiler help you out.
So the first example for a phantom type is this:
type 'a Bytes = Bytes of byte array
As you can see the type-variable 'a
(the future label) only appears on the left side (so you can create inhabitants just by Bytes [| ... |]
). Of course we need some labels so let’s differentiate between two encodings: Ascii and Utf8 and define some functions to encode strings into the right type:
type Ascii = Ascii type Utf8 = Utf8 let ascii : string -> Ascii Bytes = System.Text.Encoding.ASCII.GetBytes >> Bytes let utf8 : string -> Utf8 Bytes = System.Text.Encoding.UTF8.GetBytes >> Bytes
With this you can already have some fun and catch errors in FSI:
let text1 = ascii "Hello";; >> val text1 : Ascii Bytes = Bytes [|... let text2 = utf8 "Hello";; >> val text2 : Utf8 Bytes = Bytes [|... // compare with the right encoding text1 = ascii "Hello";; >> val it : bool = true // wrong spelling text1 = ascii "HellO";; >> val it : bool = false // different encodings text1 = text2;; >> error: Type mismatch. Expecting a Ascii Bytes but given a Utf8 Bytes The type 'Ascii' does not match the type 'Utf8'
As you can see you equality just works but if you try to compare strings with different encodings the compiler will not let you – nice!
You can of course write generic functions for this too – here is a concatenation function that will not let you concat different encoded byte-arrays:
let (++) (Bytes bs : 'a Bytes) (Bytes bs' : 'a Bytes) : 'a Bytes = Array.append bs bs' |> Bytes // matching encodings ascii "Hello" ++ ascii "World";; >> val it : Ascii Bytes = Bytes [|... // different encodings ascii "Hello" ++ utf8 "World";; >> error: This expression was expected to have type Ascii Bytes but here has type Utf8 Bytes
crypto hashes
No let’s add some types and functions to do some crypto-hashes:
type Hashed<'alg,'a> = Hashed of byte array
Here I create another phantom-type that has labels for the hash-algorithm and the encodings of the hased types. Let’s add two hashes – MD5 and SHA1:
type Sha1 = Sha1 type Md5 = Md5 let hashSha1 (Bytes bs : Bytes<'a>) : Hashed<Sha1,'a> = use sha1 = System.Security.Cryptography.SHA1.Create () sha1.ComputeHash bs |> Hashed let hashMd5 (Bytes bs : Bytes<'a>) : Hashed<Md5,'a> = use md5 = System.Security.Cryptography.MD5.Create () md5.ComputeHash bs |> Hashed
As you can see this copies the encoding from the argument into the resulting type – now the compiler will help you if you picked the wrong encodings:
let hash1 = hashMd5 (ascii "Hello");; >> val hash1 : Hashed<Md5,Ascii> = .. // will the hashes fit? hash1 = hashMd5 (ascii "Hello");; >> val it : bool = true // what if the text is wrong? hash1 = hashMd5 (ascii "HellO");; >> val it : bool = false // what about the wrong encoding? hash1 = hashMd5 (utf8 "Hello");; >> error: This expression was expected to have type Ascii Bytes but here has type Utf8 Bytes // what about the wrong algorithm? hash1 = hashSha1 (ascii "Hello");; >> error: Type mismatch. Expecting a Hashed<Md5,Ascii> but given a Hashed<Sha1,'a> The type 'Md5' does not match the type 'Sha1'
Nice huh?
checking the hashes … reaching the limits
Now let’s write something to check hashed values:
let checkSha1Hash (hs : Hashed<Sha1,'a>) (bs : 'a Bytes) : bool = hashSha1 bs = hs let checkMd5Hash (hs : Hashed<Md5,'a>) (bs : 'a Bytes) : bool = hashMd5 bs = hs
It’s of course rather easy – but it’s a bit of a shame that we have to use different functions (name) when the types tell the complete story.
Well we are not totally out of luck – while we sadly cannot have functions with the same name in a module we can have them as static methods on a class (although we have to use tupled arguments):
type Hashes () = static member check ( hs : Hashed<Sha1, 'a> , bs : 'a Bytes) = checkSha1Hash hs bs static member check ( hs : Hashed<Md5, 'a> , bs : 'a Bytes) = checkMd5Hash hs bs
This indeed still works fine:
let hash1 = hashMd5 (ascii "Hello");; >> val hash1 : Hashed<Md5,Ascii> = .. // check it Hashes.check (hash1, ascii "Hello");; >> val it : bool = true // wrong text: Hashes.check (hash1, ascii "HellO");; >> val it : bool = false // wrong type: Hashes.check (hash1, utf8 "HellO");; >> error: No overloads match for method 'check'. ...
When we try to do the same for the hash itself we are out of luck:
... static member hash (bs : 'a Bytes) : Hashed<Sha1,'a> = hashSha1 bs static member hash (bs : 'a Bytes) : Hashed<Md5,'a> = hashMd5 bs >>> Error: Duplicate method. The method 'hash' has the same name and signature as another method in this type.
What we would need here is overloading based on the generic types – more or less what Haskell gives you with it’s type-classes.
There are advanced techniques using static-constraints in F# where you can simulate those but for now I just wanted to hint at the limits.
conclusion
I hope I could show a little trick to use F#s type-system for great profit – phantom types of course have many more use-cases, some of which you can read about in the Haskell-Wiki (make sure to follow the links there).
a possible Haskell implementation
Just in case you are interested how this could look in Haskell here is a simple translation using typeclasses:
import qualified Data.ByteString as BS import qualified Data.Encoding as Enc import Data.Encoding.ASCII (ASCII (..)) import Data.Encoding.UTF8 (UTF8 (..)) import qualified Crypto.Hash.SHA1 as CSha1 import qualified Crypto.Hash.MD5 as CMd5 newtype Bytes l = Bytes { getBytes :: BS.ByteString } deriving (Show, Eq) data Ascii data Utf8 class Encoding l where encode :: String -> Bytes l decode :: Bytes l -> String instance Encoding Ascii where encode = Bytes . Enc.encodeStrictByteString ASCII decode = Enc.decodeStrictByteString ASCII . getBytes instance Encoding Utf8 where encode = Bytes . Enc.encodeStrictByteString UTF8 decode = Enc.decodeStrictByteString UTF8 . getBytes newtype Hashed alg enc = Hashed { getHashBytes :: BS.ByteString } deriving (Show, Eq) data Md5 data Sha1 class Hash alg where hash :: Bytes enc -> Hashed alg enc instance Hash Md5 where hash = Hashed . CMd5.hash . getBytes instance Hash Sha1 where hash = Hashed . CSha1.hash . getBytes