Generic Data Types
module Tutorial.DataTypes.GenericDataTypes
import Tutorial.DataTypes.Enumerations
Sometimes, a concept is general enough that we'd like to apply it not just to one type, but to all kinds of types. For instance, it would be highly inconvenient if we had to define a notion of a list over and over again for every type we might want to put in a list, a list of integers and a list of strings may have differently typed contents, but they share the same structure, and repeating that structure would lead to lots of code duplication. Instead we'd like to have a single generic list type parameterized by the type of value it stores.
In this section, we will explore the definition and use of generic types.
Maybe
Suppose we want to parse a value of the Weekday
type, described in the section on enumerations, from user input. Such a parsing function should return Saturday
if the user provided string was "Saturday"
, but what should it return if the input were, say, "sdfkl332"
? We have another of options here, for instance, we could return another "default" value, such as Sunday
, but we must consider what behavior programmers using our library might expect. However, silently continuing with such a default value in the face of invalid user input is very rarely the best option, and can lead to a lot of confusion.
If we were working in a traditional imperative language, our parsing function would probably throw an exception upon encountering such invalid input. Idris has the option to throw an exception too, through the idris_crash
function in the Prelude, but we need to abandon totality to do this, our function would no longer return a value for every possible input. This is a high price to pay for something as commonplace as a parsing error.
In a language like Java, C#, or C++, our function might also return some sort of null
value (leading to the dreaded NullPointerException
if not handled properly in the consuming code). Our solution will be conceptually similar, but instead of silently returning a null
, we will make the possibility of failure visible in the type. We'll define a custom data type, which encapsulates the possibility of failure.
Defining new data types in Idris is very cheap, in terms of the amount of code needed, so this pattern is very often encountered as a way of increasing type safety. Here's what such a bespoke type might look like for our use-case, along with the associated parsing function:
data MaybeWeekday = WD Weekday | NoWeekday
total
readWeekday : String -> MaybeWeekday
readWeekday "Monday" = WD Monday
readWeekday "Tuesday" = WD Tuesday
readWeekday "Wednesday" = WD Wednesday
readWeekday "Thursday" = WD Thursday
readWeekday "Friday" = WD Friday
readWeekday "Saturday" = WD Saturday
readWeekday "Sunday" = WD Sunday
readWeekday _ = NoWeekday
But what if we'd like also like to read a Bool
from user input? We'd have to write another custom data type, MaybeBool
, and so on for all the types we'd like to be able to parse from a String
.
Idris, like many other programming languages, allows us to generalize this behavior by using generic data types. Here's what the generic version of our MaybeWeekday
might look like:
note
While the prelude calls this type Maybe
, we are calling it Option
to avoid conflict. We are also altering the names of the data constructors Some
is equivlant to Just
, and None
is equivalent to Nothing
.
You may recognize the name Option
from other programming languages, for instance, Rust uses the name Option
to refer to this concept.
data Option a = Some a | None
total
readBool : String -> Option Bool
readBool "True" = Some True
readBool "False" = Some False
readBool _ = None
Let's go to the REPL and take a look at the types to get a feel for what we are working with here:
Tutorial.DataTypes.GenericDataTypes> :t Some
Tutorial.DataTypes.GenericDataTypes.Some : a -> Option a
Tutorial.DataTypes.GenericDataTypes> :t None
Tutorial.DataTypes.GenericDataTypes.None : Option a
Tutorial.DataTypes.GenericDataTypes> :t Option
Tutorial.DataTypes.GenericDataTypes.Option : Type -> Type
note
Option
is what we call a type constructor, it is not yet a saturated type. It is, instead, a function from Type
to Type
. Only when we provide it a Type
argument does it actually become a type.
We say that Option
is a type constructor parameterized over a parameter of type Type
. While Option
may not be a type, Option Bool
is a type, and so is Option Weekday
. Even Option (Option Bool)
is a valid type.
Some
and None
are Option
's data constructors, they are functions used to create values of type Option a
for a given type a
Let's see some other use cases for Option
. Below is a safe division operation, returning None
on an attempt to divide by zero, instead of throwing an exception:
total
safeDiv : Integer -> Integer -> Option Integer
safeDiv n 0 = None
safeDiv n k = Some (n `div` k)
It is important to understand the distinction between a function returning, say, Maybe Integer
and returning null
in a language like Java. In the former case, the possibility of failure is visible in the types, and as a result the type checker forces the programmer to deal with the possibility of the function returning Nothing
you can't get an Integer
out of a Maybe Integer
without pattern matching on it and exposing the Nothing
, and Idris doesn't let us forget to handle a case in a pattern match. This stands in stark contrast with the Java-esq approach, where null
is implicitly a valid value of every pointer-backed type, and programmers may (and often will) forget to handle the null
case, leading to unexpected and often hard to debug runtime exceptions.
Either
While Maybe
is very useful for quickly encoding the possibilty of failure, the one value it can provide for the failure case, Nothing
, isn't very informative, it doesn't encode any information about what went wrong. While there are many cases where this is sensible, for instance, if you are attempting to look up a value in a Map, a return value of Nothing
can reasonably be assumed to mean the desired value wasn't present in the Map, there are many more cases where multiple things could have gone wrong and the user of or function would know to know which of the things actually did go wrong.
As an example, in the case of our Weekday
parsing function, it might be useful later on to know the value of the invalid input string. Just like with Maybe
/Option
in the previous section, this concept is general enough that we want to consider other types of invalid data.
Let's build a data type to encode this:
data Validated e a = Invalid e | Valid a
note
Validated
is a type constructor parameterized over two type parameters, e
and a
.
It's data constructors are Invalid
, which holds a value describing some error condition, and Valid
, which describes the result of a successful computation.
With our new Validated
type, we can augment our readWeekday
function with some pretty useful error reporting:
total
readWeekdayV : String -> Validated String Weekday
readWeekdayV "Monday" = Valid Monday
readWeekdayV "Tuesday" = Valid Tuesday
readWeekdayV "Wednesday" = Valid Wednesday
readWeekdayV "Thursday" = Valid Thursday
readWeekdayV "Friday" = Valid Friday
readWeekdayV "Saturday" = Valid Saturday
readWeekdayV "Sunday" = Valid Sunday
readWeekdayV s = Invalid ("Not a weekday: " ++ s)
note
Much like Maybe
, this is such a general and common concept that the Prelude already contains a data type like Validated
: Either
Either's Left
data constructor is equivalent to Validated
's Invalid
, and Right
is equivalent to Valid
.
It is very common for functions to encode the possibility of failure by returning an Either err val
, where err
is the error type and val
is the desired return type. This provides a type safe (and total!) alternative to throwing a catchable exception in an imperative language.
note
Use of Either
does not always imply that Left
is an error and Right
is a success, a function returning Either
just means that it can have two different types of results, each of which are tagged with the corresponding data constructor.
List
One of the most important data structures in pure functional programming is the singly linked list, defined as follows (called Seq
in order for it not to collide with List
, which is of course already available from the Prelude):
data Seq a = Nil | (::) a (Seq a)
note
Seq
consists of two data constructors:
Nil
, representing the empty list(::)
, usually pronounced 'cons', which adds a new element of typea
onto another list of values of typea
.
Notice how we are using operators as data constructors, this is sometimes quite useful, but it's usually more helpful to give your functions and data constructors clear names, and only use operators sparingly, when there is a compelling case to be made that they would improve readability.
If we wanted to use the List
data constructors directly, it would look something like this:
total
ints : List Int64
ints = 1 :: 2 :: -3 :: Nil
However, there is a more concise way of writing the above. Idris accepts special syntax for constructing data types consisting exactly of the two constructors Nil
and (::)
:
total
ints2 : List Int64
ints2 = [1, 2, -3]
total
ints3 : List Int64
ints3 = []
The two definitions ints
and ints2
are treated identically by the compiler.
note
List syntax can also be used in pattern matching.
Seq
and List
both have another special property, each of them is defined in terms of itself, due to the cons operating taking a value and another List
/Seq
as arguments. We call such data types recursive data types, and as a result of their recursive nature, decomposing or consuming them typically requires recursive functions.
In a traditional imperative language, we might use a for loop, or similar construct, to iterate over the values of a List
, but traditional loops don't exist in a lanugage without in-place mutation. Lets take a look at the recursive way of summing a list of integers:
total
intSum : List Integer -> Integer
intSum Nil = 0
intSum (n :: ns) = n + intSum ns
Recursive functions can be hard to grasp at first, so we'll break this down a bit. If we invoke intSum
with the empty list, the first pattern matches and the function returns zero immediately. If, however, we invoke intSum
with a non-empty list - [7,5,9]
for instance - the following happens:
-
The second pattern matches and splits the list into two parts: Its head (
7
) is bound to variablen
and its tail ([5,9]
) is bound tons
:7 + intSum [5,9]
-
In a second invocation,
intSum
is called with a new list:[5,9]
. The second pattern matches,n
is bound to5
, andns
is bound to[9]
:7 + (5 + intSum [9])
-
In a third invocation
intSum
is called with list[9]
. The second pattern matches,n
is bound to9
, andns
is bound to[]
:7 + (5 + (9 + intSum [])
-
In a fourth invocation,
intSum
is called with list[]
and returns0
immediately:7 + (5 + (9 + 0)
-
In the third invocation,
9
and0
are added and9
is returned:7 + (5 + 9)
-
In the second invocation,
5
and9
are added and14
is returned:7 + 14
-
Finally, our initial invocation of
intSum
adds7
and14
, returning21
.
The recursive implementation of intSum
leads to a sequence of nested calls to intSum
, terminating once the argument is the empty list.
Generic Functions
In order to fully appreciate the versatility of generic data types, we need to talk about generic functions. Much like generic types, they are parameterized over one or more type parameters.
Consider, for instance, breaking a value out of the Option
data type. In the event it contains a Some
, we'd like to return the stored value, while for the None
case we'd like to provide a default value. Let's take a look at a function that implements this logic, specialized over Integer
s:
total
integerFromOption : Integer -> Option Integer -> Integer
integerFromOption _ (Some y) = y
integerFromOption x None = x
As the pattern of this section might imply, this approach isn't as general as we'd like, with a specialized implementation like this, we'd need similarly bespoke functions to break a value out of an Option Bool
, or an Option String
, or any other possible Option
type. We can, of course, do better, with our generic fromOption
function:
total
fromOption : a -> Option a -> a
fromOption _ (Some y) = y
fromOption x None = x
The lower-case a
here is once again a type parameter, allowing us to read the type signature as "For any type a
, given a value of type a
, and a value of type Option a
, we can return a value of type a
".
note
The compiler knows nothing about a
except for the fact that it is a type, as a result we can't summon a value of a
out of thin air like we can with an Integer
. We must have a value available to deal with the None
case.
The equivalent to fromOption
for Maybe
is called fromMaybe
and is available from the Data.Maybe
module in the base library.
Sometimes, even fromOption
is not general enough. Assume we'd like to print the value of a freshly parsed Bool
, giving some generic error message in case of a None
. We can't use fromOption
for this, as we have an Option Bool
and we'd like to return a String
. Here's one way we might accomplish this:
total
option : b -> (a -> b) -> Option a -> b
option _ f (Some y) = f y
option x _ None = x
total
handleBool : Option Bool -> String
handleBool = option "Not a boolean value." show
The function option
is parameterized over two type parameters. The a
parameter represents the type of the values stored in the Option
, while b
is the return type of the function. If the Option
turns out to be a Just
, we need a way to convert the resulting a
to a b
, and that's done using second argument, which has the type of a function from a
to b
.
In Idris, lower-case identifiers in function types are treated as type parameters, while upper-case identifiers are treated as types or type constructors, which must be in scope.