Monday, January 21, 2008

What is an Algebraic Data Type?

Tony Morris recently ran an excellent Scala course for some of the guys @ the office. The course really opened my eyes to the power of functional programming languages like Haskell and to the object-oriented/functional programming hybrid - Scala.

Scala leverages all the goodness (or badness- as Tony might say) of Java. One of the main hindrances I found when moving to Scala and any functional programming language for that matter were, the concepts and and the terminology.

In this post, I hope to shed some light on what an Algebraic Data Type is.

Here are some definitions I have found for algebraic data types:

A type defined by giving several alternatives, each of which come with their own constructor. It usually comes with a way to decompose the type through pattern matching. The concept is found in specification languages and functional programming languages. Algebraic data types can be emulated in Scala with case classes. - Programming in Scala

In computer programming, an algebraic data type is a datatype each of whose values is data from other datatypes wrapped in one of the constructors of the datatype. Any wrapped data is an argument to the constructor. In contrast to other datatypes, the constructor is not executed and the only way to operate on the data is to unwrap the constructor using pattern matching. - Wikipedia

"It is using types to define an algebra" - Tony Morris

While all of the above definitions are correct in some form or other, it doesn't make it easy for a new-comer to understand these concepts due to implementation details such as "constructors" , "pattern matching", "case classes". After much deliberation with Tony, we finally came up with a simple definition of an algebraic data type:

"An algebraic data type is a classification"

There. That's very easy to understand. Anything and everything around you can be classified. We do this in Java with classes. For example, Person could be an algebraic data type with Barney and Fred as two elements of that type:


class Person {}
final class Fred extends Person {}
final class Barney extends Person {}


You could use the values of Fred and Barney to figure out which Person element was being used. In Java you could do this by:


if (person instanceof Fred) {
//do something
} else if (person instanceof Barney){
//do something else
} else {
throw new IllegalArgumentException("There are only 2 people. I don't know what you are.")
}


or if you used an enum:


public enum Person {
Fred, Barney
}

...
...
switch (person) {
case Fred: //do something; break
case Barney: //do something else; break
default: //show error
}



In Scala you could pattern match (select the values of the algebraic data type) with:


person match {
case Fred => //do something
case Barney =>// do something else
case _ => //handle this exceptional case.
}


as you can see it is very similar to switching with an enumerated type in java. One main difference is that you can pull out values from a matched element (deconstruct) if it were passed to its constructor:

if Fred had a constructor of the form:
Fred(name: String, surname: String)


then we could match and deconstruct (pull out parameters) with:


case Fred(name, surname) => //use name and surname.


Person could be called a closed data type if it defined a finite set of member elements (or closed sum type (to use more mathematical and rigorous terminology) - Tony). For example take a Java enum of Person:


public enum Person {

Fred("Flintstone"),
Barney("Rubble");

private final String surname;

private Person(final String surname) {
this.surname = surname;
}

public String getSurname() {
return surname;
}
}


We know that since we can't extend enums externally that there are only ever 2 Person elements: Fred and Barney. In Scala a closed data type could be implemented with a sealed trait:


sealed trait Person
final case class Fred(surname: String, age: Int) extends Person
final case class Barney(salutation: String) extends Person

...
...
person match {
case Fred(surname, age) => println("Hello Fred " + surname + " your age is " + age)
case Barney(salutation) => println("Hello " + salutation + " Barney")
case _ => println("Who are you?")
}



Scala lets you define different constructors for each type unlike in Java enums where all elements of a type have to have the same constructors.

This contrasts with an open data type like:


class Person {}
final class Fred extends Person {}
final class Barney extends Person {}


Here Person could have an infinite number of subclasses or elements in its algebraic data type.

Another confusing concept I came across was the Abstract Data Type, which I sometimes confused with Algebraic Data Types. An abstract data type is an algebraic data type (it's a classification) which does not expose the construction details of its elements. In Java it would be something like:


public abstract class Person {

public abstract String getName();

private Person() { }

private static class Fred extends Person {
@Override
public String getName() {
return "Fred";
}

}

private static class Barney extends Person {
@Override
public String getName() {
return "Barney";
}
}

public static Person createFred() {
return new Fred();
}

public static Person createBarney() {
return new Barney();
}
}


Here's an example of an abstract data type in Scala written by Tony for Tom's Obi project:


sealed trait Javac {
def apply(srcdir: SrcDir): Javac = this match {
case Javac_(_) => Javac_(Some(srcdir))
}

def srcdir(s: String) = apply(SrcDir.srcdir(s))

def srcdir = this match {
case Javac_(s) => s
}
}

private final case class Javac_(srcdir: Option[SrcDir]) extends Javac

object Javac {
def javac: Javac = Javac_(None)
}



Since the Javac_ class is private the only way to instantiate the class is through the static javac method on the object JavaC.

No comments: