Type theoretical databases

We present a soundness theorem for a dependent type theory with context constants with respect to an indexed category of (finite, abstract) simplical complexes. The point of interest for computer science is that this category can be seen to represent tables in a natural way. Thus the category is a model for databases, a single mathematical structure in which all database schemas and instances (of a suitable, but sufficiently general form) are represented. The type theory then allows for the specification of database schemas and instances, the manipulation of the same with the usual type-theoretic operations, and the posing of queries.


Introduction
Databases being, essentially, collections of (possibly interrelated) tables of data, a foundational question is how to best represent such collections of tables mathematically in order to study their properties and ways of manipulating them.The relational model, essentially treating tables as structures of first-order relational signatures, is a simple and powerful representation, the virtues of which we need not recapitulate.Nevertheless, areas exist in which the relational model is less adequate than in others.One familiar example is the question of how to represent partially filled out rows or missing information1 .Another, more fundamental perhaps, is how to relate instances of different schemas, as opposed to the relatively well understood relations between instances of the same schema.As comparison and mappings between data structured in different ways is an area of some importance for database theorists, for example in the settings of (Fagin, Kolaitis, Miller and Popa 2005), (Lenzerini 2005), this suggests looking for alternative and supplemental ways of modeling tables more suitable to such "dynamic" settings.It seems natural, in that case, to try to model tables of different shapes as living in a single mathematical structure, facilitating their manipulation across different schemas.
Formally, this paper presents a soundness theorem (Theorem 1) for a certain dependent type theory with respect to a rather simple category of (finite, abstract) simplicial complexes.The novelty is that the type theory has context constants, mirroring that our choice of "display maps" does not include all maps to the terminal object.From the database perspective, however, the interesting aspect is that this category can in a natural way be seen as a category of tables; collecting in a single mathematical structure-an indexed or fibered category-the totality of schemas and instances.
This representation can be introduced as follows.Let a schema S be presented as a finite set A of attributes and a set of relation variables over those attributes.One way of allowing for partially filled out rows is to assume that whenever the schema has a relation variable R, say over attributes A 0 , . . ., A n , it also has relation variables over all non-empty subsets of {A 0 , . . ., A n }.So a partially filled out row over R is a full row over such a "sub-relation" of R. To this we add the requirement that the schema does not have two relation variables over exactly the same attributes 2 .This requirement means that a relation variable can be identified with the set of its attributes, and together with the first requirement, this means that the schema can be seen as a downward closed sub-poset of the positive power set of the set of attributes A. Thus a schema is an (abstract) simplicial complex -a combinatorial and geometric object familiar from algebraic topology.
The key observation is now that an instance of the schema S can also be regarded as a simplicial complex, by regarding the data as attributes and the tuples as relation variables.Accordingly, an instance over S is a schema of its own, and the fact that it is an instance of S is "displayed" by a certain projection to S. Thus the (structurally rich) category S of finite simplicial complexes and morphisms between them form a category of schemas which includes, at the same time, all instances of those schemas, the connection between schema and instance given by a collection of maps in S called display maps.
As such, S together with this collection D of maps form a so-called display map category (Jacobs 1999), a notion originally developed in connection with categorical models of dependent type theory.We show that (S, D) is a model of a certain dependent type theory including the usual type-forming operations and with context constants.We indicate how any schema and instance (satisfying the above two requirements) can be specified in this type theory, and how the typeforming operations can then be used to build new instances and schemas (corresponding for instance to natural join and disjoint union), and thus to pose queries.Syntactically, that is in the type theory, schemas and instances are specified in the same way, in terms of types and terms over a set of distinguished set of contexts corresponding to-and thus reflecting the special status of-single relation variable schemas (or relation schemas R[n] in the terminology of (?)).
We focus, in the space available here, on the presentation of the model, the operations, the type theory, and the soundness theorem.Section 2 presents the model as an indexed category, defines the notion of a display morphism in this context, and shows how to pass from schema to instance and from instance to schema (and display morphism).Section 2.2 gives some brief examples of schemas and instances as simplicial complexes (which we also use further down in Example 1 involving natural join, Section 3.8 concerning schema and instance specification, and Example 4 concerning queries).Section 2.3 present the semantic operations on schemas and instances interpreting the type theoretic operations, which in turn are presented together with the soundness theorem in Section 3. Future work will present expressivity and complexity analyses, the formulation of dependencies and constraints, as well as more formally presenting the relation to relational databases and to "real" instances (rather than the more "structural" instances we study here).Future work also includes exploiting the more geometric perspective on tables that this models offers (see (Spivak 2009)), and the modeling of schemas with multiple relation variables over the same attributes and instances with multiple keys representing the same data.
Only knowledge of the very basic notions of category theory, such as category, functor, and natural transformation, is assumed.

Finite simplicial complexes and schemas
We consider both schemas and instances as (abstract, finite) simplicial complexes, with a certain family of maps "displaying" that a simplicial complex-the source of the map-is an instance of another complex-the target of the map-seen as a schema.We start with formal definitions of simplicial complexes, the corresponding simplicial schemas, and the display maps.In what follows, all posets etc. are finite unless explicitly stated otherwise.
same column names, it should be natural to either collect them into one table or to rename some of the column names.
Definition 1. 1. Let X be a poset.A subset B ⊆ X is called a basis of X if the following hold: (a) for all x, y ∈ X, one has x ≤ y if and only if B ≤x ⊆ B ≤y , where B ≤x = (↓ x) ∩ B = {z ∈ B z ≤ x}; (b) no two elements of B are comparable, i.e. for all g, h ∈ B one has g ≤ h; and (c) every element x ∈ X has generators, i.e.B ≤x = ∅.If X has a basis, one sees easily that the basis is unique, and we say that X is a based poset.2. Let X be a based poset with basis B. Then define X n := {x ∈ X |B ≤x | = n + 1}.In particular, X 0 = B. 3. A based poset X is called a simplicial complex if for all x ∈ X and Y ⊆ B ≤x there exists y ∈ X such that B ≤y = Y .4. A poset X is called a simplicial schema if X op -the poset obtained by reversing the ordering-is a simplicial complex.The elements of X 0 are called attributes and the elements of X n+1 are called relation variables, relation keys, or simply relations.
We consider a simplicial schema as a category and use arrows δ x y : x / / y to indicate order.
Thus the arrow δ x y exists iff y ≤ x in the simplicial complex X op .We reserve the use of arrows to indicate order in the schema X and ≤ to indicate the order in the complex X op .We use the notation B ≤x also in connection with schemas, where it means, accordingly, the set of attributes A such that there is an arrow δ x A . 5. Suppose that X and Y are based posets with bases B and C respectively.A (poset) morphism Let S be the category consisting of simplicial schemas and functors (poset morphisms) f : X / / Y such that f : X op / / Y op is a based poset morphism.Note that based poset morphisms are completely determined by their restriction to the basis.6.A morphism f : X / / Y of simplicial schemas is a display map if f restricts to a family of maps It is straightforward to see that this is equivalent to the condition that for all x ∈ X op the restriction is an isomorphism of sets (equivalently, of simplicial complexes).
With respect to the usual notion of schema, a simplicial schema X can be thought of as given in the usual way by a finite set of attributes X 0 = {A 0 , . . ., A n−1 } and a set of relational variables X = {R 0 , . . .R m−1 }, each with a specification of column names in the form of a subset of X 0 , but with the restrictions 1) that no two relation variables are over exactly the same attributes; and 2) for any (nonempty) subset of the attributes of a relation variable there exists a relation variable over (exactly) those attributes.In the sequel we shall drop the word "simplicial" and simply say "complex" and "schema".Any complex (and hence schema) can be seen as a subposet of a finite positive power set which is downward closed and contains all singletons via the injective function X / / P + (X 0 ) defined by x → B ≤x .We freely use this perspective when convenient.
The category S contains in particular the n-simplices ∆ n and the face maps.Recall that the the n-simplex ∆ n is the complex given by the full positive power set on As a schema, ∆ n is the schema of a single relation on n + 1 attributes named by numbers 0, . . ., n (and all its "generated" sub-relations).A face map . These schemas and morphisms play a special role in Section 3 where they are used to specify general schemas and instances.

Relational instances
Let X be a (simplicial) schema, say with attributes X 0 = {A 0 , . . ., A n−1 }.A functor F : X / / FinSet from X to the category of finite sets and functions can be regarded as an instance of the schema X.For x = {A i0 , . . ., A im−1 } ∈ X, the set F (x) can be regarded as a set of "keys" (or "facts" or "row-names").The "value" (or "data" For arbitrary F , this function is not 1-1, that is, there can be distinct keys with the same values at all attributes.We say that F is a relational instance if this does not happen: Definition 2. Let X be a schema.A relational instance is a functor F : X / / FinSet such that for all x ∈ X the functions {δ x A A ∈ B ≤x } are jointly injective.Let Rel(X) be the category of relational instances and natural transformations between them.
We restrict interest in this paper to relational instances and refer to them in the sequel simply as instances.It is clear that every relational instance is isomorphic to one where the keys are actually tuples, and we make use of this in Section 2.3.The relation between the condition that schemas have at most one relation variable over a set of attributes and the restriction to relational instances are displayed in the following: Let X be a schema and F : X / / FinSet an arbitrary functor.
Recall, e.g. from (Mac Lane 1998), that the category of elements X F has objects x, a with x ∈ X / / X is defined by x, a → x and δ x,a y,b → δ x y .We then have Lemma 1.Let X be a (simplicial) schema and F : X / / FinSet be a functor.Then X F is a (simplicial) schema and p : X F → X a display morphism if and only if F is a relational instance.
Definition 3. 1.Given a schema X the functor 1 X : X / / FinSet defined by x → {x} is the terminal instance obtained by considering the attributes themselves to be the values and the relations themselves as the keys.2. A full tuple t of an instance I over schema X is a natural transformation t : 1 X + 3 I.We write Trm X (I) for the set of full tuples (indicating that we see them as terms type-theoretically). 3. Given an instance I : X / / FinSet the induced schema (over X) is the category of elements X I.The canonical projection from the induced schema to X is the projection p : X I / / X.
4. Given a full tuple t : 1 X + 3 I, the induced section is the morphism t : X / / X I in S defined by x → x, t x (x) .Notice that the induced section is always a display morphism.
Examples 1.Let S be the schema the attributes of which are A, B, C and the relation variables R : AB and Q : BC, with indicated column names.Let an instance I be given by 3 Strictly speaking, "data" is somewhat misleading, as this notion of instance treats elements of, say, F (A0) and F (A1) as formally distinct.For example, the instances F (A0) = {a}, F (A1) = {b} and G(A0) = {a}, G(A1) = {a}, of the schema A0, A1 with no relations, are isomorphic.An actual filling out of a table with data can be given as a function from, in this case, the disjoint union F (A0) + F (A1) to a domain of strings and numbers, say.We leave this extra level of structure to future work, and restrict attention to our more abstract notion of instance.
From a "simplicial" point of view, S is the category I is the functor , , , , with (1) = b and so on.Notice that I has precisely two full tuples; if we think of a full tuple as a selection of an element from each attribute such that the relevant projection of the resulting tuple has a matching key in all relations, then the two full tuples in I are a, b, c and a , b, c .Finally, the induced schema S I has attribute set and relation variables as follows, where we use subscript notation instead of pairing for readability: Attributes : Going from simplicial complexes to schemas, consider the 2-simplex ∆ 2 and an example functor , where we use tuples for keys to avoid having to write out the J(δ)s.Writing this up in table form we obtain: Notice that J has precisely one full tuple.3.There is a morphism of complexes f : S / / ∆ 2 defined (and determined) by A → 0, B → 1, C → 2. The instance I of S is the restriction of J to S along f , that is, I = J • f .Such composition operations let us regard the category of schemas together with their instances as an indexed category.

Simplicial databases
We have, with the above, a (strict) functor Rel(−) : S op / / Cat defined by X → Rel(X) and / / Rel(X).We regard this strict, indexed category as a "category of databases" in which the totality of databases and schemas are collected.We give, first, the notation, definitions and equations we need for the operation Rel(f ) for morphisms f (which can syntactically be thought of as substitution), and follow with the instance-forming operations 0, 1, dependent sum and product, +, and identity.We devote some space to substitution and dependent sum, being perhaps the most interesting operations.For the remaining ones, only basic definitions (needed to interpret the type theory of Section 3) are given.

Substitution
Definition 4. For f : X / / Y in S and J ∈ Rel(Y ) and t : 3. With p J : Y J / / X the canonical projection, let v J : 1 be the full tuple defined by y, a → a. (We elsewhere leave subscripts on v and p determined by context.) 4. Denote by f : / / Y J the schema morphism defined by x, a → f (x), a .Notice that if f is display, then so is f .Lemma 2. The following equations hold: 1.For X in S and I ∈ Rel(X) and t ∈ Trm X (I) we have p 4. For X ∈ S and I ∈ Rel(X) we have p • v = Id X I .
On tuplification and display maps.We have so far considered arbitrary morphisms f : In what follows we shall restrict to display ones, noting that all context morphisms of the type theory in Section 3 are interpreted as display maps.Furthermore, we shall assume, at least implicitly, that all instances are on tuple form, in the sense that for J : X / / FinSet the elements in J(x) are in fact elements in the product Π A∈B ≤x J(A).We identify a singleton tuple with its element and write for the empty tuple.It is clear that any instance is canonically isomorphic to one on tuple form; the tuplification of the instance, so to say.Moreover, if two instances are isomorphic, on tuple form, and agree on attributes, then they must be identical.It is this property that we need for certain of the equations below.That being said, it is often more convenient to work with instances not on this form, and we frequently do so, pointing to the fact that the "tuplification" is just a canonical rewriting.Finally, we note the following connection between tuple form and display maps: Then f is display if and only if for all J ∈ Rel(Y ) on tuple form J[f ] is also on tuple form.
Dependent product: Let X ∈ S, J ∈ Rel(X), and G ∈ Rel( X J).We define the instance Π J G : X / / FinSet as the right Kan-extension of G along p. Explicitly, we preliminary construct the following instance, not on tuple form.Let Ξ J G : X / / FinSet be defined as follows.For x ∈ X define P x = { y, a y ≤ x, a ∈ J(y)} and set Ξ J G(x) to be the set (c y,a ) y,a ∈Px c y,a ∈ G(y, a) of families satisfying the condition that for all δ y,a z,b in Thus an element c ∈ Ξ J G(x) is a function c assigning for each y ≤ x and a ∈ J(y) an element c y,a in G(y, a) such that (1) is fulfilled.For x ≤ x, we have P x ⊆ P x and the function Ξ J G(δ x x ) : / / Ξ J G(x ) sends a family (c y,a ) Px to its projection on P x .In particular, for x 0 ∈ G ≤x we have that Ξ J G(δ x x0 )((c y,a ) Px ) is the projection on y = x 0 .Since G is relational, c y,a ∈ G(y, a) is determined by the family (G(δ y,a y0,δ y y 0 (a) )(c y,a )) y0∈B ≤y (recall the notation B ≤y for the set of attributes of y).Thus a family in Ξ J G(x) is determined by its values on Ξ J G(δ x x0 ) as x 0 runs through B ≤x , and we have; Let X ∈ S, J ∈ Rel(X), G ∈ Rel( X J), and t : 1 + 3 G a full tuple.Then for every x ∈ X we have that the family (t(y, a)) Px is an element in Ξ J G(x), since the condition (1) is satisfied.Furthermore, for δ x x we have that Ξ J G(δ x x )((t(y, a)) Px ) = (t(y, a)) P x .Thus t determines a full tuple which we call λt ∈ Trm X (Ξ J G).
Next, given X ∈ S, J ∈ Rel(X), G ∈ Rel( X J), and s ∈ Trm X (Ξ J G), define a full tuple Ap s ∈ Trm X J (G) by x, a → s(x) x,a (the x, a 'th element of the family s(x)-it is straightforward to verify that this does define a full tuple by using condition (1) and the fact that s is a full tuple).Now, this preliminary definition is not of tuple form, so the final definition of Π J G is the tuplification of Ξ J G. The definitions of λ and Ap are changed accordingly, i.e. along the isomorphism.Note that for an attribute A ∈ X 0 we have that Π J G(A) is the product Π a∈J(A) G(A, a).We have then the following equations.
Lemma 5. Let f : Z / / X be a display morphism in S, J ∈ Rel(X), G ∈ Rel( X J), and t ∈ Example 1.Consider the schema S of Section 2.2, which we can give as an instance of ∆ 2 as (ignoring tuplification for readability) S : ∆ 2 / / FinSet by S(0) = {A}, S(1) = {B}, S(2) = {C}, S(01) = {R}, S(12) = {Q}, and S(02) = S(012) = ∅.Notice that, modulo the isomorphism between S as presented in 2.2 and ∆2 S, the morphism f : S / / ∆ 2 of 2.2 is the canonical projection p : ∆2 S / / ∆ 2 .Next we have I ∈ ∆2 S as (in tabular form, using subscript instead of pairing for elements in ∆2 S, and omitting the three single-column tables) Then by unpacking the definition, one sees that Π S I is, in tabular form, This exemplifies the link between the dependent product operation and natural join.
0 and 1 instances: Given X ∈ S the terminal instance 1 X has already been defined.The initial instance 0 X is the constant 0 functor, x → ∅.Note that X 0 X is the empty schema.Dependent sum: Let X ∈ S, J ∈ Rel(X), and G ∈ Rel( X J).We define the instance Σ J G : . We leave to the reader the straightforward verification that Σ J G is relational and that the stability-under-substitution equation (Σ Disjoint union: Given X ∈ S and I, J ∈ Rel(X), the instance I + J ∈ Rel(X) is defined by x → { n, a Either n = 0 ∧ a ∈ I(x) or n = 1 ∧ a ∈ J(x)}.We have full tuples left ∈ Trm X I ((I + J)[p]) defined by x, a → 0, a and right ∈ Trm X J ((I + J)[p]) defined by x, a → 1, a .

The type theory
We introduce a Martin-Löf style type theory (Martin-Löf n.d.), with explicit substitutions, extended with context and substitution constants representing simplices and face maps.The substitution and context part of the type theory is essentially that of Categories with families (Hofmann 1997), except for a novel set of context and substitution constants.In the following, we display the basic rules for substitution, context extension and for the type contructions Π, Σ, and +.Listed here are all rules introducing new terms, and selected, important equations.We also omit some elimination rules, but refer to the literature on type theory, such as (Hofmann 1997, Nordström, Petersson andSmith 2000).
For each collection of rules we give the intended interpretation in the model.The interpretation is given by the operation − , defined recursively throught this section.This is then summed up in the soundness theorem in the end.

Judgements
The type system has the following eight judgements, with intended interpretations.

Judgement
Interpretation ?context ? is a schema Γ ?type ? is an instance of the schema Γ Γ ?: A ? is an full tuple in the instance A ? : Γ / / Λ ? is a (display) schema morphism Γ ≡ Λ Γ and Λ are equal schemas Γ A ≡ B A and B are equal instances of Γ Γ t ≡ u : A t and u are equal full tuples in A σ ≡ τ : Γ / / Λ the morphisms σ and τ are equal

Substitutions
The following axioms state that contexts and substitutions form a category, acting on the types and elements.Below the axioms are stated the intended interpretations cf.Section 2

Context extension
The following rules detail how types in a context extends it.

Other types
These types correspond to the constructions with the same names in subsection 2.3.
The Σ-type The indentity type The +-type The 0-type

Context and substitution constants
For each natural number n, and for each i,j and n, such that i < j ≤ n + 2 Theorem 1.The intended interpretation of the type theory is sound.That is, all rules for equality holds true in the interpretation given by − .
Proof.The equalities for substitution are verified in Lemma 2, and the rules for Π in Lemma 5.The remaining equations are routine verification.

Instance specification as type introduction
The intended interpretation of Γ A type is that A is an instance of the schema Γ .But context extension allows us to view every instance as a schema in its own right.So for every instance Γ A type, we get a schema Γ.A.It turns out that the most convenient way to specify a schema, is by introducing a new type/instance over one of the simplex schemas ∆ n .
To specify a schema, with a maximum of n attributes, may be seen as introducing a type in the context ∆ n .A relation variable with k attributes in the schema is introduced as an element of the schema substituted into ∆ k .Names of attributes are given as elements of the schema substituted down to ∆ 0 .
Example 2. We construct the rules of the schema S of 2.2, with attributes A, B and C, with two tables R and Q.The introduction rules tells us the names of tables and attributes in S.
From these introduction rules, we can generate an elimination rule.The elimination rule tells us how to construct full tuples in an instance over the schema S.
Another interpretation of the elimination rule is that it formulates that the schema S contains only what is specified by the above introduction rules; it specifies the schema up to isomorphism.
An instance of a schema is a type depending in the context of the schema.Therefore instance specification is completely analoguous to schema specification.The following example shows how to introduce the instance I of 2.2.
Example 3. Let S be the schema from the previous example.The following set of introductions presents an instance I of S.
The above is clearly very verbose, and can be compressed, at the cost of loosing control over the naming of attributes, into the following.
A motivation for introducing a formal type theory of databases is for it to provide a query language.The development and analysis of this query language is future work, but we provide here an example of a query formulated in type theory, illustrating the concept.

Example 4. Queries
Let the schema S and the instance I be as in the previous examples.A query is represented as a type, with the terms of the type being the result of the query.We ask for the matching tuples of R and Q, i.e the tuples in the natural join R Q.This query is represented by the Π-type of I over S (cf.Example 1), which is a type in ∆ 2 , To construct elements of this type, we can apply the constructor λ to full tuples of I.These may in turn be constructed using the elimination rule of S. Thus, the result of the query is: These terms represent the expected full tuples a, b, c and a , b, c .

Appendix
This appendix has two parts.Part 1 contains several lemmas from Section 2 the proofs of which were omitted in the main text (in the second lemma below we have kept here an earlier version with certain calculations in the statement of the lemma.)Part 2 is a fuller statement of the type theory in Section 3 (and also includes some heuristic remarks that were omitted from the main text.) Part 1 Lemma 6 (Lemma 1).Let X a (simplicial) schema and F : X / / FinSet be a functor.Then X F is a (simplicial) schema and p : X F → X a display morphism if and only if F is a relational instance.
Proof.Let F be a relational instance.It is clear that there is at most one morphism between any two objects in X F , and it is easy to see that satisfying the condition for being a simplicial complex.Noticing that ( X F ) Conversely, if F is not relational, then condition 1.b of Definition 1 is violated (by any two keys with the same data).
Lemma 7 (Lemma 2).The following equations hold: 1.For X in S and I ∈ Rel(X) and t ∈ Trm X (I) we have p • t = id X .2. For X ∈ S and J ∈ Rel(X) and t ∈ Trm X (J) we have t Proof.The first two are immediate.For the third, v[ t] is the full tuple of J a). Finally for the fifth, we have f Conversely, suppose f is not display.Then there exists x ∈ X such that x has two attributes, B ≤x = {A, B}, and f (A) = f (B) = f (x).Let J ∈ Rel(Y ) be an instance such that J(f A) = {a}.
Then J • f (x) = {a} and not the tuple a, a .
2. Let A be an attribute of Z. Then

Context and substitution constants
For each natural number n, and for each i,j and n, such that i < j ≤ n + 2