Abstract

With the consensus human genome sequenced and many other sequencing projects at varying stages of completion, greater attention is being paid to the genetic differences among individuals and abilities of those differences to predict phenotypes. A significant obstacle to such work is the difficulty and expense of determining haplotypes - sets of various genetically linked because of their proximity on the genome - for large numbers of individuals for use in association studies. This paper presents some algorithmic considerations in a new approach for haplotype determination: inferring haplotypes from localised polymorphism data gathered from short genome 'fragments'. Formalised models of the biological system under consideration are examined, given a variety of assumptions about the goal of the problem and the charater of optimal solutions. Some theoretical results and algorithms for handling haplotype assembly given models are then sketched. The primary conclusion is that some important simplified variants of the problem yield tractable problems while more general variants tend to be intractable in the worst case.