biomcmc-lib  0.1
low level library for phylogenetic analysis
Data Structures | Typedefs | Functions
alignment.h File Reference

File handling functions and calculation of distances for sequence data in nexus format. More...

#include "distance_matrix.h"
#include "nexus_common.h"
Include dependency graph for alignment.h:
This graph shows which files directly or indirectly include this file:

Go to the source code of this file.

Data Structures

struct  alignment_struct
 Data from alignment file. More...
 

Typedefs

typedef struct alignment_structalignment
 

Functions

alignment read_alignment_from_file (char *seqfilename)
 Reads DNA alignment (guess format between FASTA and NEXUS) from file and store info in alignment_struct.
 
alignment read_fasta_alignment_from_file (char *seqfilename)
 Reads DNA FASTA alignment from file and store info in alignment_struct.
 
alignment read_nexus_alignment_from_file (char *seqfilename)
 Reads DNA NEXUS alignment from file and store info in alignment_struct.
 
void print_alignment_in_fasta_format (alignment align, FILE *stream)
 Prints alignment to FILE stream in FASTA format (debug purposes).
 
void del_alignment (alignment align)
 Frees memory from alignment_struct.
 
distance_matrix new_distance_matrix_from_valid_matrix_elems (distance_matrix original, int *valid, int n_valid)
 new matrix of pairwise distance by simply excluding original elements not present in valid[]
 
distance_matrix new_distance_matrix_from_alignment (alignment align)
 creates and calculates matrix of pairwise distances based on alignment
 
void store_likelihood_info_at_leaf (double **l, char *align, int n_pat, int n_state)
 transform aligned sequence into likelihood for terminal taxa (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc)
 

Detailed Description

File handling functions and calculation of distances for sequence data in nexus format.

Reading of sequence data in nexus format (sequencial or interleaved) and fasta format. For fasta format the sequences don't need to be aligned, but for all formats if the sequences are aligned a data compression is used so that we keep only the distinct site (column) patterns and a mapping between original and compressed site columns. Based on the sequence pairs we can also calculate the matrix of distances between sequences.