biomcmc-lib
0.1
low level library for phylogenetic analysis
|
File handling functions and calculation of distances for sequence data in nexus format. More...
Go to the source code of this file.
Data Structures | |
struct | alignment_struct |
Data from alignment file. More... | |
Typedefs | |
typedef struct alignment_struct * | alignment |
Functions | |
alignment | read_alignment_from_file (char *seqfilename) |
Reads DNA alignment (guess format between FASTA and NEXUS) from file and store info in alignment_struct. | |
alignment | read_fasta_alignment_from_file (char *seqfilename) |
Reads DNA FASTA alignment from file and store info in alignment_struct. | |
alignment | read_nexus_alignment_from_file (char *seqfilename) |
Reads DNA NEXUS alignment from file and store info in alignment_struct. | |
void | print_alignment_in_fasta_format (alignment align, FILE *stream) |
Prints alignment to FILE stream in FASTA format (debug purposes). | |
void | del_alignment (alignment align) |
Frees memory from alignment_struct. | |
distance_matrix | new_distance_matrix_from_valid_matrix_elems (distance_matrix original, int *valid, int n_valid) |
new matrix of pairwise distance by simply excluding original elements not present in valid[] | |
distance_matrix | new_distance_matrix_from_alignment (alignment align) |
creates and calculates matrix of pairwise distances based on alignment | |
void | store_likelihood_info_at_leaf (double **l, char *align, int n_pat, int n_state) |
transform aligned sequence into likelihood for terminal taxa (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) (e.g. A -> 0001, C-> 0010 etc) | |
File handling functions and calculation of distances for sequence data in nexus format.
Reading of sequence data in nexus format (sequencial or interleaved) and fasta format. For fasta format the sequences don't need to be aligned, but for all formats if the sequences are aligned a data compression is used so that we keep only the distinct site (column) patterns and a mapping between original and compressed site columns. Based on the sequence pairs we can also calculate the matrix of distances between sequences.