Hash functions and pseudo-random number generation
Here I show two small tests, one with generation of an initial set of random numbers, and another on `popcount()` speeds, implemented in the low-level phylogenetic library biomcmc-lib.
005 Hash functions and pseudo-random number generation
This notebook uses the low-level phylogenetic library biomcmc-lib (commit da29c5c).
Here I show two small tests, one with generation of an initial set of random numbers, and another on popcount()
speeds.
testing the generation of a deterministic array of random numbers
In biomcmc-lib
there are a few vectors with "random numbers" (some actually from random.org, some random prime numbers used in hash functions, etc.). The function biomcmc_salt_vector32_from_spice_table()
will populate a vector with these number in an order specified by the particular seed[4]
values.
//%cflags: -I/usr/users/QIB_fr005/deolivl/Academic/Quadram/009.supersptree/biomcmc-lib/lib
//%cflags: -I/usr/users/QIB_fr005/deolivl/Academic/Quadram/009.supersptree/build.191216/biomcmc-lib/lib
//%cflags: /usr/users/QIB_fr005/deolivl/Academic/Quadram/009.supersptree/build.191216/biomcmc-lib/lib/.libs/libbiomcmc.a
//%cflags:-lm
#include <biomcmc.h>
int
main (int argc, char **argv)
{
uint32_t i,j;
uint32_t seeds[]={0,1,2,3}, vec[32], nvec=34;
uint8_t *c;
size_t size = sizeof (vec);
for (i=0;i<2;i++) {
seeds[0] = i;
for (j=0;j<4;j++) printf ("%8x ",seeds[j]);
printf ("original seeds\n");
biomcmc_salt_vector32_from_spice_table (vec, nvec, seeds);
printf("\nseed: %u\n", i);
//for (j=0; j<nvec;j++) {printf("%12u ", vec[j]); if (!((j+1)%8)) printf ("\n"); }
c = (uint8_t*) vec;
size = sizeof (vec);
for (; (size > 0); c++, size--) {printf ("%4x ", *c); if (!((size-1)%16)) printf ("\n"); }
}
for (i=0;i<15;i++) {
for (j=0;j<4;j++) printf ("%8x ",biomcmc_hashint_salted (j, i));
printf ("<< for hash %u\n", i);
}
return EXIT_SUCCESS;
}
pop_count()
)
Counting the number of active bits (I have recently implemented a few faster versions of 64 bits popcounts, and below I compare them with my first implementation — in the context of "bipartitions", which are vectors of bitstrings.
First I check if they are calculating the right thing (number of "ones" in a vector of 64 bit integers). And then I see which is faster. I expected pop0()
to be slowest (it is linear on the number of active bits), and the others to be similar. Which is what we observe. The one I've chosen to be the new "default" on biomcmc-lib
is pop1()
, which I've seen in Jue Ruan's RedBean and also in novoBreak. The previous default was the i &= i -1
trick described in K & R. The other two are from Andrew Dalke.
//%cflags: -I/usr/users/QIB_fr005/deolivl/Academic/Quadram/009.supersptree/biomcmc-lib/lib
//%cflags: -I/usr/users/QIB_fr005/deolivl/Academic/Quadram/009.supersptree/build.191216/biomcmc-lib/lib
//%cflags: /usr/users/QIB_fr005/deolivl/Academic/Quadram/009.supersptree/build.191216/biomcmc-lib/lib/.libs/libbiomcmc.a
//%cflags:-lm
#include <biomcmc.h>
int
main (int argc, char **argv)
{
uint32_t i,j, n_iter = 10000;
clock_t time0, time1;
bipartition bp = new_bipartition (70);
biomcmc_random_number_init (0ULL);
printf ("STEP 1 : check that all pop counts are correct\n");
for (i=0; i<4; i++) {
for (j=0; j < bp->n->ints; j++) {
bp->bs[j] = biomcmc_rng_get ();
}
bipartition_print_to_stdout (bp);
printf ("\t%4d %4d %4d %4d\n",
bipartition_count_n_ones_pop0(bp),
bipartition_count_n_ones_pop1(bp),
bipartition_count_n_ones_pop2(bp),
bipartition_count_n_ones_pop3(bp));
}
del_bipartition (bp); bp = new_bipartition (50000); // bigger bipartition
printf ("STEP 2: time pop counts (pop1 is default, and we call through a wrapper to discount this overhead)\n");
time0 = clock ();
for (i=0; i < n_iter; i++) {
for (j=0; j < bp->n->ints; j++) bp->bs[j] = biomcmc_rng_get ();
bipartition_count_n_ones (bp);
}
time1 = clock (); fprintf (stderr, "pop1: %.8f secs\n", (double)(time1-time0)/(double)CLOCKS_PER_SEC);
time0 = time1;
for (i=0; i < n_iter; i++) {
for (j=0; j < bp->n->ints; j++) bp->bs[j] = biomcmc_rng_get ();
bipartition_count_n_ones_pop0 (bp);
}
time1 = clock (); fprintf (stderr, "pop0: %.8f secs\n", (double)(time1-time0)/(double)CLOCKS_PER_SEC);
time0 = time1;
for (i=0; i < n_iter; i++) {
for (j=0; j < bp->n->ints; j++) bp->bs[j] = biomcmc_rng_get ();
bipartition_count_n_ones_pop2 (bp);
}
time1 = clock (); fprintf (stderr, "pop2: %.8f secs\n", (double)(time1-time0)/(double)CLOCKS_PER_SEC);
time0 = time1;
for (i=0; i < n_iter; i++) {
for (j=0; j < bp->n->ints; j++) bp->bs[j] = biomcmc_rng_get ();
bipartition_count_n_ones_pop3 (bp);
}
time1 = clock (); fprintf (stderr, "pop3: %.8f secs\n", (double)(time1-time0)/(double)CLOCKS_PER_SEC);
time0 = time1;
del_bipartition (bp);
biomcmc_random_number_finalize ();
return EXIT_SUCCESS;
}