Wrong number of elements is copied with async_work_group_copy in kernel1
This instruction in "debugfastergrad" branch
GENOTYPE_LENGTH_IN_GLOBMEM elements from global memory into the local array
However, such array is defined with a size smaller than what is copied into it.
Therefore, the aforementioned instruction should copy
ACTUAL_GENOTYPE_LENGTH instead of