Wrong number of elements is copied with async_work_group_copy in kernel1
This instruction in "debugfastergrad" branch
copies GENOTYPE_LENGTH_IN_GLOBMEM
elements from global memory into the local array genotype[ACTUAL_GENOTYPE_LENGTH]
.
However, such array is defined with a size smaller than what is copied into it.
Therefore, the aforementioned instruction should copy ACTUAL_GENOTYPE_LENGTH
instead of GENOTYPE_LENGTH_IN_GLOBMEM
.