Memory space of constant kernel arguments should be re-qualified
The maximum allowed size for constant arguments varies for each device, e.g.:
AMD Vega56 GPU has 4.2 GB:
ULong attributes ...
1 CL_DEVICE_MAX_MEM_ALLOC_SIZE : 4244635648
1 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : 16384
1 CL_DEVICE_GLOBAL_MEM_SIZE : 8573157376
1 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE : 4244635648
1 CL_DEVICE_LOCAL_MEM_SIZE : 32768
NVidia M2000 GPU has 16.7 MB:
ULong attributes ...
1 CL_DEVICE_MAX_MEM_ALLOC_SIZE : 8589934592
1 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE : 20971520
1 CL_DEVICE_GLOBAL_MEM_SIZE : 31497080832
1 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE : 16777216
1 CL_DEVICE_LOCAL_MEM_SIZE : 16777216
As all constant kernel arguments were qualified as __constant
, the program might be allocating more data than what fits in the constant buffer. Perhaps this is handle automatically by the compiler, i.e., it might be moving this extra data to global memory.
However, this is signalled as an error by oclgrind --check-api:
Oclgrind - OpenCL runtime error detected
Function: clEnqueueNDRangeKernel
Error: CL_OUT_OF_RESOURCES
total constant memory size (252528) exceeds device maximum of 65536
Error: clEnqueueNDRangeKernel() -5
Oclgrind - OpenCL runtime error detected
Function: clEnqueueNDRangeKernel
Error: CL_OUT_OF_RESOURCES
total constant memory size (297680) exceeds device maximum of 65536
Error: clEnqueueNDRangeKernel() -5
A solution for this is evaluating the amount of data being passed to kernel, and then re-qualifying arguments either as __constant
or __global const
.
For size definitions see ref1 and ref2.
The calculation of sizes is as follows. Originally, each of these is passed as a separate __constant
argument from host to kernel. Here, these are listed in groups for convenient data-passing from host to device (see kernel codes in commits below).
interintra
(subtotal size: 1280)
Constant array | Size definition | Size calculation | Size in Bytes |
---|---|---|---|
atom_charges | MAX_NUM_OF_ATOMS * sizeof(float) | 256 * 4 | 1024 |
atom_types | MAX_NUM_OF_ATOMS * sizeof(char) | 256 * 1 | 256 |
intracontrib
Constant array | Size definition | Size calculation | Size in Bytes |
---|---|---|---|
intraE_contributors | 3 * MAX_INTRAE_CONTRIBUTORS * sizeof(char) | 3 * 256 * 256 * 1 | 196608 |
intra
(subtotal size: 2032)
Constant array | Size definition | Size calculation | Size in Bytes |
---|---|---|---|
reqm | ATYPE_NUM * sizeof(float) | 22 * 4 | 88 |
reqm_hbond | ATYPE_NUM * sizeof(float) | 22 * 4 | 88 |
atom1_types_reqm | ATYPE_NUM * sizeof(unsigned int) | 22 * 4 | 88 |
atom2_types_reqm | ATYPE_NUM * sizeof(unsigned int) | 22 * 4 | 88 |
VWpars_AC | MAX_NUM_OF_ATYPES * MAX_NUM_OF_ATYPES * sizeof(float) | 14 * 14 * 4 | 784 |
VWpars_BD | MAX_NUM_OF_ATYPES * MAX_NUM_OF_ATYPES * sizeof(float) | 14 * 14 * 4 | 784 |
dspars_S | MAX_NUM_OF_ATYPES * sizeof(float) | 14 * 4 | 56 |
dspars_V | MAX_NUM_OF_ATYPES * sizeof(float) | 14 * 4 | 56 |
rotlist
Constant array | Size definition | Size calculation | Size in Bytes |
---|---|---|---|
rotlist | MAX_NUM_OF_ROTATIONS * sizeof(int) | 256 * 32 * 4 | 32768 |
conform
(subtotal size: 19840)
Constant array | Size definition | Size calculation | Size in Bytes |
---|---|---|---|
ref_coords_x | MAX_NUM_OF_ATOMS * sizeof(float) | 256 * 4 | 1024 |
ref_coords_y | MAX_NUM_OF_ATOMS * sizeof(float) | 256 * 4 | 1024 |
ref_coords_z | MAX_NUM_OF_ATOMS * sizeof(float) | 256 * 4 | 1024 |
rotbonds_moving_vectors | 3 * MAX_NUM_OF_ROTBONDS * sizeof(float) | 3 * 32 * 4 | 384 |
rotbonds_unit_vectors | 3 * MAX_NUM_OF_ROTBONDS * sizeof(float) | 3 * 32 * 4 | 384 |
ref_orientation_quats | 4 * MAX_NUM_OF_RUNS * sizeof(float) | 4 * 1000 * 4 | 16000 |
A total of 252528 Bytes is required for constant data, which is a much smaller size than the minimum in the available GPUs.
For debugfastergrad, we require the following arrays as well:
gradsrotbonds
Constant array | Size definition | Size calculation | Size in Bytes |
---|---|---|---|
rotbonds_atoms | MAX_NUM_OF_ATOMS * MAX_NUM_OF_ROTBONDS * sizeof(int) | 256 * 32 * 4 | 32768 |
grads
(subtotal size: 12384)
Constant array | Size definition | Size calculation | Size in Bytes |
---|---|---|---|
rotbonds | 2 * MAX_NUM_OF_ROTBONDS * sizeof(int) | 2 * 32 * 4 | 256 |
num_rotating_atoms_per_rotbond | MAX_NUM_OF_ROTBONDS * sizeof(int) | 32 * 4 | 128 |
angle | 1000*sizeof(float) | 1000 * 4 | 4000 |
dependence_on_theta | 1000*sizeof(float) | 1000 * 4 | 4000 |
dependence_on_rotangle | 1000*sizeof(float) | 1000 * 4 | 4000 |
A total of 297680 (252528 + 45152) Bytes is required for constant data, which is a much smaller size than the minimum in the available GPUs.