Skip to content

Memory space of constant kernel arguments should be re-qualified

The maximum allowed size for constant arguments varies for each device, e.g.:

AMD Vega56 GPU has 4.2 GB:

  ULong attributes ...
   1 CL_DEVICE_MAX_MEM_ALLOC_SIZE                 : 4244635648 
   1 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE              : 16384 
   1 CL_DEVICE_GLOBAL_MEM_SIZE                    : 8573157376 
   1 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE           : 4244635648 
   1 CL_DEVICE_LOCAL_MEM_SIZE                     : 32768 

NVidia M2000 GPU has 16.7 MB:

 ULong attributes ...
  1 CL_DEVICE_MAX_MEM_ALLOC_SIZE                 : 8589934592  
  1 CL_DEVICE_GLOBAL_MEM_CACHE_SIZE              : 20971520  
  1 CL_DEVICE_GLOBAL_MEM_SIZE                    : 31497080832  
  1 CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE           : 16777216  
  1 CL_DEVICE_LOCAL_MEM_SIZE                     : 16777216  

As all constant kernel arguments were qualified as __constant, the program might be allocating more data than what fits in the constant buffer. Perhaps this is handle automatically by the compiler, i.e., it might be moving this extra data to global memory.

However, this is signalled as an error by oclgrind --check-api:

Oclgrind - OpenCL runtime error detected
	Function: clEnqueueNDRangeKernel
	Error:    CL_OUT_OF_RESOURCES
	total constant memory size (252528) exceeds device maximum of 65536

Error: clEnqueueNDRangeKernel() -5

Oclgrind - OpenCL runtime error detected
	Function: clEnqueueNDRangeKernel
	Error:    CL_OUT_OF_RESOURCES
	total constant memory size (297680) exceeds device maximum of 65536

Error: clEnqueueNDRangeKernel() -5

A solution for this is evaluating the amount of data being passed to kernel, and then re-qualifying arguments either as __constant or __global const.

For size definitions see ref1 and ref2.

The calculation of sizes is as follows. Originally, each of these is passed as a separate __constant argument from host to kernel. Here, these are listed in groups for convenient data-passing from host to device (see kernel codes in commits below).

interintra (subtotal size: 1280)

Constant array Size definition Size calculation Size in Bytes
atom_charges MAX_NUM_OF_ATOMS * sizeof(float) 256 * 4 1024
atom_types MAX_NUM_OF_ATOMS * sizeof(char) 256 * 1 256

intracontrib

Constant array Size definition Size calculation Size in Bytes
intraE_contributors 3 * MAX_INTRAE_CONTRIBUTORS * sizeof(char) 3 * 256 * 256 * 1 196608

intra (subtotal size: 2032)

Constant array Size definition Size calculation Size in Bytes
reqm ATYPE_NUM * sizeof(float) 22 * 4 88
reqm_hbond ATYPE_NUM * sizeof(float) 22 * 4 88
atom1_types_reqm ATYPE_NUM * sizeof(unsigned int) 22 * 4 88
atom2_types_reqm ATYPE_NUM * sizeof(unsigned int) 22 * 4 88
VWpars_AC MAX_NUM_OF_ATYPES * MAX_NUM_OF_ATYPES * sizeof(float) 14 * 14 * 4 784
VWpars_BD MAX_NUM_OF_ATYPES * MAX_NUM_OF_ATYPES * sizeof(float) 14 * 14 * 4 784
dspars_S MAX_NUM_OF_ATYPES * sizeof(float) 14 * 4 56
dspars_V MAX_NUM_OF_ATYPES * sizeof(float) 14 * 4 56

rotlist

Constant array Size definition Size calculation Size in Bytes
rotlist MAX_NUM_OF_ROTATIONS * sizeof(int) 256 * 32 * 4 32768

conform (subtotal size: 19840)

Constant array Size definition Size calculation Size in Bytes
ref_coords_x MAX_NUM_OF_ATOMS * sizeof(float) 256 * 4 1024
ref_coords_y MAX_NUM_OF_ATOMS * sizeof(float) 256 * 4 1024
ref_coords_z MAX_NUM_OF_ATOMS * sizeof(float) 256 * 4 1024
rotbonds_moving_vectors 3 * MAX_NUM_OF_ROTBONDS * sizeof(float) 3 * 32 * 4 384
rotbonds_unit_vectors 3 * MAX_NUM_OF_ROTBONDS * sizeof(float) 3 * 32 * 4 384
ref_orientation_quats 4 * MAX_NUM_OF_RUNS * sizeof(float) 4 * 1000 * 4 16000

A total of 252528 Bytes is required for constant data, which is a much smaller size than the minimum in the available GPUs.

For debugfastergrad, we require the following arrays as well:

gradsrotbonds

Constant array Size definition Size calculation Size in Bytes
rotbonds_atoms MAX_NUM_OF_ATOMS * MAX_NUM_OF_ROTBONDS * sizeof(int) 256 * 32 * 4 32768

grads (subtotal size: 12384)

Constant array Size definition Size calculation Size in Bytes
rotbonds 2 * MAX_NUM_OF_ROTBONDS * sizeof(int) 2 * 32 * 4 256
num_rotating_atoms_per_rotbond MAX_NUM_OF_ROTBONDS * sizeof(int) 32 * 4 128
angle 1000*sizeof(float) 1000 * 4 4000
dependence_on_theta 1000*sizeof(float) 1000 * 4 4000
dependence_on_rotangle 1000*sizeof(float) 1000 * 4 4000

A total of 297680 (252528 + 45152) Bytes is required for constant data, which is a much smaller size than the minimum in the available GPUs.

Edited by Leonardo Solis