Commit be4dc487 authored by Leonardo Solis's avatar Leonardo Solis
Browse files

potential unroll in prng_ls123


Former-commit-id: fc53c03e
parent ce240e17
......@@ -123,6 +123,14 @@ void Krnl_Prng_LS123_ushort(unsigned int Host_seed1,
lfsr.x = Host_seed1;
lfsr.y = Host_seed2;
lfsr.z = Host_seed3;
/*
uint lfsr[3];
lfsr[0] = Host_seed1;
lfsr[1] = Host_seed2;
lfsr[2] = Host_seed3;
*/
bool valid = false;
while(!valid) {
......@@ -158,6 +166,35 @@ void Krnl_Prng_LS123_ushort(unsigned int Host_seed1,
if(!valid) {
success = write_channel_nb_altera(chan_PRNG2GA_LS123_ushort_prng, tmp);
}
/*
ushort tmp[3];
#pragma unroll
for (uint i=0; i<3; i++){
uchar lsb[3];
lsb [i] = lfsr[i] & 0x01u;
lfsr[i] >>= 1;
lfsr[i] ^= (-lsb[i]) & 0xA3000000u;
tmp [i] = (DockConst_pop_size/MAX_UINT)*lfsr[i];
}
// to avoid having same entities undergoing LS simultaneously
if ((tmp[0] == tmp[1]) || (tmp[0] == tmp[2]) || (tmp[1] == tmp[2])) {
tmp[1] = tmp[0] + 1;
tmp[2] = tmp[1] + 2;
}
bool success = false;
ushort3 tmp123;
tmp123.x = tmp[0];
tmp123.y = tmp[1];
tmp123.z = tmp[2];
if(!valid) {
success = write_channel_nb_altera(chan_PRNG2GA_LS123_ushort_prng, tmp123);
}
*/
} // End of while(active)
}
......
......@@ -504,8 +504,22 @@ Speedup vs i5 cpu core: 3ptb: 59/40 = 1.47x, 1stp: 84/76 = 1.1x
>>> commit "added pdbs for testing"
BEGIN: THESE CHANGES WERE TESTED IN HW BUT PERFORMANCE REDUCED 1 SEC
. `Krnl_GA`: to <TRY TO> infer coalesced accesses to glob mem
redefine GlobPopCurr and GlobEneCurr pointers in the deepest possible scope
(unrolling correspoding for-loops provides no performance benefits: NOT DONE)
. `Krnl_GA`: for SINGLE_COPY_POP_ENE, pass `eval_cnt` and `generation_cnt` as x and y components
of the same global int2 variable, in order to increase coalescing
END: THESE CHANGES WERE TESTED IN HW BUT PERFORMANCE REDUCED 1 SEC
BEGIN: THESE CHANGES WERE TESTED IN HW BUT PERFORMANCE REDUCED 2 & 4 SEC (3ptb & 1stp)
`Krnl_PRNG.cl`: in Krnl_Prng_LS123_ushort, unroll internal logic converting vectors into arrays,
as it seems that vectorization increase performance only in NDRAnge kernels
END: THESE CHANGES WERE TESTED IN HW BUT PERFORMANCE REDUCED 2 & 4 SEC (3ptb & 1stp)
NO CHANGES, SAVED comments in PRNG for next commit
>>> commit "potential unroll in prng_ls123"
XXX, Between Conform and InterE, IntraE create a wider channel:
......
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment