Bug in CUDA exx ek screening

I experimented with GauXC to compute semi-numerical exchange using NVIDIA GPUs and found that, for large systems and large batch sizes, the computed exchange matrix is incorrect. Results obtained with the host integrator were correct. Interestingly, the CUDA results were also correct when using extremely tight screening thresholds, which suggests that something may be wrong with the Exx screening on CUDA.

Tracing the issue indicates that the problem lies in how batches of batches are generated in the exx_ek_screening function. 

https://github.com/wavefunction91/GauXC/blob/fe45b3b795c586cf6c733b3eec8fe3c9392deca6/src/xc_integrator/integrator_util/exx_screening.cxx#L256-L304

This function contains a nested loop over batches that is structured in such a way that, if the number of batches in a task is smaller than task_batch_size (which is hardcoded to 10000), some internal data buffers (bfn_max_device) will be overwritten in the next iteration of the inner loop. This can occur when there are many large batches and the GPU memory cannot fit a task of 10000 batches.

After removing the double loop structure and using the same batching strategy as in other functions, the results became correct.

	const size_t task_batch_size = 10000;

	// Setup EXX EK Screening memory on the device
	device_data.reset_allocations();
	device_data.allocate_static_data_exx_ek_screening( ntasks, nbf, nshells,
	shpairs.npairs(), basis_map.max_l() );
	device_data.send_static_data_density_basis( P_abs, ldp, nullptr, 0, nullptr, 0, nullptr, 0, basis );
	device_data.send_static_data_exx_ek_screening( V_shell_max, ldv, basis_map,
	shpairs );

	integrator_term_tracker enabled_terms;
	enabled_terms.exx_ek_screening = true;



	auto task_batch_begin = task_begin;
	while(task_batch_begin != task_end) {

	size_t nleft = std::distance(task_batch_begin, task_end);
	exx_detail::host_task_iterator task_batch_end;
	if(nleft > task_batch_size)
	task_batch_end = task_batch_begin + task_batch_size;
	else
	task_batch_end = task_end;

	device_data.zero_exx_ek_screening_intermediates();


	// Loop over tasks and form basis-related buffers
	auto task_it = task_batch_begin;
	while( task_it != task_batch_end ) {

	// Determine next task patch, send relevant data (EXX_EK only)
	task_it = device_data.generate_buffers( enabled_terms, basis_map, task_it,
	task_batch_end );

	// Evaluate collocation
	lwd->eval_collocation( &device_data );

	// Evaluate EXX EK Screening Basis Statistics
	lwd->eval_exx_ek_screening_bfn_stats( &device_data );

	}


	lwd->exx_ek_shellpair_collision( eps_E, eps_K, &device_data, task_batch_begin,
	task_batch_end, shpairs );
	task_batch_begin = task_batch_end;
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug in CUDA exx ek screening #154

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug in CUDA exx ek screening #154

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions