Skip to content

Conversation

@LKFEIYI
Copy link

@LKFEIYI LKFEIYI commented Jan 5, 2026

The previous memory leak was caused by induced_rho not being deleted in source/source_estate/module_pot/efield.cpp.

While fixing this issue, I also refactored several function calls in the surchem module and minimized matrix allocations inside loops.

Now, it seems that the efficiency of the surchem module has been improved by approximately 100%, and the memory leak has been resolved.
I merely test two examples. One is the optimization of ethanol in water, namely the example in #6794, where the time for cal_vel decreased from 1607.60s to 772.56s. The other one is the scf calculation of a Ag slab with cooh (adapted from http://bbs.keinsci.com/thread-57694-1-1.html), where the time for cal_vel decreased from 749.44s to 375.28s.
test.zip

I am not a specialist in quantum mechanics; I am merely trying to explore possible solutions with the assistance of AI. Therefore, I would greatly appreciate it if the reviewer could take the time to review my PR. If my PR does indeed resolve this issue, it would be a great honor for me.

Reminder

  • Have you linked an issue with this pull request?
  • Have you added adequate unit tests and/or case tests for your pull request?
  • Have you noticed possible changes of behavior below or in the linked issue?
  • Have you explained the changes of codes in core modules of ESolver, HSolver, ElecState, Hamilt, Operator or Psi? (ignore if not applicable)

Linked Issue

Fix #...#6794

Unit Tests and/or Case Tests for my changes

  • A unit test is added for each new feature or bug fix.

What's changed?

Any changes of core modules? (ignore if not applicable)

  • Example: I have added a new virtual function in the esolver base class in order to ...

@mohanchen mohanchen added Bugs Bugs that only solvable with sufficient knowledge of DFT Refactor Refactor ABACUS codes Performance Issues related to fail running ABACUS labels Jan 5, 2026
Copy link
Collaborator

@mohanchen mohanchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General speaking, you should divide 'bug fix' and 'refactor' into at least two separate PRs

std::complex<double> *aux_G = new std::complex<double>[rho_basis->npw];
double *aux_R = new double[rho_basis->nrxx];

local_grad_rho(ucell, rho_basis, ps_totn, nablan, aux_G, aux_R);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not a good idea to invent a new function called 'local_grad_rho', for example, you don't add OPENMP acceleration code in the new 'local_grad_rho' function, and the developers need to maintain two sets of codes that share the similar functions

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

void XC_Functional::grad_rho(const std::complex* rhog,
ModuleBase::Vector3* gdr,
const ModulePW::PW_Basis* rho_basis,
const double tpiba)
{
std::complex *gdrtmp = new std::complex[rho_basis->nmaxgr];

// the formula is : rho(r)^prime = \int iG * rho(G)e^{iGr} dG
for(int i = 0 ; i < 3 ; ++i)
{
    // calculate the charge density gradient in reciprocal space.

#ifdef _OPENMP
#pragma omp parallel for schedule(static, 1024)
#endif
for(int ig=0; ig<rho_basis->npw; ig++) {
gdrtmp[ig] = ModuleBase::IMAG_UNIT * rhog[ig] * rho_basis->gcar[ig][i];
}

    // bring the gdr from G --> R
    rho_basis->recip2real(gdrtmp, gdrtmp);

    // remember to multily 2pi/a0, which belongs to G vectors.

#ifdef _OPENMP
#pragma omp parallel for schedule(static, 1024)
#endif
for(int ir=0; ir<rho_basis->nrxx; ir++) {
gdr[ir][i] = gdrtmp[ir].real() * tpiba;
}
}

delete[] gdrtmp;
return;

}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see above the original grad_rho code, which has OPENMP acceleration

Copy link
Author

@LKFEIYI LKFEIYI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I agree that maintaining a duplicated local_grad_rho is not a good idea.

Although benchmarking showed that the shared XC_Functional::grad_rho is slightly slower (~3%) than the specialized local version, this performance cost is negligible compared to the benefits of code maintainability and avoiding redundancy. I will proceed to delete the local functions and use the unified interface.

@LKFEIYI LKFEIYI deleted the branch deepmodeling:develop January 6, 2026 21:40
@LKFEIYI LKFEIYI closed this Jan 6, 2026
@LKFEIYI LKFEIYI deleted the develop branch January 6, 2026 21:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bugs Bugs that only solvable with sufficient knowledge of DFT Performance Issues related to fail running ABACUS Refactor Refactor ABACUS codes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants