Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Libcint #113

Merged
merged 29 commits into from
Oct 13, 2023
Merged

Libcint #113

merged 29 commits into from
Oct 13, 2023

Conversation

AlexanderMath
Copy link
Contributor

@AlexanderMath AlexanderMath commented Oct 1, 2023

Got all integrals (except grad of kinetic) passing test case. I'd like to get this on main branch so we can start

  1. discuss how to interface nanoDFT with different integral implementations (THO, libcint, popcint, ...)
  2. discuss how to refactor nanoDFT for molecular dynamics (some subtleties wrt state and different combinations of arguments which may make sense)
Done compiling. Calling C code from python. 
[N=2]

[Nuclear Integral]
CPU:     2.763163926555734e-07
Compiling module jit_ipu_intor1e.0:
[##################################################] 100% Compilation Finished [Elapsed: 00:00:14.1]
IPU:     2.763163926555734e-07

[Kinetic Integral]
CPU:     -1.8022852765753328e-08
Compiling module jit_ipu_intor1e.1:
[##################################################] 100% Compilation Finished [Elapsed: 00:00:13.0]
IPU:     -1.05722721688295e-08

[Overlap Integral]
CPU:     -1.2445099606406274e-07
Compiling module jit_ipu_intor1e.2:
[##################################################] 100% Compilation Finished [Elapsed: 00:00:13.0]
IPU:     -6.484635128867211e-08

[Grad Nuclear]
CPU:     7.246001532124069e-08
Compiling module jit_ipu_intor1e.3:
[##################################################] 100% Compilation Finished [Elapsed: 00:00:13.1]
IPU:     7.246001532124069e-08

[Grad Kinetic]
CPU:     2.983345692708639e-08
Compiling module jit_ipu_intor1e.4:
[##################################################] 100% Compilation Finished [Elapsed: 00:00:13.3]
IPU:     2.983345692708639e-08

[Grad Overlap]
CPU:     6.077975783780332e-08
Compiling module jit_ipu_intor1e.5:
[##################################################] 100% Compilation Finished [Elapsed: 00:00:12.9]
IPU:     6.077975783780332e-08

[Electron Repulsion Integral]
CPU:     -4.443460513425812e-08
Compiling module jit_ipu_getints4c.6:
[##################################################] 100% Compilation Finished [Elapsed: 00:00:12.9]
IPU:     -2.953344391265489e-08

[Grad of Electron Repulsion Integral]
CPU:     1.341920186359591e-07
Compiling module jit_ipu_getints4c.7:
[##################################################] 100% Compilation Finished [Elapsed: 00:00:12.8]
IPU:     1.1929085744211143e-07

pyscf_ipu/electron_repulsion/direct.sh Outdated Show resolved Hide resolved
pyscf_ipu/electron_repulsion/popcint/libcint.cpp Outdated Show resolved Hide resolved
Comment on lines 108 to 117
Input<Vector<float>> mat;
Input<Vector<int>> shls_slice;
Input<Vector<int>> ao_loc;
Input<Vector<int>> atm;
Input<Vector<int>> bas;
Input<Vector<float>> env;
Input<Vector<int>> natm;
Input<Vector<int>> nbas;
Input<Vector<int>> which_integral;
Input<Vector<int>> comp;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly sizes here

cintopt = None

# type
float32 = "#define dtype float" in open("libcint.c", "r").read()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is not going to be called a lot? It will be expensive, and after ~1000 calls will fail due to running out of file handles.

Suggest a @memoized function called e.g. libcint_is_float32(), which closes the file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or just compute it up at line 19 where you are already interacting with the filesyste,

Copy link
Contributor Author

@AlexanderMath AlexanderMath Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this is not going to be called a lot?

Once per integral type during tracing (i.e. at most 8 times). I'd think of this as increasing compile time from 2min to 2min 100ms. We can refactor to do once and make it 2min and 10ms.

from tessellate_ipu import create_ipu_tile_primitive, ipu_cycle_count, tile_map, tile_put_sharded, tile_put_replicated
vertex_filename = osp.join(osp.dirname(__file__), "libcint.cpp")
#mat, shls_slice, ao_loc, atm, bas, env
grad = create_ipu_tile_primitive(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, probably want to do this only once?


cintopt = None
prescreen = lib.c_null_ptr()
ic(intor_name, 'GTOnr2e_fill_'+aosym, "GTOnr2e_fill_drv")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ic = icrecream? Probably remove?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 345 to 349
# type
float32 = "#define dtype float" in open("libcint.c", "r").read()
if float32:
out = out.astype(np.float32)
env = env.astype(np.float32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of duplicated code here - and a lot of it is slow - so it feels as if there might be an opportunity to dedup.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of duplicated code here - and a lot of it is slow - so it feels as if there might be an opportunity to dedup.

Sure, let's make those refactors. The heavy compute part is the 20k lines of C. Most of the above is the easy 1% optimizations I was hoping to off-load.

Co-authored-by: Andrew Fitzgibbon <awf@graphcore.ai>
@AlexanderMath AlexanderMath merged commit c21b92b into main Oct 13, 2023
4 checks passed
@AlexanderMath AlexanderMath deleted the libcint branch October 13, 2023 09:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants