-
Notifications
You must be signed in to change notification settings - Fork 13
/
TODO
178 lines (149 loc) · 7.18 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
base.py
-------
* Re-think the whole implementation of FlexibleGetters. I think all of this can
be done using lazy evaluation. See decorator.lazyprop, which we currently use
in eos.EosFit only.
As of Python 3.8, functools.cached_property() does the same thing as
lazyprop.
parse.py
--------
* Many classes still call the base classes' __init__ explicitly. We get unbound
method errors in interactive Ipython sometimes if we reload modules. We need
to use use super() instead. But that only calls *one* __init__, namely the
one from the major base class::
class C(B,A):
pass
# calls only B.__init__, not A.__init__
c=C()
In some classes, we call __init__'s of two base classes and cannot use
super() then? Is this bad design? Probably, since the MRO (method resolution
order) has clear rules also for multi inheritance. But who understands them?
* Speed: We did some profiling using kernprof.py on Cp2kMDOutputFile, which
uses np.fromstring(backtick(cmd)).reshape(...). We found that the Python
overhead for this command compared to executing all the grep/sed/awk stuff in
`cmd` directly in the shell is small (about 10% top). So, the rest of
slowdown comes from stuff which we do in pure Python with the parsed arrays
(esp. in Trajectory, I guess, which calls crys.*3d() a lot).
* auto_calc kwd in StructureFileParser.get_cont(): make kwd of __init__ instead
and also pass thru from io.read_*() such that the API is more simple and
users can do:
tr = io.read_foo(file, auto_calc=False)
instead of
pp = parse.SomeParser(file)
tr = pp.get_traj(auto_calc=False)
crys.py
-------
* All *3d() funcs use simple loops currently. This is slow. Find vectorized
versions or re-implement in Fortran (flib.f90) / Cython. But use profiling
first! Ipython's %prun or kernprof.py / line profiler.
* Structure / Trajectory: Instead of cryst_const, store abc+angles, apply
length unit to abc and derive cell + cryst_const from abc+angles.
* rpdf(): Use scipy.spatial.distance.pdist() or cdist() or a variant of
_flib.distsq_frac() for 3d array which is *much* faster.
pdist() also returns a "condensed" 1d array w/ all distances. We can feed
that directly to histogram()! This would solve major problems with that
fuction.
* rpdf(): If we re-code the distance part in Fortran, add support for variable
cell trajectories, if the rpdf is defined for that. Makes surely no sense if
the cell changes drastically during an MD and we do the average of the rpdf's
over time? Also, rmax will change in each timestep! We'll then output only up
to the smallest of all rmax values.
* Maybe make crys.Structure / Trajectory do lazy evaluation by default
by setting set_all_auto to False (maybe rename it to "lazy"). No we have:
st = Structure(...)
st.cell # access pre-calculated attr
st.get_cell() # calculate and return or return pre-calculated attr
We want:
st.cell # call self.get_cell() if not calculated etc.
This can be done only if we turn all attrs into lazy evaluated properties.
pydos.py
--------
* pydos is a very stupid name, rename to pdos.py
* let *_pdos() return one 2d array with several columns, like crys.rpdf()
all
---
* Drop verbose.py. Use the logging or warnings module.
* rename all functions foo3d() -> foo_traj(). But not funcs which take 3d arrays
directly. Only those which take a Trajectory as argument.
* remove all old (py27 version) deprecation warnings before adding new ones
examples
--------
* go thru all and check if they still work, make sure we use python3
* Clean up examples/phonon_dos/pdos_methods.py
bin
---
* go thru all and check if they still work, make sure we use python3
constants.py / units in general
-------------------------------
* Maybe change pressure (GPa currently) and time step (fs = 1e-15 sec) units to
ASE values?? Would probably break a lot of eval scripts. Maybe to ease the
transition, first introduce apu (atomic pressure unit = GPa, later ASE value
= eV / Ang**3), atu (atomic time unit = fs, later ASE value = ???) in
constants.py and use that everywhere instead of assuming GPa and fs.
Set apu_to_GPa =1.0 for now etc...
* The units implementation is a mess. Now, we have UnitsHandler ->
StructureFileParser -> parse.*OutputFile and default_units in each
*OutputFile. This must be simplified.
num.py
------
* Similar to the Fit1D base class, we could also define a FitND base class
which defines get_min() for the ND-case. Right now we have the same get_min()
implementation in PolyFit and Interpol2D. We could derive them from FitND.
* Interpol2D is actually InterpolND if we remove the 'bispl' method. So add
InterpolND and derive Interpol2D from that and add bispl only there. Uuuuhh :)
eos.py
------
* EosFit: Test data scaling when using it in thermo.Gibbs as fitfunc (as in
num.PolyFit).
mpl.py
------
* Farm this out to the mplextra package. We already placed the set_plot_*
functions there.
tests
-----
* move/merge test/tools.py with utils/, maybe place in utils/__init__.py ??
problem: utils/ is more a collection of scripts and data, not so much a
sub-package of pwtools.test
* make all paths in all tests absolute (well, relative to __file__) such that
tests can be started from anywhere, not just pwtools/test/
* tools.py: make skip functions decorators instead of a function that we call
which then raises nose.plugins.skip.SkipTest, reassess why we use
nose.plugins.skip.SkipTest instead of the unittest.skip() decorators in the
first place
2to3
----
* check all replacements
range() -> list(range())
.*.keys() -> list(.*.keys())
map() -> list(map())
zip() -> list(zip())
print("foo %s" %x) -> print(("foo %s" %x))
extenions
---------
* dist, extensions -- wheel?
* build extensions and link statically for that to deal w/ dep. on liblapack
etc which is outside of pip's scope
examples/benchmarks/distmat_speed.py
------------------------------------
* also compare against jax-md space.py
data scaling and derivatives
----------------------------
Evaluate the possibility to use sklearn data scalers for num.PolyFit and
rbf.Rbf. In PolyFit we have a working [0,1] scaler that *could* be replaced. In
Rbf we don't have any scaler and there we could benefit from that. In Rbf we do
however scale the trained weights internally, which is similar to PolyFit's
scale_vand, so no data scaling but better than nothing.
Without derivatives, we can just create user-facing wrapper APIs for Rbf and
PolyFit that scale and re-scale using sklearn scalers, no problem.
BUT: We get into trouble with derivatives in connection to scaling. In this
case I think we need to implement the scaler as part of the regression model
because we need to diff thru the scaler in all cases. In PolyFit1D we coded that
manually but support it only for the 1D case.
So, when we want to use jax to do diffs, we would need to re-implement the
sklearn scalers here and use jax.numpy -> np else jax' tracing will fall over
backwards.
PolyFit: calculating derivatives of polys using autodiff is usually overkill
since the implementation of analytic derivs is trivial (I hope! :). BUT: when
we need to diff thru the data scaler as well, it might be worth instead of
working all that out by hand.
# vim:comments=fb\:*