-
Notifications
You must be signed in to change notification settings - Fork 152
SURF2007
[wiki:SURF2008]
- Incorporating packaged solvers and parallel tools into FiPy. - Model solar cell devices using FiPy - Increasing Meshing Efficiency in FiPy
- The student will first evaluate the Scipy solvers and, if necessary, incorporate them into FiPy. After completing this relatively straightforward task, which will provide an opportunity for the student to become familiar with FiPy, the student will evaluate other Python wrapped solver suites, e.g. PETSC, !PyTrilinos. After evaluation, the student will try to incorporate one of these suites for use serially with FiPy. If this is successful, the student will investigate the parallel use of the suite.
- The student will work through modeling a few simple partial differential equation systems in FiPy to gain familiarity with the code framework. The student will familiarize themselves with the governing equations for semiconductor devices, and particularly for solar cells, and will assist with the ongoing development and testing of FiPy codes to represent these equations. The student will use these codes to simulate different solar cell geometries and material properties.
- The student will first familiarize themselves with the way FiPy implements field variables with lazy evaluation and C inlined binary operations. The student will refactor parts of FiPy so inlining is pervasive throughout the code, especially the mesh. This will bring both memory and speed gains to FiPy for both structured and unstructured grids. The student will refactor the code so that dynamic mesh updates are possible laying the groundwork for adaptive meshing. If time is available, the student will evaluate packaged meshing tools that are possibilities for adaptive meshing.
- Python-wrapped solver suites, in their current form, are difficult to integrate into FiPy because of the way they handle insertion of elements into a sparse matrix. The methods available for insertion are based on passing in an m-by-n dense matrix, and a list of m rows and n columns which are to get this matrix added in. However, this is not a reasonable way of doing an insertion such as "all the elements on the diagonal" - this would either require a dense matrix the size of the sparse matrix, or would require explicit looping (in Python) over the rows of the matrix. Either of these are prohibitively inefficient. The interface we would want would have the ability to receive a list of row indices i, list of column indices j, and a list of values v, all of the same length, and set the matrix elements (i_k, j_k) to values v_k. This would allow elements be inserted in an arbitrary pattern without having to loop in python.
- The !PyTrilinos developers were contacted regarding this; a feature request was put in, and the new insertion mechanism will be available starting from release 7.0.9 of Trilinos.
- PETSc4py has the same sort of insertion interface issues; though it provided a function for putting values on the diagonal, it did not have a general insertion interface which would work for FiPy.
- To get around this, I have created a wrapper for the Trilinos matrix interface, keeping the same interface as the _SparseMatrix wrapper around pysparse. However, the functionality of this wrapper is slightly more limited than that of the pysparse wrapper, due to some limitations of Trilinos.
- There is no "take(ids1, ids2)" function. Trilinos does not have such a function written in C, and looping in python is inefficient. However, the takeDiagonal function is still available, since there is one in Trilinos.
- Trilinos sparse matrices have the property that after all insertion into them is done, they must have their !FillComplete method called. Before this method is called, the matrices are write-only - it is not possible to read information out of them. After !FillComplete is called, new elements cannot be inserted into the matrix; it is, however, possible to change values of existing non-empty elements. Because of these constraints, some matrix operations will not be guaranteed to work on all matrices. If necessary, there are possible workarounds; however, at the moment, matrix creation works in all of the ways that FiPy uses it.
- _getitem_, _str_, _iadd, _add, _mul_, copy, put, and takeDiagonal may call !FillComplete() on their arguments, and any matrices returned will also have !FillComplete() called on them. Any functions that attempt to write to matrices may fail if their target matrix has already been !FillComplete()d and do not already have numbers in the target cells. However, those combinations of operations that don't seem to occur in FiPy.
- The !TrilinosMatrix has been integrated into FiPy, so that which matrix wrapper to use (pysparse or !TrilinosMatrix) is chosen automatically based on the solver.
- Next, the interface for using Trilinos solvers needs to be cleaned up, so that the user does not need to specify argument lists.
- Trilinos has been integrated into FiPy, so that the solver suite can be chosen by setting the FIPY_SOLVERS environment variable to 'Trilinos' or 'Pysparse', or passing the --Trilinos or --Pysparse command-line flags to the script. Trilinos release 7.0.9 is required.
- The appropriate solver suite is imported, and solvers can be accessed by name (e.g. LinearLUSolver). They will be imported from whichever solver package is being used. If Trilinos solvers are being used, they can accept an optional preconditioner object as an argument at initialization.
- If necessary, the solverSuite() call can be used to determine from within a script whether Trilinos or Pysparse is being used.
- Trilinos is slower at building matrices, but can be faster at solving some large and difficult problems. In some cases, it allows using iterative solvers instead of LU, decreasing memory requirements significantly.
- NOTE: when debugging, remember that calling "print myMatrix" will, to print the matrix, have to call !FillComplete() on it; thus, calling "print myMatrix" while debugging matrix creation can actually cause a change in the behavior, if the matrix building was not yet done and !FillComplete() should not have been called yet. Calling "print myMatrix._getMatrix()" or "print myMatrix.matrix" will not have that issue, though it also won't print the matrix in as nice a form and will expose the underlying trilinos nature of the matrix.
- Trilinos can solve the matrix equation in parallel, but since the matrix is being built on one processor and then redistributed, there are no memory-use gains from this. For all the examples that I have tried, there are no performance benefits because the overhead of network communication significantly outweighs any gains in calculation speed.
Currently, version 7.0.9 of Trilinos, compiled for the Sarge machines, is at . A version of Swig for those machines is at .
Note::
{{{/users/maxsimg/trilinos/swig-1.3.31-installed-2.4.30-seekdir/}}} should be {{{/users/maxsimg/trilinos/swig-1.3.31-installed-2.4.30-seekdir/bin/swig}}}
The call that was used to compile Trilinos was && &&
Note::
When compiling on the 64 bit machines don't use {{{"-malign-double"}}}. Use {{{"-fPIC"}}} as an option with the c++, Fortran and C flags. The configure command will thus be, {{{../configure CXXFLAGS="-O3 -fPIC" CFLAGS="-O3 -fPIC" FFLAGS="-O5 -funroll-all-loops -fPIC" F77="g77" --enable-epetra --enable-aztecoo --enable-pytrilinos --enable-ml --enable-ifpack --enable-amesos --with-gnumake --enable-galeri --cache-file=config.cache --with-swig=$PATH_TO_SWIG_EXECUTABLE --prefix=$LOCAL_INSTALLATION_DIR}}}
The option should not be necessary, since FiPy does not use Galeri, but it is needed since !PyTrilinos doesn't handle its own dependencies as well as it could. The Trilinos developers know of this and expect it to be fixed in Trilinos 8.0.
The Trilinos devs also suggest making a new directory and running from there, instead of running from the directory with the configure script.
Swig must be installed locally, since the version of Swig that is already on these machines is too old and swig 1.3.28 or higher is required. Swig can be downloaded from www.swig.org , and compiles with a ./configure --prefix=$SWIG_LOCAL_INSTALLATION_DIR && make && make install .
The optimization flags are important - they give significant performance improvements.
The --with-gnumake option should only be used on machines with gnumake.
Compiling Trilinos takes a very long time; I don't know exactly how long, but on the Sarge machines it's at least several hours.
The Trilinos python2.3/site-packages directory must be added to the PYTHONPATH, and the trilinos lib directory needs to be added to LD_LIBRARY_PATH, as described in the modifications to the FiPy manual and on the !PyTrilinos documentation page.
To compile Trilinos in parallel mode, add to the configure line. On the Sarge machines, to make this work you also need to use . On the Etch machines, to get MPI to work I had to copy over the directory from a Sarge machine to my own directory (since the Etch machines were, at the time, missing it from the mpi directory), and then use . There's probably a better way to get this to work.
FIPY_SOLVERS must be set to 'Trilinos'. All of the relevant environment variables (FIPY_SOLVERS, PYTHONPATH, LD_LIBRARY_PATH) need to be automatically set at login, such as in the .bashrc. Alternatively, there's some way that arguments can be passed to mpirun to make it export certain variables to the child process, but I haven't been able to get that to work and just used the simpler method of setting them on login.
Also, the path to FiPy should be in the PYTHONPATH; having "." in the PYTHONPATH and being in the FiPy directory might not be enough, since the "current directory" does not appear to be stable throughout the process. This could be due to MPI or a bug in Trilinos.
Currently, parallelization gives few memory gains because the matrix is built on one processor anyway, and no speed gain because of communication overhead. However, if you want to run a script in parallel anyway, you should enclose plotting and output in an
{{{if mainProcessor():}}}[[BR]] {{{ # Do stuff here}}}
statement, which will make the output happen only once (on processor 0) instead of on all processors. This isn't an elegant long-term solution, but it works for now. Note that if anything involving defining variables or solving equations is run on only the main processor, this will result in wrong output or a program crash or freeze.
Also, it is not possible to easily run the test suite in parallel. Because mpirun adds extra arguments to the script invocation, "python setup.py test" gets several extra command-line arguments, which then make the option-processing fail and quit. To get around this, one can just make a script that has the relevant tests:
{{{import unittest}}}[[BR]] {{{theSuite = unittest.!TestSuite()}}}
{{{import fipy.test}}}[[BR]] {{{theSuite.addTest(fipy.test._suite())}}}
{{{import examples.test}}}[[BR]] {{{theSuite.addTest(examples.test._suite())}}}
{{{testRunner = unittest.!TextTestRunner(verbosity=2)}}}[[BR]] {{{result = testRunner.run(theSuite)}}}
and run that under mpirun, though it won't have all of the option-processing of the full test suite.
Also, parallelization has not been tried in combination with --inline; it is possible for problems to arise with the networked filesystem, if different copies of weave are all trying to create their own files.
To run a script with Trilinos in parallel through mpirun, include . The script can then be invoked with .
To make parallelization useful, building the matrix needs to be parallelized as well. To parallelize the matrix creation completely, FiPy would have to have each processor know its starting row and ending row; this would not be difficult, since it could be generated when in trilinosMatrix.!__init!__() based on the total number of processors (comm.!NumProc()) and the current processor's ID (comm.MyPID()). The Map passed to the FECrsMatrix constructor would then be created accordingly.
The more difficult part would be having the rest of FiPy take these as input when necessary and generate only the relevant piece of the matrix and RHS vector. One way of doing this would require changes to the various mesh-accessing functions that do things like return the IDs of adjacent cells, to have them only return the ones appropriate to the current processor. I have not looked at this in enough detail to know whether this is doable or not. Without this, the unnecessary elements would have to be "filtered out" at some point, which would incur performance costs; however, if there is some way of doing this cheaply, that would work as well.