Skip to content

Problem in compiling gap_fit MPI with meson #716

@lormio

Description

@lormio

I am encountering the same (or at least similar) issue with the gap_fit executable compiled with meson, with MPI turned on:

SYSTEM ABORT: proc=0 Traceback (most recent call last)
File "../src/libAtoms/linearalgebra.F90", line 2348 kind unspecified
LA_Matrix_Factorise: cannot factorise, error: 8

However, when building gap_fit using QUIP_ARCH=linux_x86_64_gfortran_openmpi and make config (using -lopenblas -lscalapack and no extra link options), the produced gap_fit does not have any problem running the fit to the end, using the same training file and config_file.

I am compiling quip inside a conda environment with openblas, scalapack and compilers downloaded from conda itself.

Originally posted by @lormio in #715

I add to the issue quoted the packages installed in the conda environment and some information

conda packages: scalapack, openblas, gxx, gcc, gfortran, openmpi, meson, ninja.

The issue seems to be the following: the old build system finds and links Scalapack correctly, while Meson doesn't.
OLD BUILD
$ ldd build/linux_x86_64_gfortran_openmpi/gap_fit | grep scalapack
libscalapack.so => /home/miolalor/.conda/envs/quip_comp/lib/libscalapack.so (0x0000742212600000)
MESON
$ ldd builddir/src/Programs/gap_fit | grep scalapack
No output

Moreover, just watching these symbols produced by ldd, I noticed that the symbols for MPI are different, with the old build having an extra 2 of them(I don't know if it's related or not, but I don't know why it should differ):
OLD BUILD
$ ldd build/linux_x86_64_gfortran_openmpi/gap_fit | grep mpi
libmpi_usempif08.so.40 => /home/miolalor/.conda/envs/quip_comp/lib/libmpi_usempif08.so.40 (0x00007747a44ad000)
libmpi_usempi_ignore_tkr.so.40 => /home/miolalor/.conda/envs/quip_comp/lib/libmpi_usempi_ignore_tkr.so.40 (0x00007747a4499000)
libmpi_mpifh.so.40 => /home/miolalor/.conda/envs/quip_comp/lib/libmpi_mpifh.so.40 (0x00007747a4422000)
libmpi.so.40 => /home/miolalor/.conda/envs/quip_comp/lib/libmpi.so.40 (0x0000774799400000)
MESON
ldd builddir/src/Programs/gap_fit | grep mpi
libmpi_mpifh.so.40 => /home/miolalor/.conda/envs/quip_comp/lib/libmpi_mpifh.so.40 (0x00007f57742b9000)
libmpi.so.40 => /home/miolalor/.conda/envs/quip_comp/lib/libmpi.so.40 (0x00007f5769e00000)

However, I do not understand why meson is failing in linking the scalapack library when using the old build system I just specify the -lscalapack, without having to specify the -I or -L paths.

The executable produced with meson will run when used without mpiexec until the error as reported in the previous issue, and complain about undefined symbols when run in parallel (blacs_gridinit_). The old build system gap_fit works without issue instead, with or without multiple MPI processes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions