Discussion:
[Users] Simfactory2, OpenMP, and hwloc
David Gore
8 years ago
Permalink
Hello,

I am having difficulty compiling simfactory with OpenMP on a RHEL6.5 and
RHEL6.8 machine.

The compilation goes well if I set "CCTK_OPENMP_MODE = no" I get a working
version of simfactory and can run the static_tov example. HOWEVER, it will
only run on one core.


When I set ppn = 8, max-num-threads=8, num-threads=8, and nodes=1 I get the
following errors:

$ ./simfactory/bin/sim submit static_tov1 --parfile=par/static_tov.par

Warning: Too many threads per process specified: specified num-threads=8
(ppn-used is 8)
Warning: Total number of threads and number of threads per process are
inconsistent: procs=1, num-threads=8 (procs*num-smt must be an integer
multiple of num-threads)
Warning: Total number of threads and number of cores per node are
inconsistent: procs=1, ppn-used=8 (procs must be an integer multiple of
ppn-used)

$ ./simfactory/bin/sim submit static_tov1 --parfile=par/static_tov.par
--procs=8

WARNING[L1,P0] (Carpet): Although OpenMP is disabled, the environment
variable OMP_NUM_THREADS is set to 8. It will be ignored.
WARNING level 0 from host jlabdaq.pcs.cnu.edu process 0
while executing schedule bin (none), routine (no thorn)::(no routine)
in thorn Carpet, file /home/dgore/Cactus/Cactus/configs/sim/build/Carpet/
SetupGH.cc:222:
-> Although OpenMP is disabled, the environment variable
CACTUS_NUM_THREADS is set to 8. This may indicate a severe problem with the
Cactus startup mechanism.
cactus_sim: /home/dgore/Cactus/Cactus/configs/sim/build/Carpet/helpers.cc:275:
int Carpet::Abort(const cGH*, int): Assertion `0' failed.
Rank 0 with PID 32440 received signal 6
Writing backtrace to static_tov/backtrace.0.txt
/home/dgore/simulations/static_tov2/output-0000/SIMFACTORY/RunScript: line
26: 32440 Aborted (core dumped)

$./simfactory/bin/sim submit static_tov1 --parfile=par/static_tov.par
--procs=1

-- Warning: Too many threads per process specified: specified num-threads=8
(ppn-used is 8)
Warning: Total number of threads and number of threads per process are
inconsistent: procs=1, num-threads=8 (procs*num-smt must be an integer
multiple of num-threads)
Warning: Total number of threads and number of cores per node are
inconsistent: procs=1, ppn-used=8 (procs must be an integer multiple of
ppn-used)

When I set all the .ini variables to 1,

$ ./simfactory/bin/sim submit static_tov5 --parfile=par/static_tov.par

This works and runs properly on a SINGLE core.

When I try and use OpenMP, hwloc refuses to compile when I set
"CCTK_OPENMP_MODE = yes" The error is that "C compiler does not generate
executables" (or something similar).

If ayone can provide any light on this, I'd be very appreciative.
--
David Gore, Ph.D., Lecturer in Physics
Department of Physics, Computer Science and Engineering
Christopher Newport University
Office: 309 Luter Hall
Voice: 757 594 7827
David Gore
8 years ago
Permalink
(I don't know if people prefer bottom-posting or top-posting, so I'm just
going with GMail's default)

As usually happens, 10 minutes (ok, an hour) after I post a problem, I find
a solution. The problem was that, in

Cactus/configs/sim/config-data/make.config.defn, the following code block
existed:

# OpenMP flags
export CPP_OPENMP_FLAGS = -openmp <----This was the problem
export FPP_OPENMP_FLAGS = -fopenmp
export C_OPENMP_FLAGS = -fopenmp
export CXX_OPENMP_FLAGS = -fopenmp
export CUCC_OPENMP_FLAGS =
export F90_OPENMP_FLAGS = -fopenmp
export F77_OPENMP_FLAGS = -fopenmp

I removed "-openmp" from the CPP_OPENMP_FLAGS variable and now hwloc
compiles.

Much thanks to @bgoglin who was sitting on irc.freenode.net/#hwloc and
caught the error for me.
(And I thought IRC died in 2005 or so :-)

.... took about 40 minutes to compile. But I am now running static_tov
and... I'm at 740% cpu usage (Irix mode)

Another hurdle surmounted.
...
--
David Gore, Ph.D., Lecturer in Physics
Department of Physics, Computer Science and Engineering
Christopher Newport University
Office: 309 Luter Hall
Voice: 757 594 7827
Roland Haas
8 years ago
Permalink
Hello David,

any style of quoting is fine.

If you are using the gnu compiler (gcc) then the correct openmp flags
are -fopenmp and not -openmp (the latter creates an output file penmp).
-openmp was used by older Intel compilers (icc).

Yours,
Roland
...
--
My email is as private as my paper mail. I therefore support encrypting
and signing email messages. Get my PGP key from http://pgp.mit.edu .
Ian Hinder
8 years ago
Permalink
(I don't know if people prefer bottom-posting or top-posting, so I'm just going with GMail's default)
As usually happens, 10 minutes (ok, an hour) after I post a problem, I find a solution. The problem was that, in
Hi,

Good to hear it works, and thanks for posting the solution for others to see!

The file Cactus/simfactory/mdb/optionlists/centos.cfg contains a configuration for CentOS 7, which is probably the closest to RHEL 6. Indeed, it has

OPENMP = yes
CPP_OPENMP_FLAGS = -fopenmp
FPP_OPENMP_FLAGS = -D_OPENMP
C_OPENMP_FLAGS = -fopenmp
CXX_OPENMP_FLAGS = -fopenmp
F77_OPENMP_FLAGS = -fopenmp
F90_OPENMP_FLAGS = -fopenmp

because it is using gcc.

In case you didn't catch it, when the Cactus configure stage says that the compiler cannot create executables, what it really means is that there was an error running the compiler during autoconf, and the log of the error can be found in

configs/sim/config-data/config.log

I don't remember whether it tells you this or not.

Another option to using OpenMP is to use MPI for parallelisation, even on a single host. I think for that you should be able to use

ppn = 8
max-num-threads=1
num-threads=1
nodes=1

The meaning of these variables is described at

http://simfactory.org/info/documentation/userguide/processterminology.html.

You should then be able to run using --procs 8, and parallelisation will happen using MPI. "num_threads" is really "num_openmp_threads_per_process", so setting it to 1 means "don't use multiple threads per process", i.e. parallelise by creating more processes, not more threads.

Note that static_tov has quite a small grid, so parallelisation efficiency is probably not very good.
--
Ian Hinder
http://members.aei.mpg.de/ianhin
David Gore
8 years ago
Permalink
Hi, Ian.

Given the amount of times I've found answers to problems because someone
posts their fix, I thought it would be criminal not to do the same.

I have a few questions regarding the difference between configurations:
(1) You're using simfactory/mdb/optionlists and I'm using
configs/sim/config-data/make.config.defn Is there a reason to choose one
over the other?
(2) Can I assume that OPENMP=yes is the same as CCTK_OPENMP_MODE=yes ?
(3) Is it worth while to re-compile with "CPP_OPENMP_FLAGS = -fopenmp" and
"FPP_OPENMP_FLAGS = -D_OPENMP"?

Neither of the config.logs under configs/sim/config-data or under
configs/sim/scratch/hwloc/hwloc-1.10.1 had anything useful except for the
compile line. But this is because, when "-openmp" was passed to gcc, it
made an executable named "penmp" which the configure script wasn't looking
for. So there was no actual compilation error---it just couldn't find the
resulting binary. That made this harder to diagnose.

I am still quite the newbie when it comes to parallel-processing
nomenclature and that webpage is written for someone with more CS
background than I have. I don't know if this is expected from the cactus
user base so I don't know if it warrants any changes to the simfactory
documentation. My (hopefully short-lived) ignorance (detailed here for the
world to see) tells me that node := machine. Should the correct
association be node ~= motherboard and machine ~= cluster?

It seems to me that faster code would occupy more cores (one process per
core), not more threads/process. But it looks like I'll be wandering down
to our CS department to clean up "process," "thread," and "hyperthread" :-)

Much thanks for your help,
...
--
David Gore, Ph.D., Lecturer in Physics
Department of Physics, Computer Science and Engineering
Christopher Newport University
Office: 309 Luter Hall
Voice: 757 594 7827
Ian Hinder
8 years ago
Permalink
Post by David Gore
Hi, Ian.
Given the amount of times I've found answers to problems because someone posts their fix, I thought it would be criminal not to do the same.
Indeed.
Post by David Gore
(1) You're using simfactory/mdb/optionlists and I'm using configs/sim/config-data/make.config.defn Is there a reason to choose one over the other?
When you build Cactus, it does an "out of tree build" in configs/<config-name>, where the config name defaults to "sim" if you are building with simfactory. You can build multiple configurations with different build settings by giving a name for the configuration in the sim build command (sim build <configname>). SimFactory comes with a set of optionlists in simfactory/mdb/optionlists, mostly for different known clusters on which Cactus is commonly used, but also for different common operating systems. When you build with "sim build", if you are on a known machine, simfactory automatically selects an optionlist appropriate for that machine. Otherwise, you can specify an optionlist with --optionlist <name>, where the name should be an optionlist in simfactory/mdb/optionlists. You may also be able to pass a path to an optionlist, I'm not sure. This optionlist is then copied into configs/<configname>/OptionList, and Cactus starts to build. As part of the cactus build process, the file make.config.defn is generated containing various settings based on what was in the optionlist. I consider this to be an auto-generated file, and not one that should be edited by hand. You should be able to achieve everything you need by editing the original optionlist. One gotcha is that if you do modify the optionlist in simfactory/mdb/optionlists, the next Cactus build will NOT use this updated file unless you explicitly pass the --optionlist <optionlist> argument to sim build.

You didn't actually say in your email how you are building Cactus. Are you building using simfactory, i.e. "sim build", or with the Cactus make system (which is what simfactory calls), i.e. "make sim-config"? SimFactory provides a layer on top of the Cactus make system, essentially providing a database of machines, and a mapping from machine to optionlist.

What instructions are you following for building?
Post by David Gore
(2) Can I assume that OPENMP=yes is the same as CCTK_OPENMP_MODE=yes ?
OPENMP is an option you can set in the optionlist, whereas CCTK_OPENMP_MODE is an internal build system variable. It is set from code in lib/make/configure and make.config.defn.in based on OPENMP from the optionlist.
Post by David Gore
(3) Is it worth while to re-compile with "CPP_OPENMP_FLAGS = -fopenmp" and "FPP_OPENMP_FLAGS = -D_OPENMP"?
I'm not sure. I think it can't hurt, and it is how the other machines that use gcc are set up.
Post by David Gore
Neither of the config.logs under configs/sim/config-data or under configs/sim/scratch/hwloc/hwloc-1.10.1 had anything useful except for the compile line. But this is because, when "-openmp" was passed to gcc, it made an executable named "penmp" which the configure script wasn't looking for. So there was no actual compilation error---it just couldn't find the resulting binary. That made this harder to diagnose.
Interesting. The error message surely must have gone somewhere though! It should have complained that it couldn't find a binary of the given name.
Post by David Gore
I am still quite the newbie when it comes to parallel-processing nomenclature and that webpage is written for someone with more CS background than I have. I don't know if this is expected from the cactus user base so I don't know if it warrants any changes to the simfactory documentation. My (hopefully short-lived) ignorance (detailed here for the world to see) tells me that node := machine. Should the correct association be node ~= motherboard and machine ~= cluster?
Aha - I was the one who wrote that page, so this is my fault. What is meant here is that machine ~= cluster, and node ~= compute node, which would indeed have a 1:1 mapping with a motherboard, and the motherboard may contain several CPUs (corresponding to "sockets"), and each CPU will contain multiple cores.
Post by David Gore
It seems to me that faster code would occupy more cores (one process per core), not more threads/process. But it looks like I'll be wandering down to our CS department to clean up "process," "thread," and "hyperthread" :-)
That's going a bit far :) You could also try:

https://en.wikipedia.org/wiki/Process_(computing)
https://en.wikipedia.org/wiki/Thread_(computing)

I wouldn't worry about hyperthread for now.
--
Ian Hinder
http://members.aei.mpg.de/ianhin
David Gore
8 years ago
Permalink
...
This is good to know, but I assume that simply copying the make.config.defn
file to mymachine.mydomain.edu in /mdb/optionlists will probably not work.
Is there a preferred way to know how to populate the optionlist?
One gotcha is that if you do modify the optionlist in
simfactory/mdb/optionlists, the next Cactus build will NOT use this updated
file unless you explicitly pass the --optionlist <optionlist> argument to
sim build.
You didn't actually say in your email how you are building Cactus. Are
you building using simfactory, i.e. "sim build", or with the Cactus make
system (which is what simfactory calls), i.e. "make sim-config"?
What instructions are you following for building?
I am compiling via ./simfactory/bin/sim build [--clean]
--thornlist=manifest/einsteintoolkit.th
But you say that even if I create a mymachine.mydomain.edu optionlist, when
I recompile, I will need to add "--optionlist=mymachine.mydomain.edu" to
that build line?

To be fair, I *was* kinda hoping that the build would be similar to
WaveToyDemo. But when I first started by building on my debian laptop, I
followed the instructions I found here:
https://docs.einsteintoolkit.org/et-docs/Simplified_Tutorial_for_New_Users

So I stuck with the altered build method.
...
It simply said that the compiler doesn't produce executables. I'm guessing
it was looking for a.out.
Aha - I was the one who wrote that page, so this is my fault. What is
meant here is that machine ~= cluster, and node ~= compute node, which
would indeed have a 1:1 mapping with a motherboard, and the motherboard may
contain several CPUs (corresponding to "sockets"), and each CPU will
contain multiple cores.
Ok, so for a *single* computer, machine = node. I don't have such grand
designs as running on a cluster. Yet. :-)
It seems to me that faster code would occupy more cores (one process per
core), not more threads/process. But it looks like I'll be wandering down
to our CS department to clean up "process," "thread," and "hyperthread" :-)
https://en.wikipedia.org/wiki/Process_(computing)
https://en.wikipedia.org/wiki/Thread_(computing)
Too late. I almost didn't get to my class on time after poking the bears
in our CS department. :-) I'm a little more educated, but it still seems
that---although there are caveats---one process per core is usually the
faster option.
I wouldn't worry about hyperthread for now.
Vielen Dank. ;-)
--
Ian Hinder
http://members.aei.mpg.de/ianhin
--
David Gore, Ph.D., Lecturer in Physics
Department of Physics, Computer Science and Engineering
Christopher Newport University
Office: 309 Luter Hall
Voice: 757 594 7827 <%28757%29%20594-7827>
Loading...