-
-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
Using pip install dask-mpi
$ pip install dask-mpi
$ mpirun -np 2 dask-mpi --name=test-worker --nthreads=1 --memory-limit=0 --scheduler-file=test.json
distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO - Scheduler at: tcp://172.16.66.109:36539
distributed.scheduler - INFO - dashboard at: :8787
distributed.nanny - INFO - Start Nanny at: 'tcp://172.16.66.109:36297'
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
getting local rank failed
--> Returned value No permission (-17) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like orte_init failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems. This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):
orte_ess_init failed
--> Returned value No permission (-17) instead of ORTE_SUCCESS
--------------------------------------------------------------------------
--------------------------------------------------------------------------
It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here's some
additional information (which may only be relevant to an Open MPI
developer):
ompi_mpi_init: ompi_rte_init failed
--> Returned "No permission" (-17) instead of "Success" (0)
--------------------------------------------------------------------------
*** An error occurred in MPI_Init_thread
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
Using python setup.py install
$ python setup.py install
$ mpirun -np 2 dask-mpi --name=test-worker --nthreads=1 --memory-limit=0 --scheduler-file=test.json
distributed.http.proxy - INFO - To route to workers diagnostics web server please install jupyter-server-proxy: python -m pip install jupyter-server-proxy
distributed.scheduler - INFO - Clear task state
distributed.scheduler - INFO - Scheduler at: tcp://172.16.66.109:44933
distributed.scheduler - INFO - dashboard at: :8787
distributed.nanny - INFO - Start Nanny at: 'tcp://172.16.66.109:42437'
distributed.diskutils - INFO - Found stale lock file and directory '/home/mpaipuri/downloads/dask-mpi/dask-worker-space/worker-6h2hf4i6', purging
distributed.worker - INFO - Start worker at: tcp://172.16.66.109:37893
distributed.worker - INFO - Listening to: tcp://172.16.66.109:37893
distributed.worker - INFO - dashboard at: 172.16.66.109:45119
distributed.worker - INFO - Waiting to connect to: tcp://172.16.66.109:44933
distributed.worker - INFO - -------------------------------------------------
distributed.worker - INFO - Threads: 1
distributed.worker - INFO - Local Directory: /home/mpaipuri/downloads/dask-mpi/dask-worker-space/worker-t48hj0dc
distributed.worker - INFO - -------------------------------------------------
distributed.scheduler - INFO - Register worker <WorkerState 'tcp://172.16.66.109:37893', name: rascil-worker-1, status: undefined, memory: 0, processing: 0>
distributed.scheduler - INFO - Starting worker compute stream, tcp://172.16.66.109:37893
distributed.core - INFO - Starting established connection
distributed.worker - INFO - Registered to: tcp://172.16.66.109:44933
distributed.worker - INFO - -------------------------------------------------
distributed.core - INFO - Starting established connection
What happened: Installing dask-mpi with wheel packaging fails but it works normally with egg packaging. Tested it on 2 different systems and same behaviour is observed
What you expected to happen: To work with both packaging methods
Anything else we need to know?: The only difference between two approaches is generated dask-mpi command line executable.
- Dask version: 2021.11.2
- Python version: 3.8
- Operating System: Debian 11
- Install method (conda, pip, source): pip and source
Metadata
Metadata
Assignees
Labels
No labels