Known issues

Known issues

UCX

UCX is a communication framework used by several MPI implementations.

Memory cache

When used with CUDA, UCX intercepts cudaMalloc so it can determine whether the pointer passed to MPI is on the host (main memory) or the device (GPU). Unfortunately, there are several known issues with how this works with Julia:

By default, MPI.jl disables this by setting

ENV["UCX_MEMTYPE_CACHE"] = "no"

at __init__ which may result in reduced performance, especially for smaller messages.

Multi-threading and signal handling

When using Julia multi-threading, the Julia garbage collector internally uses SIGSEGV to synchronize threads.

By default, UCX will error if this signal is raised (#337), resulting in a message such as:

Caught signal 11 (Segmentation fault: invalid permissions for mapped object at address 0xXXXXXXXX)

This signal interception can be controlled by setting the environment variable UCX_ERROR_SIGNALS: if not already defined, MPI.jl will set it as:

ENV["UCX_ERROR_SIGNALS"] = "SIGILL,SIGBUS,SIGFPE"

at __init__. If set externally, it should be modified to exclude SIGSEGV from the list.

Microsoft MPI

Custom operators on 32-bit Windows

It is not possible to use custom operators with 32-bit Microsoft MPI, as it uses the stdcall calling convention, which is not supported by Julia's C-compatible function pointers.