PETSC_EXTERN PetscErrorCode PetscCUDAInitialize(MPI_Comm comm,PetscInt device)Logically collective
| comm | - the MPI communicator that will utilize the devices | |
| device | - the device assigned to current MPI process. Special values like PETSC_DECIDE or PETSC_DEFAULT have special meanings (see details below) | 
| -cuda_device <device> | - the device assigned to current MPI rank. <device> is case-insensitive and can be: NONE (or none, or -3) : the code will not use any device, otherwise it will error out; PETSC_DEFAULT(or DEFAULT, or -2) : do not explicitly set device, i.e., use whatever device already set by user (probably before PetscInitialize()). Init device runtime etc; PETSC_DECIDE (or DECIDE, or -1) : assign MPI ranks in comm to available devices in round-robin, and init device runtime etc on the selected device; >= 0 integer : assign the device with this id to current MPI process. Error out if <device> is invalid. Init device runtime etc on this device; With PETSC_{DECIDE, DEFAULT}, if there are actually no devices, the code can still run, but it will error out when trying to create device objects. | |
| -cuda_view | - view information about the devices. | |
| -cuda_synchronize | - wait at the end of asynchronize device calls so that their time gets credited to the current event. With -log_view, the default is true, otherwise false. | |
| -log_view | - logging, however if alone or combined with `-cuda_device DEFAULT | DECIDE | >=0 int`, will init device; if combined with `-cuda_device none`, won't init device. | |
| -petsc_default_use_null_stream | - If true (default), petsc will use the default NULL stream to launch its kernels and call vendor libraries such as cuBLAS, cuSPARSE etc. | |
| -use_gpu_aware_mpi | - assume the MPI is device/GPU-aware when communicating data on devices. Default true. | 
If this routine is triggered by command line options, it is called in PetscInitialize(). If users want to directly call it, they should call it immediately after PetscInitialize().
If this is not called then the CUDA initialization is delayed until first creation of a CUDA object and this can affect the timing since they happen asynchronously on different nodes and take a lot of time.
.seealso: PetscCUDAInitializeCheck(), PetscHIPInitialize(), PetscHIPInitializeCheck()