NVIDIA · fbusato · Nov 18, 2025 · Nov 18, 2025 · Nov 19, 2025 · Nov 19, 2025
@@ -334,19 +334,19 @@ Usage example:
 
 CUDA doesn't support exceptions in device code, however, sometimes we need to write host/device functions that use exceptions on host and ``__trap()`` on device. CCCL provides a set of macros that should be used in place of the standard C++ keywords to make the code compile in both, host and device code.
 
-+----------------------------+-------------------------------------------------------------------+
-| ``_CCCL_TRY``              | Replacement for the ``try`` keyword                               |
-+----------------------------+-------------------------------------------------------------------+
-| ``_CCCL_CATCH (X)``        | Replacement for the ``catch (/*X*/)`` statement                   |
-+----------------------------+-------------------------------------------------------------------+
-| ``_CCCL_CATCH_ALL``        | Replacement for the ``catch (...)`` statement                     |
-+----------------------------+-------------------------------------------------------------------+
-| ``_CCCL_CATCH_FALLTHOUGH`` | End of ``try``/``catch`` block if ``_CCCL_CATCH_ALL`` is not used |
-+----------------------------+-------------------------------------------------------------------+
-| ``_CCCL_THROW``            | Replacement for the ``throw /*arg*/`` expression                  |
-+----------------------------+-------------------------------------------------------------------+
-| ``_CCCL_RETHROW``          | Replacement for the plain ``throw`` expression                    |
-+----------------------------+-------------------------------------------------------------------+
++-----------------------------+-----------------------------------------------------------------------+
+| ``_CCCL_TRY``               | Replacement for the ``try`` keyword                                   |
++-----------------------------+-----------------------------------------------------------------------+
+| ``_CCCL_CATCH (X)``         | Replacement for the ``catch (/*X*/)`` statement                       |
++-----------------------------+-----------------------------------------------------------------------+
+| ``_CCCL_CATCH_ALL``         | Replacement for the ``catch (...)`` statement                         |
++-----------------------------+-----------------------------------------------------------------------+
+| ``_CCCL_THROW``             | Replacement for the ``throw /*arg*/`` expression                      |
++-----------------------------+-----------------------------------------------------------------------+
+| ``_CCCL_THROW_IF(COND, X)`` | Replacement for the ``throw /*arg*/`` expression if ``COND`` is true  |
++-----------------------------+-----------------------------------------------------------------------+
+| ``_CCCL_RETHROW``           | Replacement for the plain ``throw`` expression                        |
++-----------------------------+-----------------------------------------------------------------------+
 
 *Note*: The ``_CCCL_CATCH`` clause must always introduce a named variable, like: ``_CCCL_CATCH(const exception_type& var)``.
 
@@ -360,22 +360,20 @@ Example:
         {
             return ptr;
         }
-        _CCCL_THROW std::bad_alloc{}; // on device calls cuda::std::terminate()
+        _CCCL_THROW(std::bad_alloc{}); // on device calls cuda::std::terminate()
     }
 
     __host__ __device__ void do_something(int* buff)
     {
-        _CCCL_THROW std::runtime_error{"Something went wrong"}; // on device calls cuda::std::terminate()
+        _CCCL_THROW(std::runtime_error{"Something went wrong"}); // on device calls cuda::std::terminate()
     }
 
     __host__ __device__ void fn(cuda::std::size_t n)
     {
         int* buff{};
-
         _CCCL_TRY
         {
             buff = reinterpret_cast<int*>(alloc(n * sizeof(int)));
-
             do_something(buff);
         }
         _CCCL_CATCH ([[maybe_unused]] const std::bad_alloc& e) // must be always named
@@ -431,9 +429,9 @@ Debugging Macros
 ----------------
 
 +-----------------------------------+-------------------------------------------------------------------------------------------------------------+
-| ``_CCCL_ASSERT(COND, MSG)``       | Portable, conditional CCCL `assert()` macro. Requires (``CCCL_ENABLE_HOST_ASSERTIONS`` or ``CCCL_ENABLE_DEVICE_ASSERTIONS``) |
+| ``_CCCL_ASSERT(COND, MSG)``       | Portable, conditional CCCL `assert()` macro. Requires (``CCCL_ENABLE_ASSERTIONS`` or a debug build)         |
 +-----------------------------------+-------------------------------------------------------------------------------------------------------------+
-| ``_CCCL_VERIFY(COND, MSG)``       | Portable, always-on `assert()` reserved for critical checks that are always required                                                           |
+| ``_CCCL_VERIFY(COND, MSG)``       | Portable, always-on `assert()` reserved for critical checks that are always required                        |
 +-----------------------------------+-------------------------------------------------------------------------------------------------------------+
 | ``_CCCL_ENABLE_ASSERTIONS``       | Enable assertions                                                                                           |
 +-----------------------------------+-------------------------------------------------------------------------------------------------------------+

@@ -8,6 +8,7 @@ Extended API
 
    extended_api/bit
    extended_api/execution_model
+   extended_api/exceptions
    extended_api/memory_model
    extended_api/thread_groups
    extended_api/synchronization_primitives

@@ -0,0 +1,34 @@
+.. _libcudacxx-extended-api-exceptions:
+
+Exception Handling
+==================
+
+Standard C++ exception handling (``try``, ``catch``, ``throw``) is not supported in CUDA device code, while it is enabled by default in host code.
+
+**Device code**
+
+``libcu++`` maps exceptions to ``cuda::std::terminate()`` calls in device code, which translates to ``__trap()`` and terminates the kernel.
+
+**Host code**
+
+``libcu++`` allows users to manually disable exceptions in host code in two ways:
+
+- By defining ``CCCL_DISABLE_EXCEPTIONS`` before including any library headers.
+- By compiling with ``-fno-exceptions`` compiler flag with ``gcc`` or ``clang``, or ``/EH-`` compiler flag with ``msvc``.
+
+If exceptions are disabled, a ``throw`` exception is translated into a `cuda::std::terminate() <https://en.cppreference.com/w/cpp/error/terminate.html>`__ call, which terminates the program.
+
+``cuda::cuda_error``
+--------------------
+
+Exception class thrown when a CUDA error is encountered. It inherits from ``std::runtime_error``.
+
+.. code-block:: cpp
+
+    class cuda_error : public std::runtime_error
+    {
+    public:
+        cuda_error(cudaError_t status, const char* msg);
+
+        cudaError_t status() const noexcept;
+    };
@@ -19,7 +19,7 @@ Defined in the ``<cuda/tma>`` header.
       tma_interleave_layout      interleave_layout = tma_interleave_layout::none,
       tma_swizzle                swizzle           = tma_swizzle::none,
       tma_l2_fetch_size          l2_fetch_size     = tma_l2_fetch_size::none,
-      tma_oob_fill               oobfill           = tma_oob_fill::none) noexcept;
+      tma_oob_fill               oobfill           = tma_oob_fill::none);
 
     [[nodiscard]] inline
     CUtensorMap make_tma_descriptor(
@@ -28,7 +28,7 @@ Defined in the ``<cuda/tma>`` header.
         tma_interleave_layout      interleave_layout = tma_interleave_layout::none,
         tma_swizzle                swizzle           = tma_swizzle::none,
         tma_l2_fetch_size          l2_fetch_size     = tma_l2_fetch_size::none,
-        tma_oob_fill               oobfill           = tma_oob_fill::none) noexcept;
+        tma_oob_fill               oobfill           = tma_oob_fill::none);
 
     } // namespace cuda
 
@@ -87,6 +87,8 @@ Return value
 Preconditions
 -------------
 
+See :ref:`libcudacxx-extended-api-exceptions` for more details on exception handling.
+
 **General preconditions**:
 
 * Compute Capability 9.0 or newer is required.