The performance is directly related to the number of rays you shoot in your whole scene per optixLaunch.

Means if all hits will enter the recursion you will get 2^N rays with N recursions as worst case example for transparent materials following the reflection and transmission direction! (It’s not that bad with all opaque materials, but then an iterative algorithm would be much easier.) That transparency case is obviously exponential growth which means the number of rays will get massive pretty fast, and similarly the time it takes per optixLaunch.

I’d recommend counting your rays per launch (that is, all optixTrace invocations) as a debug exercise and see how your application performance relates to that.

But you would get a similar performance impact when shooting the same number of rays iteratively, like storing the next rays from a hit onto some per ray stack and executing the optixTrace per launch index inside the ray generation program.

The major difference between recursive and iterative algorithms would be the amount of stack space required.

Since OptiX needs to store live state around an optixTrace call in addition to the local space each hit shader invocation needs, this can become too large to fit into the possible stack size. Memory accesses are generally bad for GPU performance.

The standard approach to keep things more interactive is to do less work more often.

Progressive path tracers solve that by being iterative and following only one path stochastically (Monte Carlo algorithm) instead of all possible recursive paths at once. Then they accumulate (integrate) the partial results over multiple launches to converge to the final result.

Note that there is a maximum limit of recursions you can have in OptiX (currently 31) and there is also an (undocumented) internal maximum stack space where optixPipelineSetStackSize won’t succeed anymore.

https://raytracing-docs.nvidia.com/optix7/guide/index.html#limits#limits