Using KCT_dentk 1: dentk-propagate

23.03.2023

This post will be about dentk-propagate a new part of dentk toolkit, I use it as a script when developing the tool to have some important notes here.

It is implementation of the Fresnel and Rayleigh-Sommerfeld propagators intended to be used in scope of the algorithms such as Gerchberg-Saxton.

Theory

We assume there is an initial plane wave given by the complex function $U(x,y,z)$, which is called phasor in Introduction to Fourier Optics and we know the function $U(x,y,0)$. We ca then obtain $$U(x,y,z) = U(x,y,0) * p^z(x,y)$$ values of $U$ at particular $z$.

We assume that we deal with the plane waves of the form $$U = E e^{ikz},$$ where $E$ is complex envelope of the wave.

Fresnel propagator

So now we are in the $\mathbb{R}^3$ space. Fresnel propagator takes the form $$p_F(x,y) = \frac{e^{ikz}}{i \lambda z} e^{i k \frac{x^2+y^2}{2z}}$$ it can be shown that after Fourier transformation $P_F(\xi_x,\xi_y = \mathcal{F}(p_F))$ we get $$P_{F}(\xi_x, \xi_y) = \frac{e^{ikz}}{i \lambda z} \frac{i 2 z\pi}{ k} e^{-i\frac{2 z\pi^2 (\xi_x^2 + \xi_y^2)}{k}} = e^{ikz} e^{-i \lambda z \pi (\xi_x^2 + \xi_y^2)}.$$

This is very important formula as we can also use it to transform envelope as $e^{ikz}$ is a prefactor.

Rayleigh-Sommerfeld propagator

In case or this propagator, which is constructed using less assumptions than Fresnel we have that, $$p_H(x,y) = \frac{1}{i \lambda} \frac{z e^{i k \sqrt{x^2+y^2+z^2},}}{x^2+y^2+z^2}.$$

Interestingly, when neglecting evanescent waves, we obtain also the formula for Fourier transformed function $$P_H(\xi_x, \xi_y) = \begin{cases} e^{i k z \sqrt{1 - (\lambda \xi_x)^2 - (\lambda \xi_y)^2}}, & (\lambda \xi_x)^2 + (\lambda \xi_y)^2 < 1 \ 0,& (\lambda \xi_x)^2 + (\lambda \xi_y)^2 \geq 1.\end{cases}$$

Now we see that it is not that easy to separate prefactor in $U$ expression and thus some computations might be tricky. You can also see, neglecting second case, under first order Taylor expansion $\sqrt{1-x} = 1 - x/2$, the Rayleigh-Sommerfeld propagator becomes Fresnel propagator in Fourier space.

Discretization

Now the problem is that we usually measure intensity $I$ or the extinction $EXT$ but not $U(x,y,0)$. It is convenient to write for plane wave $$U = E e^{ikz}$$ and further decompose complex envelope $E = A e^{i\Phi} = \sqrt{I} e^{i\Phi},$ where $I$ is positive real intensity, $\Phi$ is real phase shift and $A = \sqrt{I}$ is positive real amplitude.

Inputs

The inputs of dentk-propagate are functions $I$ and $\Phi$ defining complex envelope $E$

input_intensity

input_phase

output_intensity

output_phase

If you were interested, I use euler formula to construct complex envelope by the following code

        float intensity, phase, amplitude;
        IDX = dimx * PY + PX;
        IDX_padded = dimx_padded * PY + PX;
        intensity = GPU_intensity[IDX]; 
        phase = GPU_phase[IDX];
        amplitude = sqrt(intensity);    
        v.x = amplitude * cosf(phase);  
        v.y = amplitude * sinf(phase);  
        GPU_envelope[IDX_padded] = v;

But let's leave implementation for now and talk a bit about discretization.

Convolution theorem

It is of course tempting to use Fourier convolution theorem in order to compute the outlined convolution $$U(x,y,z) = U(x,y,0) * p^z(x,y)$$ or rather $$E(x,y,z) = E(x,y,0) * p^z(x,y) e^{-ikz} $$ by just $$\mathcal{F} E(\xi_x,\xi_y,z) = \mathcal{F} E(x,y,0) . P^z(\xi_x, \xi_y) e^{-ikz} $$

Do you see how nicely we can use the fact $e^{ikz} e^{-ikz} = 0$ to get rid of the space vibrations across $z$ in the wave $U$ by working with $E$. Especially for Fourier, it comes handy and it is one of the main reason why Fresnel propagator is implemented more often than Rayleigh-Sommerfeld.

Now however, we have a serious issue with the discretization. In fact, we don't have continuous function but discretized values of $E$ typically on some kind of detector. Again it is tempting to use discrete Fourier transform to do convolution by multiplication but we have to make it right. And we can see in a few moments there are shades of rightness in this involved.

I would like to mention three papers, which deal with this kind of discretization/sampling problems. The paper Katkovnik2008 introduces so called discrete diffraction transform assuming that the underlying field is stepwise constant on the area of the pixels and the papers Onural2007 and Onural2004 deal with the question what if the values on the pixels are just samples of underlying function.

At the beggining, I start implementing problem more according the Katkovnik2008 since it is simpler. In some sense authors do something simmilar, what footprint projectors or my cutting voxel projector do in scope of CT projection. Do not consider just the value of the kernel at the center of the pixel but integrate it over the whole pixel area. This is additional work but at least for Fresnel diffraction, it shall be doable to implement such approach and to be honest, I don't know yet if it will bring any improvement. What is important however, that even when we approximate the integral as a central value times the area of the pixel, we have nice theory how discretization shall work and a nice foundation how to transfer continuous to the discrete case. So this result will perform here as a check, that my implementation is reasonably correct as in the first case, I would like to use precomputed Fourier kernels of propagators.

Sampled frequency space

It is natural for me to understand how we can go from continuous to discrete space with the values of envelope. We just know the centers of the pixels are --pixel-sizex and --pixel-sizey apart. But what I am struggling with is a sampling in frequency space since $M \times N$ values will be transferred to $M \times N$ frequencies. But I have already precomputed continuous kernels $P_F$ and $P_H$, which I would like to use. Katkovnik2008 does the integration in a pixel space and not frequency space now the tricky part is the correspondence between continuous frequencies and discretized frequencies and possible scaling of the kernels.

For a moment, forget about the sampling errors and just relate definitions of the continuous and discrete Fourier transforms and their inversions and just assume we have function with the support on $[0,\delta N)$ that is constant on $[n\delta, (n+1)\delta)$ and zero elsewhere. Fourier transform of such function will look as follows $$F(\xi) = \int_{-\infty}^{\infty} f(x) e^{-i 2 \pi x \xi} \mathrm{d}x \approx \delta \sum_{n=0}^{N-1} f(n \delta) e^{-i2 \pi n \delta \xi}.$$

This is in a nutshell what we do when we approximate continuous Fourier transform by the discrete, note that the $\approx$ is in fact equality for pixelwise constant function, but we don't believe underlying function is really pixelwise constant. But wait, this formula looks simmilar as a formula for discrete Fourier transform

$$F_k = \sum_{n=0}^{N-1} f_n e^{-i2 \pi n k/N},$$

when we set $f_n = f(\delta n)$ we also have $F_k = \frac{1}{\delta} F(\frac{k}{N \delta})$, therefore $\delta$ sampling step induces $1/N\delta$ step in the frequency domain.

Wow! This shall be recepy how to manipulate with kernel for the convolution! Now if you have a look to the actual implementation e.g. in CUDA kernel cuda/diffractionPhysics.cu spectralMultiplicationFresnel you can see that there is no division by pixel area. Why? It is because of the convolution. If we do that integral, we have to multiply by the pixel area each discretized cell to obtain correct result. This is the reason why in eq (14) in Katkovnik2008 there is multiplication by pixel area. So instead of dividing and then multiplying, we just omit the operation.

Now this is a standard implementation of the discretized diffraction. On my to do list is first to implement computation of the integral (24) in Katkovnik2008 and implement more precise Fresnel propagator. This computation is relatively tricky as it involves evaluating complex error function. There is C++ implementation of Faddeeva package which is to be considered for this task. For now it seems however, that using analytically precomputed kernels is better than simply computing kernel in real space and converting it to Fourier space using FFT. The implementation in Katkovnik2008 require however doing this, because averaging is not in frequency space. Another point on my todolist is to consider sampling effects and possible corrections in scope of works Onural2007 and Onural2004.

Implementation

We assume a plane wave that is known at the distance $z=0$ and we would like it to propagate it to the distance $z>0$ or $z<0$, to do so it is neccessary to have the following parameters

wave-energy

This parameter is in keV. Although the formulas needs usually just the wavelength $\lambda$, it is better to use some natural unit of the typical wave energy. Visible light is something as $0.002$ keV. Typical higher energy wave can have $20$ keV, so we set it as default.

Existing outputs

Many programs in dentk behaves differently when outputs exist and force is specified and frames is specified as well not coverring whole volume from the case output does not exist.

If output exists and they are of the same dimensions and element types as inputs, when there is frames argument specified, output z indices specified by that argument will be overvritten.

If output does not exist and frames vector is specified, output dimz is equal to the size of the frames vector and outputs are written in order in which they are specified.

Zero padding

I would like to perform convolution using convolution theorem. Now the problem is I need to pad it not to have cyclic convolution but linear convolution. Especially for the object with the nonzero phases on the boundary it might create phantom diffraction patterns, so symmetric padding is on the todo list.

Code efficiency

There can be bottleneck that each thread allocates and frees cuda memory. It would be probably more efficient if these block were preallocated and each thread uses its own block. There is also a question of the multiGPU implementation.

Tomographic notes 1 Geometric conventions

15.07.2022

When having reconstruction problem, from the perspective of mathematican, you do not start thinking about geometric conventions. Why should we think about them when algorithms like FBP don't care about the initial position of the detector or whether the x-axis is aligned to the center of the Earth, orthogonal to it or otherwise? I usually start thinking about this when debugging the code and would like to figure out what the initial possition of the detector was, how the word coordinates in the volume are specified and how it comes that other tools produce the reconstruction that is 90 degrees rotated to my tool? Another situation where you need to think about the geometry is when you give your data to the physician or radiologist. They are used to interpret the data in the frame of reference, where the z axis goes from the foots of the patient to his head, where the y axis has the same direction as the Earth gravity and xyz is right handed coordinate system.

Radon transform

If you had basic tomography class in school, you probably first learned about 2D Radon transform and its inverse, filtered backprojection. Let's start our considerations about the geometric conventions here. We have $\mathbb{R}^2$ Carthesian system denoted by coordinates $(x,y)$ and some compactly supported function $\mu(x,y)$. We would like to transform $\mu(x,y)$ and create the function, which for each line through $\mathbb{R}^2$ gives us line integral of $\mu(x,y)$. Now the question is how to parametrize any line through $\mathbb{R}^2$ and here comes the first convention

Radon transform geometry. Red ray denote the path of integration while $\theta$ and $s$ identify the position, where the ray hits the blue detector.

When we subscribe to parametrizing lines through $\mathbb{R}^2$ by $(\theta, s)$ it is also natural to say $\theta \in [0, \pi)$ $s \in (-\inf, \inf)$. Then the normal to these lines will be $\vec{n} = (cos \theta, sin \theta)$ and the ray direction $\vec{t} = (sin \theta, -cos \theta)$. Now let's say $\vec{n}$ will be positive direction of the x axis of the detector, $\vec{t}$ will be direction of incomming rays and $\theta=0$ will be initial configuration when the detector is aligned with the x axis of the word coordinates, we have a convention!

Now we can start interpreting Radon transform $$ p_\theta(s) = \int_{-\infty}^{\infty} \mu( s \vec{n} + q \vec{t}) \, dq = \int_{-\infty}^{\infty} \mu( s \cos \theta + q \sin \theta, s \sin \theta - q \cos \theta) \, dq $$ as a value of the extinction related to the position $p$ on the detector tilted by $\theta$ from the $x$ axis. Note that the vector $\vec{n}$ is tangent to the detector and $\vec{t}$ is normal to the detector as these vectors were introduced to describe lines through $\mathbb{R}^2$, which are now interpretted as a rays orthogonal to the detector.

In this package we sometimes use the angle $\omega$ related to the $\theta$ so that $\omega = \theta - \pi/2$.

Physican convention

The orientation of the axes of the 3D patient volume tend to be

Usual CT/MRI word frame of reference for the volume, z axis goes from the foots of the patient to his head, the y axis has the same direction as the Earth gravity and xyz is right handed coordinate system.

The choice of (x,y) coordinates is in fact natural for visualizing on the monitor, where the convention is that coordinates start at top left corner and y axis goes from top to down. Usual convence to index monitor pixels or pixels in the image. Zero is at top left corner, x goes to the right y goes to the bottom. Image source.

Storing the projections

For 2D scanner we simply take the (x,y) and relate them to PX axis of the detector. Word coordinate z is related to PY but how? This is actually the tricky part.

In medical data, when we store it in a same way as in CT volume, then bottom of the patient goes first to the top of the image and his top goes down. So the visualization would be updside down. Therefore in medical praxis, the projections are stored from the top of the patient to the bottom, which is oposite to the direction of the z word coordinates axis.

For samples in material sciences and in industrial CT we actually do not have that strong feeling what is upside down and upside up. Sometimes we just prefer to relate word coordinates z direction directly to PY coordinate on the detector, namely when we do slice-wise reconstruction of parallel beam scan. In this situation we might prefer geometry where these directions are not opposite.

Working with KCT CBCT 5 Parallel beam geometry

11.03.2022

In the previous chapter we learned about cone beam geometry and projective matrices. For synchrotron applications might be convenient to work also with parallel ray geometry. Here we review the concept of the projection matrices to define this type of geometry.

Defining parallel rays geometry

Parallel rays geometry is simply projecting 3D points onto 2D plane. Here we review how the parallel rays geometry is encoded in other tools and if we can use the idea of projection matrices to describe it.

In ASTRA toolbox parallel ray geometry in 3D is described by 12 numbers representing four 3D vectors.

Vector	Description
r	ray direction
d	the center of the detector
$v_x$	the vector from detector pixel (0,0) to (0,1)
$v_y$	the vector from detector pixel (0,0) to (1,0)

Additional two integers need to be supplied representing number of pixels in each detector direction.

Parameter	Description
NX	X dimension of detector in pixel count.
NY	Y dimension of detector in pixel count.

It is straightforward to see that such description is sufficient to fully describe the setup. Now we need to use this description to project point $x = (x_1, x_2, x_3)$ onto the detector by means of transform $P(x) = (PX, PY)$. For a parameter $t$ we know that the projection of all points $x + t \cdot r$ shall be the same for all $t \in \mathbb{R}$. Let's try to model parallel beam projection in 3D to the 2D detector by means of afine transform so that the coordinates on the detector $PX(x)$ and $PY(x)$ will be

$$PX(x) = PX0 + a_1 x_1 + a_2 x_2 + a_3 x_3,$$ $$PY(x) = PY0 + b_1 x_1 + b_2 x_2 + b_3 x_3,$$

where $PX(0) = PX0$ and $PY(0) = PY0$. This description requires to know 8 numbers represented by means of $4 \times 2$ projection matrix, which for given $x$ projects it to the position on the detector position $P(x) = (PX(x), PY(x))$.

Now having Astra representation of the geometry, what the values defining afine transform shall be? Since $P(x + t \cdot r) = P (x)$ the vectors $a$ and $b$ must be orthogonal to $r$. We also know that $PX(v_x) - PX(0) = 1$ and $PY(v_y) - PY(0) = 1$ so from other algebraic consideration we conclude that $a$ and $b$ will be multiples of orthogonalized vectors

$$v_x^0 = v_x - (v_x, r)/(r,r) r,$$ $$v_y^0 = v_y - (v_y, r)/(r,r) r.$$

In particular

$$a = v_x^0/(v_x^0,v_x^0),$$ $$b = v_y^0/(v_y^0,v_y^0).$$

When we project center of the detector $d$ onto the detector obviously we shall obtain center of it

$$PX(d) = PX0 + (d, a) = 0.5 NX,$$ $$PY(d) = PY0 + (d, b) = 0.5 NY$$

therefore

$$PX0 = 0.5 NX - (d,a),$$ $$PY0 = 0.5 NY - (d,b).$$

Therefore the formula to get PX and PY for arbitrary point x using Astra geometry will be

$$PX(x) = (x, a)-(d, a)+0.5 NX,$$ $$PY(x) = (x, b)-(d, b)+0.5 NY.$$

When instead of center of the detector we use the origin of the detector, the projection matrix will be independent of projector dimensions, therefore the classes Geometry3DParallel and Geometry3DParallelCameraMatrix have the constructor analogous to ASTRA, where instead of the center of the detector there is detector origin. Detector origin is a point on the detector with coordinates PX=PY=0 and by convence it is in the center of corner pixel. We use the following initialization.

Vector	Description
r	ray direction
o	the origin of the detector, point (0,0) by convention at the center of (0,0) pixel
u	the vector from detector pixel (0,0) to (0,1)
v	the vector from detector pixel (0,0) to (1,0)

It holds that

$$PX(o) = PX0 + (o, a) = 0,$$ $$PY(o) = PY0 + (o, b) = 0.$$

Therefore we have $PX0 = -(o,a)$, $PY0 = -(o,b)$ and $$PX(x) = (x, a)-(o, a),$$ $$PY(x) = (x, b)-(o, b).$$

Another observation is that we are not able to recover vectors $v_x$ and $v_y$ from the projection matrix or the transformation $P$. The reason is that the matrix is based on vectors $v_x^0$ and $v_y^0$, which are orthogonal to the incomming rays. Therefore if we need the information about the tilt of the detector, we need to provide it separatelly. For all applications shall be sufficent to know the cosine of the angle between the detector and surface orthogonal to incoming rays, which is usually 1.

Projection matrices

For projection of the point x we use homogeneous coordinates, where we represent $x = (x_1, x_2, x_3, 1)$ and the 3D parallel projection matrix $$ \mathbf{P} = \begin{pmatrix} a_1&a_2&a_3&px_0 \\ b_1&b_2&b_3&py_0 \end{pmatrix}, $$ which represents the projection onto the detector orthogonal to the incomming rays with particular origin. It is obvious that for each tilted detector there must be one virtual orthogonal detector to the incomming rays. Let's recover what are the vectors $v_x=v_x^0$, $v_y=v_y^0$, the ray direction $r$ and origin $o$ of the virtual detector corresponding to the matrix $\mathbf{P}$.

Vector $r$ or homogeneous $(r,0)$ needs to be orthogonal to the rows of matrix $\mathbf{P}$ since $\mathbf{P} (x,1)^\top$ = $\mathbf{P} (x,1)^\top + t (r,0)^\top$ for all $t \in \mathbb{R}$. Such vector is $$r = \frac{a \times b}{|a \times b|}.$$

The vectors $v_x$ and $v_y$ have the property $(v_x, a) = 1$, $(v_x,b) = 0$, $(v_x,r)=0$, $(v_y,b) = 1$, $(v_y, a) = 0$, $(v_y, r)=0$. Therefore $v_x$ and $v_y$ are scaled vectors $a$ and $b$, namely $$v_x = \frac{a}{(a,a)},$$
$$v_y = \frac{b}{(b,b)}.$$

Now remains to identify vector $o$ corresponding to the origin. In fact there is a line, which projects to (0,0) so that best will be to obtain minimum norm solution, which will be a linear combination of vectors a and b, whose linear span creates a subspace of dimension two, that in turn means that there exist unique decomposition of $o = \alpha a + \beta b$ for which it holds that $\mathbf{P} (o,1)^\top = (0,0)^\top$. For orthogonal $a$ and $b$ is the situation simplest but let's assume these vectors are not orthogonal we end up with the system $$ \begin{pmatrix} (a,a) & (a,b) \\ (b,a) & (b,b) \end{pmatrix} \begin{pmatrix} \alpha \\ \beta \end{pmatrix} = -\begin{pmatrix} px_0 \\ py_0 \end{pmatrix} $$ and the solution $$ \begin{pmatrix} \alpha \\ \beta \end{pmatrix} = -\begin{pmatrix} (a,a) & (a,b) \\ (b,a) & (b,b) \end{pmatrix}^{-1} \begin{pmatrix} px_0 \\ py_0 \end{pmatrix} .$$

Finally we conclude $o = \alpha a + \beta b$, which completes the correspondence between projective operator given as matrix and explicit declaration of the geometry in ASTRA style. Note that previous equation can be also used for establishing inverse projective operator and its minimum norm solution. Instead of (0, 0) we can use any other point on the projection plane.

CVP paper 2021: Comparing projectors accuracy

05.10.2021

We have made the following statement in the publication that will be sent for peer review to the IEEE Transactions on Medical Imaging:

Transparency and reproducibility of the results

We believe in the concept of open science. Therefore, not only the software developed by the first author is published under an open source license, we are also publishing all the procedures, scripts, input data and log files, including implementation details, that were used to produce the graphs and tables in the results section. This way, anyone can reproduce our steps on their hardware and compare the results, or use our logs and scripts to compare the results with the underlying data. The files and protocols are published on the website https://kulvait.github.io/KCT_doc/categories/cvp-paper-2020.html.

This is also important for the users of the software, because in the future we can disclose how the run times of the projectors change with new versions of the software or with new hardware.

Comparison of the accuracy

Data files

Data files contain projections of the voxels and the differences of the projections with Siddon512 algorithm. Therefore, they are quite large and we split them to the three packages corresponding to individual parts of Figure4. It is projectorAccuracyComparisonCVPPaper2021_Fig4_top_1x1x5_center.tar.gz, projectorAccuracyComparisonCVPPaper2021_Fig4_middle_1x1x1_shifted20-20-20.tar.gz and projectorAccuracyComparisonCVPPaper2021_Fig4_bottom_Long.tar.gz.

These files were split using unix split utility to individual parts, which can be downloaded separatelly from the this data directory.

Then to extract them the best is to run the following commands

cat projectorAccuracyComparisonCVPPaper2021_Fig4_top_1x1x5_center.tar.gz* | tar -xz
cat projectorAccuracyComparisonCVPPaper2021_Fig4_middle_1x1x1_shifted20-20-20.tar.gz* | tar -xz
cat pprojectorAccuracyComparisonCVPPaper2021_Fig4_bottom_Long.tar.gz* | tar -xz

Generating the graphs

To compare accuracy of the projectors and to generate the figure 4 in the paper, we first take a single voxel with attenuation 1. To do so we use dentk-empty from dentk toolbox.

dentk-empty 1 1 1 singleUnitVoxel.den --value 1.0

This can be also done using the script

./01createUnitVoxel

Next part, which produced the projections and differences is calling the script

./02createProjections

It is a bash script to run the reconstructions and create the projections and differences. Beware that to generate Siddon512 might take relatively long even for single voxel.

Finally the graphs can be generated using the command

./03compareProjections.py projectorComparison.csv

These commands are present and shall be run in the folders of individual figures.

CVP paper 2021: Benchmarks to compare projectors speed

05.10.2021

We have made the following statement in the publication that will be sent for peer review to the IEEE Transactions on Medical Imaging:

Transparency and reproducibility of the results

We believe in the concept of open science. Therefore, not only the software developed by the first author is published under an open source license, we are also publishing all the procedures, scripts, input data and log files, including implementation details, that were used to produce the graphs and tables in the results section. This way, anyone can reproduce our steps on their hardware and compare the results, or use our logs and scripts to compare the results with the underlying data. The files and protocols are published on the website https://kulvait.github.io/KCT_doc/categories/cvp-paper-2020.html.

This is also important for the users of the software, because in the future we can disclose how the run times of the projectors change with new versions of the software or with new hardware.

Comparion of the speed

Here we describe precisely how we perform comparsion of the projector and backprojector speeds.

Link to the log files of the speed comparison can be found here.

From the tests we performed were prepared the following deterministic benchmarks. The tests can be run with the reproducible setup using precisely defined input data, camera matrices representing geometries and projection data. Aim of both benchmarks is to run an itterative reconstruction process and precisely measure the speeds of projectors and bacprojector during it. The data reconstructed is deterministically generated noise from uniform distribution [0,1]. These data are disclosed for each benchmark.

For given setup 40 itterations of an CT reconstruction technique shall be used, which performs exactly 40 projections and 40 backprojections. After the initialization, that might be after the data are loaded into the GPU memmory, before the first backprojection start, starts the time measurement. Between the backprojection and consecutive projection is reported the difference time as the time of backprojection. It is not possible to exclude time between backprojection and consecutive projection from the measurement. Exact placement of the measured point is on the implementation. Time can be stopped after the last projection is executed, before writing the data to the disk. Then the average time of projection and backprojection is computed.

In our tests we used CGLS, because it does not perform intensive operations but projections and backprojections.

The tests use the createCameraMatricesForCircularScanTrajectory.py script for creating geometry setup of circular scan. Details of the implementation and derivation can be found in this blog post.

Benchmark YL

Based on the setup from the paper http://doi.org/10.1109/TMI.2010.2050898

Data to reconstruct were prepared using the command dentk-empty from dentk toolbox. It simulates 720 views of the $512 \times 512$ detector matrix.

dentk-empty --noise 512 512 720 noise512x512x720.den

Even through the current implementation of dentk-empty shall produce deterministic noise, we include the file noise512x512x720.den into the data disclosed.

The camera matrices, which define given geometry was created using the command createCameraMatricesForCircularScanTrajectory.py from scripts repository.

createCameraMatricesForCircularScanTrajectory.py --write-params-file --projection-sizex 512 --projection-sizey 512 --pixel-sizex 1.0 --pixel-sizey 1.0 --source-to-detector 949 --source-to-isocenter 541 --number-of-angles 720 YLBenchmarkInputData/CMLong720_512x512.den

These input files define the input setup of the test.

Benchmark TP

Based on the setup from T. Pfeiffer, R. Frysch, and G. Rose, “Two extensions of the separablefootprint forward projector,” 16th International Meeting on FullyThree-Dimensional Image Reconstruction in Radiology and NuclearMedicine, 2021.

Data to reconstruct were prepared using the command dentk-empty from dentk toolbox. It simulates 100 views of the $1280 \times 960$ detector matrix.

dentk-empty --noise 1280 960 100 noise1280x960x100.den

Even through the current implementation of dentk-empty shall produce deterministic noise, we include the file noise1280x960x100.den into the data disclosed.

The camera matrices, which define given geometry was created using the command createCameraMatricesForCircularScanTrajectory.py from scripts repository.

createCameraMatricesForCircularScanTrajectory.py --write-params-file --projection-sizex 1280 --projection-sizey 960 --pixel-sizex 0.25 --pixel-sizey 0.25 --omega-zero 0 --omega-angular-range 198 --source-to-detector 1000 --source-to-isocenter 750 --number-of-angles 100 shortScanTP100_1280x960.den

These input files define the input setup of the test.

Results of the benchmarks for CVP paper

Log files and input files are disclosed here. The results logs can be parsed and the average projection and backprojection times obtained by running the script

averageTimes log_file

from the https://github.com/kulvait/KCT_scripts/tree/master/reconstruction

The structure of the data is as follows:

In the speedComparison folders there are the log files from which the resulting speeds of corresponding projectors/backprojectors were generated, commit of KCT and the full command is also disclosed in each log file. Particular input projection matrices and projection data can be found at TPBenchmarkInputData and YLBenchmarkInputData.

YL=Table I from the paper, TP= Table II from the paper

Logs are from several CVP projector/backprojector settings. Only few are refered in the tables of the paper. There is the mapping of projectors/backprojectors to the log files:

CVP=nobar_norel_elv.log

CVP relaxed=bar_rel_elv.log

TT=tt.log

Siddon8=siddon8.log

CVP paper 2021: Adjoint product test

04.10.2021

We have made the following statement in the publication that will be sent for peer review to the IEEE Transactions on Medical Imaging:

Transparency and reproducibility of the results

We believe in the concept of open science. Therefore, not only the software developed by the first author is published under an open source license, we are also publishing all the procedures, scripts, input data and log files, including implementation details, that were used to produce the graphs and tables in the results section. This way, anyone can reproduce our steps on their hardware and compare the results, or use our logs and scripts to compare the results with the underlying data. The files and protocols are published on the website https://kulvait.github.io/KCT_doc/categories/cvp-paper-2020.html.

This is also important for the users of the software, because in the future we can disclose how the run times of the projectors change with new versions of the software or with new hardware.

Here I show how to perform adjoint product test using KCT CBCT and what are the results on individual platforms.

The problem we are about to solve is as follows. We have a linear operator but we don't have explicit matrix representation of it. Adjoint operator is in real matrices just its transpose. So we have our projector $A$ , backprojector $A^\top$ but how we say that they one is the transpose of another when we could not look to the individual elements? Adjoint product test address this issue, because we know for each vectors $\vec{x}$ $vec{b}$ the following identity shall hold $$\vec{b} \cdot (A \vec{x}) = \vec{x} \cdot (A^\top \vec{b})$$

When this will hold for a random vectors $\vec{x}$ and $vec{b}$, we can be pretty confident that we have adjoint operators.

In the KCT this adjoint product test is performed in the source file [adjoint.test.cpp](https://github.com/kulvait/KCT_cbct/blob/master/tests/adjoint.test.cpp). The version for what we call in the paper standard CVP is implemented in the TEST_CASE("CVP.AdjointDotProduct.nobarrier_norelaxed_elevationcorrection", "[adjointop][cuttingvox][NOVIZ]") routine and the version for what we call in the paper relaxed CVP is implemented in the TEST_CASE("CVP.AdjointDotProduct.barrier_relaxed_elevationcorrection", "[adjointop][cuttingvox][NOVIZ]"). These routines along with others for other parameters of the cutting voxel projector, but also for TT, reported as TA3 projector TEST, and Siddon projector can be run when executing the binary ./test_all -s to show also the passing tests. We study how far the ratio of the $$ \frac{\vec{b} \cdot (A \vec{x})}{\vec{x} \cdot (A^\top \vec{b})} $$ is far from optimal value 1.0. The results follow executed with the git commit 09daa3.

AMD Radeon VII

pci id for fd 5: 1002:66af, driver (null)
pci id for fd 5: 1002:66af, driver (null)
2021-10-04 18:58:35.677 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 18:58:35.677 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 18:58:35.677 DEBUG [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 2 OpenCL platforms on the PC.
2021-10-04 18:58:35.677 INFO  [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 1: AMD Accelerated Parallel Processing.
2021-10-04 18:58:35.677 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 1.
2021-10-04 18:58:35.677 DEBUG [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 1 OpenCL devices for the platform AMD Accelerated Parallel Processing.
2021-10-04 18:58:35.677 INFO  [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform AMD Accelerated Parallel Processing: gfx906
2021-10-04 18:58:35.677 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 18:58:35.677 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 18:58:35.677 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/projector.cl
2021-10-04 18:58:35.677 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/backprojector.cl
2021-10-04 18:58:35.677 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/rescaleProjections.cl
2021-10-04 18:58:35.678 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:102] Building file /tmp/allsources_369982541.cl with options : -Werror
2021-10-04 18:58:36.304 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 18:58:57.106 INFO  [26327] [____C_A_T_C_H____T_E_S_T____0@:104] Ratio is 1.000000

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_all is a Catch v2.11.1 host application.
Run with -? for options

-------------------------------------------------------------------------------
CVP.AdjointDotProduct.nobarrier
-------------------------------------------------------------------------------
/b/KCT/cbct/tests/adjoint.test.cpp:44
...............................................................................

/b/KCT/cbct/tests/adjoint.test.cpp:105: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.0000000071 < 0.00001

2021-10-04 18:58:57.108 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 18:58:57.108 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 18:58:57.108 DEBUG [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 2 OpenCL platforms on the PC.
2021-10-04 18:58:57.108 INFO  [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 1: AMD Accelerated Parallel Processing.
2021-10-04 18:58:57.108 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 1.
2021-10-04 18:58:57.108 DEBUG [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 1 OpenCL devices for the platform AMD Accelerated Parallel Processing.
2021-10-04 18:58:57.108 INFO  [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform AMD Accelerated Parallel Processing: gfx906
2021-10-04 18:58:57.108 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 18:58:57.108 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 18:58:57.108 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/projector_cvp_barrier.cl
2021-10-04 18:58:57.108 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/backprojector.cl
2021-10-04 18:58:57.108 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/rescaleProjections.cl
2021-10-04 18:58:57.108 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:102] Building file /tmp/allsources_127968647.cl with options : -DLOCALARRAYSIZE=7680 -DRELAXED -cl-fast-relaxed-math -Werror
2021-10-04 18:58:57.589 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 18:59:14.029 INFO  [26327] [____C_A_T_C_H____T_E_S_T____2@:170] Ratio is 1.000000
-------------------------------------------------------------------------------
CVP.AdjointDotProduct.barrier_relaxed
-------------------------------------------------------------------------------
/b/KCT/cbct/tests/adjoint.test.cpp:110
...............................................................................

/b/KCT/cbct/tests/adjoint.test.cpp:171: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.000000041 < 0.00001

2021-10-04 18:59:14.030 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 18:59:14.030 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 18:59:14.030 DEBUG [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 2 OpenCL platforms on the PC.
2021-10-04 18:59:14.030 INFO  [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 1: AMD Accelerated Parallel Processing.
2021-10-04 18:59:14.030 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 1.
2021-10-04 18:59:14.030 DEBUG [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 1 OpenCL devices for the platform AMD Accelerated Parallel Processing.
2021-10-04 18:59:14.030 INFO  [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform AMD Accelerated Parallel Processing: gfx906
2021-10-04 18:59:14.030 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 18:59:14.030 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 18:59:14.030 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/projector_cvp_barrier.cl
2021-10-04 18:59:14.030 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/backprojector.cl
2021-10-04 18:59:14.030 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/rescaleProjections.cl
2021-10-04 18:59:14.031 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:102] Building file /tmp/allsources_336649016.cl with options : -DELEVATIONCORRECTION -DLOCALARRAYSIZE=7680 -DRELAXED -cl-fast-relaxed-math -Werror
2021-10-04 18:59:14.516 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 18:59:31.450 INFO  [26327] [____C_A_T_C_H____T_E_S_T____4@:236] Ratio is 1.000000
-------------------------------------------------------------------------------
CVP.AdjointDotProduct.barrier_relaxed_elevationcorrection
-------------------------------------------------------------------------------
/b/KCT/cbct/tests/adjoint.test.cpp:176
...............................................................................

/b/KCT/cbct/tests/adjoint.test.cpp:237: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.0000000414 < 0.00001

2021-10-04 18:59:31.451 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 18:59:31.451 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 18:59:31.451 DEBUG [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 2 OpenCL platforms on the PC.
2021-10-04 18:59:31.451 INFO  [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 1: AMD Accelerated Parallel Processing.
2021-10-04 18:59:31.451 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 1.
2021-10-04 18:59:31.451 DEBUG [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 1 OpenCL devices for the platform AMD Accelerated Parallel Processing.
2021-10-04 18:59:31.451 INFO  [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform AMD Accelerated Parallel Processing: gfx906
2021-10-04 18:59:31.451 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 18:59:31.451 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 18:59:31.451 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/projector.cl
2021-10-04 18:59:31.451 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/backprojector.cl
2021-10-04 18:59:31.451 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/rescaleProjections.cl
2021-10-04 18:59:31.452 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:102] Building file /tmp/allsources_529597292.cl with options : -DELEVATIONCORRECTION -Werror
2021-10-04 18:59:32.076 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 18:59:55.643 INFO  [26327] [____C_A_T_C_H____T_E_S_T____6@:302] Ratio is 1.000000
-------------------------------------------------------------------------------
CVP.AdjointDotProduct.nobarrier_norelaxed_elevationcorrection
-------------------------------------------------------------------------------
/b/KCT/cbct/tests/adjoint.test.cpp:242
...............................................................................

/b/KCT/cbct/tests/adjoint.test.cpp:303: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.00000001 < 0.00001

2021-10-04 18:59:55.644 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 18:59:55.644 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 18:59:55.644 DEBUG [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 2 OpenCL platforms on the PC.
2021-10-04 18:59:55.644 INFO  [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 1: AMD Accelerated Parallel Processing.
2021-10-04 18:59:55.644 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 1.
2021-10-04 18:59:55.644 DEBUG [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 1 OpenCL devices for the platform AMD Accelerated Parallel Processing.
2021-10-04 18:59:55.644 INFO  [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform AMD Accelerated Parallel Processing: gfx906
2021-10-04 18:59:55.644 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 18:59:55.644 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 18:59:55.644 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/projector.cl
2021-10-04 18:59:55.644 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/backprojector.cl
2021-10-04 18:59:55.644 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/projector_tt.cl
2021-10-04 18:59:55.644 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/backprojector_tt.cl
2021-10-04 18:59:55.645 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:102] Building file /tmp/allsources_1828164295.cl with options : -Werror
2021-10-04 18:59:56.292 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 19:00:09.626 ERROR [26327] [____C_A_T_C_H____T_E_S_T____10@:479] X0.221993, 0.055180, 0.804684
2021-10-04 19:00:09.626 ERROR [26327] [____C_A_T_C_H____T_E_S_T____10@:480] B0.071454, 0.428649, 0.457496
-------------------------------------------------------------------------------
GLSQRReconstructor AdjointDotProduct TA3 projector TEST
-------------------------------------------------------------------------------
/b/KCT/cbct/tests/adjoint.test.cpp:432
...............................................................................

/b/KCT/cbct/tests/adjoint.test.cpp:488: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.0000000331 < 0.00001

2021-10-04 19:00:19.626 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 19:00:19.626 DEBUG [26327] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 19:00:19.626 DEBUG [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 2 OpenCL platforms on the PC.
2021-10-04 19:00:19.626 INFO  [26327] [KCT::util::OpenCLManager::getPlatform@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 1: AMD Accelerated Parallel Processing.
2021-10-04 19:00:19.626 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 1.
2021-10-04 19:00:19.626 DEBUG [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 1 OpenCL devices for the platform AMD Accelerated Parallel Processing.
2021-10-04 19:00:19.626 INFO  [26327] [KCT::util::OpenCLManager::getDevice@/b/KCT/cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform AMD Accelerated Parallel Processing: gfx906
2021-10-04 19:00:19.626 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 19:00:19.626 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 19:00:19.626 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/projector_sidon.cl
2021-10-04 19:00:19.626 DEBUG [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:78] Including file opencl/backprojector_sidon.cl
2021-10-04 19:00:19.626 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:102] Building file /tmp/allsources_966741580.cl with options : -Werror
2021-10-04 19:00:19.850 INFO  [26327] [KCT::Kniha::initializeOpenCL@/b/KCT/cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 19:00:33.125 ERROR [26327] [____C_A_T_C_H____T_E_S_T____14@:664] X0.221993, 0.055180, 0.804684
2021-10-04 19:00:33.126 ERROR [26327] [____C_A_T_C_H____T_E_S_T____14@:665] B0.071454, 0.428649, 0.457496
-------------------------------------------------------------------------------
GLSQRReconstructor AdjointDotProduct Sidon projector TEST
-------------------------------------------------------------------------------
/b/KCT/cbct/tests/adjoint.test.cpp:617
...............................................................................

/b/KCT/cbct/tests/adjoint.test.cpp:673: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.0000000007 < 0.00001

===============================================================================
All tests passed (6 assertions in 7 test cases)

NVIDIA RTX 2080 Ti

2021-10-04 18:57:43.924 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 18:57:43.924 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 18:57:43.924 DEBUG [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 1 OpenCL platforms on the PC.
2021-10-04 18:57:43.924 INFO  [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 0: NVIDIA CUDA.
2021-10-04 18:57:43.924 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 0.
2021-10-04 18:57:43.924 DEBUG [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 10 OpenCL devices for the platform NVIDIA CUDA.
2021-10-04 18:57:43.924 INFO  [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform NVIDIA CUDA: NVIDIA GeForce RTX 2080 Ti
2021-10-04 18:57:44.099 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 18:57:44.099 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 18:57:44.099 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/projector.cl
2021-10-04 18:57:44.099 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/backprojector.cl
2021-10-04 18:57:44.100 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/rescaleProjections.cl
2021-10-04 18:57:44.100 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:102] Building file /tmp/allsources_1927524877.cl with options : -Werror
2021-10-04 18:57:44.108 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 18:58:11.721 INFO  [2185335] [____C_A_T_C_H____T_E_S_T____0@:104] Ratio is 1.000000

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
test_all is a Catch v2.11.1 host application.
Run with -? for options

-------------------------------------------------------------------------------
CVP.AdjointDotProduct.nobarrier
-------------------------------------------------------------------------------
/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:44
...............................................................................

/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:105: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.0000000071 < 0.00001

2021-10-04 18:58:11.808 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 18:58:11.808 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 18:58:11.808 DEBUG [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 1 OpenCL platforms on the PC.
2021-10-04 18:58:11.808 INFO  [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 0: NVIDIA CUDA.
2021-10-04 18:58:11.808 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 0.
2021-10-04 18:58:11.808 DEBUG [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 10 OpenCL devices for the platform NVIDIA CUDA.
2021-10-04 18:58:11.808 INFO  [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform NVIDIA CUDA: NVIDIA GeForce RTX 2080 Ti
2021-10-04 18:58:11.913 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 18:58:11.913 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 18:58:11.913 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/projector_cvp_barrier.cl
2021-10-04 18:58:11.913 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/backprojector.cl
2021-10-04 18:58:11.913 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/rescaleProjections.cl
2021-10-04 18:58:11.913 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:102] Building file /tmp/allsources_155732126.cl with options : -DLOCALARRAYSIZE=7680 -DRELAXED -cl-fast-relaxed-math -Werror
2021-10-04 18:58:11.917 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 18:58:18.816 INFO  [2185335] [____C_A_T_C_H____T_E_S_T____2@:170] Ratio is 1.000000
-------------------------------------------------------------------------------
CVP.AdjointDotProduct.barrier_relaxed
-------------------------------------------------------------------------------
/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:110
...............................................................................

/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:171: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.0000000434 < 0.00001

2021-10-04 18:58:18.902 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 18:58:18.902 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 18:58:18.902 DEBUG [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 1 OpenCL platforms on the PC.
2021-10-04 18:58:18.902 INFO  [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 0: NVIDIA CUDA.
2021-10-04 18:58:18.902 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 0.
2021-10-04 18:58:18.902 DEBUG [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 10 OpenCL devices for the platform NVIDIA CUDA.
2021-10-04 18:58:18.902 INFO  [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform NVIDIA CUDA: NVIDIA GeForce RTX 2080 Ti
2021-10-04 18:58:19.006 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 18:58:19.006 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 18:58:19.006 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/projector_cvp_barrier.cl
2021-10-04 18:58:19.006 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/backprojector.cl
2021-10-04 18:58:19.006 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/rescaleProjections.cl
2021-10-04 18:58:19.007 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:102] Building file /tmp/allsources_950182500.cl with options : -DELEVATIONCORRECTION -DLOCALARRAYSIZE=7680 -DRELAXED -cl-fast-relaxed-math -Werror
2021-10-04 18:58:19.011 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 18:58:26.426 INFO  [2185335] [____C_A_T_C_H____T_E_S_T____4@:236] Ratio is 1.000000
-------------------------------------------------------------------------------
CVP.AdjointDotProduct.barrier_relaxed_elevationcorrection
-------------------------------------------------------------------------------
/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:176
...............................................................................

/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:237: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.0000000436 < 0.00001

2021-10-04 18:58:26.511 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 18:58:26.511 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 18:58:26.511 DEBUG [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 1 OpenCL platforms on the PC.
2021-10-04 18:58:26.511 INFO  [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 0: NVIDIA CUDA.
2021-10-04 18:58:26.511 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 0.
2021-10-04 18:58:26.511 DEBUG [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 10 OpenCL devices for the platform NVIDIA CUDA.
2021-10-04 18:58:26.511 INFO  [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform NVIDIA CUDA: NVIDIA GeForce RTX 2080 Ti
2021-10-04 18:58:26.621 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 18:58:26.621 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 18:58:26.621 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/projector.cl
2021-10-04 18:58:26.621 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/backprojector.cl
2021-10-04 18:58:26.621 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/rescaleProjections.cl
2021-10-04 18:58:26.621 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:102] Building file /tmp/allsources_2039522998.cl with options : -DELEVATIONCORRECTION -Werror
2021-10-04 18:58:26.625 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 18:59:12.160 INFO  [2185335] [____C_A_T_C_H____T_E_S_T____6@:302] Ratio is 1.000000
-------------------------------------------------------------------------------
CVP.AdjointDotProduct.nobarrier_norelaxed_elevationcorrection
-------------------------------------------------------------------------------
/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:242
...............................................................................

/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:303: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.00000001 < 0.00001

2021-10-04 18:59:12.251 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 18:59:12.251 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 18:59:12.251 DEBUG [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 1 OpenCL platforms on the PC.
2021-10-04 18:59:12.251 INFO  [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 0: NVIDIA CUDA.
2021-10-04 18:59:12.251 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 0.
2021-10-04 18:59:12.251 DEBUG [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 10 OpenCL devices for the platform NVIDIA CUDA.
2021-10-04 18:59:12.251 INFO  [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform NVIDIA CUDA: NVIDIA GeForce RTX 2080 Ti
2021-10-04 18:59:12.358 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 18:59:12.358 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 18:59:12.358 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/projector.cl
2021-10-04 18:59:12.358 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/backprojector.cl
2021-10-04 18:59:12.358 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/projector_tt.cl
2021-10-04 18:59:12.358 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/backprojector_tt.cl
2021-10-04 18:59:12.359 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:102] Building file /tmp/allsources_934493372.cl with options : -Werror
2021-10-04 18:59:12.363 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 18:59:15.801 ERROR [2185335] [____C_A_T_C_H____T_E_S_T____10@:479] X0.221993, 0.055180, 0.804684
2021-10-04 18:59:15.802 ERROR [2185335] [____C_A_T_C_H____T_E_S_T____10@:480] B0.071454, 0.428649, 0.457496
-------------------------------------------------------------------------------
GLSQRReconstructor AdjointDotProduct TA3 projector TEST
-------------------------------------------------------------------------------
/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:432
...............................................................................

/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:488: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.0000000331 < 0.00001

2021-10-04 19:00:05.821 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:485] projectorLocalNDRange = cl::NDRange()
2021-10-04 19:00:05.821 DEBUG [2185335] [KCT::BaseReconstructor::BaseReconstructor@:497] backprojectorLocalNDRange = cl::NDRange(4, 16, 1)
2021-10-04 19:00:05.821 DEBUG [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:30] There exists 1 OpenCL platforms on the PC.
2021-10-04 19:00:05.821 INFO  [2185335] [KCT::util::OpenCLManager::getPlatform@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:32] Selected OpenCL platform 0: NVIDIA CUDA.
2021-10-04 19:00:05.821 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:48] Adding deviceID 0 on the platform 0.
2021-10-04 19:00:05.821 DEBUG [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:81] There exists 10 OpenCL devices for the platform NVIDIA CUDA.
2021-10-04 19:00:05.821 INFO  [2185335] [KCT::util::OpenCLManager::getDevice@/home/kulvait/git/kct_cbct/submodules/CTIOL/src/OPENCL/OpenCLManager.cpp:83] Selected device 0 on the platform NVIDIA CUDA: NVIDIA GeForce RTX 2080 Ti
2021-10-04 19:00:05.929 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/utils.cl
2021-10-04 19:00:05.929 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/include.cl
2021-10-04 19:00:05.929 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/projector_sidon.cl
2021-10-04 19:00:05.929 DEBUG [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:78] Including file opencl/backprojector_sidon.cl
2021-10-04 19:00:05.929 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:102] Building file /tmp/allsources_2000313764.cl with options : -Werror
2021-10-04 19:00:05.932 INFO  [2185335] [KCT::Kniha::initializeOpenCL@/home/kulvait/git/kct_cbct/src/Kniha.cpp:152] Build succesfull
2021-10-04 19:00:09.358 ERROR [2185335] [____C_A_T_C_H____T_E_S_T____14@:664] X0.221993, 0.055180, 0.804684
2021-10-04 19:00:09.358 ERROR [2185335] [____C_A_T_C_H____T_E_S_T____14@:665] B0.071454, 0.428649, 0.457496
-------------------------------------------------------------------------------
GLSQRReconstructor AdjointDotProduct Sidon projector TEST
-------------------------------------------------------------------------------
/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:617
...............................................................................

/home/kulvait/git/kct_cbct/tests/adjoint.test.cpp:673: PASSED:
  REQUIRE( std::abs(adjointProductRatio - 1.0) < tol )
with expansion:
  0.0000000007 < 0.00001

===============================================================================
All tests passed (6 assertions in 7 test cases)

CVP paper 2021: The platforms

04.10.2021

We have made the following statement in the publication that will be sent for peer review to the IEEE Transactions on Medical Imaging:

Transparency and reproducibility of the results

We believe in the concept of open science. Therefore, not only the software developed by the first author is published under an open source license, we are also publishing all the procedures, scripts, input data and log files, including implementation details, that were used to produce the graphs and tables in the results section. This way, anyone can reproduce our steps on their hardware and compare the results, or use our logs and scripts to compare the results with the underlying data. The files and protocols are published on the website https://kulvait.github.io/KCT_doc/categories/cvp-paper-2020.html.

This is also important for the users of the software, because in the future we can disclose how the run times of the projectors change with new versions of the software or with new hardware.

Here I describe the platforms we use to produce the results in more details.

AMD Radeon VII

The local PC station, where the Debian GNU/Linux 9.13 (stretch) is installed. The Linux kernel version is

Linux stretch 4.19.0-0.bpo.5-amd64 #1 SMP Debian 4.19.37-5+deb10u2~bpo9+1 (2019-08-16) x86_64 GNU/Linux

There is AMD Ryzen 7 1800X Eight-Core Processor and 32GB system memory.

The AMD Radeon VII graphic card is used.

The OpenCL platform is based on the Radeon™ Software for Linux® 19.10.

The clinfo command has the following output

pci id for fd 5: 1002:66af, driver (null)
pci id for fd 5: 1002:66af, driver (null)
Number of platforms                               2
  Platform Name                                   Clover
  Platform Vendor                                 Mesa
  Platform Version                                OpenCL 1.1 Mesa 13.0.6
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd
  Platform Extensions function suffix             MESA

  Platform Name                                   AMD Accelerated Parallel Processing
  Platform Vendor                                 Advanced Micro Devices, Inc.
  Platform Version                                OpenCL 2.1 AMD-APP (2841.4)
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 
  Platform Host timer resolution                  1ns
  Platform Extensions function suffix             AMD

  Platform Name                                   Clover
Number of devices                                 0

  Platform Name                                   AMD Accelerated Parallel Processing
Number of devices                                 1
  Device Name                                     gfx906
  Device Vendor                                   Advanced Micro Devices, Inc.
  Device Vendor ID                                0x1002
  Device Version                                  OpenCL 2.0 AMD-APP (2841.4)
  Driver Version                                  2841.4 (PAL,HSAIL)
  Device OpenCL C Version                         OpenCL C 2.0 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Device Board Name (AMD)                         AMD Radeon VII
  Device Topology (AMD)                           PCI-E, 0d:00.0
  Max compute units                               60
  SIMD per compute unit (AMD)                     4
  SIMD width (AMD)                                16
  SIMD instruction width (AMD)                    1
  Max clock frequency                             1802MHz
  Graphics IP (AMD)                               9.6
  Device Partition                                (core)
    Max number of sub-devices                     60
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x1024
  Max work group size                             256
  Preferred work group size multiple              64
  Wavefront width (AMD)                           64
  Preferred / native vector sizes                 
    char                                                 4 / 4       
    short                                                2 / 2       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 1 / 1        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     No
    Infinity and NANs                             No
    Round to nearest                              No
    Round to zero                                 No
    Round to infinity                             No
    IEEE754-2008 fused multiply-add               No
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     No
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    64, Little-Endian
  Global memory size                              16978542592 (15.81GiB)
  Global free memory (AMD)                        16515072 (15.75GiB)
  Global memory channels (AMD)                    128
  Global memory banks per channel (AMD)           4
  Global memory bank width (AMD)                  256 bytes
  Error Correction support                        No
  Max memory allocation                           4244635648 (3.953GiB)
  Unified memory for Host and Device              No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   Yes
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       2048 bits (256 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Max size for global variable                    3820172032 (3.558GiB)
  Preferred total size of global vars             16978542592 (15.81GiB)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        16384
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            134217728 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   256 bytes
    Pitch alignment for 2D image buffers          256 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             2048x2048x2048 pixels
    Max number of read image args                 128
    Max number of write image args                64
    Max number of read/write image args           64
  Max number of pipe args                         16
  Max active pipe reservations                    16
  Max pipe packet size                            4244635648 (3.953GiB)
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Local memory syze per CU (AMD)                  65536 (64KiB)
  Local memory banks (AMD)                        32
  Max constant buffer size                        4244635648 (3.953GiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties (on host)                      
    Out-of-order execution                        No
    Profiling                                     Yes
  Queue properties (on device)                    
    Out-of-order execution                        Yes
    Profiling                                     Yes
    Preferred size                                262144 (256KiB)
    Max size                                      8388608 (8MiB)
  Max queues on device                            1
  Max events on device                            1024
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      1ns
  Profiling timer offset since Epoch (AMD)        1633346617151404433ns (Mon Oct  4 13:23:37 2021)
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Thread trace supported (AMD)                  Yes
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_gl_sharing cl_khr_gl_depth_images cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_subgroups cl_khr_gl_event cl_khr_depth_images cl_khr_mipmap_image cl_khr_mipmap_image_writes 

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [AMD]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No devices found in platform

NVIDIA RTX 2080 Ti

The remote cluster, where the Manjaro Linux 21.1.2 is installed. The Linux kernel version is

Linux G481HA00 5.10.61-1-MANJARO #1 SMP PREEMPT Thu Aug 26 20:36:54 UTC 2021 x86_64 GNU/Linux

There is Intel® Xeon® Gold 6248 Processor and the amount of memory according to the /proc/meminfo is MemTotal: 791199636 kB

There is 10 NVIDIA GEFORCE® RTX 2080 Ti graphic cards and users can claim every card for their computations by means of the self deployed resource management system. We always use just one of the cards for the tests. The card number allocated might differ.

The clinfo command has the following output


Number of platforms                               1
  Platform Name                                   NVIDIA CUDA
  Platform Vendor                                 NVIDIA Corporation
  Platform Version                                OpenCL 3.0 CUDA 11.4.112
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info
  Platform Extensions with Version                cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)
  Platform Numeric Version                        0xc00000 (3.0.0)
  Platform Extensions function suffix             NV
  Platform Host timer resolution                  0ns

  Platform Name                                   NVIDIA CUDA
Number of devices                                 10
  Device Name                                     NVIDIA GeForce RTX 2080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     20410a11-0f3a-3f04-33a2-f5d432cea4ac
  Driver UUID                                     20410a11-0f3a-3f04-33a2-f5d432cea4ac
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  470.63.01
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:1a:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               68
  Max clock frequency                             1545MHz
  Compute Capability (NV)                         7.5
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              11554717696 (10.76GiB)
  Error Correction support                        No
  Max memory allocation                           2888679424 (2.69GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        2228224 (2.125MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  3
    IL version                                    (n/a)
    ILs with version                              <186: get cl_device_ils_with_version : error>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <190: get cl_device_built_in_kernels_with_version : error>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

  Device Name                                     NVIDIA GeForce RTX 2080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     04c88413-cbeb-4721-3976-7ec59802f5e9
  Driver UUID                                     04c88413-cbeb-4721-3976-7ec59802f5e9
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  470.63.01
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:1b:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               68
  Max clock frequency                             1545MHz
  Compute Capability (NV)                         7.5
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              11554717696 (10.76GiB)
  Error Correction support                        No
  Max memory allocation                           2888679424 (2.69GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        2228224 (2.125MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  3
    IL version                                    (n/a)
    ILs with version                              <186: get cl_device_ils_with_version : error>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <190: get cl_device_built_in_kernels_with_version : error>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

  Device Name                                     NVIDIA GeForce RTX 2080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     5a2564e3-ffa7-035a-090b-2522e355c977
  Driver UUID                                     5a2564e3-ffa7-035a-090b-2522e355c977
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  470.63.01
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:1c:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               68
  Max clock frequency                             1545MHz
  Compute Capability (NV)                         7.5
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              11554717696 (10.76GiB)
  Error Correction support                        No
  Max memory allocation                           2888679424 (2.69GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        2228224 (2.125MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  3
    IL version                                    (n/a)
    ILs with version                              <186: get cl_device_ils_with_version : error>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <190: get cl_device_built_in_kernels_with_version : error>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

  Device Name                                     NVIDIA GeForce RTX 2080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     6007995f-46e4-ae01-b07f-640523acafc8
  Driver UUID                                     6007995f-46e4-ae01-b07f-640523acafc8
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  470.63.01
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:1d:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               68
  Max clock frequency                             1545MHz
  Compute Capability (NV)                         7.5
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              11554717696 (10.76GiB)
  Error Correction support                        No
  Max memory allocation                           2888679424 (2.69GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        2228224 (2.125MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  3
    IL version                                    (n/a)
    ILs with version                              <186: get cl_device_ils_with_version : error>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <190: get cl_device_built_in_kernels_with_version : error>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

  Device Name                                     NVIDIA GeForce RTX 2080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     2565d076-c445-e440-0e9a-e13f125fc1b0
  Driver UUID                                     2565d076-c445-e440-0e9a-e13f125fc1b0
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  470.63.01
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:1e:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               68
  Max clock frequency                             1545MHz
  Compute Capability (NV)                         7.5
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              11554717696 (10.76GiB)
  Error Correction support                        No
  Max memory allocation                           2888679424 (2.69GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        2228224 (2.125MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  3
    IL version                                    (n/a)
    ILs with version                              <186: get cl_device_ils_with_version : error>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <190: get cl_device_built_in_kernels_with_version : error>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

  Device Name                                     NVIDIA GeForce RTX 2080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     dd14c85a-ee2e-f7e6-a37c-49b36785614a
  Driver UUID                                     dd14c85a-ee2e-f7e6-a37c-49b36785614a
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  470.63.01
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:b1:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               68
  Max clock frequency                             1545MHz
  Compute Capability (NV)                         7.5
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              11554717696 (10.76GiB)
  Error Correction support                        No
  Max memory allocation                           2888679424 (2.69GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        2228224 (2.125MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  3
    IL version                                    (n/a)
    ILs with version                              <186: get cl_device_ils_with_version : error>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <190: get cl_device_built_in_kernels_with_version : error>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

  Device Name                                     NVIDIA GeForce RTX 2080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     1da759ad-9638-9778-868a-07d61b27051e
  Driver UUID                                     1da759ad-9638-9778-868a-07d61b27051e
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  470.63.01
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:b2:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               68
  Max clock frequency                             1545MHz
  Compute Capability (NV)                         7.5
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              11554717696 (10.76GiB)
  Error Correction support                        No
  Max memory allocation                           2888679424 (2.69GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        2228224 (2.125MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  3
    IL version                                    (n/a)
    ILs with version                              <186: get cl_device_ils_with_version : error>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <190: get cl_device_built_in_kernels_with_version : error>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

  Device Name                                     NVIDIA GeForce RTX 2080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     99038e7d-7955-cec3-2d42-e71c9cfecd4d
  Driver UUID                                     99038e7d-7955-cec3-2d42-e71c9cfecd4d
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  470.63.01
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:b3:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               68
  Max clock frequency                             1545MHz
  Compute Capability (NV)                         7.5
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              11554717696 (10.76GiB)
  Error Correction support                        No
  Max memory allocation                           2888679424 (2.69GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        2228224 (2.125MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  3
    IL version                                    (n/a)
    ILs with version                              <186: get cl_device_ils_with_version : error>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <190: get cl_device_built_in_kernels_with_version : error>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

  Device Name                                     NVIDIA GeForce RTX 2080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     fa1b0740-e46e-4748-2f5e-dac0e0520b50
  Driver UUID                                     fa1b0740-e46e-4748-2f5e-dac0e0520b50
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  470.63.01
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:b4:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               68
  Max clock frequency                             1545MHz
  Compute Capability (NV)                         7.5
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              11554717696 (10.76GiB)
  Error Correction support                        No
  Max memory allocation                           2888679424 (2.69GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        2228224 (2.125MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  3
    IL version                                    (n/a)
    ILs with version                              <186: get cl_device_ils_with_version : error>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <190: get cl_device_built_in_kernels_with_version : error>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

  Device Name                                     NVIDIA GeForce RTX 2080 Ti
  Device Vendor                                   NVIDIA Corporation
  Device Vendor ID                                0x10de
  Device Version                                  OpenCL 3.0 CUDA
  Device UUID                                     e602fca3-82c8-6d70-0141-f5c7904e48f9
  Driver UUID                                     e602fca3-82c8-6d70-0141-f5c7904e48f9
  Valid Device LUID                               No
  Device LUID                                     6d69-637300000000
  Device Node Mask                                0
  Device Numeric Version                          0xc00000 (3.0.0)
  Driver Version                                  470.63.01
  Device OpenCL C Version                         OpenCL C 1.2 
  Device OpenCL C all versions                    OpenCL C                                                         0x400000 (1.0.0)
                                                  OpenCL C                                                         0x401000 (1.1.0)
                                                  OpenCL C                                                         0x402000 (1.2.0)
                                                  OpenCL C                                                         0xc00000 (3.0.0)
  Device OpenCL C features                        __opencl_c_fp64                                                  0xc00000 (3.0.0)
                                                  __opencl_c_images                                                0xc00000 (3.0.0)
                                                  __opencl_c_int64                                                 0xc00000 (3.0.0)
                                                  __opencl_c_3d_image_writes                                       0xc00000 (3.0.0)
  Latest comfornace test passed                   v2021-02-01-00
  Device Type                                     GPU
  Device Topology (NV)                            PCI-E, 0000:b5:00.0
  Device Profile                                  FULL_PROFILE
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Max compute units                               68
  Max clock frequency                             1545MHz
  Compute Capability (NV)                         7.5
  Device Partition                                (core)
    Max number of sub-devices                     1
    Supported partition types                     None
    Supported affinity domains                    (n/a)
  Max work item dimensions                        3
  Max work item sizes                             1024x1024x64
  Max work group size                             1024
  Preferred work group size multiple (device)     32
  Preferred work group size multiple (kernel)     32
  Warp size (NV)                                  32
  Max sub-groups per work group                   0
  Preferred / native vector sizes                 
    char                                                 1 / 1       
    short                                                1 / 1       
    int                                                  1 / 1       
    long                                                 1 / 1       
    half                                                 0 / 0        (n/a)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (n/a)
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
  Address bits                                    64, Little-Endian
  Global memory size                              11554717696 (10.76GiB)
  Error Correction support                        No
  Max memory allocation                           2888679424 (2.69GiB)
  Unified memory for Host and Device              No
  Integrated memory (NV)                          No
  Shared Virtual Memory (SVM) capabilities        (core)
    Coarse-grained buffer sharing                 Yes
    Fine-grained buffer sharing                   No
    Fine-grained system sharing                   No
    Atomics                                       No
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       4096 bits (512 bytes)
  Preferred alignment for atomics                 
    SVM                                           0 bytes
    Global                                        0 bytes
    Local                                         0 bytes
  Atomic memory capabilities                      relaxed, work-group scope
  Atomic fence capabilities                       relaxed, acquire/release, work-group scope
  Max size for global variable                    0
  Preferred total size of global vars             0
  Global Memory cache type                        Read/Write
  Global Memory cache size                        2228224 (2.125MiB)
  Global Memory cache line size                   128 bytes
  Image support                                   Yes
    Max number of samplers per kernel             32
    Max size for 1D images from buffer            268435456 pixels
    Max 1D or 2D image array size                 2048 images
    Max 2D image size                             32768x32768 pixels
    Max 3D image size                             16384x16384x16384 pixels
    Max number of read image args                 256
    Max number of write image args                32
    Max number of read/write image args           0
  Pipe support                                    No
  Max number of pipe args                         0
  Max active pipe reservations                    0
  Max pipe packet size                            0
  Local memory type                               Local
  Local memory size                               49152 (48KiB)
  Registers per block (NV)                        65536
  Max number of constant args                     9
  Max constant buffer size                        65536 (64KiB)
  Generic address space support                   No
  Max size of kernel argument                     4352 (4.25KiB)
  Queue properties (on host)                      
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Device enqueue capabilities                     (n/a)
  Queue properties (on device)                    
    Out-of-order execution                        No
    Profiling                                     No
    Preferred size                                0
    Max size                                      0
  Max queues on device                            0
  Max events on device                            0
  Prefer user sync for interop                    No
  Profiling timer resolution                      1000ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    Non-uniform work-groups                       No
    Work-group collective functions               No
    Sub-group independent forward progress        No
    Kernel execution timeout (NV)                 Yes
  Concurrent copy and kernel execution (NV)       Yes
    Number of async copy engines                  3
    IL version                                    (n/a)
    ILs with version                              <186: get cl_device_ils_with_version : error>
  printf() buffer size                            1048576 (1024KiB)
  Built-in kernels                                (n/a)
  Built-in kernels with version                   <190: get cl_device_built_in_kernels_with_version : error>
  Device Extensions                               cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_fp64 cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_icd cl_khr_gl_sharing cl_nv_compiler_options cl_nv_device_attribute_query cl_nv_pragma_unroll cl_nv_copy_opts cl_nv_create_buffer cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_device_uuid cl_khr_pci_bus_info
  Device Extensions with Version                  cl_khr_global_int32_base_atomics                                 0x400000 (1.0.0)
                                                  cl_khr_global_int32_extended_atomics                             0x400000 (1.0.0)
                                                  cl_khr_local_int32_base_atomics                                  0x400000 (1.0.0)
                                                  cl_khr_local_int32_extended_atomics                              0x400000 (1.0.0)
                                                  cl_khr_fp64                                                      0x400000 (1.0.0)
                                                  cl_khr_3d_image_writes                                           0x400000 (1.0.0)
                                                  cl_khr_byte_addressable_store                                    0x400000 (1.0.0)
                                                  cl_khr_icd                                                       0x400000 (1.0.0)
                                                  cl_khr_gl_sharing                                                0x400000 (1.0.0)
                                                  cl_nv_compiler_options                                           0x400000 (1.0.0)
                                                  cl_nv_device_attribute_query                                     0x400000 (1.0.0)
                                                  cl_nv_pragma_unroll                                              0x400000 (1.0.0)
                                                  cl_nv_copy_opts                                                  0x400000 (1.0.0)
                                                  cl_nv_create_buffer                                              0x400000 (1.0.0)
                                                  cl_khr_int64_base_atomics                                        0x400000 (1.0.0)
                                                  cl_khr_int64_extended_atomics                                    0x400000 (1.0.0)
                                                  cl_khr_device_uuid                                               0x400000 (1.0.0)
                                                  cl_khr_pci_bus_info                                              0x400000 (1.0.0)

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [NV]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_DEFAULT)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No devices found in platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  Invalid device type for platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform
190:>186:>190:>186:>190:>186:>190:>186:>190:>186:>190:>186:>190:>186:>190:>186:>190:>186:>190:>186:>

Working with KCT CBCT 4 First cone beam CT projection

19.09.2021

In the last post Working with KCT CBCT 1 we converted DICOM data to the DEN format and explained how volumes are encoded. Today we create a CT trajectory setup and encode it in the format that can be recognized in KCT. Finally, we use KCT to reproject the volume from the last post using this trajectory.

Theoretical scanning trajectory

Let's reproject the volume we have using theoretical C-Arm FDCT trajectory. Let's say that our device rotates along the axis parallel to the z axis. By angle $\omega$ we denote polar angle between normal to detector that is pointing towards the source and x axis. Let's fix source to isocenter and source to detector lengths and create a setup in which patient is irradiated from 360 different angles corresponding to integer $\omega$ values. Let's create (camera matrices)[https://en.wikipedia.org/wiki/Camera_matrix], which encode this setup. We create them as float64 DEN files, where z dimension will represent number of angles. So that we produce file of dimensions (dimx, dimy, dimz)=(3,4,360).

Volume reprojection

Use the following command

cllin-projector --cvp --voxel-sizex 0.64453125 --voxel-sizey 0.64453125 --voxel-sizez 1.5 --projection-sizex 616 --projection-sizey 480 DEN_RAW/Series_00.den CM.den Series_00.proj

Working with KCT CBCT 3 Python implementation of circular CT trajectory

15.09.2021

In the last post we build the mathematical foundation how to construct camera matrices for FDCT setup. Now we use this knowledge to create DEN file with a circular geometry and show how to work with it in the KCT package.

System setup

We have yet mastered the theory so let's produce the set of camera matrices for given trajectory. I have put a script that implements what has been said to github. Version of denpy package must be at least 1.1.2 to have function for storing double DEN storeNdarrayAsDoubleDEN.

For this demonstration to run on your computer you first need to install denpy package and clone DEN scripts package. Let's suppose you are running Debian 11.

You can create a folder KCT and KCT_bin in your home directory. Go into this folder and clone KCT scripts into the folder scripts.

git clone https://github.com/kulvait/KCT_scripts.git scripts

Then into your .bashrc you can add the following

export PATH=$PATH:~/KCT_bin::~/KCT_scripts

Then you shall also install denpy

pip3 install --user git+https://github.com/kulvait/KCT_denpy.git

if you need to upgrade run

pip3 install --user --upgrade git+https://github.com/kulvait/KCT_denpy.git

Now you have installed everything to produce the camera matrices. Optionally you might need to install KCT dentk package, that will be very useful for working with the files in DEN format.

Go to the KCT folder and clone it

git clone https://github.com/kulvait/KCT_dentk dentk

Then follow the instructions in README file at https://github.com/kulvait/KCT_dentk to build the package.

Python implementation

When the system is set up, we can just run

createCameraMatricesForCircularScanTrajectory.py --force --write-params-file CM.den

as I set all the default parameters of this script according to last post. You can see that the den file CM.den and the text file CM.den.params were produced. Params file is not apriori needed for working with KCT but it is a good practice to know how your files were created.

Let's have a look to the main part of the script

I=ARG.source_to_isocenter
A=ARG.source_to_detector
M=float(ARG.projection_sizey)
N=float(ARG.projection_sizex)
PX=ARG.pixel_sizex
PY=ARG.pixel_sizey
VIEWCOUNT = ARG.number_of_angles
OMEGA = ARG.omega_zero
OMEGAINCREMENT = 2*np.pi/VIEWCOUNT

#Let's create specified set of projection matrices as np.array
CameraMatrices = np.zeros((3,4,0), dtype=np.float64)

for viewIndex in range(VIEWCOUNT):
    s = sourcePosition(OMEGA, I) 
    _A3=A3(ARG.pixel_offsetx, ARG.pixel_offsety)
    _A2=A2(M, N)
    _A1 = A1(PX, PY)
    _E = E(A)
    _X2 = X2(s) 
    _X1 = X1(OMEGA)
    CM = _A3.dot(_A2).dot(_A1).dot(_E).dot(_X2).dot(_X1)
    CameraMatrices = np.dstack((CameraMatrices, CM))
    OMEGA += OMEGAINCREMENT

DEN.storeNdarrayAsDoubleDEN(ARG.outputMatrixFile, CameraMatrices, ARG.force)

First we initialize the parameters.

Then we prepare the numpy matrices according to the last post and finally we do the multiplication and concatenate the matrices.

On the last line you can see how easy is to convert numpy.ndarray into the DEN by one call.

Using the script, we produce the camera matrices we need for given trajectory setup. To remind you the parameters let's have a look to the CM.den.params file

{
  "_json_message": "Created using KCT script createCameraMatricesForCircularScanTrajectory.py",
  "force": true,
  "number_of_angles": 360,
  "omega_zero": 0,
  "outputMatrixFile": "CM.den",
  "pixel_offsetx": 0.0,
  "pixel_offsety": 0.0,
  "pixel_sizex": 0.616,
  "pixel_sizey": 0.616,
  "projection_sizex": 616,
  "projection_sizey": 480,
  "source_to_detector": 1198.0,
  "source_to_isocenter": 749.0,
  "write_params_file": true
}

So that we can check which particular parameters were used to create our trajectory. To dig deeper, I have written dentk-mathinfo as part of the dentk package. We can check decompose our matrices and check properties such as source position of first few frames by running

dentk-matinfo CM.den -f 0-5

with the following output

Camera matrix from 0-th frame:
    |   -0.200     1.623     0.000   149.737|                     | 1944.805    -0.000   239.500| |    0.000     1.000     0.000|    0.000|
P = |   -0.257     0.000    -1.623   192.252| = C[Q|u] =    1198.0|    0.000  1944.805   307.500|.|   -0.000    -0.000    -1.000|    0.000|
    |   -0.001     0.000     0.000     0.625|                     |    0.000     0.000     1.000| |   -1.000     0.000    -0.000|  749.000|
S = [749.00, -0.00, -0.00], -Q^T u = [749.00,  0.00,  0.00].
Camera matrix from 1-th frame:
    |   -0.228     1.620     0.000   149.737|                     | 1944.805     0.000   239.500| |   -0.017     1.000    -0.000|    0.000|
P = |   -0.257    -0.004    -1.623   192.252| = C[Q|u] =    1198.0|    0.000  1944.805   307.500|.|   -0.000    -0.000    -1.000|    0.000|
    |   -0.001    -0.000     0.000     0.625|                     |    0.000     0.000     1.000| |   -1.000    -0.017    -0.000|  749.000|
S = [748.89, 13.07, -0.00], -Q^T u = [748.89, 13.07,  0.00].
Camera matrix from 2-th frame:
    |   -0.256     1.615     0.000   149.737|                     | 1944.805     0.000   239.500| |   -0.035     0.999    -0.000|    0.000|
P = |   -0.257    -0.009    -1.623   192.252| = C[Q|u] =    1198.0|    0.000  1944.805   307.500|.|   -0.000     0.000    -1.000|    0.000|
    |   -0.001    -0.000     0.000     0.625|                     |    0.000     0.000     1.000| |   -0.999    -0.035    -0.000|  749.000|
S = [748.54, 26.14, -0.00], -Q^T u = [748.54, 26.14,  0.00].
Camera matrix from 3-th frame:
    |   -0.285     1.611     0.000   149.737|                     | 1944.805     0.000   239.500| |   -0.052     0.999    -0.000|    0.000|
P = |   -0.256    -0.013    -1.623   192.252| = C[Q|u] =    1198.0|    0.000  1944.805   307.500|.|   -0.000     0.000    -1.000|    0.000|
    |   -0.001    -0.000     0.000     0.625|                     |    0.000     0.000     1.000| |   -0.999    -0.052    -0.000|  749.000|
S = [747.97, 39.20, -0.00], -Q^T u = [747.97, 39.20,  0.00].
Camera matrix from 4-th frame:
    |   -0.313     1.605     0.000   149.737|                     | 1944.805     0.000   239.500| |   -0.070     0.998    -0.000|   -0.000|
P = |   -0.256    -0.018    -1.623   192.252| = C[Q|u] =    1198.0|    0.000  1944.805   307.500|.|    0.000     0.000    -1.000|    0.000|
    |   -0.001    -0.000     0.000     0.625|                     |    0.000     0.000     1.000| |   -0.998    -0.070    -0.000|  749.000|
S = [747.18, 52.25, -0.00], -Q^T u = [747.18, 52.25,  0.00].
Camera matrix from 5-th frame:
    |   -0.341     1.600     0.000   149.737|                     | 1944.805     0.000   239.500| |   -0.087     0.996     0.000|   -0.000|
P = |   -0.256    -0.022    -1.623   192.252| = C[Q|u] =    1198.0|    0.000  1944.805   307.500|.|    0.000     0.000    -1.000|    0.000|
    |   -0.001    -0.000     0.000     0.625|                     |    0.000     0.000     1.000| |   -0.996    -0.087    -0.000|  749.000|
S = [746.15, 65.28, -0.00], -Q^T u = [746.15, 65.28,  0.00].

Here we can see that it was among other things able to show us source positions $S$ for the first views. This seems reasonable when looking to the image of the geometry and how we describe the trajectory. First the source is aligned with $x_1$ axis and it rotates towards $x_2$ axis.

In the next post we use created camera matrices, KCT cbct package and downloaded CT volume from public repository and we will show how to project the CT volumes using the FDCT trajectory, which we just created.

Working with KCT CBCT 2 Projective geometry and camera matrices to describe CT geometry

14.09.2021

Before we define particular geometry corresponding to the flat panel detector CT trajectory, we need to know some theory about projective geometry and camera matrices. This will be the content of this post.

CT projections geometry

In the computer tomography, we project 3D object in $ \mathbb{R}^3 $ to the projector grid. Let's say it is a two dimensional grid that consist of rectangular pixels. Coordinates on the detector can be naturally described as the vectors in $ \mathbb{R}^2 $ since the projections are 2D images. The process of CBCT X-ray projection is analogous to the pinhole camera model that projects the 3D scene onto the 2D plane. And therefore projective geometry is a good tool to study this correspondence.

Projective geometry

Projective space is a structure on top of an Vector space $\mathbf{V}$ that is not a vector space itself. For an introduction into the projective geometry, see class notes of Nigel Hitchin. I will follow some of its definitions

The projective space $\mathcal{P}(\mathbf{V})$ of a vector space $\mathbf{V}$ is a set of one dimensional subspaces of $\mathbf{V}$. The dimension of $\mathcal{P}(\mathbf{V})$ is $dim(\mathbf{V}) - 1$. Projective space of dimension $1$ is called projective line and projective space of dimension $2$ is called projective plane.

It is interesting to see in this definitions, that when we take the space $\mathbb{R}^3$, where the source is placed in its origin. We construct "a set of one dimensional subspaces of $\mathbb{R}^3$", which are all the lines through the origin representing all the rays going from the source. The number of lines with this characteristic is (almost) the same as the number of the points on the unit half sphere and I use this property, for derivation of the 3D CBCT Cutting voxel projector. Here we first observe what uniquelly defines the flat detector CT (FDCT) setup and which properties can be described using projection matrices.

FDCT projection setup

Let's have the following FDCT setup

with 2D scatch

There are world coordinates described by vectors $\mathbb{x} = (x_1, x_2, x_3)$. There is a source at the position $\mathbb{S} = (s_1, s_2, s_3)$. And there is a Flat panel detector which is described by the point $\chi^{(0,0)}$, where is the point $(0,0)$ at the detector and by two orthogonal vectors $\chi^1$ and $\chi^2$. Let's say that spacing of the detector pixels is determined by the length of the vectors $\chi^1$ and $\chi^2$ so that the size of the pixels is $|\chi^1| \times |\chi^2|$. Let's also say that pixel boundaries have zero thickness and where one pixel ends, another starts. We have to specify how many pixels is there in the directions $\chi^1$ and $\chi^2$ and we have complete FDCT setup.

By convention described also in previous post, the $x_3$ axis parallel with the axis of the rotation and $x_2$ axis goes from the top (above scanned subject) to bottom (under scanned object).

Let's mention there is one special ray from the source, which is perpendicular to the detector. We usually call this ray principal ray and the point, where this ray hits the detector principal point. In some applications it might be convenient to shift the principal ray outside the center of the detector, see for example work on quarter detector offset shifting to improve resolution.

Common FDCT setup simplifications

The setup described is too general for many projectors implemented in KCT package. The only exception is Siddon projector, which can be used with the geometries of this generality.

TR and TT projectors by design and CVP projector by current implementation use the following simplification of the geometry. They expect, that the vector $\mathbb{\chi^2}$ is always parallel to $\mathbb{x_3}$ cartesian vector. That implies then that the normal to the detector, which can be obtained e.g. as a normalized cross product $\mathbb{\chi^1} \times \mathbb{\chi^2}$ is orthogonal to $\mathbb{x_3}$.

This exclude trajectories, where the device rotates along axis not parallel with $\mathbb{x_3}$.

Camera matrix to describe FDCT setup

By the camera matrix we describe the mapping from the point $\mathbb{x} = (x_1, x_2, x_3)$ to the point on the detector given by $\mathbb{P}=(p^1, p^2)$. The point $\mathbb{x}$ will be projected onto the point $\chi^{(0,0)} + p^1 * \chi^1 + p^2 * \chi^2$. Interestingly, we don't need to know precise values of vectors $\chi^{(0,0)}, \chi^1, \chi^2$ as when we place the vector $\chi^{(0,0)}$ anywhere on the ray from the source to that vector, and scale accordingly also the vectors $\chi^1$ and $\chi^2$, we obtain obtain the same mapping between $\mathbb{x} = (x_1, x_2, x_3)$ and $\mathbb{P}=(P^1, P^2)$. In the normal detector we can not change size of the pixels, so this is just a theoretical consideration. However this has an implication that when having camera matrix in the sense of beeing a linear mapping between projective space of dimension 3 and projective space of dimension 2, we can not tell how far from the source the detector is or how far the pixels are spaced from each other. When to the camera metrix adding information of one pixel dimension or distance from projector to source, system is fully defined.

Why talking about abstract projective elements? Camera matrix is at the end of the day matrix from $\mathbf{R}^{3x4}$. Scaling by nonzero constant do not change its properties as a projective element. Therefore using this fact we can encode source to detector distance into the matrix. On the other hand we don't need this information in order to make CT projection or reconstruction anyway. Camera matrix does not tell us, what is the size of the detector and this information must be provided to the reconstruction software by other means. For example to do a projection using KCT framework, we need to specify

    --projection-sizex UINT Needs: --projection-sizey
                                X dimension of detector in pixel count, defaults to 616.
    --projection-sizey UINT Needs: --projection-sizex
                                Y dimension of detector in pixel count, defaults to 480.

Creating camera matrices for particular setup

The Camera matrix is relatively well described on Wikipedia. So let's construct set of camera matrices to be used in the KCT framework. To store them, we use DEN file format of the size float64 with $(dimx, dimy, dimz)=(4,3,n)$, where n is number of the configurations in given FDCT trajectory.

Let's have the following setup. Zero of the world coordinates coincide with the volume center. Source and detector rotate along $x_3$ axis and $\chi_2$ is parallel with $x_3$. Principal ray at every position of the trajectory hits the detector. The distance from the source to theisocenter is $I = 749mm$ and the distance from source to the detector is $A=1198mm$. Trajectory consist of $360$ views and in the view $\omega$ the normal to the detector pointing towards the source forms the polar angle with respect to $x_1$ and $x_2$ axes so that $n_\omega = (cos(\omega), sin(\omega))$. Finally let's have $PX\times PY$ = $0.616mmx0.616mm$ pixels and $M\times N = 616x480$ grid.

From this information we construct camera matrix for given projection setup. First we perform a transformation to the local coordinate system related to the given view $\omega$ that will be more convenient to work with. In the terminology of projection matrices, we understand the point in 3D as a point of projective space of dimension 3, which can be represented by the vector $(x_1, x_2, x_3, 1)$. We use this just as a vehicle to encode source position into the projection matrix not to exploit some topological properties of $\mathbb{R}^4$, but let's respect this usual way how to construct projection matrices. So that linear transformations in this kind of space will be transformations between $\mathbb{R}^4$ encoded by means of matrices from $\mathbb{R}^{4\times4}$.

First we just perform a rotation of the axes $x_1,x_2,x_3$ so that $x_1',x_2',x_3'$ will again form Cartesian coordinate system but rotated towards our setup. Let's identify positive direction of $x_3'$ with positive direction of $\chi^2$ by $x_3'=-x_3$. We rotate remaining two axes in the way that $x_2'$ unit vector will be the normal vector $n_\omega$ and $x_1'$ will be colinear with $\chi^1$ on the detector. So we use the following projective element $$ \mathbf{X}_1 = \begin{pmatrix} \sin{\omega}& -\cos{\omega} &0 &0 \\ \cos{\omega}& \sin{\omega}&0 &0 \\ 0&0&-1&0 \\ 0&0&0&1 \end{pmatrix}. $$

Now the $\mathbf{X}_1$ took us into a new coordinate system, let's live in it and shift its origin to $S'$. First let's relize that coordinates of $S' = (0, -\sqrt{s_1^2+s_2^2}, -s_3, 1)$. Now we can do a shift by means of the next projective element $$ \mathbf{X}_2 = \begin{pmatrix} 1&0&0&0\\ 0&1&0&\sqrt{s_1^2+s_2^2}\\ 0&0&1&s_3\\ 0&0&0&1 \end{pmatrix} $$

So we constructed projective element such that $$ \begin{pmatrix} x_1'' \\ x_2''\\x_3''\\1 \end{pmatrix} = \mathbf{X}_2 \mathbf{X}_1 \begin{pmatrix} x_1\\x_2\\x_3\\1 \end{pmatrix}. $$

Now let's construct the projective element from 3D transformed space to the detector. But simpler detector with the same orientation as our flat panel but with the zero positioned at principal point and moreover detector such, that unit vector $x_3''$ from principal point projects to 1 in $\chi^2$ direction and unit vector $x_2''$ project to 1 in $\chi_1$ direction. This detector must have focal length $A$ so that we can construct the following projective element $$ \mathbf{E} = \begin{pmatrix} 1&0&0&0\\ 0&0&1&0\\ 0&\frac{1}{A}&0&0 \end{pmatrix} $$ Notice that prior to Git commit f7baa25 of https://github.com/kulvait/KCT_scripts the meaning of $\omega$ were different, it was the angle between positive $x_1$ in the word geometry and detector to source vector, there wea in turn different sign convention to correct for the fact that $x_2''$ points from detector to source and not vice versa. We know, that the unit vectors project to the distance $1/PX$ or $1/PY$ respectivelly. Let's add another projective element to correct for the pixel sizes $$ \mathbf{A}_1 = \begin{pmatrix} \frac{1}{PX}&0&0\\ 0&\frac{1}{PY}&0\\ 0&0&1 \end{pmatrix} $$

Here is obvious that the operation of cutting pixels into $kxk$ subpixels can be realized by matrix $$ \begin{pmatrix} k&0&0\\ 0&k&0\\ 0&0&1 \end{pmatrix} $$ and splitting is realized by inverse operation $$ \begin{pmatrix} \frac{1}{k}&0&0\\ 0&\frac{1}{k}&0\\ 0&0&1 \end{pmatrix}. $$

To shift the $\chi^{(0,0)}$ to its intended position let's do this \begin{equation} \mathbf{A}_2 = \begin{pmatrix} 1&0&\frac{N-1}{2}\\ 0&1&\frac{M-1}{2}\\ 0&0&1 \end{pmatrix} \end{equation}

In the KCT we use the convention that integer projection coordinates denote center of given pixel. Therefore zero is achieved at the center of corner pixel. Now the principal point has coordinates $(PX,PY)=((N-1)/2, (M-1)/2)$ we may need to shift it further. We might want to do something fancy as quarter detector offset shifting to improve resolution. Or we might just want to adjust the positioning to the real device we are trying to model.

\begin{equation} \mathbf{A}_3 = \begin{pmatrix} 1&0&PX_o\\ 0&1&PY_o\\ 0&0&1 \end{pmatrix} \end{equation}

Is a projective element, which does this final offsetting with $(PX_o,PY_o)=(0,0)$ in our demonstration example.

Finally we obtain camera matrix, sometimes in this context also called projection matrix, that for given position in the volume provides the position on the projector $$ \mathbf{P} = \mathbf{A}_3 \mathbf{A}_2 \mathbf{A}_1 \mathbf{E} \mathbf{X}_2 \mathbf{X}_1. $$

In the next post we use our newly gained knowledge to create a Python implementation to write circular trajectory into the DEN file.

Theory

Fresnel propagator

Rayleigh-Sommerfeld propagator

Discretization

Inputs

Convolution theorem

Sampled frequency space

Implementation

wave-energy

Existing outputs

Zero padding

Code efficiency

Radon transform

Physican convention

Storing the projections

Defining parallel rays geometry

Projection matrices

Comparison of the accuracy

Data files

Generating the graphs

Comparion of the speed

Benchmark YL

Benchmark TP

Results of the benchmarks for CVP paper

AMD Radeon VII

NVIDIA RTX 2080 Ti

AMD Radeon VII

NVIDIA RTX 2080 Ti

Theoretical scanning trajectory

Volume reprojection

System setup

Python implementation

Next post

CT projections geometry

Projective geometry

FDCT projection setup

Common FDCT setup simplifications

Camera matrix to describe FDCT setup

Creating camera matrices for particular setup

Next post