diff options
| author | Daniil Kazantsev <dkazanc@hotmail.com> | 2019-11-28 23:01:03 +0000 | 
|---|---|---|
| committer | Daniil Kazantsev <dkazanc@hotmail.com> | 2019-11-28 23:01:03 +0000 | 
| commit | c65291e6b987283e4767a8ad2bd2d2433ca3782e (patch) | |
| tree | c3b660c9b2151f2ff1a12352daf73dfc90d1c3a3 | |
| parent | cdef6a981f1772ed04fe44bbe2b8251983a4ba7a (diff) | |
all work completed on gpu version of pdtv
22 files changed, 1099 insertions, 53 deletions
@@ -24,11 +24,12 @@  1. Rudin-Osher-Fatemi (ROF) Total Variation (explicit PDE minimisation scheme) **2D/3D CPU/GPU** (Ref. *1*)  2. Fast-Gradient-Projection (FGP) Total Variation **2D/3D CPU/GPU** (Ref. *2*)  3. Split-Bregman (SB) Total Variation **2D/3D CPU/GPU** (Ref. *5*) -4. Total Generalised Variation (TGV) model for higher-order regularisation **2D/3D CPU/GPU** (Ref. *6*) -5. Linear and nonlinear diffusion (explicit PDE minimisation scheme) **2D/3D CPU/GPU** (Ref. *8*) -6. Anisotropic Fourth-Order Diffusion (explicit PDE minimisation) **2D/3D CPU/GPU** (Ref. *9*) -7. A joint ROF-LLT (Lysaker-Lundervold-Tai) model for higher-order regularisation **2D/3D CPU/GPU** (Ref. *10,11*) -8. Nonlocal Total Variation regularisation (GS fixed point iteration) **2D CPU/GPU** (Ref. *12*) +4. Primal-Dual (PD) Total Variation **2D/3D CPU/GPU** (Ref. *13*) +5. Total Generalised Variation (TGV) model for higher-order regularisation **2D/3D CPU/GPU** (Ref. *6,13*) +6. Linear and nonlinear diffusion (explicit PDE minimisation scheme) **2D/3D CPU/GPU** (Ref. *8*) +7. Anisotropic Fourth-Order Diffusion (explicit PDE minimisation) **2D/3D CPU/GPU** (Ref. *9*) +8. A joint ROF-LLT (Lysaker-Lundervold-Tai) model for higher-order regularisation **2D/3D CPU/GPU** (Ref. *10,11*) +9. Nonlocal Total Variation regularisation (GS fixed point iteration) **2D CPU/GPU** (Ref. *12*)  ### Multi-channel (vectorial):  1. Fast-Gradient-Projection (FGP) Directional Total Variation **2D/3D CPU/GPU** (Ref. *3,4,2*) @@ -145,29 +146,32 @@ addpath(/path/to/library);  ```  ### References to implemented methods: -1. [Rudin, L.I., Osher, S. and Fatemi, E., 1992. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4), pp.259-268.](https://www.sciencedirect.com/science/article/pii/016727899290242F) +1. [Rudin, L.I., Osher, S. and Fatemi, E., 1992. Nonlinear total variation based noise removal algorithms. Physica D: nonlinear phenomena, 60(1-4)](https://www.sciencedirect.com/science/article/pii/016727899290242F) -2. [Beck, A. and Teboulle, M., 2009. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing, 18(11), pp.2419-2434.](https://doi.org/10.1109/TIP.2009.2028250) +2. [Beck, A. and Teboulle, M., 2009. Fast gradient-based algorithms for constrained total variation image denoising and deblurring problems. IEEE Transactions on Image Processing, 18(11)](https://doi.org/10.1109/TIP.2009.2028250) -3. [Ehrhardt, M.J. and Betcke, M.M., 2016. Multicontrast MRI reconstruction with structure-guided total variation. SIAM Journal on Imaging Sciences, 9(3), pp.1084-1106.](https://doi.org/10.1137/15M1047325) +3. [Ehrhardt, M.J. and Betcke, M.M., 2016. Multicontrast MRI reconstruction with structure-guided total variation. SIAM Journal on Imaging Sciences, 9(3)](https://doi.org/10.1137/15M1047325)  4. [Kazantsev, D., Jørgensen, J.S., Andersen, M., Lionheart, W.R., Lee, P.D. and Withers, P.J., 2018. Joint image reconstruction method with correlative multi-channel prior for X-ray spectral computed tomography. Inverse Problems, 34(6)](https://doi.org/10.1088/1361-6420/aaba86) **Results can be reproduced using the following** [SOFTWARE](https://github.com/dkazanc/multi-channel-X-ray-CT) -5. [Goldstein, T. and Osher, S., 2009. The split Bregman method for L1-regularized problems. SIAM journal on imaging sciences, 2(2), pp.323-343.](https://doi.org/10.1137/080725891) +5. [Goldstein, T. and Osher, S., 2009. The split Bregman method for L1-regularized problems. SIAM journal on imaging sciences, 2(2)](https://doi.org/10.1137/080725891) -6. [Bredies, K., Kunisch, K. and Pock, T., 2010. Total generalized variation. SIAM Journal on Imaging Sciences, 3(3), pp.492-526.](https://doi.org/10.1137/090769521) +6. [Bredies, K., Kunisch, K. and Pock, T., 2010. Total generalized variation. SIAM Journal on Imaging Sciences, 3(3)](https://doi.org/10.1137/090769521) -7. [Duran, J., Moeller, M., Sbert, C. and Cremers, D., 2016. Collaborative total variation: a general framework for vectorial TV models. SIAM Journal on Imaging Sciences, 9(1), pp.116-151.](https://doi.org/10.1137/15M102873X) +7. [Duran, J., Moeller, M., Sbert, C. and Cremers, D., 2016. Collaborative total variation: a general framework for vectorial TV models. SIAM Journal on Imaging Sciences, 9(1)](https://doi.org/10.1137/15M102873X) -8. [Black, M.J., Sapiro, G., Marimont, D.H. and Heeger, D., 1998. Robust anisotropic diffusion. IEEE Transactions on image processing, 7(3), pp.421-432.](https://doi.org/10.1109/83.661192) +8. [Black, M.J., Sapiro, G., Marimont, D.H. and Heeger, D., 1998. Robust anisotropic diffusion. IEEE Transactions on image processing, 7(3)](https://doi.org/10.1109/83.661192) -9. [Hajiaboli, M.R., 2011. An anisotropic fourth-order diffusion filter for image noise removal. International Journal of Computer Vision, 92(2), pp.177-191.](https://doi.org/10.1007/s11263-010-0330-1) +9. [Hajiaboli, M.R., 2011. An anisotropic fourth-order diffusion filter for image noise removal. International Journal of Computer Vision, 92(2)](https://doi.org/10.1007/s11263-010-0330-1) -10. [Lysaker, M., Lundervold, A. and Tai, X.C., 2003. Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Transactions on image processing, 12(12), pp.1579-1590.](https://doi.org/10.1109/TIP.2003.819229) +10. [Lysaker, M., Lundervold, A. and Tai, X.C., 2003. Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Transactions on image processing, 12(12)](https://doi.org/10.1109/TIP.2003.819229) -11. [Kazantsev, D., Guo, E., Phillion, A.B., Withers, P.J. and Lee, P.D., 2017. Model-based iterative reconstruction using higher-order regularization of dynamic synchrotron data. Measurement Science and Technology, 28(9), p.094004.](https://doi.org/10.1088/1361-6501/aa7fa8) +11. [Kazantsev, D., Guo, E., Phillion, A.B., Withers, P.J. and Lee, P.D., 2017. Model-based iterative reconstruction using higher-order regularization of dynamic synchrotron data. Measurement Science and Technology, 28(9)](https://doi.org/10.1088/1361-6501/aa7fa8) + +12. [Abderrahim E., Lezoray O. and Bougleux S. 2008. Nonlocal discrete regularization on weighted graphs: a framework for image and manifold processing. IEEE Trans. Image Processing 17(7), pp. 1047-1060.](https://ieeexplore.ieee.org/document/4526700) + +13. [Chambolle, A. and Pock, T., 2010. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision 40(1)](https://doi.org/10.1007/s10851-010-0251-1) -12. [Abderrahim E., Lezoray O. and Bougleux S. 2008. Nonlocal discrete regularization on weighted graphs: a framework for image and manifold processing." IEEE Trans. Image Processing 17(7), pp. 1047-1060.](https://ieeexplore.ieee.org/document/4526700)  ### References to software (please cite if used):  * [Kazantsev, D., Pasca, E., Turner, M.J. and Withers, P.J., 2019. CCPi-Regularisation toolkit for computed tomographic image reconstruction with proximal splitting algorithms. SoftwareX, 9, pp.317-323.](https://www.sciencedirect.com/science/article/pii/S2352711018301912) diff --git a/build/run.sh b/build/run.sh index bbbbc6a..f0b3d3e 100755 --- a/build/run.sh +++ b/build/run.sh @@ -8,9 +8,9 @@ cd ../build_proj/  #make clean  export CIL_VERSION=19.10  # install Python modules without CUDA -cmake ../ -DBUILD_PYTHON_WRAPPER=ON -DBUILD_MATLAB_WRAPPER=OFF -DBUILD_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./install +#cmake ../ -DBUILD_PYTHON_WRAPPER=ON -DBUILD_MATLAB_WRAPPER=OFF -DBUILD_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./install  # install Python modules with CUDA -# cmake ../ -DBUILD_PYTHON_WRAPPER=ON -DBUILD_MATLAB_WRAPPER=OFF -DBUILD_CUDA=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./install +cmake ../ -DBUILD_PYTHON_WRAPPER=ON -DBUILD_MATLAB_WRAPPER=OFF -DBUILD_CUDA=ON -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./install  # install Matlab modules without CUDA  # cmake ../ -DBUILD_PYTHON_WRAPPER=OFF -DMatlab_ROOT_DIR=/dls_sw/apps/matlab/r2014a/ -DBUILD_MATLAB_WRAPPER=ON -DBUILD_CUDA=OFF -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=./install  # install Matlab modules with CUDA diff --git a/demos/Matlab_demos/demoMatlab_3Ddenoise.m b/demos/Matlab_demos/demoMatlab_3Ddenoise.m index f018327..b7f92cb 100644 --- a/demos/Matlab_demos/demoMatlab_3Ddenoise.m +++ b/demos/Matlab_demos/demoMatlab_3Ddenoise.m @@ -62,6 +62,25 @@ fprintf('Denoise a volume using the FGP-TV model (GPU) \n');  % fprintf('%s %f \n', 'RMSE error for FGP-TV is:', rmse_fgpG);  % figure; imshow(u_fgpG(:,:,7), [0 1]); title('FGP-TV denoised volume (GPU)');  %% +fprintf('Denoise a volume using the PD-TV model (CPU) \n'); +lambda_reg = 0.03; % regularsation parameter for all methods +iter_pd = 300; % number of FGP iterations +epsil_tol =  0.0; % tolerance +tic; [u_pd,infovec] = PD_TV(single(vol3D), lambda_reg, iter_pd, epsil_tol); toc;  +energyfunc_val_fgp = TV_energy(single(u_pd),single(vol3D),lambda_reg, 1); % get energy function value +rmse_pd = (RMSE(Ideal3D(:),u_pd(:))); +fprintf('%s %f \n', 'RMSE error for PD-TV is:', rmse_pd); +figure; imshow(u_pd(:,:,7), [0 1]); title('PD-TV denoised volume (CPU)'); +%% +% fprintf('Denoise a volume using the PD-TV model (GPU) \n'); +% lambda_reg = 0.03; % regularsation parameter for all methods +% iter_pd = 300; % number of FGP iterations +% epsil_tol =  0.0; % tolerance +% tic; u_pdG = PD_TV_GPU(single(vol3D), lambda_reg, iter_pd, epsil_tol); toc;  +% rmse_pdG = (RMSE(Ideal3D(:),u_pdG(:))); +% fprintf('%s %f \n', 'RMSE error for PD-TV is:', rmse_pdG); +% figure; imshow(u_pdG(:,:,7), [0 1]); title('PD-TV denoised volume (GPU)'); +%%  fprintf('Denoise a volume using the SB-TV model (CPU) \n');  iter_sb = 150; % number of SB iterations  epsil_tol =  0.0; % tolerance diff --git a/demos/Matlab_demos/demoMatlab_denoise.m b/demos/Matlab_demos/demoMatlab_denoise.m index b50eaf5..3d93cb6 100644 --- a/demos/Matlab_demos/demoMatlab_denoise.m +++ b/demos/Matlab_demos/demoMatlab_denoise.m @@ -46,6 +46,22 @@ figure; imshow(u_fgp, [0 1]); title('FGP-TV denoised image (CPU)');  % tic; u_fgpG = FGP_TV_GPU(single(u0), lambda_reg, iter_fgp, epsil_tol); toc;   % figure; imshow(u_fgpG, [0 1]); title('FGP-TV denoised image (GPU)');  %% +fprintf('Denoise using the PD-TV model (CPU) \n'); +lambda_reg = 0.03; +iter_pd = 500; % number of FGP iterations +epsil_tol =  0.0; % tolerance +tic; [u_pd,infovec] = PD_TV(single(u0), lambda_reg, iter_pd, epsil_tol); toc;  +energyfunc_val_pd = TV_energy(single(u_pd),single(u0),lambda_reg, 1); % get energy function value +rmsePD = (RMSE(u_pd(:),Im(:))); +fprintf('%s %f \n', 'RMSE error for PD-TV is:', rmsePD); +[ssimval] = ssim(u_pd*255,single(Im)*255); +fprintf('%s %f \n', 'MSSIM error for PD-TV is:', ssimval); +figure; imshow(u_pd, [0 1]); title('PD-TV denoised image (CPU)'); +%% +% fprintf('Denoise using the PD-TV model (GPU) \n'); +% tic; u_pdG = PD_TV_GPU(single(u0), lambda_reg, iter_pd, epsil_tol); toc;  +% figure; imshow(u_pdG, [0 1]); title('PD-TV denoised image (GPU)'); +%%  fprintf('Denoise using the SB-TV model (CPU) \n');  lambda_reg = 0.03;  iter_sb = 200; % number of SB iterations diff --git a/demos/demo_cpu_regularisers.py b/demos/demo_cpu_regularisers.py index 09781d5..7d66b7f 100644 --- a/demos/demo_cpu_regularisers.py +++ b/demos/demo_cpu_regularisers.py @@ -130,7 +130,7 @@ pars = {'algorithm' : FGP_TV, \          'input' : u0,\          'regularisation_parameter':0.02, \          'number_of_iterations' :1500 ,\ -        'tolerance_constant':1e-06,\ +        'tolerance_constant':1e-08,\          'methodTV': 0 ,\          'nonneg': 0} @@ -176,7 +176,7 @@ pars = {'algorithm' : PD_TV, \          'input' : u0,\          'regularisation_parameter':0.02, \          'number_of_iterations' :1500 ,\ -        'tolerance_constant':1e-06,\ +        'tolerance_constant':1e-08,\          'methodTV': 0 ,\          'nonneg': 1,          'lipschitz_const' : 8, diff --git a/demos/demo_cpu_regularisers3D.py b/demos/demo_cpu_regularisers3D.py index 349300f..cfdd2d4 100644 --- a/demos/demo_cpu_regularisers3D.py +++ b/demos/demo_cpu_regularisers3D.py @@ -185,10 +185,11 @@ pars = {'algorithm' : PD_TV, \          'input' : noisyVol,\          'regularisation_parameter':0.02, \          'number_of_iterations' :1000 ,\ -        'tolerance_constant': 1e-06,\ +        'tolerance_constant':1e-06,\          'methodTV': 0 ,\ -        'lipschitz_const' : 12,\ -        'nonneg': 0} +        'nonneg': 0, +        'lipschitz_const' : 8, +        'tau' : 0.0025}  print ("#############FGP TV GPU####################")  start_time = timeit.default_timer() @@ -198,7 +199,8 @@ start_time = timeit.default_timer()                pars['tolerance_constant'],                 pars['methodTV'],                pars['nonneg'], -              pars['lipschitz_const'], 'cpu') +              pars['lipschitz_const'], +              pars['tau'],'cpu')  Qtools = QualityTools(idealVol, pd_cpu3D)  pars['rmse'] = Qtools.rmse() diff --git a/demos/demo_cpu_vs_gpu_regularisers.py b/demos/demo_cpu_vs_gpu_regularisers.py index 21e3899..015dfc6 100644 --- a/demos/demo_cpu_vs_gpu_regularisers.py +++ b/demos/demo_cpu_vs_gpu_regularisers.py @@ -12,7 +12,7 @@ import matplotlib.pyplot as plt  import numpy as np  import os  import timeit -from ccpi.filters.regularisers import ROF_TV, FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th +from ccpi.filters.regularisers import ROF_TV, FGP_TV, PD_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th  from ccpi.filters.regularisers import PatchSelect  from ccpi.supp.qualitymetrics import QualityTools  ############################################################################### @@ -220,6 +220,108 @@ if (diff_im.sum() > 1):  else:      print ("Arrays match") + +#%% +print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%") +print ("____________PD-TV bench___________________") +print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%") + +## plot  +fig = plt.figure() +plt.suptitle('Comparison of PD-TV regulariser using CPU and GPU implementations') +a=fig.add_subplot(1,4,1) +a.set_title('Noisy Image') +imgplot = plt.imshow(u0,cmap="gray") + +# set parameters +pars = {'algorithm' : PD_TV, \ +        'input' : u0,\ +        'regularisation_parameter':0.02, \ +        'number_of_iterations' :1500 ,\ +        'tolerance_constant':0.0,\ +        'methodTV': 0 ,\ +        'nonneg': 0, +        'lipschitz_const' : 8, +        'tau' : 0.0025} +         +print ("#############PD TV CPU####################") +start_time = timeit.default_timer() +(pd_cpu,info_vec_cpu) = PD_TV(pars['input'],  +              pars['regularisation_parameter'], +              pars['number_of_iterations'], +              pars['tolerance_constant'],  +              pars['methodTV'], +              pars['nonneg'], +              pars['lipschitz_const'], +              pars['tau'],'cpu') + +Qtools = QualityTools(Im, pd_cpu) +pars['rmse'] = Qtools.rmse() + +txtstr = printParametersToString(pars) +txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time) +print (txtstr) +a=fig.add_subplot(1,4,2) + +# these are matplotlib.patch.Patch properties +props = dict(boxstyle='round', facecolor='wheat', alpha=0.75) +# place a text box in upper left in axes coords +a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14, +         verticalalignment='top', bbox=props) +imgplot = plt.imshow(pd_cpu, cmap="gray") +plt.title('{}'.format('CPU results')) + +# set parameters +pars = {'algorithm' : PD_TV, \ +        'input' : u0,\ +        'regularisation_parameter':0.02, \ +        'number_of_iterations' :1500 ,\ +        'tolerance_constant':0.0,\ +        'methodTV': 0 ,\ +        'nonneg': 0, +        'lipschitz_const' : 8, +        'tau' : 0.0025} +         +print ("#############PD TV CPU####################") +start_time = timeit.default_timer() +(pd_gpu,info_vec_gpu) = PD_TV(pars['input'],  +              pars['regularisation_parameter'], +              pars['number_of_iterations'], +              pars['tolerance_constant'],  +              pars['methodTV'], +              pars['nonneg'], +              pars['lipschitz_const'], +              pars['tau'],'gpu') + +Qtools = QualityTools(Im, pd_gpu) +pars['rmse'] = Qtools.rmse() + +txtstr = printParametersToString(pars) +txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time) +print (txtstr) +a=fig.add_subplot(1,4,3) + +# these are matplotlib.patch.Patch properties +props = dict(boxstyle='round', facecolor='wheat', alpha=0.75) +# place a text box in upper left in axes coords +a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14, +         verticalalignment='top', bbox=props) +imgplot = plt.imshow(pd_gpu, cmap="gray") +plt.title('{}'.format('GPU results')) + + +print ("--------Compare the results--------") +tolerance = 1e-05 +diff_im = np.zeros(np.shape(pd_cpu)) +diff_im = abs(pd_cpu - pd_gpu) +diff_im[diff_im > tolerance] = 1 +a=fig.add_subplot(1,4,4) +imgplot = plt.imshow(diff_im, vmin=0, vmax=1, cmap="gray") +plt.title('{}'.format('Pixels larger threshold difference')) +if (diff_im.sum() > 1): +    print ("Arrays do not match!") +else: +    print ("Arrays match")  #%%  print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")  print ("____________SB-TV bench___________________") diff --git a/demos/demo_gpu_regularisers.py b/demos/demo_gpu_regularisers.py index 3efcfce..5131847 100644 --- a/demos/demo_gpu_regularisers.py +++ b/demos/demo_gpu_regularisers.py @@ -12,7 +12,7 @@ import matplotlib.pyplot as plt  import numpy as np  import os  import timeit -from ccpi.filters.regularisers import ROF_TV, FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th +from ccpi.filters.regularisers import ROF_TV, FGP_TV, PD_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th  from ccpi.filters.regularisers import PatchSelect, NLTV  from ccpi.supp.qualitymetrics import QualityTools  ############################################################################### @@ -158,6 +158,55 @@ imgplot = plt.imshow(fgp_gpu, cmap="gray")  plt.title('{}'.format('GPU results'))  #%%  print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%") +print ("_______________PD-TV (2D)__________________") +print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%") + +## plot  +fig = plt.figure() +plt.suptitle('Performance of PD-TV regulariser using the GPU') +a=fig.add_subplot(1,2,1) +a.set_title('Noisy Image') +imgplot = plt.imshow(u0,cmap="gray") + +# set parameters +pars = {'algorithm' : PD_TV, \ +        'input' : u0,\ +        'regularisation_parameter':0.02, \ +        'number_of_iterations' :1500 ,\ +        'tolerance_constant':1e-06,\ +        'methodTV': 0 ,\ +        'nonneg': 1, +        'lipschitz_const' : 8, +        'tau' : 0.0025} +         +print ("#############PD TV CPU####################") +start_time = timeit.default_timer() +(pd_gpu,info_vec_gpu) = PD_TV(pars['input'],  +              pars['regularisation_parameter'], +              pars['number_of_iterations'], +              pars['tolerance_constant'],  +              pars['methodTV'], +              pars['nonneg'], +              pars['lipschitz_const'], +              pars['tau'],'gpu') + +Qtools = QualityTools(Im, pd_gpu) +pars['rmse'] = Qtools.rmse() + +txtstr = printParametersToString(pars) +txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time) +print (txtstr) +a=fig.add_subplot(1,2,2) + +# these are matplotlib.patch.Patch properties +props = dict(boxstyle='round', facecolor='wheat', alpha=0.75) +# place a text box in upper left in axes coords +a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14, +         verticalalignment='top', bbox=props) +imgplot = plt.imshow(pd_gpu, cmap="gray") +plt.title('{}'.format('GPU results')) +#%% +print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")  print ("____________SB-TV regulariser______________")  print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%") diff --git a/demos/demo_gpu_regularisers3D.py b/demos/demo_gpu_regularisers3D.py index ccf9694..2c25d01 100644 --- a/demos/demo_gpu_regularisers3D.py +++ b/demos/demo_gpu_regularisers3D.py @@ -12,7 +12,7 @@ import matplotlib.pyplot as plt  import numpy as np  import os  import timeit -from ccpi.filters.regularisers import ROF_TV, FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th +from ccpi.filters.regularisers import ROF_TV, FGP_TV, PD_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th  from ccpi.supp.qualitymetrics import QualityTools  ###############################################################################  def printParametersToString(pars): @@ -172,7 +172,54 @@ a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14,           verticalalignment='top', bbox=props)  imgplot = plt.imshow(fgp_gpu3D[10,:,:], cmap="gray")  plt.title('{}'.format('Recovered volume on the GPU using FGP-TV')) +#%% +print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%") +print ("_______________PD-TV (3D)__________________") +print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%") + +## plot  +fig = plt.figure() +plt.suptitle('Performance of PD-TV regulariser using the GPU') +a=fig.add_subplot(1,2,1) +a.set_title('Noisy Image') +imgplot = plt.imshow(noisyVol[10,:,:],cmap="gray") +# set parameters +pars = {'algorithm' : PD_TV, \ +        'input' : noisyVol,\ +        'regularisation_parameter':0.02, \ +        'number_of_iterations' :1000 ,\ +        'tolerance_constant':1e-06,\ +        'methodTV': 0 ,\ +        'nonneg': 0, +        'lipschitz_const' : 8, +        'tau' : 0.0025} + +print ("#############PD TV GPU####################") +start_time = timeit.default_timer() +(pd_gpu3D, info_vec_gpu)  = PD_TV(pars['input'],  +              pars['regularisation_parameter'], +              pars['number_of_iterations'], +              pars['tolerance_constant'],  +              pars['methodTV'], +              pars['nonneg'], +              pars['lipschitz_const'], +              pars['tau'],'gpu') + +Qtools = QualityTools(idealVol, pd_gpu3D) +pars['rmse'] = Qtools.rmse() +txtstr = printParametersToString(pars) +txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time) +print (txtstr) +a=fig.add_subplot(1,2,2) + +# these are matplotlib.patch.Patch properties +props = dict(boxstyle='round', facecolor='wheat', alpha=0.75) +# place a text box in upper left in axes coords +a.text(0.15, 0.25, txtstr, transform=a.transAxes, fontsize=14, +         verticalalignment='top', bbox=props) +imgplot = plt.imshow(pd_gpu3D[10,:,:], cmap="gray") +plt.title('{}'.format('Recovered volume on the GPU using PD-TV'))  #%%  print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")  print ("_______________SB-TV (3D)__________________") diff --git a/src/CMakeLists.txt b/src/CMakeLists.txt index 5fe1a57..f93c19a 100644 --- a/src/CMakeLists.txt +++ b/src/CMakeLists.txt @@ -17,4 +17,12 @@ if (BUILD_MATLAB_WRAPPER)  endif()  if (BUILD_PYTHON_WRAPPER)      add_subdirectory(Python) -endif()
\ No newline at end of file +endif() +find_package(OpenMP REQUIRED) +if (OPENMP_FOUND) +    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}") +    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}") +    set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}") +   set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_SHARED_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}") +   set (CMAKE_STATIC_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_STATIC_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}") +endif() diff --git a/src/Core/CMakeLists.txt b/src/Core/CMakeLists.txt index c1bd7bb..07aae3c 100644 --- a/src/Core/CMakeLists.txt +++ b/src/Core/CMakeLists.txt @@ -12,17 +12,6 @@ set (CIL_VERSION $ENV{CIL_VERSION} CACHE INTERNAL "Core Imaging Library version"  message("CIL_VERSION ${CIL_VERSION}")  #include (GenerateExportHeader) - -find_package(OpenMP) -if (OPENMP_FOUND) -    set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}") -    set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}") -    set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}") -   set (CMAKE_SHARED_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_SHARED_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}") -   set (CMAKE_STATIC_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_STATIC_LINKER_FLAGS} ${OpenMP_CXX_FLAGS}") - -endif() -  ## Build the regularisers package as a library  message("Creating Regularisers as a shared library") @@ -115,6 +104,7 @@ if (BUILD_CUDA)        CUDA_ADD_LIBRARY(cilregcuda SHARED          ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/TV_ROF_GPU_core.cu          ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/TV_FGP_GPU_core.cu +        ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/TV_PD_GPU_core.cu          ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/TV_SB_GPU_core.cu          ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/LLT_ROF_GPU_core.cu          ${CMAKE_CURRENT_SOURCE_DIR}/regularisers_GPU/TGV_GPU_core.cu diff --git a/src/Core/regularisers_CPU/PD_TV_core.c b/src/Core/regularisers_CPU/PD_TV_core.c index 65b8711..534091b 100644 --- a/src/Core/regularisers_CPU/PD_TV_core.c +++ b/src/Core/regularisers_CPU/PD_TV_core.c @@ -81,6 +81,7 @@ float PDTV_CPU_main(float *Input, float *U, float *infovector, float lambdaPar,              /* copy U to U_old */              copyIm(U, U_old, (long)(dimX), (long)(dimY), 1l); +            /* calculate divergence */              DivProj2D(U, Input, P1, P2,(long)(dimX), (long)(dimY), lt, tau);              /* check early stopping criteria */ diff --git a/src/Core/regularisers_GPU/TV_PD_GPU_core.cu b/src/Core/regularisers_GPU/TV_PD_GPU_core.cu new file mode 100644 index 0000000..59fda3b --- /dev/null +++ b/src/Core/regularisers_GPU/TV_PD_GPU_core.cu @@ -0,0 +1,522 @@ +/* +This work is part of the Core Imaging Library developed by +Visual Analytics and Imaging System Group of the Science Technology +Facilities Council, STFC + +Copyright 2017 Daniil Kazantsev +Copyright 2017 Srikanth Nagella, Edoardo Pasca + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at +http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License. +*/ + +#include "TV_PD_GPU_core.h" +#include "shared.h" +#include <thrust/functional.h> +#include <thrust/device_vector.h> +#include <thrust/transform_reduce.h> + +/* CUDA implementation of Primal-Dual TV [1] by Chambolle Pock denoising/regularization model (2D/3D case) + * + * Input Parameters: + * 1. Noisy image/volume + * 2. lambdaPar - regularization parameter + * 3. Number of iterations + * 4. eplsilon: tolerance constant + * 5. lipschitz_const: convergence related parameter + * 6. TV-type: methodTV - 'iso' (0) or 'l1' (1) + * 7. nonneg: 'nonnegativity (0 is OFF by default, 1 is ON) + * 8. tau: time marching parameter + + * Output: + * [1] TV - Filtered/regularized image/volume + * [2] Information vector which contains [iteration no., reached tolerance] + * + * [1] Antonin Chambolle, Thomas Pock. "A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging", 2010 + */ + +#define BLKXSIZE2D 16 +#define BLKYSIZE2D 16 + +#define BLKXSIZE 8 +#define BLKYSIZE 8 +#define BLKZSIZE 8 + +#define idivup(a, b) ( ((a)%(b) != 0) ? (a)/(b)+1 : (a)/(b) ) +// struct square { __host__ __device__ float operator()(float x) { return x * x; } }; + +/************************************************/ +/*****************2D modules*********************/ +/************************************************/ + +__global__ void dualPD_kernel(float *U, float *P1, float *P2, float sigma, int N, int M) +{ + +   //calculate each thread global index +   const int xIndex=blockIdx.x*blockDim.x+threadIdx.x; +   const int yIndex=blockIdx.y*blockDim.y+threadIdx.y; + +   int index = xIndex + N*yIndex; + +   if ((xIndex < N) && (yIndex < M)) { +     if (xIndex == N-1) P1[index] += sigma*(U[(xIndex-1) + N*yIndex] - U[index]); +     else P1[index] += sigma*(U[(xIndex+1) + N*yIndex] - U[index]); +     if (yIndex == M-1) P2[index] += sigma*(U[xIndex + N*(yIndex-1)] - U[index]); +     else  P2[index] += sigma*(U[xIndex + N*(yIndex+1)] - U[index]); +   } +   return; +} +__global__ void Proj_funcPD2D_iso_kernel(float *P1, float *P2, int N, int M, int ImSize) +{ + +   float denom; +   //calculate each thread global index +   const int xIndex=blockIdx.x*blockDim.x+threadIdx.x; +   const int yIndex=blockIdx.y*blockDim.y+threadIdx.y; + +   int index = xIndex + N*yIndex; + +   if ((xIndex < N) && (yIndex < M)) { +       denom = pow(P1[index],2) +  pow(P2[index],2); +       if (denom > 1.0f) { +           P1[index] = P1[index]/sqrt(denom); +           P2[index] = P2[index]/sqrt(denom); +       } +   } +   return; +} +__global__ void Proj_funcPD2D_aniso_kernel(float *P1, float *P2, int N, int M, int ImSize) +{ + +   float val1, val2; +   //calculate each thread global index +   const int xIndex=blockIdx.x*blockDim.x+threadIdx.x; +   const int yIndex=blockIdx.y*blockDim.y+threadIdx.y; + +   int index = xIndex + N*yIndex; + +   if ((xIndex < N) && (yIndex < M)) { +               val1 = abs(P1[index]); +               val2 = abs(P2[index]); +               if (val1 < 1.0f) {val1 = 1.0f;} +               if (val2 < 1.0f) {val2 = 1.0f;} +               P1[index] = P1[index]/val1; +               P2[index] = P2[index]/val2; +   } +   return; +} +__global__ void DivProj2D_kernel(float *U, float *Input, float *P1, float *P2, float lt, float tau, int N, int M) +{ +   float P_v1, P_v2, div_var; + +   //calculate each thread global index +   const int xIndex=blockIdx.x*blockDim.x+threadIdx.x; +   const int yIndex=blockIdx.y*blockDim.y+threadIdx.y; + +   int index = xIndex + N*yIndex; + +   if ((xIndex < N) && (yIndex < M)) { +     if (xIndex == 0) P_v1 = -P1[index]; +     else P_v1 = -(P1[index] - P1[(xIndex-1) + N*yIndex]); +     if (yIndex == 0) P_v2 = -P2[index]; +     else  P_v2 = -(P2[index] - P2[xIndex + N*(yIndex-1)]); +     div_var = P_v1 + P_v2; +     U[index] = (U[index] - tau*div_var + lt*Input[index])/(1.0 + lt); +   } +   return; +} +__global__ void PDnonneg2D_kernel(float* Output, int N, int M, int num_total) +{ +   int xIndex = blockDim.x * blockIdx.x + threadIdx.x; +   int yIndex = blockDim.y * blockIdx.y + threadIdx.y; + +   int index = xIndex + N*yIndex; + +   if (index < num_total)	{ +       if (Output[index] < 0.0f) Output[index] = 0.0f; +   } +} +/************************************************/ +/*****************3D modules*********************/ +/************************************************/ +__global__ void dualPD3D_kernel(float *U, float *P1, float *P2, float *P3, float sigma, int N, int M, int Z) +{ + +  //calculate each thread global index +  int i = blockDim.x * blockIdx.x + threadIdx.x; +  int j = blockDim.y * blockIdx.y + threadIdx.y; +  int k = blockDim.z * blockIdx.z + threadIdx.z; + +  int index = (N*M)*k + i + N*j; + +  if ((i < N) && (j < M) && (k < Z)) { +     if (i == N-1) P1[index] += sigma*(U[(N*M)*k + (i-1) + N*j] - U[index]); +     else P1[index] += sigma*(U[(N*M)*k + (i+1) + N*j] - U[index]); +     if (j == M-1) P2[index] += sigma*(U[(N*M)*k + i + N*(j-1)] - U[index]); +     else  P2[index] += sigma*(U[(N*M)*k + i + N*(j+1)] - U[index]); +     if (k == Z-1) P3[index] += sigma*(U[(N*M)*(k-1) + i + N*j] - U[index]); +     else  P3[index] += sigma*(U[(N*M)*(k+1) + i + N*j] - U[index]); +   } +   return; +} +__global__ void Proj_funcPD3D_iso_kernel(float *P1, float *P2, float *P3, int N, int M, int Z, int ImSize) +{ + +   float denom,sq_denom; +   //calculate each thread global index +   int i = blockDim.x * blockIdx.x + threadIdx.x; +   int j = blockDim.y * blockIdx.y + threadIdx.y; +   int k = blockDim.z * blockIdx.z + threadIdx.z; + +   int index = (N*M)*k + i + N*j; + +   if ((i < N) && (j < M) && (k <  Z)) { +       denom = pow(P1[index],2) +  pow(P2[index],2) + pow(P3[index],2); +       if (denom > 1.0f) { +           sq_denom = 1.0f/sqrt(denom); +           P1[index] *= sq_denom; +           P2[index] *= sq_denom; +           P3[index] *= sq_denom; +       } +   } +   return; +} +__global__ void Proj_funcPD3D_aniso_kernel(float *P1, float *P2, float *P3, int N, int M, int Z, int ImSize) +{ + +   float val1, val2, val3; +   //calculate each thread global index +   int i = blockDim.x * blockIdx.x + threadIdx.x; +   int j = blockDim.y * blockIdx.y + threadIdx.y; +   int k = blockDim.z * blockIdx.z + threadIdx.z; + +   int index = (N*M)*k + i + N*j; + +   if ((i < N) && (j < M) && (k <  Z)) { +               val1 = abs(P1[index]); +               val2 = abs(P2[index]); +               val3 = abs(P3[index]); +               if (val1 < 1.0f) {val1 = 1.0f;} +               if (val2 < 1.0f) {val2 = 1.0f;} +               if (val3 < 1.0f) {val3 = 1.0f;} +               P1[index] /= val1; +               P2[index] /= val2; +               P3[index] /= val3; +   } +   return; +} +__global__ void DivProj3D_kernel(float *U, float *Input, float *P1, float *P2, float *P3, float lt, float tau, int N, int M, int Z) +{ +   float P_v1, P_v2, P_v3, div_var; + +   //calculate each thread global index +   int i = blockDim.x * blockIdx.x + threadIdx.x; +   int j = blockDim.y * blockIdx.y + threadIdx.y; +   int k = blockDim.z * blockIdx.z + threadIdx.z; + +   int index = (N*M)*k + i + N*j; + +   if ((i < N) && (j < M) && (k <  Z)) { +     if (i == 0) P_v1 = -P1[index]; +     else P_v1 = -(P1[index] - P1[(N*M)*k + (i-1) + N*j]); +     if (j == 0) P_v2 = -P2[index]; +     else  P_v2 = -(P2[index] - P2[(N*M)*k + i + N*(j-1)]); +     if (k == 0) P_v3 = -P3[index]; +     else  P_v3 = -(P3[index] - P3[(N*M)*(k-1) + i + N*j]); +     div_var = P_v1 + P_v2 + P_v3; +     U[index] = (U[index] - tau*div_var + lt*Input[index])/(1.0 + lt); +   } +   return; +} + +__global__ void PDnonneg3D_kernel(float* Output, int N, int M, int Z, int num_total) +{ +   int i = blockDim.x * blockIdx.x + threadIdx.x; +   int j = blockDim.y * blockIdx.y + threadIdx.y; +   int k = blockDim.z * blockIdx.z + threadIdx.z; + +   int index = (N*M)*k + i + N*j; + +   if (index < num_total)	{ +       if (Output[index] < 0.0f) Output[index] = 0.0f; +   } +} +__global__ void PDcopy_kernel2D(float *Input, float* Output, int N, int M, int num_total) +{ +   int xIndex = blockDim.x * blockIdx.x + threadIdx.x; +   int yIndex = blockDim.y * blockIdx.y + threadIdx.y; + +   int index = xIndex + N*yIndex; + +   if (index < num_total)	{ +       Output[index] = Input[index]; +   } +} + +__global__ void PDcopy_kernel3D(float *Input, float* Output, int N, int M, int Z, int num_total) +{ +   int i = blockDim.x * blockIdx.x + threadIdx.x; +   int j = blockDim.y * blockIdx.y + threadIdx.y; +   int k = blockDim.z * blockIdx.z + threadIdx.z; + +   int index = (N*M)*k + i + N*j; + +   if (index < num_total)	{ +       Output[index] = Input[index]; +   } +} + +__global__ void getU2D_kernel(float *Input, float *Input_old, float theta, int N, int M, int num_total) +{ +   int xIndex = blockDim.x * blockIdx.x + threadIdx.x; +   int yIndex = blockDim.y * blockIdx.y + threadIdx.y; + +   int index = xIndex + N*yIndex; + +   if (index < num_total)	{ +       Input[index] += theta*(Input[index] - Input_old[index]); +   } +} + +__global__ void getU3D_kernel(float *Input, float *Input_old, float theta, int N, int M, int Z, int num_total) +{ +  int i = blockDim.x * blockIdx.x + threadIdx.x; +  int j = blockDim.y * blockIdx.y + threadIdx.y; +  int k = blockDim.z * blockIdx.z + threadIdx.z; + +  int index = (N*M)*k + i + N*j; + +   if (index < num_total)	{ +       Input[index] += theta*(Input[index] - Input_old[index]); +   } +} + +__global__ void PDResidCalc2D_kernel(float *Input1, float *Input2, float* Output, int N, int M, int num_total) +{ +    int xIndex = blockDim.x * blockIdx.x + threadIdx.x; +    int yIndex = blockDim.y * blockIdx.y + threadIdx.y; + +    int index = xIndex + N*yIndex; + +    if (index < num_total)	{ +        Output[index] = Input1[index] - Input2[index]; +    } +} + +__global__ void PDResidCalc3D_kernel(float *Input1, float *Input2, float* Output, int N, int M, int Z, int num_total) +{ +   	int i = blockDim.x * blockIdx.x + threadIdx.x; +    int j = blockDim.y * blockIdx.y + threadIdx.y; +    int k = blockDim.z * blockIdx.z + threadIdx.z; + +    int index = (N*M)*k + i + N*j; + +    if (index < num_total)	{ +        Output[index] = Input1[index] - Input2[index]; +    } +} + + +/*%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%*/ + +////////////MAIN HOST FUNCTION /////////////// +extern "C" int TV_PD_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, int iter, float epsil, float lipschitz_const, int methodTV, int nonneg, float tau, int dimX, int dimY, int dimZ) +{ +   int deviceCount = -1; // number of devices +   cudaGetDeviceCount(&deviceCount); +   if (deviceCount == 0) { +       fprintf(stderr, "No CUDA devices found\n"); +       return -1; +   } +   int count = 0, i; +   float re, sigma, theta, lt; +   re = 0.0f; + +   sigma = 1.0/(lipschitz_const*tau); +   theta = 1.0f; +   lt = tau/lambdaPar; + +   if (dimZ <= 1) { +   /*2D verson*/ +     int ImSize = dimX*dimY; +     float *d_input, *d_update, *d_old=NULL, *P1=NULL, *P2=NULL; + +     dim3 dimBlock(BLKXSIZE2D,BLKYSIZE2D); +     dim3 dimGrid(idivup(dimX,BLKXSIZE2D), idivup(dimY,BLKYSIZE2D)); + +      /*allocate space for images on device*/ +      checkCudaErrors( cudaMalloc((void**)&d_input,ImSize*sizeof(float)) ); +      checkCudaErrors( cudaMalloc((void**)&d_update,ImSize*sizeof(float)) ); +      checkCudaErrors( cudaMalloc((void**)&d_old,ImSize*sizeof(float)) ); +      checkCudaErrors( cudaMalloc((void**)&P1,ImSize*sizeof(float)) ); +      checkCudaErrors( cudaMalloc((void**)&P2,ImSize*sizeof(float)) ); + +       checkCudaErrors( cudaMemcpy(d_input,Input,ImSize*sizeof(float),cudaMemcpyHostToDevice)); +       checkCudaErrors( cudaMemcpy(d_update,Input,ImSize*sizeof(float),cudaMemcpyHostToDevice)); +       cudaMemset(P1, 0, ImSize*sizeof(float)); +       cudaMemset(P2, 0, ImSize*sizeof(float)); + +       /********************** Run CUDA 2D kernel here ********************/ +       /* The main kernel */ +       for (i = 0; i < iter; i++) { + +           /* computing the the dual P variable */ +           dualPD_kernel<<<dimGrid,dimBlock>>>(d_update, P1, P2, sigma, dimX, dimY); +           checkCudaErrors( cudaDeviceSynchronize() ); +           checkCudaErrors(cudaPeekAtLastError() ); + +           if (nonneg != 0) { +           PDnonneg2D_kernel<<<dimGrid,dimBlock>>>(d_update, dimX, dimY, ImSize); +           checkCudaErrors( cudaDeviceSynchronize() ); +           checkCudaErrors(cudaPeekAtLastError() ); } + +           /* projection step */ +           if (methodTV == 0) Proj_funcPD2D_iso_kernel<<<dimGrid,dimBlock>>>(P1, P2, dimX, dimY, ImSize); /*isotropic TV*/ +           else Proj_funcPD2D_aniso_kernel<<<dimGrid,dimBlock>>>(P1, P2, dimX, dimY, ImSize); /*anisotropic TV*/ +           checkCudaErrors( cudaDeviceSynchronize() ); +           checkCudaErrors(cudaPeekAtLastError() ); + +           /* copy U to U_old */ +           PDcopy_kernel2D<<<dimGrid,dimBlock>>>(d_update, d_old, dimX, dimY, ImSize); +           checkCudaErrors( cudaDeviceSynchronize() ); +           checkCudaErrors(cudaPeekAtLastError() ); + +           /* calculate divergence */ +           DivProj2D_kernel<<<dimGrid,dimBlock>>>(d_update, d_input, P1, P2, lt, tau, dimX, dimY); +           checkCudaErrors( cudaDeviceSynchronize() ); +           checkCudaErrors(cudaPeekAtLastError() ); + +           if ((epsil != 0.0f) && (i % 5 == 0)) { +               /* calculate norm - stopping rules using the Thrust library */ +               PDResidCalc2D_kernel<<<dimGrid,dimBlock>>>(d_update, d_old, P1, dimX, dimY, ImSize); +               checkCudaErrors( cudaDeviceSynchronize() ); +               checkCudaErrors(cudaPeekAtLastError() ); + +               // setup arguments +               square<float>        unary_op; +               thrust::plus<float> binary_op; +               thrust::device_vector<float> d_vec(P1, P1 + ImSize); +               float reduction = std::sqrt(thrust::transform_reduce(d_vec.begin(), d_vec.end(), unary_op, 0.0f, binary_op)); +               thrust::device_vector<float> d_vec2(d_update, d_update + ImSize); +               float reduction2 = std::sqrt(thrust::transform_reduce(d_vec2.begin(), d_vec2.end(), unary_op, 0.0f, binary_op)); + +               // compute norm +               re = (reduction/reduction2); +               if (re < epsil)  count++; +               if (count > 3) break; +             } + +           getU2D_kernel<<<dimGrid,dimBlock>>>(d_update, d_old, theta, dimX, dimY, ImSize); +           checkCudaErrors( cudaDeviceSynchronize() ); +           checkCudaErrors(cudaPeekAtLastError() ); +          } +           //copy result matrix from device to host memory +           cudaMemcpy(Output,d_update,ImSize*sizeof(float),cudaMemcpyDeviceToHost); + +           cudaFree(d_input); +           cudaFree(d_update); +           cudaFree(d_old); +           cudaFree(P1); +           cudaFree(P2); + +   } +   else { +           /*3D verson*/ +           int ImSize = dimX*dimY*dimZ; +           float *d_input, *d_update, *d_old=NULL, *P1=NULL, *P2=NULL, *P3=NULL; + +           dim3 dimBlock(BLKXSIZE,BLKYSIZE,BLKZSIZE); +           dim3 dimGrid(idivup(dimX,BLKXSIZE), idivup(dimY,BLKYSIZE),idivup(dimZ,BLKZSIZE)); + +           /*allocate space for images on device*/ +           checkCudaErrors( cudaMalloc((void**)&d_input,ImSize*sizeof(float)) ); +           checkCudaErrors( cudaMalloc((void**)&d_update,ImSize*sizeof(float)) ); +           checkCudaErrors( cudaMalloc((void**)&d_old,ImSize*sizeof(float)) ); +           checkCudaErrors( cudaMalloc((void**)&P1,ImSize*sizeof(float)) ); +           checkCudaErrors( cudaMalloc((void**)&P2,ImSize*sizeof(float)) ); +           checkCudaErrors( cudaMalloc((void**)&P3,ImSize*sizeof(float)) ); + +            checkCudaErrors( cudaMemcpy(d_input,Input,ImSize*sizeof(float),cudaMemcpyHostToDevice)); +            checkCudaErrors( cudaMemcpy(d_update,Input,ImSize*sizeof(float),cudaMemcpyHostToDevice)); +            cudaMemset(P1, 0, ImSize*sizeof(float)); +            cudaMemset(P2, 0, ImSize*sizeof(float)); +            cudaMemset(P3, 0, ImSize*sizeof(float)); +           /********************** Run CUDA 3D kernel here ********************/ +       for (i = 0; i < iter; i++) { + +         /* computing the the dual P variable */ +         dualPD3D_kernel<<<dimGrid,dimBlock>>>(d_update, P1, P2, P3, sigma, dimX, dimY, dimZ); +         checkCudaErrors( cudaDeviceSynchronize() ); +         checkCudaErrors(cudaPeekAtLastError() ); + +         if (nonneg != 0) { +         PDnonneg3D_kernel<<<dimGrid,dimBlock>>>(d_update, dimX, dimY, dimZ, ImSize); +         checkCudaErrors( cudaDeviceSynchronize() ); +         checkCudaErrors(cudaPeekAtLastError() ); } + +         /* projection step */ +         if (methodTV == 0) Proj_funcPD3D_iso_kernel<<<dimGrid,dimBlock>>>(P1, P2, P3, dimX, dimY, dimZ, ImSize); /*isotropic TV*/ +         else Proj_funcPD3D_aniso_kernel<<<dimGrid,dimBlock>>>(P1, P2, P3, dimX, dimY, dimZ, ImSize); /*anisotropic TV*/ +         checkCudaErrors( cudaDeviceSynchronize() ); +         checkCudaErrors(cudaPeekAtLastError() ); + +         /* copy U to U_old */ +         PDcopy_kernel3D<<<dimGrid,dimBlock>>>(d_update, d_old, dimX, dimY, dimZ, ImSize); +         checkCudaErrors( cudaDeviceSynchronize() ); +         checkCudaErrors(cudaPeekAtLastError() ); + +         /* calculate divergence */ +         DivProj3D_kernel<<<dimGrid,dimBlock>>>(d_update, d_input, P1, P2, P3, lt, tau, dimX, dimY, dimZ); +         checkCudaErrors( cudaDeviceSynchronize() ); +         checkCudaErrors(cudaPeekAtLastError() ); + +         if ((epsil != 0.0f) && (i % 5 == 0)) { +         /* calculate norm - stopping rules using the Thrust library */ +         PDResidCalc3D_kernel<<<dimGrid,dimBlock>>>(d_update, d_old, P1, dimX, dimY, dimZ, ImSize); +         checkCudaErrors( cudaDeviceSynchronize() ); +         checkCudaErrors(cudaPeekAtLastError() ); + +        // setup arguments +         square<float>        unary_op; +         thrust::plus<float> binary_op; +         thrust::device_vector<float> d_vec(P1, P1 + ImSize); +         float reduction = std::sqrt(thrust::transform_reduce(d_vec.begin(), d_vec.end(), unary_op, 0.0f, binary_op)); +         thrust::device_vector<float> d_vec2(d_update, d_update + ImSize); +         float reduction2 = std::sqrt(thrust::transform_reduce(d_vec2.begin(), d_vec2.end(), unary_op, 0.0f, binary_op)); + +           // compute norm +           re = (reduction/reduction2); +           if (re < epsil)  count++; +           if (count > 3) break; +           } + +           /* get U*/ +           getU3D_kernel<<<dimGrid,dimBlock>>>(d_update, d_old, theta, dimX, dimY, dimZ, ImSize); +           checkCudaErrors( cudaDeviceSynchronize() ); +           checkCudaErrors(cudaPeekAtLastError() ); +         } +           /***************************************************************/ +           //copy result matrix from device to host memory +           cudaMemcpy(Output,d_update,ImSize*sizeof(float),cudaMemcpyDeviceToHost); + +           cudaFree(d_input); +           cudaFree(d_update); +           cudaFree(d_old); +           cudaFree(P1); +           cudaFree(P2); +           cudaFree(P3); + +   } +   //cudaDeviceReset(); +   /*adding info into info_vector */ +   infovector[0] = (float)(i);  /*iterations number (if stopped earlier based on tolerance)*/ +   infovector[1] = re;  /* reached tolerance */ +   return 0; +} diff --git a/src/Core/regularisers_GPU/TV_PD_GPU_core.h b/src/Core/regularisers_GPU/TV_PD_GPU_core.h new file mode 100644 index 0000000..2b123d9 --- /dev/null +++ b/src/Core/regularisers_GPU/TV_PD_GPU_core.h @@ -0,0 +1,9 @@ +#ifndef _TV_PD_GPU_ +#define _TV_PD_GPU_ + +#include "CCPiDefines.h" +#include <memory.h> + +extern "C" CCPI_EXPORT int TV_PD_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, int iter, float epsil, float lipschitz_const, int methodTV, int nonneg, float tau, int dimX, int dimY, int dimZ); + +#endif diff --git a/src/Matlab/mex_compile/compileGPU_mex.m b/src/Matlab/mex_compile/compileGPU_mex.m index 56fcd38..577630f 100644 --- a/src/Matlab/mex_compile/compileGPU_mex.m +++ b/src/Matlab/mex_compile/compileGPU_mex.m @@ -41,6 +41,11 @@ fprintf('%s \n', 'Compiling SB-TV...');  mex -g -I/usr/local/cuda-10.0/include -L/usr/local/cuda-10.0/lib64 -lcudart -lcufft -lmwgpu SB_TV_GPU.cpp TV_SB_GPU_core.o  movefile('SB_TV_GPU.mex*',Pathmove); +fprintf('%s \n', 'Compiling PD-TV...'); +!/usr/local/cuda/bin/nvcc -O0 -c TV_PD_GPU_core.cu -Xcompiler -fPIC -I~/SOFT/MATLAB9/extern/include/ +mex -g -I/usr/local/cuda-10.0/include -L/usr/local/cuda-10.0/lib64 -lcudart -lcufft -lmwgpu PD_TV_GPU.cpp TV_PD_GPU_core.o +movefile('PD_TV_GPU.mex*',Pathmove); +  fprintf('%s \n', 'Compiling TGV...');  !/usr/local/cuda/bin/nvcc -O0 -c TGV_GPU_core.cu -Xcompiler -fPIC -I~/SOFT/MATLAB9/extern/include/  mex -g -I/usr/local/cuda-10.0/include -L/usr/local/cuda-10.0/lib64 -lcudart -lcufft -lmwgpu TGV_GPU.cpp TGV_GPU_core.o @@ -72,6 +77,7 @@ mex -g -I/usr/local/cuda-10.0/include -L/usr/local/cuda-10.0/lib64 -lcudart -lcu  movefile('PatchSelect_GPU.mex*',Pathmove);  delete TV_ROF_GPU_core* TV_FGP_GPU_core* TV_SB_GPU_core* dTV_FGP_GPU_core* NonlDiff_GPU_core* Diffus_4thO_GPU_core* TGV_GPU_core* LLT_ROF_GPU_core* CCPiDefines.h +delete TV_PD_GPU_core*  delete PatchSelect_GPU_core* Nonlocal_TV_core* shared.h  fprintf('%s \n', 'All successfully compiled!'); diff --git a/src/Matlab/mex_compile/regularisers_CPU/PD_TV.c b/src/Matlab/mex_compile/regularisers_CPU/PD_TV.c index eac2d18..e5ab1e4 100644 --- a/src/Matlab/mex_compile/regularisers_CPU/PD_TV.c +++ b/src/Matlab/mex_compile/regularisers_CPU/PD_TV.c @@ -30,6 +30,7 @@   * 5. TV-type: methodTV - 'iso' (0) or 'l1' (1)   * 6. nonneg: 'nonnegativity (0 is OFF by default, 1 is ON)   * 7. lipschitz_const: convergence related parameter + * 8. tau: convergence related parameter   * Output:   * [1] TV - Filtered/regularized image/volume @@ -45,7 +46,7 @@ void mexFunction(      int number_of_dims, iter, methTV, nonneg;      mwSize dimX, dimY, dimZ;      const mwSize *dim_array; -    float *Input, *infovec=NULL, *Output=NULL, lambda, epsil, lipschitz_const; +    float *Input, *infovec=NULL, *Output=NULL, lambda, epsil, lipschitz_const, tau;      number_of_dims = mxGetNumberOfDimensions(prhs[0]);      dim_array = mxGetDimensions(prhs[0]); @@ -55,29 +56,30 @@ void mexFunction(      Input  = (float *) mxGetData(prhs[0]); /*noisy image (2D/3D) */      lambda =  (float) mxGetScalar(prhs[1]); /* regularization parameter */ -    iter = 400; /* default iterations number */ +    iter = 500; /* default iterations number */      epsil = 1.0e-06; /* default tolerance constant */      methTV = 0;  /* default isotropic TV penalty */      nonneg = 0; /* default nonnegativity switch, off - 0 */ -    lipschitz_const = 12.0; /* lipschitz_const */ +    lipschitz_const = 8.0; /* lipschitz_const */ +    tau = 0.0025; /* tau convergence const */      if (mxGetClassID(prhs[0]) != mxSINGLE_CLASS) {mexErrMsgTxt("The input image must be in a single precision"); } -    if ((nrhs == 3) || (nrhs == 4) || (nrhs == 5) || (nrhs == 6) || (nrhs == 7))  iter = (int) mxGetScalar(prhs[2]); /* iterations number */ -    if ((nrhs == 4) || (nrhs == 5) || (nrhs == 6) || (nrhs == 7))  epsil =  (float) mxGetScalar(prhs[3]); /* tolerance constant */ -    if ((nrhs == 5) || (nrhs == 6) || (nrhs == 7))  { +    if ((nrhs == 3) || (nrhs == 4) || (nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  iter = (int) mxGetScalar(prhs[2]); /* iterations number */ +    if ((nrhs == 4) || (nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  epsil =  (float) mxGetScalar(prhs[3]); /* tolerance constant */ +    if ((nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  {          char *penalty_type;          penalty_type = mxArrayToString(prhs[4]); /* choosing TV penalty: 'iso' or 'l1', 'iso' is the default */          if ((strcmp(penalty_type, "l1") != 0) && (strcmp(penalty_type, "iso") != 0)) mexErrMsgTxt("Choose TV type: 'iso' or 'l1',");          if (strcmp(penalty_type, "l1") == 0)  methTV = 1;  /* enable 'l1' penalty */          mxFree(penalty_type);      } -    if ((nrhs == 6) || (nrhs == 7))  { +    if ((nrhs == 6) || (nrhs == 7) || (nrhs == 8))  {          nonneg = (int) mxGetScalar(prhs[5]);          if ((nonneg != 0) && (nonneg != 1)) mexErrMsgTxt("Nonnegativity constraint can be enabled by choosing 1 or off - 0");      } -    if (nrhs == 7) lipschitz_const = (float) mxGetScalar(prhs[6]); -     +    if ((nrhs == 7) || (nrhs == 8)) lipschitz_const = (float) mxGetScalar(prhs[6]); +    if (nrhs == 8) tau = (float) mxGetScalar(prhs[7]);         /*Handling Matlab output data*/      dimX = dim_array[0]; dimY = dim_array[1]; dimZ = dim_array[2];     @@ -94,5 +96,5 @@ void mexFunction(      infovec = (float*)mxGetPr(plhs[1] = mxCreateNumericArray(1, vecdim, mxSINGLE_CLASS, mxREAL));      /* running the function */     -    PDTV_CPU_main(Input, Output, infovec, lambda, iter,  epsil, lipschitz_const, methTV, nonneg, dimX, dimY, dimZ); +    PDTV_CPU_main(Input, Output, infovec, lambda, iter,  epsil, lipschitz_const, methTV, nonneg, tau, dimX, dimY, dimZ);  } diff --git a/src/Matlab/mex_compile/regularisers_GPU/PD_TV_GPU.cpp b/src/Matlab/mex_compile/regularisers_GPU/PD_TV_GPU.cpp new file mode 100644 index 0000000..e853dd3 --- /dev/null +++ b/src/Matlab/mex_compile/regularisers_GPU/PD_TV_GPU.cpp @@ -0,0 +1,101 @@ +/* + * This work is part of the Core Imaging Library developed by + * Visual Analytics and Imaging System Group of the Science Technology + * Facilities Council, STFC + * + * Copyright 2019 Daniil Kazantsev + * Copyright 2019 Srikanth Nagella, Edoardo Pasca + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * http://www.apache.org/licenses/LICENSE-2.0 + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +#include "matrix.h" +#include "mex.h" +#include "TV_PD_GPU_core.h" + +/* GPU implementation of Primal-Dual TV [1] by Chambolle Pock denoising/regularization model (2D/3D case) + * + * Input Parameters: + * 1. Noisy image/volume + * 2. lambdaPar - regularization parameter + * 3. Number of iterations + * 4. eplsilon: tolerance constant + * 5. TV-type: methodTV - 'iso' (0) or 'l1' (1) + * 6. nonneg: 'nonnegativity (0 is OFF by default, 1 is ON) + * 7. lipschitz_const: convergence related parameter + * 8. tau: convergence related parameter +  + * Output: + * [1] TV - Filtered/regularized image/volume + * [2] Information vector which contains [iteration no., reached tolerance] + * + * [1] Antonin Chambolle, Thomas Pock. "A First-Order Primal-Dual Algorithm for Convex Problems with Applications to Imaging", 2010 + */ + +void mexFunction( +        int nlhs, mxArray *plhs[], +        int nrhs, const mxArray *prhs[]) +         +{ +    int number_of_dims, iter, methTV, nonneg; +    mwSize dimX, dimY, dimZ; +    const mwSize *dim_array; +    float *Input, *infovec=NULL, *Output=NULL, lambda, epsil, lipschitz_const, tau; +     +    number_of_dims = mxGetNumberOfDimensions(prhs[0]); +    dim_array = mxGetDimensions(prhs[0]); +     +    /*Handling Matlab input data*/ +    if ((nrhs < 2) || (nrhs > 7)) mexErrMsgTxt("At least 2 parameters is required, all parameters are: Image(2D/3D), Regularization parameter, iterations number, tolerance, penalty type ('iso' or 'l1'), nonnegativity switch, lipschitz_const"); +     +    Input  = (float *) mxGetData(prhs[0]); /*noisy image (2D/3D) */ +    lambda =  (float) mxGetScalar(prhs[1]); /* regularization parameter */ +    iter = 500; /* default iterations number */ +    epsil = 1.0e-06; /* default tolerance constant */ +    methTV = 0;  /* default isotropic TV penalty */ +    nonneg = 0; /* default nonnegativity switch, off - 0 */ +    lipschitz_const = 8.0; /* lipschitz_const */ +    tau = 0.0025; /* tau convergence const */ +     +    if (mxGetClassID(prhs[0]) != mxSINGLE_CLASS) {mexErrMsgTxt("The input image must be in a single precision"); } +     +    if ((nrhs == 3) || (nrhs == 4) || (nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  iter = (int) mxGetScalar(prhs[2]); /* iterations number */ +    if ((nrhs == 4) || (nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  epsil =  (float) mxGetScalar(prhs[3]); /* tolerance constant */ +    if ((nrhs == 5) || (nrhs == 6) || (nrhs == 7) || (nrhs == 8))  { +        char *penalty_type; +        penalty_type = mxArrayToString(prhs[4]); /* choosing TV penalty: 'iso' or 'l1', 'iso' is the default */ +        if ((strcmp(penalty_type, "l1") != 0) && (strcmp(penalty_type, "iso") != 0)) mexErrMsgTxt("Choose TV type: 'iso' or 'l1',"); +        if (strcmp(penalty_type, "l1") == 0)  methTV = 1;  /* enable 'l1' penalty */ +        mxFree(penalty_type); +    } +    if ((nrhs == 6) || (nrhs == 7) || (nrhs == 8))  { +        nonneg = (int) mxGetScalar(prhs[5]); +        if ((nonneg != 0) && (nonneg != 1)) mexErrMsgTxt("Nonnegativity constraint can be enabled by choosing 1 or off - 0"); +    } +    if ((nrhs == 7) || (nrhs == 8)) lipschitz_const = (float) mxGetScalar(prhs[6]); +    if (nrhs == 8) tau = (float) mxGetScalar(prhs[7]);    +     +    /*Handling Matlab output data*/ +    dimX = dim_array[0]; dimY = dim_array[1]; dimZ = dim_array[2];     +        +    if (number_of_dims == 2) { +        dimZ = 1; /*2D case*/ +        Output = (float*)mxGetPr(plhs[0] = mxCreateNumericArray(2, dim_array, mxSINGLE_CLASS, mxREAL));         +    } +    if (number_of_dims == 3) {     +        Output = (float*)mxGetPr(plhs[0] = mxCreateNumericArray(3, dim_array, mxSINGLE_CLASS, mxREAL)); +    }     +    mwSize vecdim[1]; +    vecdim[0] = 2; +    infovec = (float*)mxGetPr(plhs[1] = mxCreateNumericArray(1, vecdim, mxSINGLE_CLASS, mxREAL)); +     +    /* running the function */     +    TV_PD_GPU_main(Input, Output, infovec, lambda, iter,  epsil, lipschitz_const, methTV, nonneg, tau, dimX, dimY, dimZ); +} diff --git a/src/Python/ccpi/filters/regularisers.py b/src/Python/ccpi/filters/regularisers.py index bc745fe..5f4001a 100644 --- a/src/Python/ccpi/filters/regularisers.py +++ b/src/Python/ccpi/filters/regularisers.py @@ -4,7 +4,7 @@ script which assigns a proper device core function based on a flag ('cpu' or 'gp  from ccpi.filters.cpu_regularisers import TV_ROF_CPU, TV_FGP_CPU, TV_PD_CPU, TV_SB_CPU, dTV_FGP_CPU, TNV_CPU, NDF_CPU, Diff4th_CPU, TGV_CPU, LLT_ROF_CPU, PATCHSEL_CPU, NLTV_CPU  try: -    from ccpi.filters.gpu_regularisers import TV_ROF_GPU, TV_FGP_GPU, TV_SB_GPU, dTV_FGP_GPU, NDF_GPU, Diff4th_GPU, TGV_GPU, LLT_ROF_GPU, PATCHSEL_GPU +    from ccpi.filters.gpu_regularisers import TV_ROF_GPU, TV_FGP_GPU, TV_PD_GPU, TV_SB_GPU, dTV_FGP_GPU, NDF_GPU, Diff4th_GPU, TGV_GPU, LLT_ROF_GPU, PATCHSEL_GPU      gpu_enabled = True  except ImportError:      gpu_enabled = False @@ -64,7 +64,7 @@ def PD_TV(inputData, regularisation_parameter, iterations,                       lipschitz_const,                       tau)      elif device == 'gpu' and gpu_enabled: -        return TV_PD_CPU(inputData, +        return TV_PD_GPU(inputData,                       regularisation_parameter,                       iterations,                       tolerance_param, diff --git a/src/Python/setup-regularisers.py.in b/src/Python/setup-regularisers.py.in index 9a5b693..c4ad143 100644 --- a/src/Python/setup-regularisers.py.in +++ b/src/Python/setup-regularisers.py.in @@ -39,6 +39,7 @@ extra_include_dirs += [os.path.join(".." , "Core"),                         os.path.join(".." , "Core",  "inpainters_CPU"),                         os.path.join(".." , "Core",  "regularisers_GPU" , "TV_FGP" ) ,                         os.path.join(".." , "Core",  "regularisers_GPU" , "TV_ROF" ) , +                       os.path.join(".." , "Core",  "regularisers_GPU" , "TV_PD" ) ,                         os.path.join(".." , "Core",  "regularisers_GPU" , "TV_SB" ) ,                         os.path.join(".." , "Core",  "regularisers_GPU" , "TGV" ) ,                         os.path.join(".." , "Core",  "regularisers_GPU" , "LLTROF" ) , diff --git a/src/Python/src/gpu_regularisers.pyx b/src/Python/src/gpu_regularisers.pyx index 8cd8c93..b22d15e 100644 --- a/src/Python/src/gpu_regularisers.pyx +++ b/src/Python/src/gpu_regularisers.pyx @@ -22,6 +22,7 @@ CUDAErrorMessage = 'CUDA error'  cdef extern int TV_ROF_GPU_main(float* Input, float* Output, float *infovector, float *lambdaPar, int lambda_is_arr, int iter, float tau, float epsil, int N, int M, int Z);  cdef extern int TV_FGP_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, int iter, float epsil, int methodTV, int nonneg, int N, int M, int Z); +cdef extern int TV_PD_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, int iter, float epsil, float lipschitz_const, int methodTV, int nonneg, float tau, int dimX, int dimY, int dimZ);  cdef extern int TV_SB_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, int iter, float epsil, int methodTV, int N, int M, int Z);  cdef extern int LLT_ROF_GPU_main(float *Input, float *Output, float *infovector, float lambdaROF, float lambdaLLT, int iterationsNumb, float tau,  float epsil, int N, int M, int Z);  cdef extern int TGV_GPU_main(float *Input, float *Output, float *infovector, float lambdaPar, float alpha1, float alpha0, int iterationsNumb, float L2, float epsil, int dimX, int dimY, int dimZ); @@ -70,6 +71,75 @@ def TV_FGP_GPU(inputData,                       tolerance_param,                       methodTV,                       nonneg) +# Total-variation Primal-Dual (PD) +def TV_PD_GPU(inputData, regularisation_parameter, iterationsNumb, tolerance_param, methodTV, nonneg, lipschitz_const, tau): +    if inputData.ndim == 2: +        return TVPD2D(inputData, regularisation_parameter, iterationsNumb, tolerance_param, methodTV, nonneg, lipschitz_const, tau) +    elif inputData.ndim == 3: +        return TVPD3D(inputData, regularisation_parameter, iterationsNumb, tolerance_param, methodTV, nonneg, lipschitz_const, tau) + +def TVPD2D(np.ndarray[np.float32_t, ndim=2, mode="c"] inputData, +                     float regularisation_parameter, +                     int iterationsNumb, +                     float tolerance_param, +                     int methodTV, +                     int nonneg, +                     float lipschitz_const, +                     float tau): + +    cdef long dims[2] +    dims[0] = inputData.shape[0] +    dims[1] = inputData.shape[1] + +    cdef np.ndarray[np.float32_t, ndim=2, mode="c"] outputData = \ +            np.zeros([dims[0],dims[1]], dtype='float32') + +    cdef np.ndarray[np.float32_t, ndim=1, mode="c"] infovec = \ +            np.ones([2], dtype='float32') + +    if (TV_PD_GPU_main(&inputData[0,0], &outputData[0,0], &infovec[0], regularisation_parameter, +                       iterationsNumb, +                       tolerance_param, +                       lipschitz_const, +                       methodTV, +                       nonneg, +                       tau, +                       dims[1],dims[0], 1) ==0): +        return (outputData,infovec) +    else: +        raise ValueError(CUDAErrorMessage); + +def TVPD3D(np.ndarray[np.float32_t, ndim=3, mode="c"] inputData, +                     float regularisation_parameter, +                     int iterationsNumb, +                     float tolerance_param, +                     int methodTV, +                     int nonneg, +                     float lipschitz_const, +                     float tau): + +    cdef long dims[3] +    dims[0] = inputData.shape[0] +    dims[1] = inputData.shape[1] +    dims[2] = inputData.shape[2] + +    cdef np.ndarray[np.float32_t, ndim=3, mode="c"] outputData = \ +            np.zeros([dims[0], dims[1], dims[2]], dtype='float32') +    cdef np.ndarray[np.float32_t, ndim=1, mode="c"] infovec = \ +            np.zeros([2], dtype='float32') + +    if (TV_PD_GPU_main(&inputData[0,0,0], &outputData[0,0,0], &infovec[0], regularisation_parameter, +                       iterationsNumb, +                       tolerance_param, +                       lipschitz_const, +                       methodTV, +                       nonneg, +                       tau, +                       dims[2], dims[1], dims[0]) ==0): +        return (outputData,infovec) +    else: +        raise ValueError(CUDAErrorMessage); +  # Total-variation Split Bregman (SB)  def TV_SB_GPU(inputData,                       regularisation_parameter, @@ -195,7 +265,7 @@ def ROFTV2D(np.ndarray[np.float32_t, ndim=2, mode="c"] inputData,      if isinstance (regularisation_parameter, np.ndarray):          reg = regularisation_parameter.copy()          # Running CUDA code here -        if (TV_ROF_GPU_main(&inputData[0,0], &outputData[0,0], &infovec[0],  +        if (TV_ROF_GPU_main(&inputData[0,0], &outputData[0,0], &infovec[0],                         ®[0,0], 1,                         iterations,                         time_marching_parameter, diff --git a/test/test_CPU_regularisers.py b/test/test_CPU_regularisers.py index 95e2a3f..266ca8a 100644 --- a/test/test_CPU_regularisers.py +++ b/test/test_CPU_regularisers.py @@ -3,7 +3,7 @@ import unittest  import os  #import timeit  import numpy as np -from ccpi.filters.regularisers import FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th, ROF_TV +from ccpi.filters.regularisers import FGP_TV, SB_TV, TGV, LLT_ROF, FGP_dTV, NDF, Diff4th, ROF_TV, PD_TV  from testroutines import BinReader, rmse   ############################################################################### @@ -39,6 +39,15 @@ class TestRegularisers(unittest.TestCase):          self.assertAlmostEqual(rms,0.02,delta=0.01) +    def test_PD_TV_CPU(self): +        Im,input,ref = self.getPars() + +        pd_cpu,info = PD_TV(input, 0.02, 300, 0.0, 0, 0, 8, 0.0025, 'cpu'); + +        rms = rmse(Im, pd_cpu) +         +        self.assertAlmostEqual(rms,0.02,delta=0.01) +      def test_TV_ROF_CPU(self):          # set parameters          Im, input,ref = self.getPars() diff --git a/test/test_run_test.py b/test/test_run_test.py index e693e03..1707aec 100755 --- a/test/test_run_test.py +++ b/test/test_run_test.py @@ -164,6 +164,94 @@ class TestRegularisers(unittest.TestCase):          self.assertLessEqual(diff_im.sum() , 1)
 +    def test_PD_TV_CPU_vs_GPU(self):
 +        print(__name__)
 +        #filename = os.path.join("test","lena_gray_512.tif")
 +        #plt = TiffReader()
 +        filename = os.path.join("test","test_imageLena.bin")
 +        plt = BinReader()
 +        # read image
 +        Im = plt.imread(filename)
 +        Im = np.asarray(Im, dtype='float32')
 +
 +        Im = Im/255
 +        perc = 0.05
 +        u0 = Im + np.random.normal(loc = 0 ,
 +                                          scale = perc * Im ,
 +                                          size = np.shape(Im))
 +        u_ref = Im + np.random.normal(loc = 0 ,
 +                                          scale = 0.01 * Im ,
 +                                          size = np.shape(Im))
 +
 +        # map the u0 u0->u0>0
 +        # f = np.frompyfunc(lambda x: 0 if x < 0 else x, 1,1)
 +        u0 = u0.astype('float32')
 +        u_ref = u_ref.astype('float32')
 +
 +        print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
 +        print ("____________PD-TV bench___________________")
 +        print ("%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%")
 +
 +
 +        pars = {'algorithm' : PD_TV, \
 +                'input' : u0,\
 +                'regularisation_parameter':0.02, \
 +                'number_of_iterations' :1500 ,\
 +                'tolerance_constant':0.0,\
 +                'methodTV': 0 ,\
 +                'nonneg': 0,
 +                'lipschitz_const' : 8,
 +                'tau' : 0.0025}
 +                
 +        print ("#############PD TV CPU####################")
 +        start_time = timeit.default_timer()
 +        (pd_cpu,info_vec_cpu) = PD_TV(pars['input'], 
 +                      pars['regularisation_parameter'],
 +                      pars['number_of_iterations'],
 +                      pars['tolerance_constant'], 
 +                      pars['methodTV'],
 +                      pars['nonneg'],
 +                      pars['lipschitz_const'],
 +                      pars['tau'],'cpu')
 +
 +        rms = rmse(Im, pd_cpu)
 +        pars['rmse'] = rms
 +
 +        txtstr = printParametersToString(pars)
 +        txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time)
 +        print (txtstr)
 +
 +        print ("##############PD TV GPU##################")
 +        start_time = timeit.default_timer()
 +        try:
 +            (pd_gpu,info_vec_gpu) = PD_TV(pars['input'], 
 +              pars['regularisation_parameter'],
 +              pars['number_of_iterations'],
 +              pars['tolerance_constant'], 
 +              pars['methodTV'],
 +              pars['nonneg'],
 +              pars['lipschitz_const'],
 +              pars['tau'],'gpu')
 +
 +        except ValueError as ve:
 +            self.skipTest("Results not comparable. GPU computing error.")
 +
 +        rms = rmse(Im, pd_gpu)
 +        pars['rmse'] = rms
 +        pars['algorithm'] = PD_TV
 +        txtstr = printParametersToString(pars)
 +        txtstr += "%s = %.3fs" % ('elapsed time',timeit.default_timer() - start_time)
 +        print (txtstr)
 +
 +        print ("--------Compare the results--------")
 +        tolerance = 1e-05
 +        diff_im = np.zeros(np.shape(pd_cpu))
 +        diff_im = abs(pd_cpu - pd_gpu)
 +        diff_im[diff_im > tolerance] = 1
 +
 +        self.assertLessEqual(diff_im.sum() , 1)
 +
 +
      def test_SB_TV_CPU_vs_GPU(self):
          print(__name__)
          #filename = os.path.join("test","lena_gray_512.tif")
  | 
