Browse Books

Go to CUDA Application Design and Development

As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software developers who need to understand GPU programming with CUDA and increase efficiency in their projects. CUDA Application Design and Development starts with an introduction to parallel computing concepts for readers with no previous parallel experience, and focuses on issues of immediate importance to working software developers: achieving high performance, maintaining competitiveness, analyzing CUDA benefits versus costs, and determining application lifespan. The book then details the thought behind CUDA and teaches how to create, analyze, and debug CUDA applications. Throughout, the focus is on software engineering issues: how to use CUDA in the context of existing application code, with existing compilers, languages, software tools, and industry-standard API libraries. Using an approach refined in a series of well-received articles at Dr Dobbs Journal, author Rob Farber takes the reader step-by-step from fundamentals to implementation, moving from language theory to practical coding. Includes multiple examples building from simple to more complex applications in four key areas: machine learning, visualization, vision recognition, and mobile computing Addresses the foundational issues for CUDA development: multi-threaded programming and the different memory hierarchy Includes teaching chapters designed to give a full understanding of CUDA tools, techniques and structure. Presents CUDA techniques in the context of the hardware they are implemented on as well as other styles of programming that will help readers bridge into the new material Table of Contents 1. First Programs and How to Think in CUDA 2. CUDA for Machine Learning and Optimization 3. The CUDA Tool Suite: Profiling a PCANLPCA Functor 4. The CUDA Execution Model 5. CUDA Memory 6. Efficiently Using GPU Memory 7. Techniques to Increase Parallelism 8. CUDA for All GPU and CPU Applications 9. Mixing CUDA and Rendering 10. CUDA in a Cloud and Cluster Environments 11. CUDA for Real Problems: Monte Carlo, Modeling, and More 12. Application Focus on Live Streaming Video

Cited By

Zhang W, Liu F and Han C Heterogeneous parallelism and performance optimization based on Flink+TornadoVM Proceedings of the 3rd International Conference on Computer, Artificial Intelligence and Control Engineering, (152-158)

Zeng D, Zhu A, Gu L, Li P, Chen Q and Guo M (2023). Enabling Efficient Spatio-Temporal GPU Sharing for Network Function Virtualization, IEEE Transactions on Computers , 72 :10 , (2963-2977), Online publication date: 1-Oct-2023 .

Zhao W, Wang W and Wang Q (2022). Optimization of cosmological N-body simulation with FMM-PM on SIMT accelerators, The Journal of Supercomputing , 78 :5 , (7186-7205), Online publication date: 1-Apr-2022 .

Mitchell R, Stokes D, Frank E and Holmes G (2022). Bandwidth-Optimal Random Shuffling for GPUs, ACM Transactions on Parallel Computing , 9 :1 , (1-20), Online publication date: 31-Mar-2022 .

Strubytska I and Strubytskyi P (2021). Efficiency of Parallelization Using GPU in Discrete Dynamic Models Construction Process, SN Computer Science , 2 :3 , Online publication date: 1-May-2021 .

Bak D, Mazurek P and Oszutowska–Mazurek D Optimization of Demodulation for Air–Gap Data Transmission Based on Backlight Modulation of Screen Computational Science – ICCS 2019, (71-80)

Mazurek P and Krupinski R Monte Carlo Analysis of Local Cross–Correlation ST–TBD Algorithm Computational Science – ICCS 2019, (60-70)

Jurczuk K, Kretowski M and Bezy-Wendling J (2018). GPU-based computational modeling of magnetic resonance imaging of vascular structures, International Journal of High Performance Computing Applications , 32 :4 , (496-511), Online publication date: 1-Jul-2018 .

Wynters E (2018). Parallel particle swarm optimization can solve many optimization problems quickly on GPUS, Journal of Computing Sciences in Colleges , 33 :6 , (114-123), Online publication date: 1-Jun-2018 .

Wang Q, Chen D, Li S, Wu Q and Zhang Q (2017). An adaptive cartoon-like stylization for color video in real time, Multimedia Tools and Applications , 76 :15 , (16767-16782), Online publication date: 1-Aug-2017 .

Chen L, Xu Y and Zeng Z (2017). Searching approximate global optimal Heilbronn configurations of nine points in the unit square via GPGPU computing, Journal of Global Optimization , 68 :1 , (147-167), Online publication date: 1-May-2017 .

Torun M, Yilmaz O and Akansu A (2016). FPGA, GPU, and CPU implementations of Jacobi algorithm for eigenanalysis, Journal of Parallel and Distributed Computing , 96 :C , (172-180), Online publication date: 1-Oct-2016 .

Kuşcu Ö, Çetiner H and Çetin Ö (2016). Development of a web interface for performing morphological operations on CUDA platform, Computer Applications in Engineering Education , 24 :5 , (787-798), Online publication date: 1-Sep-2016 .

Bistaffa F, Bombieri N and Farinelli A CUBE Proceedings of the Twenty-second European Conference on Artificial Intelligence, (125-132)

Dai Y, Fang Y, Yang L and Jeon G (2016). Graphics processing unit-accelerated joint-bitplane belief propagation algorithm in DSC, The Journal of Supercomputing , 72 :6 , (2351-2375), Online publication date: 1-Jun-2016 .

Benner P, Dufrechou E, Ezzatti P, Quintana-Ortí E and Remón A (2015). Unleashing GPU acceleration for symmetric band linear algebra kernels and model reduction, Cluster Computing , 18 :4 , (1351-1362), Online publication date: 1-Dec-2015 .

Huang P, Li X and Yuan B A Parallel GPU-Based Approach to Clustering Very Fast Data Streams Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, (23-32)

Borisenko A, Haidl M and Gorlatch S Parallelizing Branch-and-Bound on GPUs for Optimization of Multiproduct Batch Plants Proceedings of the 13th International Conference on Parallel Computing Technologies - Volume 9251, (324-337)

Pacevič R, Kačeniauskas A and Markauskas D (2015). Visualization of cracks by using the local Voronoi decompositions and distributed software, Advances in Engineering Software , 84 :C , (85-94), Online publication date: 1-Jun-2015 .

Reza H, Aguilar M and Jalal S Regression testing of GPU/MIC systems for HPCC Proceedings of the 2015 International Workshop on Software Engineering for High Performance Computing in Science, (30-37)

Abdellah M, Eldeib A and Sharawi A (2015). High performance GPU-Based fourier volume rendering, Journal of Biomedical Imaging , 2015 , (2-2), Online publication date: 1-Jan-2015 .

Dai Y, He D, Fang Y and Yang L (2014). Accelerating 2D orthogonal matching pursuit algorithm on GPU, The Journal of Supercomputing , 69 :3 , (1363-1381), Online publication date: 1-Sep-2014 .

Olaya J and Romero R Runtime Pipeline Scheduling System for Heterogeneous Architectures Proceedings of the 2014 Annual Conference on Extreme Science and Engineering Discovery Environment, (1-7)

Benner P, Dufrechou E, Ezzatti P, Igounet P, Quintana-Ortí E and Remón A Accelerating Band Linear Algebra Operations on GPUs with Application in Model Reduction Proceedings of the 14th International Conference on Computational Science and Its Applications — ICCSA 2014 - Volume 8584, (386-400)

Yu Y, He X, Guo H, Zhong S, Wang Y, Chen X and Xiao W APR Proceedings of Workshop on General Purpose Processing Using GPUs, (81-89)

Yu Y, He X, Guo H, Zhong S, Wang Y, Chen X and Xiao W APR Proceedings of Workshop on General Purpose Processing Using GPUs, (81-89)

de Melo Quintela B, Caldas D, Farage M and Lobosco M Multiscale modeling of heterogeneous media applying AEH to 3d bodies Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I, (675-690)