This is an assortment of talks that I gave over the years on various topics for different occasions.
The Future of Accelerator Programming in C++
OpenCL, CUDA, C++ AMP, OpenACC, RenderScript, Thrust, Bolst, Vexcl, Boost.Compute, ViennaCL, MTL4, NT2, Arrayfire — the list of tools, environments, frameworks and libraries for accelerator programming is long. This talk will give an overview of a number of these tools and map them to different cases of accelerator use. How do they compare in terms of developer productivity, generality, and performance?
With all these tools at our disposal, the problem of accelerator programming is far from solved. There must be a better way to describe data parallelism and concurrency in C++. Maybe the functional programming community can come to our rescue? Or, as Bret Victor put it so appropriately, we must simply “forget everything we think we know about computers. Forget that we think we know what a computer is” to find a good solution. Comments from the audience are welcome for this second part of the talk.
Mastering the Cell Broadband Engine architecture through a Boost based parallel communication library
In this talk I gave at the Boost Conference in 2011 I present the library I was working on with Joel Falcou and
Lionel Lacassagne during 2010. It’s a library for the Cell Broadband Engine that greatly simplifies the process of writing code for this architecture. It features a MPI implementation for an embedded system and asynchronous segmented iterators. The slides can be found here and a printable version is available here. The following is a short version of the abstract:
In this talk we will present our efforts of creating a library that simplifies the development of efficient applications for the Cell architecture. We will show how we utilize modern C++ concepts and a number of Boost libraries (MPL, PP, Function, Spirit) to create a straight forward interface with underlying high-performance algorithms. We will discuss the challenges that accompany the Cell architecture and how we mastered them.
General-Purpose Computation on Graphics Hardware
This is a two hour talk I gave in front of graduate students at the University of Applied Sciences Regensburg on May 29th 2010. The talk was followed by a tutorial session where students solved lab assignments I prepared for the occasion. The slides are available from here and the lab assignment can be downloaded here.
The assignment comes with source code and there is also a package containing the solutions. The second lab assignment is an excerpt of the reduction example in the CUDA SDK. Documentation and source code for this assignment can be obtained from the the Nvidia code examples website (Nvidia frequently changes this link, the example code can be found by searching for “cuda sdk parallel reduction”).