Members of the team at Codeplay have been working to evolve the Khronos open standard SYCL for the past 7 years, and it’s exciting for the community to be able to see SYCL 2020, the latest version of the standard.
SYCL (pronounced ‘sickle’) is a royalty-free, cross-platform abstraction layer that enables code for heterogeneous processors to be written in a “single-source” style using completely standard C++. SYCL enables single source development where C++ template functions can contain both host and device code to construct complex algorithms that use the acceleration provided by processors such as GPUs, and then re-use them throughout their source code on different types of data.
The SYCL 2020 specification has been released for public review as a provisional specification and the group is looking for developers to provide their valuable feedback before the final version is published and ratified. To provide feedback on the SYCL 2020 specification, visit the Khronos Group SYCL Community Forum. You can also read the Khronos press release on their website for more background.
This version of SYCL brings a host of new features and benefits for the developer community. It reduces the amount of code that needs to be written to achieve the performance benefits that are brought by parallel processing architectures such as GPUs, and enables advanced hardware features to be used by developers.
In this blog engineers from Codeplay that are also contributors to the SYCL Working Group, the team that defines the standard within Khronos, talk about what they think will make developing with SYCL even better when using SYCL 2020.
Moving to C++17
Michael Wong, Distinguished Engineer at Codeplay and SYCL Working Group Chair
First Michael tells us about how SYCL has moved to support C++17.
“Our goal within the SYCL Working Group, as I previously explained in this blog post, is to align SYCL with ISO C++ and, at the same time, integrate features from SYCL into the C++ standard. SYCL 2020 has bumped the minimum required version of C++14 to C++17. This means that the SYCL interface can now take advantage of C++17 features such as class template argument deduction (CTAD) and deduction guides. It also means that developers will be able to use C++17 in their kernel functions, as all SYCL 2020 implementations will be required to support at least C++17. This is a step towards our larger goal and tightens the cadence with ISO C++.”
Unified Shared Memory
Peter Zuzek, Senior Software Engineer at Codeplay
A major new feature in SYCL 2020 is unified shared memory (USM), something that offers an alternative to the existing buffers and accessors in SYCL 1.2.1. This feature makes porting existing C++ code that makes use of a lot of pointers and arrays to SYCL easier. Peter talks more about this.
“SYCL 2020 introduces support for unified shared memory (USM), which provides a new memory management model alongside the existing buffer – accessor model. At its core USM provides a single unified address space, which means that pointers to USM memory allocations are consistent across the host application and within kernel functions executing on a device.
This is a power feature as it allows SYCL applications to make use of pointer-based structures and algorithms, which was not previously possible without workarounds, such as the legacy pointer from the ComputeCpp SDK, which emulated raw pointer semantics using buffers. Now applications can allocate memory on the device and simply pass that pointer directly to the kernels without creating accessors.
This is a major move in increasing the range of AI and HPC applications which can be accelerated by SYCL making it easier to port existing C++ applications to use SYCL and porting from other programming models such as CUDA. Furthermore, this addition to the standard also further aligns SYCL with ISO C++, where the machine model assumes a single unified address space.
To find out more about USM and how to use it watch this presentation from Intel at SYCLcon 2020.”
Sub-Groups and Group Algorithms
Peter also wanted to talk about sub-groups.
“SYCL 2020 introduces sub-groups, a new construct in the execution model which provides a way to represent groups of cooperative work-items. This is represented in the SYCL programming model in the form of a new hierarchy level within the iteration space over which kernel functions are invoked which can be mapped to more efficient cooperative operations.
As well as representing a work-item’s position within a sub-group, SYCL 2020 introduces a range of group-based algorithms which provide an abstraction for providing efficient cooperative operations over a work-group or sub-group including broadcast,
The introduction of sub-groups and group-based algorithms provides a much-needed support for abstraction for cooperative work-items, expressing the underlying hardware capabilities to provide further performance to SYCL applications.”
Gordon Brown, Principal Software Engineer, Codeplay
SYCL 2020 introduces a number of interface improvements to the accessor class template in order to make it more streamlined and less verbose, especially for the most common use cases. Gordon has some information on these.
“The template parameters are now all defaulted, so
accessor<T> now represents a 1D read-write accessor to global buffer data. Special attention has been given to the treatment of const and the use of read-only mode. That allows us to use
accessor<const T>, which represents a 1D read-only accessor to global buffer data. An important realization is that the accessor class serves two main functions: a) request access to data (by informing the SYCL runtime) and b) provide access to data (e.g. subscript operators). So for storage and kernel use,
accessor<const T> are the main ones to use. Other types of accessor are still important when requesting access, but now they can be implicitly converted to one of the above-mentioned types.
Accessors can also be placeholders by default. This means that an accessor can be constructed outside of a command group and bound to different command groups later. The type of a placeholder accessor is the same as a non-placeholder one, the placeholder template parameter has been deprecated. Additionally, an accessor can now be default constructed, without any arguments.
Accessors now provide a container-like interface, which means it’s possible to use a ranged based
for loop on an accessor, which iterates through the access range and provides access to each element. This also provides opportunities to use some standard C++ algorithms on accessors.
Moving the minimum required version of C++ in SYCL 2020 to C++17 enabled some more improvements, mainly through CTAD, where the accessor type can be deduced from the provided buffer, or with the help of deduction tags.
For host access, the host_buffer access target has been deprecated in favor of a new type, the host_accessor, which benefits from CTAD as well.”
Modules in SYCL
Gordon has also has a particular eye on “modules” with some compelling reasons why developers will benefit from this feature.
“SYCL 2020 introduces a new feature called modules, not to be confused with C++20 modules. This is a replacement for the SYCL 1.2.1 program class which provides a generalized interface for compiling and linking of kernel functions. This is a useful feature as it allows developers to pre-compile kernel functions ahead of enqueueing them within a command group to avoid compilation overhead runtime scheduling, and it also allows more control when doing just-in-time compilation and linking, including specifying compiler options and linking with external libraries.
Modules builds on the existing program class interface of SYCL 1.2.1 to provide a generalization of the interface which is now type-safe and no longer tied to OpenCL, support for specialization constants and incorporation of ahead of time (AoT) compilation into the interface to better support platforms which compile directly to the target ISA.
The introduction of modules in SYCL 2020 strengthens SYCL as a general-purpose programming model capable of supporting a wide range of heterogeneous platforms and architectures.
To find out more about modules and how to use them I presented on modules at SYCLcon 2020.”
Host tasks Bring Flexibility
The last topic Gordon wanted to talk about is another new feature in SYCL 2020.
“Host tasks are a new feature in SYCL which provides a way to enqueue native C++ code on the host within the SYCL scheduler using either USM or buffers – accessor memory management models. This means that standard C++ functions can be invoked within the data-dependency graph determined by the SYCL runtime, conforming to the same ordering rules as SYCL kernel functions and memory operations.
Host tasks can be used in couple of ways to enhance the flexibility of the SYCL programming model. Firstly, they can be used to execute native C++ code intermittently between kernel functions to perform computations on the host. Secondly, host tasks can take an
interop_handle parameter to perform in-place interoperability with a backend API such as OpenCL or CUDA.
The introduction of host tasks opens up the SYCL programing model to interoperability with both backend-specific APIs and existing C++ libraries, positioning SYCL as a extensible programming model for a wider range of platforms and libraries, further extending the range of compute applications which can be ported to SYCL.
To find out more about host tasks and how to use them I presented this at SYCLcon 2020.”
Alexander Johnston, Senior Software Engineer, Codeplay
Codeplay has worked on implementing SYCL to support a range of different types of processors, from GPUs to DSPs for embedded systems. Alexander talks about how this led to some enhancements to
Images in SYCL 2020.
“When adapting SYCL to some processors we found images in SYCL 1.2.1 tied too closely to OpenCL. As such, they did not map well to some potential backends. We took the opportunity to rework them and improve clarity for SYCL 2020 users.
SYCL 2020 reworks the image and sampler classes to improve image capability clarity and provide improved support across multiple backends. To begin separating from OpenCL, image functionality is now provided by the
unsampled_image and the
sampled_image classes instead of a single image class. The
unsampled_image class allows for reading and writing pixel data to an image. The
sampled_image class provides all the sampling capabilities of the former sampler class but contains the sampling details along with the image.
We have also removed some possible confusion about supported image types by reducing the
image_channel_type enums into a single enum,
image_format, which contains all available image data formats.
With these changes image functionality and intention should be clearer for both users and implementors.”
Generalization for CUDA Support and Beyond
Ruyman Reyes Castro, Principal Software Engineer, Codeplay
For Ruyman one of the most interesting changes to the new specification enables SYCL to target programming models other than OpenCL.
“Recently we have been working with Intel on a project to bring support for Nvidia GPUs to DPC++, which has been a fundamental learning experience on how SYCL could map to other programming models. We have brought these lessons learned to the SYCL working group, and presented the Generalization Proposal. Generalization takes SYCL beyond a high-level abstraction from OpenCL into a more complete programming model that can target a large variety of heterogeneous APIs and hardware. SYCL can now become the glue of C++ frameworks and libraries, such as machine learning frameworks. When these frameworks use SYCL, they can target a range of platforms without having to port and translate their code.”
Address space update
Victor Lomuller, Principal Software Engineer, Codeplay
SYCL 2020 made significant work for handling address space: better definition of address space for the programing model, remodeling of the multi_ptr class and the introduction of the generic address space concept. Victor talks more about how this works.
“This is something most users will probably not notice at this stage but is important to ensure integration with C++.
SYCL 2020 introduces more explicit rules when it comes to the interaction of SYCL address spaces (which physical memory is being addressed) and the C++ type system. Those new rules allow users to ensure their device code will compile with any header library they may use (providing they only use the features allowed in device) and allow the introduction of a generic address space which provide a unified view of the memory.
SYCL 2020 also introduces more strict address space qualifications which allow users to get a pointer type that carries the address memory region with the type. Knowing which memory is addressed allows the compiler to generate more efficient access on some platforms and this allow users drive some optimizations. Most of the behavior is for now implementation defined in the provisional specification but will be addressed in the final version.
multi_ptr class was updated to make a strong distinction between unqualified pointers and qualified pointers. An extra template parameter was added which allows user to control the types returned by the interface. The
multi_ptr interface allows the user to drive some optimizations by carrying the addressed memory information and choose its tradeoff between type preservation and C++ integration guarantees.
The final addition for address spaces is the introduction of the generic address space context. In SYCL 1.2.1, the memory addressed had to be inferred by the compiler and this added a series of limitations to users. The generic address space concept allows the device compiler to assume unqualified pointers are addressing any of the physical memory. This allows more flexibility when writing code and makes SYCL 2020 a true C++ programming model.”
The last topic we will cover is introduced by Victor.
“Specialization constants is a concept borrowed from the SPIR-V specification. The feature allows users to express some values as being constant but whose value will only be provided when the device code is being consumed by the runtime driver. This allows users to specialize a module for some values that can only be known at runtime.
The typical use case is tuning a kernel according to the hardware on which it is executed, and being able to do this without having to manually generate every possible variant during compilation. This enables developers to have more efficient, more adaptable kernels while reducing the application size.”
We are planning on going into more detail on some of these topics and features in future blogs, so follow our Twitter account to be the first to find out about these.