Immortalis-G715, Mali-G715, G615: Hardware ray tracing and VRS move into new arm GPUs


In addition to the new CPU cores Cortex-X3, A715 and A510 Refresh, Arm today introduced three new GPUs: Immortalis-G715, Mali-G715 and Mali-G615. The flagship is equipped with a ray tracing unit to enable hardware ray tracing in smartphones, tablets and arm notebooks, for example. All three GPUs also support VRS.

Table of Contents

  1. 1 4th Generation Valhall Architecture
  2. Immortalis comes exclusively with a raytracing unit
  3. Efficient raytracing for smartphones
    1. Raytracing units grow with the number of shader cores
  4. Arm doubles FMA units
    1. Shader cores extensively revised
  5. All new GPUs support VRS
    1. Driver updates planned via Google Play

After the CXT GPU from Imagination Technologies and the Xclipse 920 GPU based on AMD RDNA 2 in the Exynos 2200, the Immortalis G715 is the third mobile GPU developed for Arm processors that supports hardware-accelerated ray tracing and thus a clear advantage towards software implementations. Compared to solutions of this type, the lead should not be surprisingly over 300 percent, explains Arm based on his own internal benchmarks. The only competitor missing in the Android environment is Qualcomm with an Adreno GPU and ray tracing support.

Valhall architecture in 4th generation

With the Immortalis G715, Arm continues to rely on the Valhall architecture first introduced in May 2019 with the Mali-G77, which was also used in the Mali-G78 and G68 in 2020 as well as last year's Mali-G710 and G610. According to a roadmap published by the company, the GPU architecture is likely to change next year.

Immortalis-G715, Mali-G715, Mali-G615 (image: arm)
New GPUs retain Valhall architecture (Image: Arm)

Immortalis comes exclusively with a raytracing unit

Immortalis-G715, Mali-G715 and Mali-G615 all use the fourth generation of the Valhall architecture and have a very similar structure, which differs only in the configuration of the shader cores and their number. The Immortalis-G715 is exclusively equipped with a new ray tracing unit (RTU) for hardware-accelerated ray tracing within the inner core of the shader core, which the other two innovations lack. As Arm explains, the new ray tracing acceleration should account for less than 4 percent of the shader core area. The Mali-G715 has the same shader structure, but with a reduced number, and has to do without the RTU. The same applies to the Mali-G615, which is limited to even fewer shader cores. Arm describes the Immortalis G715 as the new flagship and the Mali G715 and Mali G615 as the new premium GPUs.

New ray tracing unit in the inner core (Image: Arm)

Efficient ray tracing for smartphones

In order to make ray tracing as efficient as possible on a mobile GPU like the Immortalis G715, not every primitive in a scene, i.e. the polygons that make up the objects in a scene, is tested against the ray. Instead, as with other well-known GPU manufacturers, an acceleration technique is used that tests the beam against ever-decreasing three-dimensional boxes containing a complex three-dimensional object made of polygons. If the ray does not cross this box, then it logically does not cross the primitives contained in it either, which therefore do not have to be calculated. The RTU carries out this procedure until a leaf is reached in the so-called “bounding volume hierarchy” (BVH), i.e. the hierarchy of the data structure (tree), against whose primitives the beam is then tested. To perform these calculations, each RTU of each inner core of a shader core of the Immortalis-G715 has an RBOX_UNIT (RT_RAY_BOX) for the traversal of the BVH and an RTRI_UNIT (RT_RAY_TRI) for the intersection with the polygon. Shading and denoising are then taken over again by the shaders in the shader core.

figure> Raytracing uses BVH for primitives (Image: Arm)

Raytracing is only supported on the Immortalis-G715 in connection with the Vulkan API and as of today is only for Android , but not intended for Windows.

Raytracing units grow with the number of shader cores

How many ray tracing units an Immortalis G715 has depends on how many shader cores it was configured with by the SoC provider. To maintain its flagship positioning in the portfolio, Arm allows for recommended configurations of 10 to 16 shader cores and 2 or 4 L2 slices up to 1MB. A maximum of 16 RTUs are used. The new Mali-G715, which does not require an RTU, can be configured with 7 to 9 shader cores and, apart from ray tracing, has the same properties. The Mali-G615 is designed for 1 to 6 shader cores and can be equipped with just 1 instead of 2 or 4 L2 slices.

Immortalis-G715 (image: arm)
Mali-G715 (image: arm)
Mali-G615 (image: arm)

Arm doubled FMA units

Arm delivers 15 percent more performance and 15 percent less consumption compared to the previous generation with the same number of shader cores – ray tracing on the outside. This is achieved, among other things, by revising the execution engines. The inner core of each shader core comes with the Valhall architecture on two execution engines, which in turn each have two processing units, among other things. Each processing unit contains a processing element, which Arm has supplemented with a second module for FMA (“Fused Multiply-Add”) with an additional block for the multiplication of matrices (MMUL) for the fourth Valhall generation. This doubling of the units replaces the previous structure with only one FMA module and is intended to double the performance, especially with FMA, although the shader core area only increases by 25 percent.

Image 1 of 5

Previous structure of the processing element of the processing unit (image: arm)
New structure of the processing element of the processing unit (image: arm)

< figure class="thumbs__figure thumbs__figure--has-caption"> New structure of the processing element of the processing unit in the Execution Engine (Image: Arm)

FMA performance gain compared to required area (image: arm)

Shader cores extensively revised

The two execution engines are part of each shader core, which will receive further improvements in the area of ​​”Power, Performance and Area” (PPA) with the fourth Valhall generation. In general, the “Command Stream Front-End” (CSF) should work faster, the Tiler achieves three times the polygon throughput at its peak, the FP16 Blender throughput has been doubled, a new hardware block for FP16-MSAA has been integrated, the Texture mapping works with certain LODs with double speed, “Arm Fixed Rate Compression” (AFRC) was implemented, the throughput of the varying unit was doubled and the load/store efficiency of the caches was increased.

Image 1 of 10

Optimizations on the front end, tiler and shader core (Image: Arm)

All new GPUs support VRS

With the exception of the RTU, these optimizations on the front end, tiler and shader core are incorporated into all new products today. This also applies to the support of “Variable Rate Shading” (VRS), which enables a lower shading quality for certain image areas without the quality visibly suffering as a result. In an example in which Arm uses VRS to decouple the rasterization frequency from the shading frequency, the shading rate is only one per four pixels instead of one per one pixel without VRS.

All three new GPUs support VRS (Image: Arm)

VRS is particularly relevant for mobile devices because it not only increases the FPS, although Arm quantifies this increase at up to 40 percent. Alternatively, with the same FPS as before, consumption can also be reduced and thus the energy efficiency and battery life of the smartphone increased. VRS works on Immortalis-G715, Mali-G715 and Mali-G615.

Driver updates planned via Google Play

The three new GPUs will later receive updatable drivers via Google Play Services, as Arm explained in a question and answer session. Especially with the introduction of hardware ray tracing, there should still be a lot of optimization potential in the Android driver area.

ComputerBase received information about this article from Arm at an event of the Manufacturer in Austin, Texas under NDA. The costs for arrival, departure and hotel accommodation were borne by the company. The company had no influence on or obligation to report. The only requirement was the earliest possible publication date.

This article was interesting, helpful or both? The editors are happy about any support from ComputerBase Pro and disabled ad blockers. More about ads on ComputerBase.