2017 LLVM Developers’ Meeting: “Apple LLVM GPU Compiler: Embedded Dragons
- explanation of GPUs
- overview of GPU execution model
- latency hiding
- register usage
- many compiler internals
- optimizations and trade-offs that need to be considered
- uniformity hoisting
- compiler is able to extract constants …
OpenCl -> Vulkan: A Porting Guide [wayback-archive]
- why?
- better cross-platforms support
- more frequent driver updates
- more tools
- less driver overhead, up to 3x less
- shows differences between the APIs
- clspv a tool to compile openCL shaders to SPIRV
VK_KHR_dedicated_allocation unofficial manual [wayback-archive]
- extension allows the driver to inform application that separate allocations are preferred
- nvidia and intel need this extension to allow extra optimizations
- how to use the extension
Material layering [wayback-archive]
- overview of di…
Forward+ decal rendering [wayback-archive]
- goals: avoid additional geometry, and don’t increase draw call count
- extend the light structure to also support decals
- cull the decals, sort them and apply them in the object shader just as lights
- still has a few open problems, for example mip selection …
Lightmap optimizations for iOS [wayback-archive]
- lightmpas too big for the memory budget on iOS
- using ETC2 as a replacement for DXT5 (same size, slightly less quality)
- switched to per-vertex lightmaps
- needed some art fixes
- but reduced disk size to 17 % and most expensive runtime location to 25 %…
real-time-rendering: an overview for artists [wayback-archive]
- overview of
- pbr
- render pipeline
- draw calls
- culling
- optimizations
Design Patterns for Low-Level Real-Time Rendering [wayback-archive]
- overview
- GPU/CPU memory systems
- command lists
- GCN resource descriptors
- ring buffer
- both GPU and…
Real-time Global Illumination by Precomputed Local Reconstruction from Sparse Radiance Probes [wayback-archive]
- realtime global illumination technique
- using local precomputed data so that no long-range interactions is required between probes and receivers
- receiver depends only on a constant numb…
About GPU Family 4 - Metal 2.0 [wayback-archive]
- tile-based deferred rendering (TBDR)
- imageblocks
- tile memory that can be acessed from the shader to create custom, tile local storage
- can be shared between compute, and rasterization
- tile shading
- new programmable stage, allow access to all data …
Which Compute ID for me? [wayback-archive]
- easy to read overview of how compute shader ids are calculated for 1D and 2D processing
- provides examples
- explanation of specular light calculations for area lights (Sphere, disk, rectangle, tube light)
- uses the Represen…
Advertisement
Popular Blogs
Advertisement