读powervr gpu 硬件架构有感

读 gpu-thearchitecture of high-end mobile graphics hardware

Graphics Architectures Overview

1) 渲染算法 imr

介绍及问题引入
后续介绍 tiling: 简言之,化整为零

2) 渲染算法 Tile Based Renderer /tbr

graphic pipeline结构里引入tiling 和on-chip buffer(比如32*32 tile大小的),
后续介绍Deferred Rendering

3)Tile Based Deferred Renderer (TBDR)

继续优化tbr

=========新的专题

【】PowerVR Hardware Overview – In Depth
硬件模块分析

1)Tiling Accelerator (TA)

概念涵盖 {Clips, projects, and culls geometry}

2)Parameter Buffer (PB)

Data stored in system memory
ƒ Too much for on-chip memory
Essential for deferring/tiling process
ƒ Allows geometry and fragment processing to be separated
Stores Vertex Data
ƒ All data attached to each vertex passed from the TA
Stores Primitive Lists
ƒ Lists of which primitives belong to which tile

3)Parameter Buffer (PB) Management

What happens when the PB is full? (这个问题非常好,个人感觉这里是个lazy mechanism)
ƒ A render is flushed

ƒ What impact does this have?
ƒ Flushed renders benefit from HSR performed up to that point (思考:hsr效果不可能是全局最优的,硬件的buffer大小决定了limit)
ƒ Previously flushed data must be retrieved from the frame buffer for successive(后续) tile renders

ƒ How likely is filling the PB?
ƒ Highly unlikely
ƒ Big enough you should never hit it (某种程度,这里buffer可以使用cache的思想来看,增加命中率就可以提高效率)

ƒ PB size can be changed by the developer on some platforms

4)Image Synthesis Processor (ISP)

ƒ Performs HSR and other Depth/Stencil Operations
ƒ Very fast Read-Modify -Write (Uses On-Chip Buffers)
ƒ Passes visible fragments to the ‘Tag Buffer’
ƒ A buffer used to track visible fragments
Visible fragments passed to the TSP
ƒ Fragments are grouped by primitive for cache efficiency

5)(重要的小结)Texture & Shading Processor (TSP)

ƒ Interpolates(插值) vertex data for each fragment
ƒ ‘Varyings’ in a shaders
Fetches texture samples
ƒ “non-dependent” texture reads only
Submits fragment work to the USSE(统一??)
ƒ Along with any pre-fetched data(这里是不是就跟filing处理的结果联系起来,详细参见tbdr的pipleline)

6)Arithmetic Logic Units (ALUs)

ƒ Unified architecture
ƒ Processes vertex, fragment, and compute tasks (大部分专门处理几何运算,这里没有分(标量运算和向量运算))

ƒ SIMD style execution (向量运算)
ƒ Fed by the Coarse Grain Scheduler (CGS) (直译:粗粒度的调度器) (运算单元的调度器)

7)Unified Architecture (现场讲座的时候,这里讨论很多)
大部分谈到关于矩阵运算和标量运算的调度

8)Pixel Back End (PBE)

Series5/5XT: 4x MSAA
ƒ Series6: 8x MSAA
一句话去锯齿的算法

9)Micro Kernel

Specialised software running on the USSE (Series5) or its own core (Series6)

ƒ Allows the GPU and CPU to operate with minimal synchronisation
ƒ Improves performance by handling interrupts on the GPU
ƒ Competing solutions handle interrupts on CPU (in the driver)
一句话:在driver和gpu之间 增加一层逻辑,提升gpu性能。

10)总结pipeline line

(鉴于连线不仅仅是线性的,就没有详细给出,不过这里是个重点,下次给予详细分析)

Bandwidth Saving
ƒ Bandwidth usage is the biggest contributor to GPU power consumption
ƒ Saving bandwidth means staying ‘on chip’ as much as possible
ƒ It also means throwing away work you don’t need to do
ƒ PowerVR is designed from the ground up to do all of these

11)【】PowerVR Series5 Implementation
(这张图在ppt方面是个动态图,转化为pdf就为静态的,里面有详细的数据流,
现场没有听清楚,不过这里面有很多干货!!!如果你懂,方便的话,请补充一下)

OpenGL ES 2.0 Shader Based GPU

12)其他各种型号的架构图
对于选型,这有必要了解。

附件:就是讲座文件,后续给出官方链接,见后续

其余:在架构图tbdr中,
(图中,不明白标号为 “*2"的连线:从vertext data到texture and shade 部分,难道hsr加入后,余留工作在后续部分处理? )

时间有限,给个简单的框架介绍。很多干货,需要反复思考。

评论

没见附件啊呵呵,可否在上传附件栏上传附件呢?

快乐地工作!满足地生活!

星期日, 09/07/2014 - 08:32 — 创新网小编

已经补充,在网站上编辑格式不是很方便,不知道有没有编辑的建议?

星期二, 09/09/2014 - 20:22 — titer1

如果你方便可以发给我们,我们帮你传上去 ,邮箱是editor@eetrend.com 多谢了

快乐地工作!满足地生活!

星期二, 09/09/2014 - 23:51 — 创新网小编