
读 gpu-thearchitecture of high-end mobile graphics hardware
Graphics Architectures Overview
1) 渲染算法 imr
介绍及问题引入
后续介绍 tiling: 简言之,化整为零
2) 渲染算法 Tile Based Renderer /tbr
graphic pipeline结构里引入tiling 和on-chip buffer(比如32*32 tile大小的),
后续介绍Deferred Rendering
3)Tile Based Deferred Renderer (TBDR)
继续优化tbr
=========新的专题
【】PowerVR Hardware Overview – In Depth
硬件模块分析
1)Tiling Accelerator (TA)
概念涵盖 {Clips, projects, and culls geometry}
2)Parameter Buffer (PB)
Data stored in system memory
Too much for on-chip memory
Essential for deferring/tiling process
Allows geometry and fragment processing to be separated
Stores Vertex Data
All data attached to each vertex passed from the TA
Stores Primitive Lists
Lists of which primitives belong to which tile
3)Parameter Buffer (PB) Management
What happens when the PB is full? (这个问题非常好,个人感觉这里是个lazy mechanism)
A render is flushed
What impact does this have?
Flushed renders benefit from HSR performed up to that point (思考:hsr效果不可能是全局最优的,硬件的buffer大小决定了limit)
Previously flushed data must be retrieved from the frame buffer for successive(后续) tile renders
How likely is filling the PB?
Highly unlikely
Big enough you should never hit it (某种程度,这里buffer可以使用cache的思想来看,增加命中率就可以提高效率)
PB size can be changed by the developer on some platforms
4)Image Synthesis Processor (ISP)
Performs HSR and other Depth/Stencil Operations
Very fast Read-Modify -Write (Uses On-Chip Buffers)
Passes visible fragments to the ‘Tag Buffer’
A buffer used to track visible fragments
Visible fragments passed to the TSP
Fragments are grouped by primitive for cache efficiency
5)(重要的小结)Texture & Shading Processor (TSP)
Interpolates(插值) vertex data for each fragment
‘Varyings’ in a shaders
Fetches texture samples
“non-dependent” texture reads only
Submits fragment work to the USSE(统一??)
Along with any pre-fetched data(这里是不是就跟filing处理的结果联系起来,详细参见tbdr的pipleline)
6)Arithmetic Logic Units (ALUs)
Unified architecture
Processes vertex, fragment, and compute tasks (大部分专门处理几何运算,这里没有分(标量运算和向量运算))
SIMD style execution (向量运算)
Fed by the Coarse Grain Scheduler (CGS) (直译:粗粒度的调度器) (运算单元的调度器)
7)Unified Architecture (现场讲座的时候,这里讨论很多)
大部分谈到关于矩阵运算和标量运算的调度
8)Pixel Back End (PBE)
Series5/5XT: 4x MSAA
Series6: 8x MSAA
一句话去锯齿的算法
9)Micro Kernel
Specialised software running on the USSE (Series5) or its own core (Series6)
Allows the GPU and CPU to operate with minimal synchronisation
Improves performance by handling interrupts on the GPU
Competing solutions handle interrupts on CPU (in the driver)
一句话:在driver和gpu之间 增加一层逻辑,提升gpu性能。
10)总结pipeline line
(鉴于连线不仅仅是线性的,就没有详细给出,不过这里是个重点,下次给予详细分析)
Bandwidth Saving
Bandwidth usage is the biggest contributor to GPU power consumption
Saving bandwidth means staying ‘on chip’ as much as possible
It also means throwing away work you don’t need to do
PowerVR is designed from the ground up to do all of these
11)【】PowerVR Series5 Implementation
(这张图在ppt方面是个动态图,转化为pdf就为静态的,里面有详细的数据流,
现场没有听清楚,不过这里面有很多干货!!!如果你懂,方便的话,请补充一下)
OpenGL ES 2.0 Shader Based GPU
12)其他各种型号的架构图
对于选型,这有必要了解。
附件:就是讲座文件,后续给出官方链接,见后续
其余:在架构图tbdr中,
(图中,不明白标号为 “*2"的连线:从vertext data到texture and shade 部分,难道hsr加入后,余留工作在后续部分处理? )
时间有限,给个简单的框架介绍。很多干货,需要反复思考。
- titer1's blog
- 要发表评论,请先登录 或 注册
- 585 围观
评论
没见附件啊呵呵,可否在上传附件栏上传附件呢?
快乐地工作!满足地生活!
- 要发表评论,请先登录 或 注册
星期日, 09/07/2014 - 08:32 — 创新网小编已经补充,在网站上编辑格式不是很方便,不知道有没有编辑的建议?
- 要发表评论,请先登录 或 注册
星期二, 09/09/2014 - 20:22 — titer1如果你方便可以发给我们,我们帮你传上去 ,邮箱是editor@eetrend.com 多谢了
快乐地工作!满足地生活!
- 要发表评论,请先登录 或 注册
星期二, 09/09/2014 - 23:51 — 创新网小编