Unity教程之-Unity3d图形性能优化

 

试着翻译了下官方文档关于图形优化的部分,Optimizing Graphics Performance  图形性能优化

Good performance is critical to the success of many games. Below are some simple guidelines for maximizing the speed of your game’s graphical rendering.

良好的性能是很多游戏成功的关键。以下是一些最大化提高游戏图形渲染速度的简单指导。

Where are the graphics costs  图形开销在哪里

The graphical parts of your game can primarily cost on two systems of the computer: the GPU or the CPU. The first rule of any optimization is to find where the performance problem is; because strategies for optimizing for GPU vs. CPU are quite different (and can even be opposite – it’s quite common to make GPU do more work while optimizing for CPU, and vice versa).

游戏的图形显示部分主要消耗计算器的两个系统:GPU和CPU。任何优化的第一规则都是查明性能瓶颈在哪里,因为优化GPU和优化CPU的策略常常是不同的(甚至常常是相互对立的——常常为了优化CPU而把它的一些工作交给GPU,反之亦然)。

Typical bottlenecks and ways to check for them:

典型的瓶颈和检查方法:

  • • GPU is often limited by fillrate or memory bandwidth.
  • • GPU常常受限于填充率和存储器带宽。
    • • Does running the game at lower display resolution make it faster? If so, you’re most likely limited by fillrate on the GPU.
    • • 如果游戏在较低的显示分辨率上运行更快,则很可能是受限于GPU的填充率。
  • • CPU is often limited by the number of things that need to be rendered, also known as “draw calls”.
  • • CPU常常受限于需要渲染的物品数量,我们通常叫它“draw calls”(绘制调用,一般用专业术语“draw call”)
    • • Check “draw calls” in Rendering Statistics window; if it’s more than several thousand (for PCs) or several hundred (for mobile), then you might want to optimize the object count.
    • • 在Rendering Statistics窗口查看“draw calls”,如果在PC上它常常多达几千个,或者在移动设备上多达几百个,那么你可能需要优化物体的数量。

Of course, these are only the rules of thumb; the bottleneck could as well be somewhere else. Less typical bottlenecks:

当然,这只是根据经验的主要开销点,而瓶颈也可能在别处。不那么典型的瓶颈有:

  • • Rendering is not a problem, neither on the GPU nor the CPU! For example, your scripts or physics might be the actual problem. Use Profiler to figure this out.
  • • GPU和CPU在渲染方面都没有问题!比如,你的脚本或者物理可能是真正的问题所在,使用Unity的Profiler查明问题。
  • • GPU has too many vertices to process. How many vertices are “ok” depends on the GPU and the complexity of vertex shaders. Typical figures are “not more than 100 thousand” on mobile, and “not more than several million” on PC.
  • • GPU处理太多的顶点。到底多少个顶点是ok的,依赖于具体的GPU设备和顶点着色器的复杂度。典型的数字是,在移动设备上“不要超过10万个”,在PC上“不要超过数百万个”。
  • • CPU has too many vertices to process, for things that do vertex processing on the CPU. This could be skinned meshes, cloth simulation, particles etc.
  • • CPU处理太多的顶点 —— 针对CPU处理顶点的情况。这可能是蒙皮网格、衣服模拟、粒子等。

CPU optimization – draw call count  CPU优化:draw call数量

In order to render any object on the screen, the CPU has some work to do – things like figuring out which lights affect that object, setting up the shader & shader parameters, sending drawing commands to the graphics driver, which then prepares the commands to be sent off to the graphics card. All this “per object” CPU cost is not very cheap, so if you have lots of visible objects, it can add up.

为了渲染一个物体到屏幕上,CPU需要做一些工作——比如,哪些灯光影响物体,建立着色器和着色器参数,给显卡驱动发送绘制命令,然后准备发送给显卡的命令。单个物体的CPU开销并不昂贵,但是如果有很多可见物体,这些开销会累加。

So for example, if you have a thousand triangles, it will be much, much cheaper if they are all in one mesh, instead of having a thousand individual meshes one triangle each. The cost of both scenarios on the GPU will be very similar, but the work done by the CPU to render a thousand objects (instead of one) will be significant.

所以,举例来说,比如你有1000个三角形,相比每个三角形一个独立的网格,它们都在一个网格中对CPU开销要低得多。这两种方案对于GPU来说差别不大,但是CPU渲染1000个物体(代替1个)的开销多很多。

In order to make CPU do less work, it’s good to reduce the visible object count:

为了让CPU做更少的工作,减少可见对象的数量是很有效的:

  • • Combine close objects together, either manually or using Unity’s draw call batching.
  • • 合并接近的物体。可以手工合并,也可以利用Unity的draw call batching(批量draw call)。
  • • Use less materials in your objects, by putting separate textures into a larger texture atlas and so on.
  • • 在物体中使用更少的材质。可以把独立的纹理合并成一个更大的纹理图集。
  • • Use less things that cause objects to be rendered multiple times (reflections, shadows, per-pixel lights etc., see below).
  • • 避免使用导致物体被渲染多次的效果(比如反射、阴影、像素光照等,见下面)。

Combine objects together so that each mesh has at least several hundred triangles and uses only one Material for the entire mesh. It is important to understand that combining two objects which don’t share a material does not give you any performance increase at all. The most common reason for having multiple materials is that two meshes don’t share the same textures, so to optimize CPU performance, you should ensure that any objects you combine share the same textures.

合并物体使每个网格有至少几百个三角形,并且整个网格只使用一种材质。合并两个不共用同一材质的物体并不会提升性能,理解这一点很重要。拥有多个材质的最常见原因是两个网格不共用相同的纹理,所以为了优化CPU性能,你要确保合并的物体共用相同的纹理。

However, when using many pixel lights in the Forward rendering path, there are situations where combining objects may not make sense, as explained below.

然而,如果在正向渲染路径下使用很多像素光照,有一些情况下合并物体并没有效果,下面解释。

GPU: Optimizing Model Geometry  GPU:优化模型几何

When optimizing the geometry of a model, there are two basic rules:

有两个优化模型几何的基本原则:

  • • Don’t use any more triangles than necessary
  • • 不用使用任何非必要的多余三角形
  • • Try to keep the number of UV mapping seams and hard edges (doubled-up vertices) as low as possible
  • • 尽可能减少UV贴图接缝和硬边(顶点增多了一倍)的数量。

Note that the actual number of vertices that graphics hardware has to process is usually not the same as the number reported by a 3D application. Modeling applications usually display the geometric vertex count, i.e. the number of distinct corner points that make up a model. For a graphics card, however, some geometric vertices will need to be split into two or more logical vertices for rendering purposes. A vertex must be split if it has multiple normals, UV coordinates or vertex colors. Consequently, the vertex count in Unity is invariably higher than the count given by the 3D application.

注意,图形硬件处理的顶点实际数量常常跟3D应用程序报告的不一致。建模应用常常显示几何顶点数量,即构成模型的不同角点的数量。然而,图形显卡为了渲染目的可能会把一些几何顶点拆分成两个或者更多个逻辑顶点。如果一个顶点有多个法线、UV坐标或者顶点颜色,那么必须把它拆分。因此,Unity中的顶点数量一定会比3D应用程序给的定点数多。

While the amount of geometry in the models is mostly relevant for the GPU, some features in Unity also process models on the CPU, for example mesh skinning.

模型的几何数量主要对GPU有意义,Unity中的一些特性也在CPU上处理模型,比如网格蒙皮。

Lighting Performance  光照性能

Lighting which is not computed at all is always the fastest! Use Lightmapping to “bake” static lighting just once, instead of computing it each frame. The process of generating a lightmapped environment takes only a little longer than just placing a light in the scene in Unity, but:

不需要计算的光照是最快的。使用光照贴图烘焙静态光,只需要一次,代替了每帧计算。生成光照贴图环境,比在Unity的场景中放一个光源消耗的时间仅仅多一点,但是

  • • It is going to run a lot faster (2–3 times for 2 per-pixel lights)
  • • 它的运行速度要快很多(对于2个逐像素光照,快2-3倍)
  • • And it will look a lot better since you can bake global illumination and the lightmapper can smooth the results
  • • 并且,因为可以烘焙全局光照,并且lightmapper(光照贴图工具)可以让烘焙的结果更平滑,所以它看起来效果也好很多

In a lot of cases there can be simple tricks possible in shaders and content, instead of adding more lights all over the place. For example, instead of adding a light that shines straight into the camera to get “rim lighting” effect, consider adding a dedicated “rim lighting” computation into your shaders directly.

许多情况下,有一些着色器和内容的简单技巧,而不是在所有地方添加更多的光源。比如,为了获得“边缘光照”效果,可以直接在着色器中添加一次“边缘光照”计算,而不是添加一个直射相机的灯。

Lights in forward rendering 正向渲染中的光照

Per-pixel dynamic lighting will add significant rendering overhead to every affected pixel and can lead to objects being rendered in multiple passes. On less powerful devices, like mobile or low-end PC GPUs, avoid having more than one Pixel Light illuminating any single object, and use lightmaps to light static objects instead of having their lighting calculated every frame. Per-vertex dynamic lighting can add significant cost to vertex transformations. Try to avoid situations where multiple lights illuminate any given object.

对每一个受到影响的像素,逐像素动态光会累加可观的渲染耗费,并且会导致物体在多个通道被渲染。在性能比较差的设备上,比如移动设备或者低端PC的GPU,避免使用多于一个的像素灯照射任何单个物体,并且使用光照贴图来照亮静态物体而不是每帧计算光照。逐顶点动态光照会在顶点转换上累加客观的消耗。努力避免多个灯光照射任何给定物体的情况。

If you use pixel lighting then each mesh has to be rendered as many times as there are pixel lights illuminating it. If you combine two meshes that are very far apart, it will increase the effective size of the combined object. All pixel lights that illuminate any part of this combined object will be taken into account during rendering, so the number of rendering passes that need to be made could be increased. Generally, the number of passes that must be made to render the combined object is the sum of the number of passes for each of the separate objects, and so nothing is gained by combining. For this reason, you should not combine meshes that are far enough apart to be affected by different sets of pixel lights.

如果你使用像素光照,对于每一个网格,像素光照射它多少次,它将被渲染多少次。如果你合并两个相距较远的网格,将会增大合并物体的有效大小。渲染的时候,照射到合并物体任一部位的所有的像素光都会被计算,所以,需要渲染通道数量会增加。一般地,渲染合并物体的通道数量等于分别渲染独立物体的通道数量之和,所以,合并没有作用。因此,你不应该合并足够远以至于受不同像素光影响的网格。

During rendering, Unity finds all lights surrounding a mesh and calculates which of those lights affect it most. The Quality Settings are used to modify how many of the lights end up as pixel lights and how many as vertex lights. Each light calculates its importance based on how far away it is from the mesh and how intense its illumination is. Furthermore, some lights are more important than others purely from the game context. For this reason, every light has a Render Mode setting which can be set to Important or Not Important; lights marked as Not Important will typically have a lower rendering overhead.

渲染的时候,Unity查找所有网格周围所有的光,并计算哪一个对网格影响最大。Quality  Settings(质量设置)可以修改最终多少个光是像素光,多少个是顶点光。每一个光基于距离网格的距离计算它的权重和光照强度。此外,取决于游戏内容,有些光比别的光更重要。因此,每个光源有渲染模式设置,可以把它设置为重要或者不重要,标记为不重要的光一般有更低的渲染开销。

As an example, consider a driving game where the player’s car is driving in the dark with headlights switched on. The headlights are likely to be the most visually significant light sources in the game, so their Render Mode would probably be set to Important. On the other hand, there may be other lights in the game that are less important (other cars’ rear lights, say) and which don’t improve the visual effect much by being pixel lights. The Render Mode for such lights can safely be set to Not Important so as to avoid wasting rendering capacity in places where it will give little benefit.

举例来说,考虑一个赛车游戏,玩家的车开着车头灯,在黑夜中行驶。车头灯是游戏中最重要的可见光,所以它们的渲染模式可能要设置为重要。另一方面,可能游戏里的其它灯光没那么重要(比如其它汽车的尾灯),对这些灯光来说,使用像素光照提升可视效果作用不大,可以把它们设置为不重要,避免在只能获得较少效果的地方浪费渲染性能。

Optimizing per-pixel lighting saves both CPU and the GPU: the CPU has less draw calls to do, and the GPU has less vertices to process and pixels to rasterize for all these additional object renders.

对于CPU和GPU来说,优化逐像素光照都可以减少开销:CPU需要处理的draw call少了,GPU需要处理的顶点和光栅化所有这些额外对象的渲染的像素少了。

GPU: Texture Compression and Mipmaps  GPU:纹理压缩和多重纹理

Using Compressed Textures will decrease the size of your textures (resulting in faster load times and smaller memory footprint) and can also dramatically increase rendering performance. Compressed textures use only a fraction of the memory bandwidth needed for uncompressed 32bit RGBA textures.

使用压缩纹理会减少纹理大小(结果是更快的加载速度和更小的内存占用)并且大幅提高渲染性能。压缩纹理占用的存储带宽只有未压缩的32位RGBA纹理的一小部分。

Use Texture Mip Maps  使用多重纹理

As a rule of thumb, always have Generate Mip Maps enabled for textures used in a 3D scene. In the same way Texture Compression can help limit the amount of texture data transfered when the GPU is rendering, a mip mapped texture will enable the GPU to use a lower-resolution texture for smaller triangles.

作为经验,在3D场景中使用的纹理总是启用生成多重纹理。以同样的方式,GPU渲染时,纹理压缩可以帮助限制传输的纹理数据量,因为对于较小的三角形,多重纹理允许GPU使用较低分辨率的纹理。

The only exception to this rule is when a texel (texture pixel) is known to map 1:1 to the rendered screen pixel, as with UI elements or in a 2D game.

这条规则的例外是,知道texel(纹理像素)是1:1映射到渲染的屏幕像素,比如UI元素或者在2D游戏中。

LOD and Per-Layer Cull Distances  LOD(多细节层次)和每层剔除距离

In some games, it may be appropriate to cull small objects more aggressively than large ones, in order to reduce both the CPU and GPU load. For example, small rocks and debris could be made invisible at long distances while large buildings would still be visible.

在一些游戏中,为了减少CPU和GPU负担,可以适当剔除小物体。比如,远距离的小石头和碎片可以设为不可见,而大的建筑物是可见的。

This can be either achieved by Level Of Detail system, or by setting manual per-layer culling distances on the camera. You could put small objects into a separate layer and setup per-layer cull distances using the Camera.layerCullDistances script function.

可以使用LOD系统,或者在相机上设置手工每层剔除距离,来做剔除。你可以把小物体放入一个独立的层,然后使用Camera.layerCullDistance脚本函数设置每层的剔除距离。

Realtime Shadows  实时阴影

Realtime shadows are nice, but they can cost quite a lot of performance, both in terms of extra draw calls for the CPU, and extra processing on the GPU. For further details, see the Shadows page.

实时阴影效果很好,但是会消耗很多的性能,包括CPU额外的draw call和GPU额外的处理。更多细节,看文档的阴影页面。

GPU: Tips for writing high-performance shaders  GPU:写高性能着色器的提示

A high-end PC GPU and a low-end mobile GPU can be literally hundreds of times performance difference apart. Same is true even on a single platform. On a PC, a fast GPU is dozens of times faster than a slow integrated GPU; and on mobile platforms you can see just as large difference in GPUs.

毫不夸张的说,高端PC和低端移动设备的GPU性能可能相差几百倍,甚至在同一个平台上也相差这么大。在PC上,一个快的GPU几十倍速于低端集成GPU;在移动设备上,也是如此。

So keep in mind that GPU performance on mobile platforms and low-end PCs will be much lower than on your development machines. Typically, shaders will need to be hand optimized to reduce calculations and texture reads in order to get good performance. For example, some built-in Unity shaders have their “mobile” equivalents that are much faster (but have some limitations or approximations – that’s what makes them faster).

所以,请记住,在移动设备和低端PC上的GPU性能,可能比你的开发机器低得多。典型地,为了良好的性能,着色器需要手工优化来减少计算和纹理读取。例如,一些内置的Unity着色器有快得多的等价的“移动”版本(但是有些限制或者是近似值 – 就是这些使得更快)。

Below are some guidelines that are most important for mobile and low-end PC graphics cards:

下面是一些针对移动设备或者低端PC显卡的指南:

Complex mathematical operations  复杂的数学运算

Transcendental mathematical functions (such as powexplogcossintan, etc) are quite expensive, so a good rule of thumb is to have no more than one such operation per pixel. Consider using lookup textures as an alternative where applicable.

复杂的数学函数(比如pow、exp、log、cos、sin、tan等)开销很大,所以一个好的经验是不要在每个像素上使用这些函数。如果可以,考虑使用查找纹理作为替换。

It is not advisable to attempt to write your own normalizedotinversesqrt operations, however. If you use the built-in ones then the driver will generate much better code for you.

不建议自己实现normalize、dot、inversesqrt等运算,使用内置的函数,驱动会生成更好的代码。

Keep in mind that alpha test (discard) operation will make your fragments slower.

记住,alpha测试(裁剪)操作会使你的片段更慢。

Floating point operations  浮点运算

You should always specify the precision of floating point variables when writing custom shaders. It is critical to pick the smallest possible floating point format in order to get the best performance. Precision of operations is completely ignored on many desktop GPUs, but is critical for performance on many mobile GPUs.

写自定义的着色器时,应该指定浮点数精度。为了获得更好的性能,选用最小的可行浮点数格式是很关键的。运算精度在很多台式机GPU上完全被忽略,但是在移动设备GPU上,它对于性能很关键。

If the shader is written in Cg/HLSL then precision is specified as follows:

如果着色器是Cg/HLSL写的,精度如下:

  • • float – full 32-bit floating point format, suitable for vertex transformations but has the slowest performance.
  • • float – 32位浮点数格式,适合用于顶点变换,但是性能最慢。
  • • half – reduced 16-bit floating point format, suitable for texture UV coordinates and roughly twice as fast as float.
  • • half – 减半的16位浮点数格式,适合用于纹理UV坐标,性能大约是float的2倍。
  • • fixed – 10-bit fixed point format, suitable for colors, lighting calculation and other high-performance operations and roughly four times faster than float.
  • • fixed – 10位顶点格式,适合用于颜色、光照计算和其它高性能操作,比float大约快4倍。

If the shader is written in GLSL ES then the floating point precision is specified specified as highpmediumplowp respectively.

如果着色器用GLSL ES写的,浮点数格式分别是:highpmediumplowp

For further details about shader performance, please read the Shader Performance page.

了解着色器性能的更多细节,请阅读文档着色器性能页面。

Simple Checklist to make Your Game Faster  让你的游戏更快的简要清单

  • • Keep vertex count below 200K..3M per frame when targetting PCs, depending on the target GPU
  • • 如果目标设备是PC,保持顶点数低于20万-300万,依赖于目标GPU。
  • • If you’re using built-in shaders, pick ones from Mobile or Unlit category. They work on non-mobile platforms as well; but are simplified and approximated versions of the more complex shaders.
  • • 如果你使用内置着色器,选用Mobile或Unlit种类的。它们也可以在非移动平台良好工作,是复杂着色器的简化或者近似值版本。
  • • Keep the number of different materials per scene low – share as many materials between different objects as possible.
  • • 保持每次场景不同材质的数量 —— 不同物体尽可能共享材质。
  • • Set Static property on a non-moving objects to allow internal optimizations like static batching.
  • • 对于不移动的物体,设置Static属性,允许像静态批处理这样的内部优化。
  • • Do not use Pixel Lights when it is not necessary – choose to have only a single (preferably directional) pixel light affecting your geometry.
  • • 除非必要,不要使用像素光 —— 只选用一个(尽可能平行光)像素光影响你的几何体。
  • • Do not use dynamic lights when it is not necessary – choose to bake lighting instead.
  • • 除非必要,不要使用动态光 —— 选择烘焙光照来代替。
  • • Use compressed texture formats when possible, otherwise prefer 16bit textures over 32bit.
  • • 如果可以,尽量使用压缩纹理格式;否则,16位格式纹理性能好于32位的。
  • • Do not use fog when it is not necessary.
  • • 除非必要,不要使用雾效果。
  • • Learn benefits of Occlusion Culling and use it to reduce amount of visible geometry and draw-calls in case of complex static scenes with lots of occlusion. Plan your levels to benefit from occlusion culling.
  • • 学习遮挡剔除的好处,然后使用它来降低可见几何体和有许多遮挡的复杂静态场景draw call的数量。设计你准备从遮挡剔除获得好处的等级。
  • • Use skyboxes to “fake” distant geometry.
  • • 使用天空盒“冒充”远距离的几何体。
  • • Use pixel shaders or texture combiners to mix several textures instead of a multi-pass approach.
  • • 使用像素着色器或纹理合并来混合几个纹理,而不是多通道逼近。
  • • If writing custom shaders, always use smallest possible floating point format:
  • • 如果写自定义着色器,使用尽可能小的浮点数格式:
    • • fixed / lowp – for colors, lighting information and normals,
    • • fixed / lowp – 用于颜色、光照信息和法线,
    • • half / mediump – for texture UV coordinates,
    • • half / mediump – 用于纹理UV坐标,
    • • float / highp – avoid in pixel shaders, fine to use in vertex shader for position calculations.
    • • float / highp – 避免在像素着色器中使用,用于顶点着色器的位置计算比较好。
  • • Minimize use of complex mathematical operations such as powsincos etc. in pixel shaders.
  • • 在像素着色器中,尽可能少用复杂的数学函数,比如powsincos等。
  • • Choose to use less textures per fragment.
  • • 每个片段使用较少的纹理。