I improved the IQM CPU vertex skinning performance and added GPU vertex skinning to the opengl2 renderer. IQM may actually be usable now.
CPU vertex skinning now only calculates unique vertex bone indexes and weights combinations once per model surface; inspired by Cube 2: Sauerbraten documentation.
http://sauerbraten.org/docs/models.html
To optimize animation of the model on both CPU and GPU, keep the number of blend weights per vertex to a minimum. Also, similar combinations of blend weights are cached while animating, especially on the CPU, such that if two or more vertices use the same blend weights, blending calculations only have to be done once for all the vertices - so try and minimize the number of distinct combinations of blend weights if possible.
opengl1
Vertex skinning for my turtle IQM use to take 3.6 times as long as MD3. Now it takes 1.6 times as long as MD3.
Old IQM CPU vertex skinning:
(draw entity with model 100 times, time measured in nanoseconds but displayed as milliseconds)
(100x turtle) total IQM frame time 43 msec, vertex 38 msec, skeleton 4 msec
(100x mrfixit) total IQM frame time 17 msec, vertex 16 msec, skeleton 1 msec
(100x qshambler) total IQM frame time 42 msec, vertex 41 msec, skeleton 1 msec
(100x qvore) total IQM frame time 10 msec, vertex 9 msec, skeleton 1 msec
New IQM CPU vertex skinning:
(100x turtle) total IQM frame time 20 msec, vertex 16 msec, skeleton 4 msec
(100x mrfixit) total IQM frame time 7 msec, vertex 6 msec, skeleton 1 msec
(100x qshambler) total IQM frame time 38 msec, vertex 37 msec, skeleton 1 msec
(100x qvore) total IQM frame time 7 msec, vertex 6 msec, skeleton 1 msec
MD3: The performance goal
(100x turtle) MD3 frame time 12 msec
opengl2
I didn’t know CPU vertex skinning in opengl2 renderer was so much slower than in opengl1.
Old IQM CPU vertex skinning:
(100x turtle) total IQM frame time 127 msec, vertex 117 msec, skeleton 10 msec
(100x mrfixit) total IQM frame time 51 msec, vertex 50 msec, skeleton 1 msec
(100x qshambler) total IQM frame time 121 msec, vertex 120 msec, skeleton 1 msec
(100x qvore) total IQM frame time 30 msec, vertex 29 msec, skeleton 1 msec
New IQM CPU vertex skinning:
(100x turtle) total IQM frame time 81 msec, vertex 71 msec, skeleton 10 msec
(100x mrfixit) total IQM frame time 30 msec, vertex 29 msec, skeleton 1 msec
(100x qshambler) total IQM frame time 108 msec, vertex 107 msec, skeleton 1 msec
(100x qvore) total IQM frame time 22 msec, vertex 21 msec, skeleton 1 msec
When drawing 100 turtle IQM models there are 40 msec spent in R_VaoPackNormal() / R_VaoPackTangent() called by RB_IQMSurfaceAnim() each frame. Setting r_depthPrepass to 0 cuts total time in half because r_depthPrepass 1 draws the models a second time. Switching to VBO avoids R_VaoPackNormal() slowness and reduces impact of r_depthPrepass.
opengl2 GPU vertex skinning compared to new CPU vertex skinning and opengl1 renderer:
(100x turtle) GPU: 60fps, CPU: 7fps, opengl1: 25fps
(100x mrfixit) GPU: 160fps, CPU: 18fps, opengl1: 60fps
(100x qshambler) GPU: 200fps, CPU: 6fps, opengl1: 18fps
(100x qvore) GPU: 200fps, CPU: 30fps, opengl1: 81fps