Rendering performance / OpenGL interaction

For questions or suggestions related to the source code.

Rendering performance / OpenGL interaction

Postby zakalawe on Tue, 31st Mar 2009, 11:10

A little discussion about rendering performance - on Mac, with even moderately complex routes (and the default visual settings, draw distance, etc), but on an extremely powerful machine (quad core MacPro, i.e 2 x dual 3Ghz Xeons with 3GB ram, and a nVidia 7300GT), I regularly experience single-digit frame rates. The ECML route is bad leaving Newcastle, the West Midlands routes again are bad at larger stations, the LU routes are almost unusable. Performance is fine when scene complexity drops, and does seem to vary pretty much proportionally with scene complexity - there's not some constant overhead at work. Keep in mind the machine runs countless other OpenGL applications, including the Tao examples, very well.

I've ran both debug and release version of the .exe, and used the official pre-built .exe with the same results, so I don't believe this is a compilation problem. It may be a general problem with the Mono runtime on Mac, but from inspecting the code I do see one other obvious explanation, which I wanted to ask about: the rendering code is submitting all the main scene geometry as GL_POLYGONS, using individual glVertex commands. I've always been taught that this can lead to very poor performance, since the driver can't optimise command transfers, pipeline on the card, and so on. GL_POLYGON in particular worries me, since it will presumably fall back to a software tessellation path. (There's an additional concern that each glXXX call is a managed/native switch with some overhead, which cumulatively may also be causing problems)

I've looked into modifying the code myself to use the approaches I was taught (vertex arrays or ideally VBOs, possibly with separate index buffers although I confess I'm out of date on the usefulness of those). The problem I encounter then is the Z-ordering model; the current rendering technique splits all geometry into faces and renders by depth; I'm used to collecting all opaque geometry by material (to minimise state changes) and then rendering in arbitrary order with z-writes and z-tests enabled. Transparent geometry does need to be rendered with a depth sort - usually a post-render stage with z-test enabled but z-writes disabled.

I'm still considering modifying the rendering code, but it would be a bigger project than I had hoped, so I wondered if anyone here has any comments on what I've said, before I spend many evenings on it.
zakalawe
 
Posts: 7
Joined: Fri, 30th Jan 2009, 11:26

Re: Rendering performance / OpenGL interaction

Postby michelle on Tue, 31st Mar 2009, 12:17

I get up to 600 fps on some routes, so it is more likely a problem specific to OS X, Mono and the device drivers available.

First of all, the renderer splits all faces into opaque (1), color-key (2), alpha (3) and overlay (4) polygons, and depth sorting only occurs on the alpha and overlay (2D cab) list, and optionally also on the color-key list (when smooth transparency mode is enabled). Opaque polygons occur for most of the polygons and are not depth-sorted. Reading the z-buffer is enabled for all but the overlay lists, and writing to the z-buffer is disabled for both the alpha and the overlay list.

"Transparent geometry does need to be rendered with a depth sort": If transparency means on/off, then you are almost right, but as soon as full alpha channels are used, or bilinear/trilinear/anisotropic filtering is used, then you are wrong.

Using GL_POLYGON is the only possible way as I cannot use GL_TRIANGLE or similar when the incoming data is stored in a different format - and arbitrary polygons are commonly used.

The reason why I am not yet using OpenGL display lists, but render each polygon individually, has a simple reason: Theoretically, I could assume that every object has a static position in 3D space in OpenGL and the camera moves along the scene. However, with the length some routes have, single-precision floating-point numbers (32-bit), which are used by the hardware, quickly reach their precision limits, and the geometry starts to get imprecise after some dozens of kilometers. More importantly, the camera starts to become asynchronous with the train movement really soon.

As such, I am currently keeping the camera static at (0,0,0) and am moving all polygons and not the camera. As I store all polygons in double-precision floating-point numbers (64-bit), this can account for the loss of precision the hardware imposes. This is probably not so much of an issue with first-person shooters, as they don't have that long of a scenery.

In the future though, I will be using a hybrid technique that will be able to incorporate OpenGL display lists: Move the objects and not the camera block-wise (e.g. on 100m boundaries), and within each block, move the camera and keep the objects static. This way, I will only need to update the OpenGL display lists every 100m or so, while in-between, I can make use of display lists which give higher performance. However, this is not possible for those lists that use depth sorting. And if I consider depth-peeling, things get more complicated anyway.

If you want to rewrite the renderer yourself, then I wish you luck, but you will need to account for the loss of precision that will ultimately occur if you keep the objects static, or have high-performance, but unusable graphics instead.
User avatar
michelle
Site Admin
 
Posts: 1147
Joined: Mon, 14th Apr 2008, 20:36


Return to Source code

Who is online

Users browsing this forum: No registered users and 0 guests