Archive for March 2008

Temporary break

This week I’ve started to write the thesis document, so in the next weeks I’ll be more busy with writing than with something else.

I hope to continue the research in my spare time, and to continue it after I’ll get my degree too, anyway every other news about the project will be posted on this blog.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Real-time thermal erosion much faster now!

Intoduction

“A picture is worth a thousand words” so I suppose a video should be much better ;-)

The real-time erosion has gained a lot from the latest optimization work, the number of FPS is passed from 1.8 to 71.4 for my first thermal erosion and from 3.1 to 71.4 for my second thermal erosion, that means that the first erosion is about 40 times faster and the second one is about 23 times faster, a huge improvement for a real-time program!

Even if the second algorithm is faster than the first one, using a small heightmap (256×256) gives the same execution time for both.

First thermal erosion

This video shows a terrain generated by the sum of 6 octaves of Perlin Noise and eroded by 100 iterations (double the number of the previous video) of my first thermal erosion algorithm.

The video is available on:

this erosion algorithm allows an average frame rate (computation + visualization) of 71.4 FPS using a 256×256 16-bits floating-point texture as heightmap.

Second thermal erosion

This video shows the same terrain generated by the sum of 6 octaves of Perlin Noise and eroded by 100 (double the number of the previous video) iterations of my second thermal erosion algorithm.

The video is available on:

this erosion algorithm allows an average frame rate (computation + visualization) of 71.4 FPS using a 256×256 16-bits floating-point texture as heightmap.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Benchmarks: Thermal Erosion algorithms

Introduction

I’ve done new benchmarks to test how the latest optimizations have improved the performance of the thermal erosion shaders.

All the benchmarks are made using a 1024×1024 16-bits floating points texture.

I’ve run the benchmarks on my graphic card, a GeForce 7600 GT, and on the graphic card of my friend Encelo, a GeForce 8600 GT.

The following data are the execution times needed to complete a different number of iterations of theerosion, for each group of iterations you can see the slowest, the average and the fastest time on 10 tests.

First thermal erosion

These are the results for the 7600 GT:

iterations = 10 -> min = 135 ms. - avg = 136 ms. - max = 137 ms.

iterations = 30 -> min = 406 ms. - avg = 407 ms. - max = 408 ms.

iterations = 50 -> min = 676 ms. - avg = 678 ms. - max = 679 ms.

iterations = 70 -> min = 948 ms. - avg = 949 ms. - max = 950 ms.

iterations = 100 -> min = 1354 ms. - avg = 1356 ms. - max = 1357 ms.

The optimized shader is about 47% faster than the previous version.

These are the results for the 8600 GT:

iterations = 10 -> min = 49 ms. - avg = 50 ms. - max = 51 ms.

iterations = 30 -> min = 148 ms. - avg = 149 ms. - max = 150 ms.

iterations = 50 -> min = 246 ms. - avg = 247 ms. - max = 249 ms.

iterations = 70 -> min = 346 ms. - avg = 347 ms. - max = 348 ms.

iterations = 100 -> min = 492 ms. - avg = 493 ms. - max = 495 ms.

The shader is about 64% faster on this graphic card.

Second thermal erosion

These are the results for the 7600 GT:

iterations = 10 -> min = 124 ms. - avg = 126 ms. - max = 127 ms.

iterations = 30 -> min = 376 ms. - avg = 377 ms. - max = 378 ms.

iterations = 50 -> min = 627 ms. - avg = 628 ms. - max = 630 ms.

iterations = 70 -> min = 879 ms. - avg = 880 ms. - max = 881 ms.

iterations = 100 -> min = 1255 ms. - avg = 1256 ms. - max = 1258 ms.

The optimized shader is about 48% faster than the previous version.

These are the results for the 8600 GT:

iterations = 10 -> min = 49 ms. - avg = 51 ms. - max = 52 ms.

iterations = 30 -> min = 150 ms. - avg = 151 ms. - max = 151 ms.

iterations = 50 -> min = 248 ms. - avg = 250 ms. - max = 252 ms.

iterations = 70 -> min = 347 ms. - avg = 349 ms. - max = 351 ms.

iterations = 100 -> min = 495 ms. - avg = 496 ms. - max = 498 ms.

The shader is about 39% faster on this graphic card.

My first thermal erosion

These are the results for the 7600 GT:

iterations = 10 -> min = 120 ms. - avg = 121 ms. - max = 121 ms.

iterations = 30 -> min = 361 ms. - avg = 363 ms. - max = 364 ms.

iterations = 50 -> min = 603 ms. - avg = 604 ms. - max = 605 ms.

iterations = 70 -> min = 845 ms. - avg = 846 ms. - max = 847 ms.

iterations = 100 -> min = 1207 ms. - avg = 1208 ms. - max = 1209 ms.

The optimized shader is about 49% faster than the previous version.

These are the results for the 8600 GT:

iterations = 10 -> min = 50 ms. - avg = 51 ms. - max = 52 ms.

iterations = 30 -> min = 151 ms. - avg = 153 ms. - max = 153 ms.

iterations = 50 -> min = 252 ms. - avg = 254 ms. - max = 255 ms.

iterations = 70 -> min = 353 ms. - avg = 356 ms. - max = 358 ms.

iterations = 100 -> min = 508 ms. - avg = 509 ms. - max = 511 ms.

The shader is about 58% faster on this graphic card.

My second thermal erosion

These are the results for the 7600 GT:

iterations = 10 -> min = 55 ms. - avg = 56 ms. - max = 57 ms.

iterations = 30 -> min = 165 ms. - avg = 166 ms. - max = 167 ms.

iterations = 50 -> min = 275 ms. - avg = 277 ms. - max = 277 ms.

iterations = 70 -> min = 386 ms. - avg = 387 ms. - max = 388 ms.

iterations = 100 -> min = 552 ms. - avg = 553 ms. - max = 554 ms.

The optimized shader is about 36% faster than the previous version.

These are the results for the 8600 GT:

iterations = 10 -> min = 24 ms. - avg = 24 ms. - max = 25 ms.

iterations = 30 -> min = 72 ms. - avg = 73 ms. - max = 73 ms.

iterations = 50 -> min = 120 ms. - avg = 122 ms. - max = 122 ms.

iterations = 70 -> min = 169 ms. - avg = 170 ms. - max = 172 ms.

iterations = 100 -> min = 241 ms. - avg = 242 ms. - max = 243 ms.

The shader is about 57% faster on this graphic card.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Benchmarks: Generation algorithms

Introduction

I’ve done new benchmarks to test how the latest optimizations have improved the performance of the generation shaders.

All the benchmarks are made using a 1024×1024 16-bits floating points texture.

I’ve run the benchmarks on my graphic card, a GeForce 7600 GT, and on the graphic card of my friend Encelo, a GeForce 8600 GT.

The following data are the execution times needed to complete a different number of iterations of the generation phase, for each group of iterations you can see the slowest, the average and the fastest time on 10 tests.

Fault formation

These are the results for the 7600 GT:

iterations = 250 -> min = 271 ms. - avg = 273 ms. - max = 276 ms.

iterations = 500 -> min = 543 ms. - avg = 545 ms. - max = 548 ms.

iterations = 1000 -> min = 1089 ms. - avg = 1090 ms. - max = 1093 ms.

iterations = 2000 -> min = 2179 ms. - avg = 2180 ms. - max = 2183 ms.

The optimized shader is about 12% faster than the previous version.

These are the results for the 8600 GT:

iterations = 250 -> min = 235 ms. - avg = 237 ms. - max = 239 ms.

iterations = 500 -> min = 471 ms. - avg = 473 ms. - max = 477 ms.

iterations = 1000 -> min = 942 ms. - avg = 945 ms. - max = 951 ms.

iterations = 2000 -> min = 1883 ms. - avg = 1885 ms. - max = 1889 ms.

The shader is about 14% faster on this graphic card.

Circles

These are the results for the 7600 GT:

iterations = 250 -> min = 338 ms. - avg = 340 ms. - max = 243 ms.

iterations = 500 -> min = 676 ms. - avg = 678 ms. - max = 682 ms.

iterations = 1000 -> min = 1355 ms. - avg = 1356 ms. - max = 1359 ms.

iterations = 2000 -> min = 2710 ms. - avg = 2712 ms. - max = 2715 ms.

The optimized shader is about 1% faster than the previous version.

These are the results for the 8600 GT:

iterations = 250 -> min = 238 ms. - avg = 239 ms. - max = 242 ms.

iterations = 500 -> min = 475 ms. - avg = 477 ms. - max = 481 ms.

iterations = 1000 -> min = 951 ms. - avg = 952 ms. - max = 955 ms.

iterations = 2000 -> min = 1900 ms. - avg = 1902 ms. - max = 1906 ms.

The shader is about 30% faster on this graphic card.

Perlin Noise

These are the results for the 7600 GT:

octaves = 2 -> min = 31 ms. - avg = 32 ms. - max = 32 ms.

octaves = 4 -> min = 61 ms. - avg = 62 ms. - max = 63 ms.

octaves = 6 -> min = 92 ms. - avg = 92 ms. - max = 93 ms.

octaves = 8 -> min = 121 ms. - avg = 122 ms. - max = 123 ms.

The optimized shader is about 61% faster than the previous version.

These are the results for the 8600 GT:

octaves = 2 -> min = 12 ms. - avg = 13 ms. - max = 13 ms.

octaves = 4 -> min = 23 ms. - avg = 24 ms. - max = 24 ms.

octaves = 6 -> min = 36 ms. - avg = 36 ms. - max = 36 ms.

octaves = 8 -> min = 47 ms. - avg = 48 ms. - max = 49 ms.

The shader is about 60% faster on this graphic card.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]

Shaders optimization

Introduction

Thanks to a night at Encelo’s flat that meant some discussions (several beers… :-) ) and a bit of testing I’ve improved a lot the performance of almost all the shaders!

I’ve continued to work on optimization for another day, and now I’m really satisfied of the results I’ve obtained, in most of cases the improvements are amazing, as you will see in the benchmarks I’m going to publish in the next posts.

In this discussion I’m going to talk about a couple of optimizations for shaders coding.

Optimizing the two phases shaders

As I’ve described in my previous posts, the first three thermal erosion shaders are based on two different phases.

In my first implementation I put the code of both phases in a single shader and the program decided what phase to execute according to a uniform variable “phase” and an if branch.

I’ve discovered this is a bad way of coding, the best way is split the two phases in two different shaders, attach them to two different shader programs and call the appropriate program inside the GL application keeping the logic branch there.

Optimizing the shaders execution

Thanks to the previous optimization I’ve started to attach every single shader to a different shader program, and (re)call the program when needed, i.e. the associated key is pressed.

Before of that I used a single shader program and I attached to it a different shader every time it was needed, this way required a linking at every execution and I suppose it requires several other operations hidden inside the GL/graphic pipeline, taking so much more time to execute the shader.

Some minor optimizations

Some minor suggestions to keep in mind are:

  • use GLSL functions also for simple code as the cross product
  • avoid type casting and use floats all the times that are required computations (usually GLSL functions requires float and so they built-in variables are)
  • use the latest video drivers, updating my NVIDIA video drivers from 100 to 169 version improved thermal erosion shaders by about 10%.
[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]