As you might know, I'm working on getting Evas running as fast as possible on Nokia N800 but for the last 2 weeks I was being annoyed by a problem related to selecting areas to get updated by X11... I think some improvements now!
Every scene manager or drawing canvas, in order to be fast, must not redraw or ask the underlying system to redraw parts that didn't change since last iteration and we shouldn't draw the same area twice (if it's opaque drawing).
This can be solved by splitting rectangles so overlaps are removed, but this can generate a lot of rectangles, and since you have to communicate with X11 and even do some operation on this rectangles yourself, the more rectangles, the worse...
So you also need to merge neighbor rectangles, maybe let some rectangles overlap, if this area is not that big... or even merge rectangles that are not exact neighbors but have an acceptable error if you get their bounding box.
Problem: complexity of these algoritms are quadratic, at least if you compare on 2-by-2 basis.
In order to avoid this, Evas did a really nice trick: use a boolean matrix to represent the screen segmented in tiles (8x8 by default), when you add some rectangle, mark with "True", when you delete, mark with "False", when you need to discover rectangles, walk the matrix and as soon as you find a horizontal "True" span, you walk next lines to see if you can make this rectangle taller. This makes things really easy and complexity is linear on screen size ((width * height) / TILESIZE²).
On PCs, with huge data caches and enhanced branch prediction, this is really fast, but on N800, OProfile points both
evas_common_tilebuf_get_render_rects() as time consuming. Problem is that although complexity is O(n), n = (800 * 480) / 8² = 6000, and we walk this amount even if there is no rectangles to paint! If you have the full screen to render, you'll walk it twice, so 12000 iterations.
As usually we have small amount of rectangles, and usually these overlap, we can get these merged and n will be reduced, with O(n²) worst case but something around O(nlogn) for common cases not being so bad.
So after discussing and drawing many ideas with Chenca, we found some good speedup cases and I wrote a small test in Python (pygame test), then in C (sdl test) and then moved it to Evas. Rasterman did some tests and he liked it, so he accepted my patch and code is already in CVS, it's used by default but you can have the old code by commenting out "
#define EVAS_RECT_SPLIT" from evas_common.h.
I'm still looking for improvements and speed up changes to this algorithms, ideas?
Code can be downloaded from: http://barbieri-playground.googlecode.com/svn/rectangular-areas/ (SVN)
- I keep rectangles in an unordered list, so yes I do compare rectangles that are distant on screen;
- We've already considered using some smarter data structure, like R-Tree or Quad-Tree, but probably they would add more overhead than it worth. Probaby finding ways to simplify rectangles and keeping number of rectangles low is better (try to use bounding box more often);
- I want to implement some checks to avoid generating 3 splits on fuzzy split, if one split is to generate 3 new rectangles, it's better to invert the order and generate just 2;
- I want to avoid rectangles with Height/Width >> 1.0 to overlap others, overlaps just worth to rectangles with small height;
- Evas implementation downsize resolution in order to help number of rectangles being small, to avoid 1px rectangles and to keep buffers aligned (my focus is 16bpp system, so 2 bytes per pixel, 2 pixels is 4 bytes, N800 memory alignment).