Hi all,
I've written a simple test application to measure gdiplus performace. Although currently it tests only GdipDrawImage() it could be easily extended to test anything else if desired.
I'm attaching the test source with results running 32-bit test binary with gdiplus=b and gdiplus=n (taken from win7). As could be observed, currently built-in gdiplus performs up to 17 times worse even in identity case in comparison with native gdiplus. With a scale/rotate graphics transform the results are up to 20 times slower.
Considerig that built-n gdiplus status regarding support for various complex things is pretty good, perhaps it's time to spend some time optimizing its performans.
I'm still primarily concerned with getting all of the features in place and behaving correctly first. Performance isn't a concern for me, but if someone points out a real app where it is an issue then I may be more interested.
I agree that image drawing is far enough along that we can start to optimize it, and I'm not surprised that the performance is bad. I imagine that convert_pixels is very slow in most cases, and optimized codepaths for the common cases could lead to some general improvement (even those cases that are treated specially have not been profiled, so they may not be ideal). The fact that alpha_blend_pixels works using GdipBitmapGetPixel/GdipBitmapSetPixel is very bad, and will hurt performance of any drawing operation on a bitmap.
When drawing to an HDC (or a Bitmap with premultiplied alpha), the fact that we convert from the bitmap format to non-premultiplied alpha, then convert back to premultiplied alpha in alpha_blend_pixels, is probably not good. In cases where we have no ImageAttributes object, we may be able to use premultiplied alpha for the whole process.
Vincent Povirk madewokherd@gmail.com wrote:
I'm still primarily concerned with getting all of the features in place and behaving correctly first. Performance isn't a concern for me, but if someone points out a real app where it is an issue then I may be more interested.
I have an application which generates the final image using 1 bpp TIFF as a background, placing some additional pieces here and there from different sources with different color formats and scaling transforms. Under Wine generating the final image takes tens of seconds while with native gdiplus it's almost instant.
I agree that image drawing is far enough along that we can start to optimize it, and I'm not surprised that the performance is bad. I imagine that convert_pixels is very slow in most cases, and optimized codepaths for the common cases could lead to some general improvement (even those cases that are treated specially have not been profiled, so they may not be ideal). The fact that alpha_blend_pixels works using GdipBitmapGetPixel/GdipBitmapSetPixel is very bad, and will hurt performance of any drawing operation on a bitmap.
When drawing to an HDC (or a Bitmap with premultiplied alpha), the fact that we convert from the bitmap format to non-premultiplied alpha, then convert back to premultiplied alpha in alpha_blend_pixels, is probably not good. In cases where we have no ImageAttributes object, we may be able to use premultiplied alpha for the whole process.
I suspect that generating an intermediate RGBA image in the process of drawing should be avoided when source or destination bitmaps have no alpha. Also the way how gdiplus currently scales source is pretty not optimal, using GDI instead is way faster and produces similar results.
I should've known you'd have an app where it matters.
I suspect that generating an intermediate RGBA image in the process of drawing should be avoided when source or destination bitmaps have no alpha.
I think we can get away with that if the source has no alpha, which isn't the case for indexed color, and then only if there's nothing that can add an alpha channel to it (such as a color transform, or interpolation when part of the source area pulls from the outside of the image).
In that case, we'd need a masked copy operation, which is equivalent to alpha blend when we know alpha is either 0 or 255. We can't just copy a rectangle because we can draw an image to an arbitrary parallelogram (unless we add to our list of conditions for an optimized codepath that the destination is a rectangle). There are many places where we could use such a thing, but I don't know how we would implement such that it is faster than the fastest we can reasonably do alpha blending (which is definitely NOT the way we are doing it now).
It does not matter whether the destination has alpha.
In general, I would prefer to see optimizations to the things we are already doing, before we start adding special cases.
Also the way how gdiplus currently scales source is pretty not optimal, using GDI instead is way faster and produces similar results.
Well, GDI is using a DIB engine so we should be able to match whatever it's doing.