I was reading through dlls/dsound/mixer.c and I came across the function DSOUND_MixerVol() that really stood out. The purpose of the code it to apply a volume amplification by multiplying the channel data by the amplification factor. What *really* struck me was the parallelism that could be achieved using SIMD instructions (I am the author of the project SIMDx86 on Sourceforge):
for (i = 0; i < len; i += 2) { *bps = (*bps * dsb->cvolpan.dwTotalLeftAmpFactor) >> 16; bps++; } The segment below it for stereo sound not shown but is basically the same thing.
This could be done extremely easily using MMX, which could process 32 samples/loop interation, or SSE2 which could process 64 samples/loop (actually, more, but cache lines aren't that big yet). With the addition of some aggressive prefetching, there could be a signifigant speed up. Now, here is the question:
What is WINE's policy toward: 1) Optimizations, rather than bugfix patches. 2) Inline assembly language (allowed, disallowed)? Inline assembly language for the purpose of optimization (instead of, say, fixing up stack)? 3) Function pointers to select optimal code paths? 4) Detecting/using enhanced x86 instruction sets (e.g. MMX/SSE2)? Is there still an effort to make WINE work with non-x86? (DarWINE or something)?
Hi,
using MMX/SSE2 means it would be platform/architecture specific code. we should be platform independent as possible.
Thank, VJ
On 10/7/06, baggett.patrick@figglesoftware.com baggett.patrick@figglesoftware.com wrote:
I was reading through dlls/dsound/mixer.c and I came across the function DSOUND_MixerVol() that really stood out. The purpose of the code it to apply a volume amplification by multiplying the channel data by the amplification factor. What *really* struck me was the parallelism that could be achieved using SIMD instructions (I am the author of the project SIMDx86 on Sourceforge):
for (i = 0; i < len; i += 2) { *bps = (*bps * dsb->cvolpan.dwTotalLeftAmpFactor) >> 16; bps++; } The segment below it for stereo sound not shown but is basically the same thing.
This could be done extremely easily using MMX, which could process 32 samples/loop interation, or SSE2 which could process 64 samples/loop (actually, more, but cache lines aren't that big yet). With the addition of some aggressive prefetching, there could be a signifigant speed up. Now, here is the question:
What is WINE's policy toward:
- Optimizations, rather than bugfix patches.
- Inline assembly language (allowed, disallowed)? Inline assembly
language for the purpose of optimization (instead of, say, fixing up stack)? 3) Function pointers to select optimal code paths? 4) Detecting/using enhanced x86 instruction sets (e.g. MMX/SSE2)? Is there still an effort to make WINE work with non-x86? (DarWINE or something)?
What platform does not have MMX instructions and is now supported, is it problem to detect if CPU have MMX and use it if is it possible? Because speed improvment is always wantable.
Mirek
Vijay Kiran Kamuju napsal(a):
Hi,
using MMX/SSE2 means it would be platform/architecture specific code. we should be platform independent as possible.
Thank, VJ
On 10/7/06, baggett.patrick@figglesoftware.com baggett.patrick@figglesoftware.com wrote:
I was reading through dlls/dsound/mixer.c and I came across the function DSOUND_MixerVol() that really stood out. The purpose of the code it to apply a volume amplification by multiplying the channel data by the amplification factor. What *really* struck me was the parallelism that could be achieved using SIMD instructions (I am the author of the project SIMDx86 on Sourceforge):
for (i = 0; i < len; i += 2) { *bps = (*bps * dsb->cvolpan.dwTotalLeftAmpFactor) >> 16; bps++; } The segment below it for stereo sound not shown but is basically the same thing.
This could be done extremely easily using MMX, which could process 32 samples/loop interation, or SSE2 which could process 64 samples/loop (actually, more, but cache lines aren't that big yet). With the addition of some aggressive prefetching, there could be a signifigant speed up. Now, here is the question:
What is WINE's policy toward:
- Optimizations, rather than bugfix patches.
- Inline assembly language (allowed, disallowed)? Inline assembly
language for the purpose of optimization (instead of, say, fixing up stack)? 3) Function pointers to select optimal code paths? 4) Detecting/using enhanced x86 instruction sets (e.g. MMX/SSE2)? Is there still an effort to make WINE work with non-x86? (DarWINE or something)?
What platform does not have MMX instructions and is now supported, is it problem to detect if CPU have MMX and use it if is it possible? Because speed improvment is always wantable.
Mirek
Think about non-x86 CPUs on which Wine(lib) is used too.
Roderick
Am Sonntag 08 Oktober 2006 10:47 schrieb Mirek:
What platform does not have MMX instructions and is now supported, is it problem to detect if CPU have MMX and use it if is it possible? Because speed improvment is always wantable.
Shouldn't C code be writeable in a way that the compiler recognises what the code is up to and uses MMX and others automatically?
I was reading through dlls/dsound/mixer.c and I came across the function DSOUND_MixerVol() that really stood out. The purpose of the code it to apply a volume amplification by multiplying the channel data by the amplification factor. What *really* struck me was the parallelism that could be achieved using SIMD instructions (I am the author of the project SIMDx86 on Sourceforge):
for (i = 0; i < len; i += 2) { *bps = (*bps * dsb->cvolpan.dwTotalLeftAmpFactor) >> 16; bps++; } The segment below it for stereo sound not shown but is basically the same thing.
This could be done extremely easily using MMX, which could process 32 samples/loop interation, or SSE2 which could process 64 samples/loop (actually, more, but cache lines aren't that big yet). With the addition of some aggressive prefetching, there could be a signifigant speed up.
As mentioned in another part of this thread asm optimizations in Wine aren't preferred.
If you want to optimize Wine's audio performance this isn't really the area where you should look. The main bottlenecks are located in the interaction of dsound with wine's audio layer (oss/alsa) and the native sound drivers (oss/alsa). That is the area which is far from optimal and causes bad sound quality, latencies and other issues.
Roderick
Isn't this just a matter of #ifdefs? I don't really think Wine should stick to the lowest common denominator.
I do think that such optimizations should be very well documented and in sync with the original code.
Stephen
On 10/8/06, Roderick Colenbrander thunderbird2k@gmx.net wrote:
I was reading through dlls/dsound/mixer.c and I came across the function DSOUND_MixerVol() that really stood out. The purpose of the code it to apply a volume amplification by multiplying the channel data by the amplification factor. What *really* struck me was the parallelism that could be achieved using SIMD instructions (I am the author of the project SIMDx86 on Sourceforge):
for (i = 0; i < len; i += 2) { *bps = (*bps * dsb->cvolpan.dwTotalLeftAmpFactor) >> 16; bps++; } The segment below it for stereo sound not shown but is basically the same thing.
This could be done extremely easily using MMX, which could process 32 samples/loop interation, or SSE2 which could process 64 samples/loop (actually, more, but cache lines aren't that big yet). With the addition of some aggressive prefetching, there could be a signifigant speed up.
As mentioned in another part of this thread asm optimizations in Wine aren't preferred.
If you want to optimize Wine's audio performance this isn't really the area where you should look. The main bottlenecks are located in the interaction of dsound with wine's audio layer (oss/alsa) and the native sound drivers (oss/alsa). That is the area which is far from optimal and causes bad sound quality, latencies and other issues.
Roderick
-- Der GMX SmartSurfer hilft bis zu 70% Ihrer Onlinekosten zu sparen! Ideal für Modem und ISDN: http://www.gmx.net/de/go/smartsurfer