[PATCH 0/2] MR8226: d3dx9: Replace libtxc_dxtn.

List overview All Threads

newer

older

[PATCH v2 0/2] MR8245:...

[PATCH v3 0/1] MR8228: mshtml: Fix...

Connor McAdams (＠cmcadams)

6 Jun 2025 6 Jun '25

1:18 p.m.

These patches replace the existing libtxc_dxtn texture compression/decompression source files with new libraries that can handle more formats. This is will be helpful when we begin to share code with d3dx10/d3dx11, as we will be able to use bcdec to decode all supported compressed formats, and stb_dxt to compress BC1-BC5.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/8226

Show replies by date

Connor McAdams

6 Jun 6 Jun

1:18 p.m.

New subject: [PATCH 1/2] d3dx9: Replace txc_fetch_dxtn with bcdec.

From: Connor McAdams cmcadams@codeweavers.com

diff --git a/dlls/d3dx9_24/Makefile.in b/dlls/d3dx9_24/Makefile.in index 3f3fc57bfb5..000bf47af30 100644 --- a/dlls/d3dx9_24/Makefile.in +++ b/dlls/d3dx9_24/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_25/Makefile.in b/dlls/d3dx9_25/Makefile.in index 93691cfb23d..6796f90b1c4 100644 --- a/dlls/d3dx9_25/Makefile.in +++ b/dlls/d3dx9_25/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_26/Makefile.in b/dlls/d3dx9_26/Makefile.in index b7369e7810e..cb6ff05ccbb 100644 --- a/dlls/d3dx9_26/Makefile.in +++ b/dlls/d3dx9_26/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_27/Makefile.in b/dlls/d3dx9_27/Makefile.in index da4e396db86..339bccaae96 100644 --- a/dlls/d3dx9_27/Makefile.in +++ b/dlls/d3dx9_27/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_28/Makefile.in b/dlls/d3dx9_28/Makefile.in index b321e3605af..d25dc56a7e7 100644 --- a/dlls/d3dx9_28/Makefile.in +++ b/dlls/d3dx9_28/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_29/Makefile.in b/dlls/d3dx9_29/Makefile.in index c1bee84ca0c..fe06e9b4a31 100644 --- a/dlls/d3dx9_29/Makefile.in +++ b/dlls/d3dx9_29/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_30/Makefile.in b/dlls/d3dx9_30/Makefile.in index da162dae334..882c2355521 100644 --- a/dlls/d3dx9_30/Makefile.in +++ b/dlls/d3dx9_30/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_31/Makefile.in b/dlls/d3dx9_31/Makefile.in index d973244f620..a48b829a26c 100644 --- a/dlls/d3dx9_31/Makefile.in +++ b/dlls/d3dx9_31/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_32/Makefile.in b/dlls/d3dx9_32/Makefile.in index 1b2fb450b92..e7b902765d4 100644 --- a/dlls/d3dx9_32/Makefile.in +++ b/dlls/d3dx9_32/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_33/Makefile.in b/dlls/d3dx9_33/Makefile.in index 038280385e3..34f63cf1819 100644 --- a/dlls/d3dx9_33/Makefile.in +++ b/dlls/d3dx9_33/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_34/Makefile.in b/dlls/d3dx9_34/Makefile.in index 1cb2deace74..23faf212b25 100644 --- a/dlls/d3dx9_34/Makefile.in +++ b/dlls/d3dx9_34/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_35/Makefile.in b/dlls/d3dx9_35/Makefile.in index f08b2ebc52e..e7c6cb997b7 100644 --- a/dlls/d3dx9_35/Makefile.in +++ b/dlls/d3dx9_35/Makefile.in @@ -24,7 +24,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_36/Makefile.in b/dlls/d3dx9_36/Makefile.in index 798ae4c2c1d..e21a80122ae 100644 --- a/dlls/d3dx9_36/Makefile.in +++ b/dlls/d3dx9_36/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_36/bcdec.h b/dlls/d3dx9_36/bcdec.h new file mode 100644 index 00000000000..4e4c7360507 --- /dev/null +++ b/dlls/d3dx9_36/bcdec.h @@ -0,0 +1,1328 @@ +/* bcdec.h - v0.96 + provides functions to decompress blocks of BC compressed images + written by Sergii "iOrange" Kudlai in 2022 + + This library does not allocate memory and is trying to use as less stack as possible + + The library was never optimized specifically for speed but for the overall size + it has zero external dependencies and is not using any runtime functions + + Supported BC formats: + BC1 (also known as DXT1) + it's "binary alpha" variant BC1A (DXT1A) + BC2 (also known as DXT3) + BC3 (also known as DXT5) + BC4 (also known as ATI1N) + BC5 (also known as ATI2N) + BC6H (HDR format) + BC7 + + BC1/BC2/BC3/BC7 are expected to decompress into 4*4 RGBA blocks 8bit per component (32bit pixel) + BC4/BC5 are expected to decompress into 4*4 R/RG blocks 8bit per component (8bit and 16bit pixel) + BC6H is expected to decompress into 4*4 RGB blocks of either 32bit float or 16bit "half" per + component (96bit or 48bit pixel) + + For more info, issues and suggestions please visit https://github.com/iOrange/bcdec + + CREDITS: + Aras Pranckevicius (@aras-p) - BC1/BC3 decoders optimizations (up to 3x the speed) + - BC6H/BC7 bits pulling routines optimizations + - optimized BC6H by moving unquantize out of the loop + - Split BC6H decompression function into 'half' and + 'float' variants + + bugfixes: + @linkmauve + + LICENSE: See end of file for license information. +*/ + +#ifndef BCDEC_HEADER_INCLUDED +#define BCDEC_HEADER_INCLUDED + +/* if BCDEC_STATIC causes problems, try defining BCDECDEF to 'inline' or 'static inline' */ +#ifndef BCDECDEF +#ifdef BCDEC_STATIC +#define BCDECDEF static +#else +#ifdef __cplusplus +#define BCDECDEF extern "C" +#else +#define BCDECDEF extern +#endif +#endif +#endif + +/* Used information sources: + https://docs.microsoft.com/en-us/windows/win32/direct3d10/d3d10-graphics-pro... + https://docs.microsoft.com/en-us/windows/win32/direct3d11/bc6h-format + https://docs.microsoft.com/en-us/windows/win32/direct3d11/bc7-format + https://docs.microsoft.com/en-us/windows/win32/direct3d11/bc7-format-mode-re... + + ! WARNING ! Khronos's BPTC partitions tables contain mistakes, do not use them! + https://www.khronos.org/registry/DataFormat/specs/1.1/dataformat.1.1.html#BP... + + ! Use tables from here instead ! + https://www.khronos.org/registry/OpenGL/extensions/ARB/ARB_texture_compressi... + + Leaving it here as it's a nice read + https://fgiesen.wordpress.com/2021/10/04/gpu-bcn-decoding/ + + Fast half to float function from here + https://gist.github.com/rygorous/2144712 +*/ + +#define BCDEC_BC1_BLOCK_SIZE 8 +#define BCDEC_BC2_BLOCK_SIZE 16 +#define BCDEC_BC3_BLOCK_SIZE 16 +#define BCDEC_BC4_BLOCK_SIZE 8 +#define BCDEC_BC5_BLOCK_SIZE 16 +#define BCDEC_BC6H_BLOCK_SIZE 16 +#define BCDEC_BC7_BLOCK_SIZE 16 + +#define BCDEC_BC1_COMPRESSED_SIZE(w, h) ((((w)>>2)*((h)>>2))*BCDEC_BC1_BLOCK_SIZE) +#define BCDEC_BC2_COMPRESSED_SIZE(w, h) ((((w)>>2)*((h)>>2))*BCDEC_BC2_BLOCK_SIZE) +#define BCDEC_BC3_COMPRESSED_SIZE(w, h) ((((w)>>2)*((h)>>2))*BCDEC_BC3_BLOCK_SIZE) +#define BCDEC_BC4_COMPRESSED_SIZE(w, h) ((((w)>>2)*((h)>>2))*BCDEC_BC4_BLOCK_SIZE) +#define BCDEC_BC5_COMPRESSED_SIZE(w, h) ((((w)>>2)*((h)>>2))*BCDEC_BC5_BLOCK_SIZE) +#define BCDEC_BC6H_COMPRESSED_SIZE(w, h) ((((w)>>2)*((h)>>2))*BCDEC_BC6H_BLOCK_SIZE) +#define BCDEC_BC7_COMPRESSED_SIZE(w, h) ((((w)>>2)*((h)>>2))*BCDEC_BC7_BLOCK_SIZE) + +BCDECDEF void bcdec_bc1(const void* compressedBlock, void* decompressedBlock, int destinationPitch); +BCDECDEF void bcdec_bc2(const void* compressedBlock, void* decompressedBlock, int destinationPitch); +BCDECDEF void bcdec_bc3(const void* compressedBlock, void* decompressedBlock, int destinationPitch); +BCDECDEF void bcdec_bc4(const void* compressedBlock, void* decompressedBlock, int destinationPitch); +BCDECDEF void bcdec_bc5(const void* compressedBlock, void* decompressedBlock, int destinationPitch); +BCDECDEF void bcdec_bc6h_float(const void* compressedBlock, void* decompressedBlock, int destinationPitch, int isSigned); +BCDECDEF void bcdec_bc6h_half(const void* compressedBlock, void* decompressedBlock, int destinationPitch, int isSigned); +BCDECDEF void bcdec_bc7(const void* compressedBlock, void* decompressedBlock, int destinationPitch); + +#endif /* BCDEC_HEADER_INCLUDED */ + +#ifdef BCDEC_IMPLEMENTATION + +static void bcdec__color_block(const void* compressedBlock, void* decompressedBlock, int destinationPitch, int onlyOpaqueMode) { + unsigned short c0, c1; + unsigned int refColors[4]; /* 0xAABBGGRR */ + unsigned char* dstColors; + unsigned int colorIndices; + int i, j, idx; + unsigned int r0, g0, b0, r1, g1, b1, r, g, b; + + c0 = ((unsigned short*)compressedBlock)[0]; + c1 = ((unsigned short*)compressedBlock)[1]; + + /* Expand 565 ref colors to 888 */ + r0 = (((c0 >> 11) & 0x1F) * 527 + 23) >> 6; + g0 = (((c0 >> 5) & 0x3F) * 259 + 33) >> 6; + b0 = ((c0 & 0x1F) * 527 + 23) >> 6; + refColors[0] = 0xFF000000 | (b0 << 16) | (g0 << 8) | r0; + + r1 = (((c1 >> 11) & 0x1F) * 527 + 23) >> 6; + g1 = (((c1 >> 5) & 0x3F) * 259 + 33) >> 6; + b1 = ((c1 & 0x1F) * 527 + 23) >> 6; + refColors[1] = 0xFF000000 | (b1 << 16) | (g1 << 8) | r1; + + if (c0 > c1 || onlyOpaqueMode) { /* Standard BC1 mode (also BC3 color block uses ONLY this mode) */ + /* color_2 = 2/3*color_0 + 1/3*color_1 + color_3 = 1/3*color_0 + 2/3*color_1 */ + r = (2 * r0 + r1 + 1) / 3; + g = (2 * g0 + g1 + 1) / 3; + b = (2 * b0 + b1 + 1) / 3; + refColors[2] = 0xFF000000 | (b << 16) | (g << 8) | r; + + r = (r0 + 2 * r1 + 1) / 3; + g = (g0 + 2 * g1 + 1) / 3; + b = (b0 + 2 * b1 + 1) / 3; + refColors[3] = 0xFF000000 | (b << 16) | (g << 8) | r; + } else { /* Quite rare BC1A mode */ + /* color_2 = 1/2*color_0 + 1/2*color_1; + color_3 = 0; */ + r = (r0 + r1 + 1) >> 1; + g = (g0 + g1 + 1) >> 1; + b = (b0 + b1 + 1) >> 1; + refColors[2] = 0xFF000000 | (b << 16) | (g << 8) | r; + + refColors[3] = 0x00000000; + } + + colorIndices = ((unsigned int*)compressedBlock)[1]; + + /* Fill out the decompressed color block */ + dstColors = (unsigned char*)decompressedBlock; + for (i = 0; i < 4; ++i) { + for (j = 0; j < 4; ++j) { + idx = colorIndices & 0x03; + ((unsigned int*)dstColors)[j] = refColors[idx]; + colorIndices >>= 2; + } + + dstColors += destinationPitch; + } +} + +static void bcdec__sharp_alpha_block(const void* compressedBlock, void* decompressedBlock, int destinationPitch) { + unsigned short* alpha; + unsigned char* decompressed; + int i, j; + + alpha = (unsigned short*)compressedBlock; + decompressed = (unsigned char*)decompressedBlock; + + for (i = 0; i < 4; ++i) { + for (j = 0; j < 4; ++j) { + decompressed[j * 4] = ((alpha[i] >> (4 * j)) & 0x0F) * 17; + } + + decompressed += destinationPitch; + } +} + +static void bcdec__smooth_alpha_block(const void* compressedBlock, void* decompressedBlock, int destinationPitch, int pixelSize) { + unsigned char* decompressed; + unsigned char alpha[8]; + int i, j; + unsigned long long block, indices; + + block = *(unsigned long long*)compressedBlock; + decompressed = (unsigned char*)decompressedBlock; + + alpha[0] = block & 0xFF; + alpha[1] = (block >> 8) & 0xFF; + + if (alpha[0] > alpha[1]) { + /* 6 interpolated alpha values. */ + alpha[2] = (6 * alpha[0] + alpha[1] + 1) / 7; /* 6/7*alpha_0 + 1/7*alpha_1 */ + alpha[3] = (5 * alpha[0] + 2 * alpha[1] + 1) / 7; /* 5/7*alpha_0 + 2/7*alpha_1 */ + alpha[4] = (4 * alpha[0] + 3 * alpha[1] + 1) / 7; /* 4/7*alpha_0 + 3/7*alpha_1 */ + alpha[5] = (3 * alpha[0] + 4 * alpha[1] + 1) / 7; /* 3/7*alpha_0 + 4/7*alpha_1 */ + alpha[6] = (2 * alpha[0] + 5 * alpha[1] + 1) / 7; /* 2/7*alpha_0 + 5/7*alpha_1 */ + alpha[7] = ( alpha[0] + 6 * alpha[1] + 1) / 7; /* 1/7*alpha_0 + 6/7*alpha_1 */ + } + else { + /* 4 interpolated alpha values. */ + alpha[2] = (4 * alpha[0] + alpha[1] + 1) / 5; /* 4/5*alpha_0 + 1/5*alpha_1 */ + alpha[3] = (3 * alpha[0] + 2 * alpha[1] + 1) / 5; /* 3/5*alpha_0 + 2/5*alpha_1 */ + alpha[4] = (2 * alpha[0] + 3 * alpha[1] + 1) / 5; /* 2/5*alpha_0 + 3/5*alpha_1 */ + alpha[5] = ( alpha[0] + 4 * alpha[1] + 1) / 5; /* 1/5*alpha_0 + 4/5*alpha_1 */ + alpha[6] = 0x00; + alpha[7] = 0xFF; + } + + indices = block >> 16; + for (i = 0; i < 4; ++i) { + for (j = 0; j < 4; ++j) { + decompressed[j * pixelSize] = alpha[indices & 0x07]; + indices >>= 3; + } + + decompressed += destinationPitch; + } +} + +typedef struct bcdec__bitstream { + unsigned long long low; + unsigned long long high; +} bcdec__bitstream_t; + +static int bcdec__bitstream_read_bits(bcdec__bitstream_t* bstream, int numBits) { + unsigned int mask = (1 << numBits) - 1; + /* Read the low N bits */ + unsigned int bits = (bstream->low & mask); + + bstream->low >>= numBits; + /* Put the low N bits of "high" into the high 64-N bits of "low". */ + bstream->low |= (bstream->high & mask) << (sizeof(bstream->high) * 8 - numBits); + bstream->high >>= numBits; + + return bits; +} + +static int bcdec__bitstream_read_bit(bcdec__bitstream_t* bstream) { + return bcdec__bitstream_read_bits(bstream, 1); +} + +/* reversed bits pulling, used in BC6H decoding + why ?? just why ??? */ +static int bcdec__bitstream_read_bits_r(bcdec__bitstream_t* bstream, int numBits) { + int bits = bcdec__bitstream_read_bits(bstream, numBits); + /* Reverse the bits. */ + int result = 0; + while (numBits--) { + result <<= 1; + result |= (bits & 1); + bits >>= 1; + } + return result; +} + + + +BCDECDEF void bcdec_bc1(const void* compressedBlock, void* decompressedBlock, int destinationPitch) { + bcdec__color_block(compressedBlock, decompressedBlock, destinationPitch, 0); +} + +BCDECDEF void bcdec_bc2(const void* compressedBlock, void* decompressedBlock, int destinationPitch) { + bcdec__color_block(((char*)compressedBlock) + 8, decompressedBlock, destinationPitch, 1); + bcdec__sharp_alpha_block(compressedBlock, ((char*)decompressedBlock) + 3, destinationPitch); +} + +BCDECDEF void bcdec_bc3(const void* compressedBlock, void* decompressedBlock, int destinationPitch) { + bcdec__color_block(((char*)compressedBlock) + 8, decompressedBlock, destinationPitch, 1); + bcdec__smooth_alpha_block(compressedBlock, ((char*)decompressedBlock) + 3, destinationPitch, 4); +} + +BCDECDEF void bcdec_bc4(const void* compressedBlock, void* decompressedBlock, int destinationPitch) { + bcdec__smooth_alpha_block(compressedBlock, decompressedBlock, destinationPitch, 1); +} + +BCDECDEF void bcdec_bc5(const void* compressedBlock, void* decompressedBlock, int destinationPitch) { + bcdec__smooth_alpha_block(compressedBlock, decompressedBlock, destinationPitch, 2); + bcdec__smooth_alpha_block(((char*)compressedBlock) + 8, ((char*)decompressedBlock) + 1, destinationPitch, 2); +} + +/* http://graphics.stanford.edu/~seander/bithacks.html#VariableSignExtend */ +static int bcdec__extend_sign(int val, int bits) { + return (val << (32 - bits)) >> (32 - bits); +} + +static int bcdec__transform_inverse(int val, int a0, int bits, int isSigned) { + /* If the precision of A0 is "p" bits, then the transform algorithm is: + B0 = (B0 + A0) & ((1 << p) - 1) */ + val = (val + a0) & ((1 << bits) - 1); + if (isSigned) { + val = bcdec__extend_sign(val, bits); + } + return val; +} + +/* pretty much copy-paste from documentation */ +static int bcdec__unquantize(int val, int bits, int isSigned) { + int unq, s = 0; + + if (!isSigned) { + if (bits >= 15) { + unq = val; + } else if (!val) { + unq = 0; + } else if (val == ((1 << bits) - 1)) { + unq = 0xFFFF; + } else { + unq = ((val << 16) + 0x8000) >> bits; + } + } else { + if (bits >= 16) { + unq = val; + } else { + if (val < 0) { + s = 1; + val = -val; + } + + if (val == 0) { + unq = 0; + } else if (val >= ((1 << (bits - 1)) - 1)) { + unq = 0x7FFF; + } else { + unq = ((val << 15) + 0x4000) >> (bits - 1); + } + + if (s) { + unq = -unq; + } + } + } + return unq; +} + +static int bcdec__interpolate(int a, int b, int* weights, int index) { + return (a * (64 - weights[index]) + b * weights[index] + 32) >> 6; +} + +static unsigned short bcdec__finish_unquantize(int val, int isSigned) { + int s; + + if (!isSigned) { + return (unsigned short)((val * 31) >> 6); /* scale the magnitude by 31 / 64 */ + } else { + val = (val < 0) ? -(((-val) * 31) >> 5) : (val * 31) >> 5; /* scale the magnitude by 31 / 32 */ + s = 0; + if (val < 0) { + s = 0x8000; + val = -val; + } + return (unsigned short)(s | val); + } +} + +/* modified half_to_float_fast4 from https://gist.github.com/rygorous/2144712 */ +static float bcdec__half_to_float_quick(unsigned short half) { + typedef union { + unsigned int u; + float f; + } FP32; + + static const FP32 magic = { 113 << 23 }; + static const unsigned int shifted_exp = 0x7c00 << 13; /* exponent mask after shift */ + FP32 o; + unsigned int exp; + + o.u = (half & 0x7fff) << 13; /* exponent/mantissa bits */ + exp = shifted_exp & o.u; /* just the exponent */ + o.u += (127 - 15) << 23; /* exponent adjust */ + + /* handle exponent special cases */ + if (exp == shifted_exp) { /* Inf/NaN? */ + o.u += (128 - 16) << 23; /* extra exp adjust */ + } else if (exp == 0) { /* Zero/Denormal? */ + o.u += 1 << 23; /* extra exp adjust */ + o.f -= magic.f; /* renormalize */ + } + + o.u |= (half & 0x8000) << 16; /* sign bit */ + return o.f; +} + +BCDECDEF void bcdec_bc6h_half(const void* compressedBlock, void* decompressedBlock, int destinationPitch, int isSigned) { + static char actual_bits_count[4][14] = { + { 10, 7, 11, 11, 11, 9, 8, 8, 8, 6, 10, 11, 12, 16 }, /* W */ + { 5, 6, 5, 4, 4, 5, 6, 5, 5, 6, 10, 9, 8, 4 }, /* dR */ + { 5, 6, 4, 5, 4, 5, 5, 6, 5, 6, 10, 9, 8, 4 }, /* dG */ + { 5, 6, 4, 4, 5, 5, 5, 5, 6, 6, 10, 9, 8, 4 } /* dB */ + }; + + /* There are 32 possible partition sets for a two-region tile. + Each 4x4 block represents a single shape. + Here also every fix-up index has MSB bit set. */ + static unsigned char partition_sets[32][4][4] = { + { {128, 0, 1, 1}, {0, 0, 1, 1}, { 0, 0, 1, 1}, {0, 0, 1, 129} }, /* 0 */ + { {128, 0, 0, 1}, {0, 0, 0, 1}, { 0, 0, 0, 1}, {0, 0, 0, 129} }, /* 1 */ + { {128, 1, 1, 1}, {0, 1, 1, 1}, { 0, 1, 1, 1}, {0, 1, 1, 129} }, /* 2 */ + { {128, 0, 0, 1}, {0, 0, 1, 1}, { 0, 0, 1, 1}, {0, 1, 1, 129} }, /* 3 */ + { {128, 0, 0, 0}, {0, 0, 0, 1}, { 0, 0, 0, 1}, {0, 0, 1, 129} }, /* 4 */ + { {128, 0, 1, 1}, {0, 1, 1, 1}, { 0, 1, 1, 1}, {1, 1, 1, 129} }, /* 5 */ + { {128, 0, 0, 1}, {0, 0, 1, 1}, { 0, 1, 1, 1}, {1, 1, 1, 129} }, /* 6 */ + { {128, 0, 0, 0}, {0, 0, 0, 1}, { 0, 0, 1, 1}, {0, 1, 1, 129} }, /* 7 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, { 0, 0, 0, 1}, {0, 0, 1, 129} }, /* 8 */ + { {128, 0, 1, 1}, {0, 1, 1, 1}, { 1, 1, 1, 1}, {1, 1, 1, 129} }, /* 9 */ + { {128, 0, 0, 0}, {0, 0, 0, 1}, { 0, 1, 1, 1}, {1, 1, 1, 129} }, /* 10 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, { 0, 0, 0, 1}, {0, 1, 1, 129} }, /* 11 */ + { {128, 0, 0, 1}, {0, 1, 1, 1}, { 1, 1, 1, 1}, {1, 1, 1, 129} }, /* 12 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, { 1, 1, 1, 1}, {1, 1, 1, 129} }, /* 13 */ + { {128, 0, 0, 0}, {1, 1, 1, 1}, { 1, 1, 1, 1}, {1, 1, 1, 129} }, /* 14 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, { 0, 0, 0, 0}, {1, 1, 1, 129} }, /* 15 */ + { {128, 0, 0, 0}, {1, 0, 0, 0}, { 1, 1, 1, 0}, {1, 1, 1, 129} }, /* 16 */ + { {128, 1, 129, 1}, {0, 0, 0, 1}, { 0, 0, 0, 0}, {0, 0, 0, 0} }, /* 17 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, {129, 0, 0, 0}, {1, 1, 1, 0} }, /* 18 */ + { {128, 1, 129, 1}, {0, 0, 1, 1}, { 0, 0, 0, 1}, {0, 0, 0, 0} }, /* 19 */ + { {128, 0, 129, 1}, {0, 0, 0, 1}, { 0, 0, 0, 0}, {0, 0, 0, 0} }, /* 20 */ + { {128, 0, 0, 0}, {1, 0, 0, 0}, {129, 1, 0, 0}, {1, 1, 1, 0} }, /* 21 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, {129, 0, 0, 0}, {1, 1, 0, 0} }, /* 22 */ + { {128, 1, 1, 1}, {0, 0, 1, 1}, { 0, 0, 1, 1}, {0, 0, 0, 129} }, /* 23 */ + { {128, 0, 129, 1}, {0, 0, 0, 1}, { 0, 0, 0, 1}, {0, 0, 0, 0} }, /* 24 */ + { {128, 0, 0, 0}, {1, 0, 0, 0}, {129, 0, 0, 0}, {1, 1, 0, 0} }, /* 25 */ + { {128, 1, 129, 0}, {0, 1, 1, 0}, { 0, 1, 1, 0}, {0, 1, 1, 0} }, /* 26 */ + { {128, 0, 129, 1}, {0, 1, 1, 0}, { 0, 1, 1, 0}, {1, 1, 0, 0} }, /* 27 */ + { {128, 0, 0, 1}, {0, 1, 1, 1}, {129, 1, 1, 0}, {1, 0, 0, 0} }, /* 28 */ + { {128, 0, 0, 0}, {1, 1, 1, 1}, {129, 1, 1, 1}, {0, 0, 0, 0} }, /* 29 */ + { {128, 1, 129, 1}, {0, 0, 0, 1}, { 1, 0, 0, 0}, {1, 1, 1, 0} }, /* 30 */ + { {128, 0, 129, 1}, {1, 0, 0, 1}, { 1, 0, 0, 1}, {1, 1, 0, 0} } /* 31 */ + }; + + static int aWeight3[8] = { 0, 9, 18, 27, 37, 46, 55, 64 }; + static int aWeight4[16] = { 0, 4, 9, 13, 17, 21, 26, 30, 34, 38, 43, 47, 51, 55, 60, 64 }; + + bcdec__bitstream_t bstream; + int mode, partition, numPartitions, i, j, partitionSet, indexBits, index, ep_i, actualBits0Mode; + int r[4], g[4], b[4]; /* wxyz */ + unsigned short* decompressed; + int* weights; + + decompressed = (unsigned short*)decompressedBlock; + + bstream.low = ((unsigned long long*)compressedBlock)[0]; + bstream.high = ((unsigned long long*)compressedBlock)[1]; + + r[0] = r[1] = r[2] = r[3] = 0; + g[0] = g[1] = g[2] = g[3] = 0; + b[0] = b[1] = b[2] = b[3] = 0; + + mode = bcdec__bitstream_read_bits(&bstream, 2); + if (mode > 1) { + mode |= (bcdec__bitstream_read_bits(&bstream, 3) << 2); + } + + /* modes >= 11 (10 in my code) are using 0 one, others will read it from the bitstream */ + partition = 0; + + switch (mode) { + /* mode 1 */ + case 0b00: { + /* Partitition indices: 46 bits + Partition: 5 bits + Color Endpoints: 75 bits (10.555, 10.555, 10.555) */ + g[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gy[4] */ + b[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* by[4] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* bz[4] */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* rw[9:0] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* gw[9:0] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* bw[9:0] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* rx[4:0] */ + g[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gz[4] */ + g[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* gy[3:0] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* gx[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream); /* bz[0] */ + g[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* gz[3:0] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* bx[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 1; /* bz[1] */ + b[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* by[3:0] */ + r[2] |= bcdec__bitstream_read_bits(&bstream, 5); /* ry[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 2; /* bz[2] */ + r[3] |= bcdec__bitstream_read_bits(&bstream, 5); /* rz[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 3; /* bz[3] */ + partition = bcdec__bitstream_read_bits(&bstream, 5); /* d[4:0] */ + mode = 0; + } break; + + /* mode 2 */ + case 0b01: { + /* Partitition indices: 46 bits + Partition: 5 bits + Color Endpoints: 75 bits (7666, 7666, 7666) */ + g[2] |= bcdec__bitstream_read_bit(&bstream) << 5; /* gy[5] */ + g[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gz[4] */ + g[3] |= bcdec__bitstream_read_bit(&bstream) << 5; /* gz[5] */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 7); /* rw[6:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream); /* bz[0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 1; /* bz[1] */ + b[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* by[4] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 7); /* gw[6:0] */ + b[2] |= bcdec__bitstream_read_bit(&bstream) << 5; /* by[5] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 2; /* bz[2] */ + g[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gy[4] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 7); /* bw[6:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 3; /* bz[3] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 5; /* bz[5] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* bz[4] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 6); /* rx[5:0] */ + g[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* gy[3:0] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 6); /* gx[5:0] */ + g[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* gz[3:0] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 6); /* bx[5:0] */ + b[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* by[3:0] */ + r[2] |= bcdec__bitstream_read_bits(&bstream, 6); /* ry[5:0] */ + r[3] |= bcdec__bitstream_read_bits(&bstream, 6); /* rz[5:0] */ + partition = bcdec__bitstream_read_bits(&bstream, 5); /* d[4:0] */ + mode = 1; + } break; + + /* mode 3 */ + case 0b00010: { + /* Partitition indices: 46 bits + Partition: 5 bits + Color Endpoints: 72 bits (11.555, 11.444, 11.444) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* rw[9:0] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* gw[9:0] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* bw[9:0] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* rx[4:0] */ + r[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* rw[10] */ + g[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* gy[3:0] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 4); /* gx[3:0] */ + g[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* gw[10] */ + b[3] |= bcdec__bitstream_read_bit(&bstream); /* bz[0] */ + g[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* gz[3:0] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 4); /* bx[3:0] */ + b[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* bw[10] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 1; /* bz[1] */ + b[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* by[3:0] */ + r[2] |= bcdec__bitstream_read_bits(&bstream, 5); /* ry[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 2; /* bz[2] */ + r[3] |= bcdec__bitstream_read_bits(&bstream, 5); /* rz[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 3; /* bz[3] */ + partition = bcdec__bitstream_read_bits(&bstream, 5); /* d[4:0] */ + mode = 2; + } break; + + /* mode 4 */ + case 0b00110: { + /* Partitition indices: 46 bits + Partition: 5 bits + Color Endpoints: 72 bits (11.444, 11.555, 11.444) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* rw[9:0] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* gw[9:0] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* bw[9:0] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 4); /* rx[3:0] */ + r[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* rw[10] */ + g[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gz[4] */ + g[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* gy[3:0] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* gx[4:0] */ + g[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* gw[10] */ + g[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* gz[3:0] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 4); /* bx[3:0] */ + b[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* bw[10] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 1; /* bz[1] */ + b[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* by[3:0] */ + r[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* ry[3:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream); /* bz[0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 2; /* bz[2] */ + r[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* rz[3:0] */ + g[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gy[4] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 3; /* bz[3] */ + partition = bcdec__bitstream_read_bits(&bstream, 5); /* d[4:0] */ + mode = 3; + } break; + + /* mode 5 */ + case 0b01010: { + /* Partitition indices: 46 bits + Partition: 5 bits + Color Endpoints: 72 bits (11.444, 11.444, 11.555) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* rw[9:0] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* gw[9:0] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* bw[9:0] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 4); /* rx[3:0] */ + r[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* rw[10] */ + b[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* by[4] */ + g[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* gy[3:0] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 4); /* gx[3:0] */ + g[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* gw[10] */ + b[3] |= bcdec__bitstream_read_bit(&bstream); /* bz[0] */ + g[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* gz[3:0] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* bx[4:0] */ + b[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* bw[10] */ + b[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* by[3:0] */ + r[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* ry[3:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 1; /* bz[1] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 2; /* bz[2] */ + r[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* rz[3:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* bz[4] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 3; /* bz[3] */ + partition = bcdec__bitstream_read_bits(&bstream, 5); /* d[4:0] */ + mode = 4; + } break; + + /* mode 6 */ + case 0b01110: { + /* Partitition indices: 46 bits + Partition: 5 bits + Color Endpoints: 72 bits (9555, 9555, 9555) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 9); /* rw[8:0] */ + b[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* by[4] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 9); /* gw[8:0] */ + g[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gy[4] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 9); /* bw[8:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* bz[4] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* rx[4:0] */ + g[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gz[4] */ + g[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* gy[3:0] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* gx[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream); /* bz[0] */ + g[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* gx[3:0] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* bx[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 1; /* bz[1] */ + b[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* by[3:0] */ + r[2] |= bcdec__bitstream_read_bits(&bstream, 5); /* ry[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 2; /* bz[2] */ + r[3] |= bcdec__bitstream_read_bits(&bstream, 5); /* rz[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 3; /* bz[3] */ + partition = bcdec__bitstream_read_bits(&bstream, 5); /* d[4:0] */ + mode = 5; + } break; + + /* mode 7 */ + case 0b10010: { + /* Partitition indices: 46 bits + Partition: 5 bits + Color Endpoints: 72 bits (8666, 8555, 8555) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 8); /* rw[7:0] */ + g[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gz[4] */ + b[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* by[4] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 8); /* gw[7:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 2; /* bz[2] */ + g[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gy[4] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 8); /* bw[7:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 3; /* bz[3] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* bz[4] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 6); /* rx[5:0] */ + g[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* gy[3:0] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* gx[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream); /* bz[0] */ + g[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* gz[3:0] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* bx[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 1; /* bz[1] */ + b[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* by[3:0] */ + r[2] |= bcdec__bitstream_read_bits(&bstream, 6); /* ry[5:0] */ + r[3] |= bcdec__bitstream_read_bits(&bstream, 6); /* rz[5:0] */ + partition = bcdec__bitstream_read_bits(&bstream, 5); /* d[4:0] */ + mode = 6; + } break; + + /* mode 8 */ + case 0b10110: { + /* Partitition indices: 46 bits + Partition: 5 bits + Color Endpoints: 72 bits (8555, 8666, 8555) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 8); /* rw[7:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream); /* bz[0] */ + b[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* by[4] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 8); /* gw[7:0] */ + g[2] |= bcdec__bitstream_read_bit(&bstream) << 5; /* gy[5] */ + g[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gy[4] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 8); /* bw[7:0] */ + g[3] |= bcdec__bitstream_read_bit(&bstream) << 5; /* gz[5] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* bz[4] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* rx[4:0] */ + g[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gz[4] */ + g[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* gy[3:0] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 6); /* gx[5:0] */ + g[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* zx[3:0] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* bx[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 1; /* bz[1] */ + b[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* by[3:0] */ + r[2] |= bcdec__bitstream_read_bits(&bstream, 5); /* ry[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 2; /* bz[2] */ + r[3] |= bcdec__bitstream_read_bits(&bstream, 5); /* rz[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 3; /* bz[3] */ + partition = bcdec__bitstream_read_bits(&bstream, 5); /* d[4:0] */ + mode = 7; + } break; + + /* mode 9 */ + case 0b11010: { + /* Partitition indices: 46 bits + Partition: 5 bits + Color Endpoints: 72 bits (8555, 8555, 8666) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 8); /* rw[7:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 1; /* bz[1] */ + b[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* by[4] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 8); /* gw[7:0] */ + b[2] |= bcdec__bitstream_read_bit(&bstream) << 5; /* by[5] */ + g[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gy[4] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 8); /* bw[7:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 5; /* bz[5] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* bz[4] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* bw[4:0] */ + g[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gz[4] */ + g[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* gy[3:0] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 5); /* gx[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream); /* bz[0] */ + g[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* gz[3:0] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 6); /* bx[5:0] */ + b[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* by[3:0] */ + r[2] |= bcdec__bitstream_read_bits(&bstream, 5); /* ry[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 2; /* bz[2] */ + r[3] |= bcdec__bitstream_read_bits(&bstream, 5); /* rz[4:0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 3; /* bz[3] */ + partition = bcdec__bitstream_read_bits(&bstream, 5); /* d[4:0] */ + mode = 8; + } break; + + /* mode 10 */ + case 0b11110: { + /* Partitition indices: 46 bits + Partition: 5 bits + Color Endpoints: 72 bits (6666, 6666, 6666) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 6); /* rw[5:0] */ + g[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gz[4] */ + b[3] |= bcdec__bitstream_read_bit(&bstream); /* bz[0] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 1; /* bz[1] */ + b[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* by[4] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 6); /* gw[5:0] */ + g[2] |= bcdec__bitstream_read_bit(&bstream) << 5; /* gy[5] */ + b[2] |= bcdec__bitstream_read_bit(&bstream) << 5; /* by[5] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 2; /* bz[2] */ + g[2] |= bcdec__bitstream_read_bit(&bstream) << 4; /* gy[4] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 6); /* bw[5:0] */ + g[3] |= bcdec__bitstream_read_bit(&bstream) << 5; /* gz[5] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 3; /* bz[3] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 5; /* bz[5] */ + b[3] |= bcdec__bitstream_read_bit(&bstream) << 4; /* bz[4] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 6); /* rx[5:0] */ + g[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* gy[3:0] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 6); /* gx[5:0] */ + g[3] |= bcdec__bitstream_read_bits(&bstream, 4); /* gz[3:0] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 6); /* bx[5:0] */ + b[2] |= bcdec__bitstream_read_bits(&bstream, 4); /* by[3:0] */ + r[2] |= bcdec__bitstream_read_bits(&bstream, 6); /* ry[5:0] */ + r[3] |= bcdec__bitstream_read_bits(&bstream, 6); /* rz[5:0] */ + partition = bcdec__bitstream_read_bits(&bstream, 5); /* d[4:0] */ + mode = 9; + } break; + + /* mode 11 */ + case 0b00011: { + /* Partitition indices: 63 bits + Partition: 0 bits + Color Endpoints: 60 bits (10.10, 10.10, 10.10) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* rw[9:0] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* gw[9:0] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* bw[9:0] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 10); /* rx[9:0] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 10); /* gx[9:0] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 10); /* bx[9:0] */ + mode = 10; + } break; + + /* mode 12 */ + case 0b00111: { + /* Partitition indices: 63 bits + Partition: 0 bits + Color Endpoints: 60 bits (11.9, 11.9, 11.9) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* rw[9:0] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* gw[9:0] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* bw[9:0] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 9); /* rx[8:0] */ + r[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* rw[10] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 9); /* gx[8:0] */ + g[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* gw[10] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 9); /* bx[8:0] */ + b[0] |= bcdec__bitstream_read_bit(&bstream) << 10; /* bw[10] */ + mode = 11; + } break; + + /* mode 13 */ + case 0b01011: { + /* Partitition indices: 63 bits + Partition: 0 bits + Color Endpoints: 60 bits (12.8, 12.8, 12.8) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* rw[9:0] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* gw[9:0] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* bw[9:0] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 8); /* rx[7:0] */ + r[0] |= bcdec__bitstream_read_bits_r(&bstream, 2) << 10;/* rx[10:11] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 8); /* gx[7:0] */ + g[0] |= bcdec__bitstream_read_bits_r(&bstream, 2) << 10;/* gx[10:11] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 8); /* bx[7:0] */ + b[0] |= bcdec__bitstream_read_bits_r(&bstream, 2) << 10;/* bx[10:11] */ + mode = 12; + } break; + + /* mode 14 */ + case 0b01111: { + /* Partitition indices: 63 bits + Partition: 0 bits + Color Endpoints: 60 bits (16.4, 16.4, 16.4) */ + r[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* rw[9:0] */ + g[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* gw[9:0] */ + b[0] |= bcdec__bitstream_read_bits(&bstream, 10); /* bw[9:0] */ + r[1] |= bcdec__bitstream_read_bits(&bstream, 4); /* rx[3:0] */ + r[0] |= bcdec__bitstream_read_bits_r(&bstream, 6) << 10;/* rw[10:15] */ + g[1] |= bcdec__bitstream_read_bits(&bstream, 4); /* gx[3:0] */ + g[0] |= bcdec__bitstream_read_bits_r(&bstream, 6) << 10;/* gw[10:15] */ + b[1] |= bcdec__bitstream_read_bits(&bstream, 4); /* bx[3:0] */ + b[0] |= bcdec__bitstream_read_bits_r(&bstream, 6) << 10;/* bw[10:15] */ + mode = 13; + } break; + + default: { + /* Modes 10011, 10111, 11011, and 11111 (not shown) are reserved. + Do not use these in your encoder. If the hardware is passed blocks + with one of these modes specified, the resulting decompressed block + must contain all zeroes in all channels except for the alpha channel. */ + for (i = 0; i < 4; ++i) { + for (j = 0; j < 4; ++j) { + decompressed[j * 3 + 0] = 0; + decompressed[j * 3 + 1] = 0; + decompressed[j * 3 + 2] = 0; + } + decompressed += destinationPitch; + } + + return; + } + } + + numPartitions = (mode >= 10) ? 0 : 1; + + actualBits0Mode = actual_bits_count[0][mode]; + if (isSigned) { + r[0] = bcdec__extend_sign(r[0], actualBits0Mode); + g[0] = bcdec__extend_sign(g[0], actualBits0Mode); + b[0] = bcdec__extend_sign(b[0], actualBits0Mode); + } + + /* Mode 11 (like Mode 10) does not use delta compression, + and instead stores both color endpoints explicitly. */ + if ((mode != 9 && mode != 10) || isSigned) { + for (i = 1; i < (numPartitions + 1) * 2; ++i) { + r[i] = bcdec__extend_sign(r[i], actual_bits_count[1][mode]); + g[i] = bcdec__extend_sign(g[i], actual_bits_count[2][mode]); + b[i] = bcdec__extend_sign(b[i], actual_bits_count[3][mode]); + } + } + + if (mode != 9 && mode != 10) { + for (i = 1; i < (numPartitions + 1) * 2; ++i) { + r[i] = bcdec__transform_inverse(r[i], r[0], actualBits0Mode, isSigned); + g[i] = bcdec__transform_inverse(g[i], g[0], actualBits0Mode, isSigned); + b[i] = bcdec__transform_inverse(b[i], b[0], actualBits0Mode, isSigned); + } + } + + for (i = 0; i < (numPartitions + 1) * 2; ++i) { + r[i] = bcdec__unquantize(r[i], actualBits0Mode, isSigned); + g[i] = bcdec__unquantize(g[i], actualBits0Mode, isSigned); + b[i] = bcdec__unquantize(b[i], actualBits0Mode, isSigned); + } + + weights = (mode >= 10) ? aWeight4 : aWeight3; + for (i = 0; i < 4; ++i) { + for (j = 0; j < 4; ++j) { + partitionSet = (mode >= 10) ? ((i|j) ? 0 : 128) : partition_sets[partition][i][j]; + + indexBits = (mode >= 10) ? 4 : 3; + /* fix-up index is specified with one less bit */ + /* The fix-up index for subset 0 is always index 0 */ + if (partitionSet & 0x80) { + indexBits--; + } + partitionSet &= 0x01; + + index = bcdec__bitstream_read_bits(&bstream, indexBits); + + ep_i = partitionSet * 2; + decompressed[j * 3 + 0] = bcdec__finish_unquantize( + bcdec__interpolate(r[ep_i], r[ep_i+1], weights, index), isSigned); + decompressed[j * 3 + 1] = bcdec__finish_unquantize( + bcdec__interpolate(g[ep_i], g[ep_i+1], weights, index), isSigned); + decompressed[j * 3 + 2] = bcdec__finish_unquantize( + bcdec__interpolate(b[ep_i], b[ep_i+1], weights, index), isSigned); + } + + decompressed += destinationPitch; + } +} + +BCDECDEF void bcdec_bc6h_float(const void* compressedBlock, void* decompressedBlock, int destinationPitch, int isSigned) { + unsigned short block[16*3]; + float* decompressed; + const unsigned short* b; + int i, j; + + bcdec_bc6h_half(compressedBlock, block, 4*3, isSigned); + b = block; + decompressed = (float*)decompressedBlock; + for (i = 0; i < 4; ++i) { + for (j = 0; j < 4; ++j) { + decompressed[j * 3 + 0] = bcdec__half_to_float_quick(*b++); + decompressed[j * 3 + 1] = bcdec__half_to_float_quick(*b++); + decompressed[j * 3 + 2] = bcdec__half_to_float_quick(*b++); + } + decompressed += destinationPitch; + } +} + +static void bcdec__swap_values(int* a, int* b) { + a[0] ^= b[0], b[0] ^= a[0], a[0] ^= b[0]; +} + +BCDECDEF void bcdec_bc7(const void* compressedBlock, void* decompressedBlock, int destinationPitch) { + static char actual_bits_count[2][8] = { + { 4, 6, 5, 7, 5, 7, 7, 5 }, /* RGBA */ + { 0, 0, 0, 0, 6, 8, 7, 5 }, /* Alpha */ + }; + + /* There are 64 possible partition sets for a two-region tile. + Each 4x4 block represents a single shape. + Here also every fix-up index has MSB bit set. */ + static unsigned char partition_sets[2][64][4][4] = { + { /* Partition table for 2-subset BPTC */ + { {128, 0, 1, 1}, {0, 0, 1, 1}, { 0, 0, 1, 1}, {0, 0, 1, 129} }, /* 0 */ + { {128, 0, 0, 1}, {0, 0, 0, 1}, { 0, 0, 0, 1}, {0, 0, 0, 129} }, /* 1 */ + { {128, 1, 1, 1}, {0, 1, 1, 1}, { 0, 1, 1, 1}, {0, 1, 1, 129} }, /* 2 */ + { {128, 0, 0, 1}, {0, 0, 1, 1}, { 0, 0, 1, 1}, {0, 1, 1, 129} }, /* 3 */ + { {128, 0, 0, 0}, {0, 0, 0, 1}, { 0, 0, 0, 1}, {0, 0, 1, 129} }, /* 4 */ + { {128, 0, 1, 1}, {0, 1, 1, 1}, { 0, 1, 1, 1}, {1, 1, 1, 129} }, /* 5 */ + { {128, 0, 0, 1}, {0, 0, 1, 1}, { 0, 1, 1, 1}, {1, 1, 1, 129} }, /* 6 */ + { {128, 0, 0, 0}, {0, 0, 0, 1}, { 0, 0, 1, 1}, {0, 1, 1, 129} }, /* 7 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, { 0, 0, 0, 1}, {0, 0, 1, 129} }, /* 8 */ + { {128, 0, 1, 1}, {0, 1, 1, 1}, { 1, 1, 1, 1}, {1, 1, 1, 129} }, /* 9 */ + { {128, 0, 0, 0}, {0, 0, 0, 1}, { 0, 1, 1, 1}, {1, 1, 1, 129} }, /* 10 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, { 0, 0, 0, 1}, {0, 1, 1, 129} }, /* 11 */ + { {128, 0, 0, 1}, {0, 1, 1, 1}, { 1, 1, 1, 1}, {1, 1, 1, 129} }, /* 12 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, { 1, 1, 1, 1}, {1, 1, 1, 129} }, /* 13 */ + { {128, 0, 0, 0}, {1, 1, 1, 1}, { 1, 1, 1, 1}, {1, 1, 1, 129} }, /* 14 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, { 0, 0, 0, 0}, {1, 1, 1, 129} }, /* 15 */ + { {128, 0, 0, 0}, {1, 0, 0, 0}, { 1, 1, 1, 0}, {1, 1, 1, 129} }, /* 16 */ + { {128, 1, 129, 1}, {0, 0, 0, 1}, { 0, 0, 0, 0}, {0, 0, 0, 0} }, /* 17 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, {129, 0, 0, 0}, {1, 1, 1, 0} }, /* 18 */ + { {128, 1, 129, 1}, {0, 0, 1, 1}, { 0, 0, 0, 1}, {0, 0, 0, 0} }, /* 19 */ + { {128, 0, 129, 1}, {0, 0, 0, 1}, { 0, 0, 0, 0}, {0, 0, 0, 0} }, /* 20 */ + { {128, 0, 0, 0}, {1, 0, 0, 0}, {129, 1, 0, 0}, {1, 1, 1, 0} }, /* 21 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, {129, 0, 0, 0}, {1, 1, 0, 0} }, /* 22 */ + { {128, 1, 1, 1}, {0, 0, 1, 1}, { 0, 0, 1, 1}, {0, 0, 0, 129} }, /* 23 */ + { {128, 0, 129, 1}, {0, 0, 0, 1}, { 0, 0, 0, 1}, {0, 0, 0, 0} }, /* 24 */ + { {128, 0, 0, 0}, {1, 0, 0, 0}, {129, 0, 0, 0}, {1, 1, 0, 0} }, /* 25 */ + { {128, 1, 129, 0}, {0, 1, 1, 0}, { 0, 1, 1, 0}, {0, 1, 1, 0} }, /* 26 */ + { {128, 0, 129, 1}, {0, 1, 1, 0}, { 0, 1, 1, 0}, {1, 1, 0, 0} }, /* 27 */ + { {128, 0, 0, 1}, {0, 1, 1, 1}, {129, 1, 1, 0}, {1, 0, 0, 0} }, /* 28 */ + { {128, 0, 0, 0}, {1, 1, 1, 1}, {129, 1, 1, 1}, {0, 0, 0, 0} }, /* 29 */ + { {128, 1, 129, 1}, {0, 0, 0, 1}, { 1, 0, 0, 0}, {1, 1, 1, 0} }, /* 30 */ + { {128, 0, 129, 1}, {1, 0, 0, 1}, { 1, 0, 0, 1}, {1, 1, 0, 0} }, /* 31 */ + { {128, 1, 0, 1}, {0, 1, 0, 1}, { 0, 1, 0, 1}, {0, 1, 0, 129} }, /* 32 */ + { {128, 0, 0, 0}, {1, 1, 1, 1}, { 0, 0, 0, 0}, {1, 1, 1, 129} }, /* 33 */ + { {128, 1, 0, 1}, {1, 0, 129, 0}, { 0, 1, 0, 1}, {1, 0, 1, 0} }, /* 34 */ + { {128, 0, 1, 1}, {0, 0, 1, 1}, {129, 1, 0, 0}, {1, 1, 0, 0} }, /* 35 */ + { {128, 0, 129, 1}, {1, 1, 0, 0}, { 0, 0, 1, 1}, {1, 1, 0, 0} }, /* 36 */ + { {128, 1, 0, 1}, {0, 1, 0, 1}, {129, 0, 1, 0}, {1, 0, 1, 0} }, /* 37 */ + { {128, 1, 1, 0}, {1, 0, 0, 1}, { 0, 1, 1, 0}, {1, 0, 0, 129} }, /* 38 */ + { {128, 1, 0, 1}, {1, 0, 1, 0}, { 1, 0, 1, 0}, {0, 1, 0, 129} }, /* 39 */ + { {128, 1, 129, 1}, {0, 0, 1, 1}, { 1, 1, 0, 0}, {1, 1, 1, 0} }, /* 40 */ + { {128, 0, 0, 1}, {0, 0, 1, 1}, {129, 1, 0, 0}, {1, 0, 0, 0} }, /* 41 */ + { {128, 0, 129, 1}, {0, 0, 1, 0}, { 0, 1, 0, 0}, {1, 1, 0, 0} }, /* 42 */ + { {128, 0, 129, 1}, {1, 0, 1, 1}, { 1, 1, 0, 1}, {1, 1, 0, 0} }, /* 43 */ + { {128, 1, 129, 0}, {1, 0, 0, 1}, { 1, 0, 0, 1}, {0, 1, 1, 0} }, /* 44 */ + { {128, 0, 1, 1}, {1, 1, 0, 0}, { 1, 1, 0, 0}, {0, 0, 1, 129} }, /* 45 */ + { {128, 1, 1, 0}, {0, 1, 1, 0}, { 1, 0, 0, 1}, {1, 0, 0, 129} }, /* 46 */ + { {128, 0, 0, 0}, {0, 1, 129, 0}, { 0, 1, 1, 0}, {0, 0, 0, 0} }, /* 47 */ + { {128, 1, 0, 0}, {1, 1, 129, 0}, { 0, 1, 0, 0}, {0, 0, 0, 0} }, /* 48 */ + { {128, 0, 129, 0}, {0, 1, 1, 1}, { 0, 0, 1, 0}, {0, 0, 0, 0} }, /* 49 */ + { {128, 0, 0, 0}, {0, 0, 129, 0}, { 0, 1, 1, 1}, {0, 0, 1, 0} }, /* 50 */ + { {128, 0, 0, 0}, {0, 1, 0, 0}, {129, 1, 1, 0}, {0, 1, 0, 0} }, /* 51 */ + { {128, 1, 1, 0}, {1, 1, 0, 0}, { 1, 0, 0, 1}, {0, 0, 1, 129} }, /* 52 */ + { {128, 0, 1, 1}, {0, 1, 1, 0}, { 1, 1, 0, 0}, {1, 0, 0, 129} }, /* 53 */ + { {128, 1, 129, 0}, {0, 0, 1, 1}, { 1, 0, 0, 1}, {1, 1, 0, 0} }, /* 54 */ + { {128, 0, 129, 1}, {1, 0, 0, 1}, { 1, 1, 0, 0}, {0, 1, 1, 0} }, /* 55 */ + { {128, 1, 1, 0}, {1, 1, 0, 0}, { 1, 1, 0, 0}, {1, 0, 0, 129} }, /* 56 */ + { {128, 1, 1, 0}, {0, 0, 1, 1}, { 0, 0, 1, 1}, {1, 0, 0, 129} }, /* 57 */ + { {128, 1, 1, 1}, {1, 1, 1, 0}, { 1, 0, 0, 0}, {0, 0, 0, 129} }, /* 58 */ + { {128, 0, 0, 1}, {1, 0, 0, 0}, { 1, 1, 1, 0}, {0, 1, 1, 129} }, /* 59 */ + { {128, 0, 0, 0}, {1, 1, 1, 1}, { 0, 0, 1, 1}, {0, 0, 1, 129} }, /* 60 */ + { {128, 0, 129, 1}, {0, 0, 1, 1}, { 1, 1, 1, 1}, {0, 0, 0, 0} }, /* 61 */ + { {128, 0, 129, 0}, {0, 0, 1, 0}, { 1, 1, 1, 0}, {1, 1, 1, 0} }, /* 62 */ + { {128, 1, 0, 0}, {0, 1, 0, 0}, { 0, 1, 1, 1}, {0, 1, 1, 129} } /* 63 */ + }, + { /* Partition table for 3-subset BPTC */ + { {128, 0, 1, 129}, {0, 0, 1, 1}, { 0, 2, 2, 1}, { 2, 2, 2, 130} }, /* 0 */ + { {128, 0, 0, 129}, {0, 0, 1, 1}, {130, 2, 1, 1}, { 2, 2, 2, 1} }, /* 1 */ + { {128, 0, 0, 0}, {2, 0, 0, 1}, {130, 2, 1, 1}, { 2, 2, 1, 129} }, /* 2 */ + { {128, 2, 2, 130}, {0, 0, 2, 2}, { 0, 0, 1, 1}, { 0, 1, 1, 129} }, /* 3 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, {129, 1, 2, 2}, { 1, 1, 2, 130} }, /* 4 */ + { {128, 0, 1, 129}, {0, 0, 1, 1}, { 0, 0, 2, 2}, { 0, 0, 2, 130} }, /* 5 */ + { {128, 0, 2, 130}, {0, 0, 2, 2}, { 1, 1, 1, 1}, { 1, 1, 1, 129} }, /* 6 */ + { {128, 0, 1, 1}, {0, 0, 1, 1}, {130, 2, 1, 1}, { 2, 2, 1, 129} }, /* 7 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, {129, 1, 1, 1}, { 2, 2, 2, 130} }, /* 8 */ + { {128, 0, 0, 0}, {1, 1, 1, 1}, {129, 1, 1, 1}, { 2, 2, 2, 130} }, /* 9 */ + { {128, 0, 0, 0}, {1, 1, 129, 1}, { 2, 2, 2, 2}, { 2, 2, 2, 130} }, /* 10 */ + { {128, 0, 1, 2}, {0, 0, 129, 2}, { 0, 0, 1, 2}, { 0, 0, 1, 130} }, /* 11 */ + { {128, 1, 1, 2}, {0, 1, 129, 2}, { 0, 1, 1, 2}, { 0, 1, 1, 130} }, /* 12 */ + { {128, 1, 2, 2}, {0, 129, 2, 2}, { 0, 1, 2, 2}, { 0, 1, 2, 130} }, /* 13 */ + { {128, 0, 1, 129}, {0, 1, 1, 2}, { 1, 1, 2, 2}, { 1, 2, 2, 130} }, /* 14 */ + { {128, 0, 1, 129}, {2, 0, 0, 1}, {130, 2, 0, 0}, { 2, 2, 2, 0} }, /* 15 */ + { {128, 0, 0, 129}, {0, 0, 1, 1}, { 0, 1, 1, 2}, { 1, 1, 2, 130} }, /* 16 */ + { {128, 1, 1, 129}, {0, 0, 1, 1}, {130, 0, 0, 1}, { 2, 2, 0, 0} }, /* 17 */ + { {128, 0, 0, 0}, {1, 1, 2, 2}, {129, 1, 2, 2}, { 1, 1, 2, 130} }, /* 18 */ + { {128, 0, 2, 130}, {0, 0, 2, 2}, { 0, 0, 2, 2}, { 1, 1, 1, 129} }, /* 19 */ + { {128, 1, 1, 129}, {0, 1, 1, 1}, { 0, 2, 2, 2}, { 0, 2, 2, 130} }, /* 20 */ + { {128, 0, 0, 129}, {0, 0, 0, 1}, {130, 2, 2, 1}, { 2, 2, 2, 1} }, /* 21 */ + { {128, 0, 0, 0}, {0, 0, 129, 1}, { 0, 1, 2, 2}, { 0, 1, 2, 130} }, /* 22 */ + { {128, 0, 0, 0}, {1, 1, 0, 0}, {130, 2, 129, 0}, { 2, 2, 1, 0} }, /* 23 */ + { {128, 1, 2, 130}, {0, 129, 2, 2}, { 0, 0, 1, 1}, { 0, 0, 0, 0} }, /* 24 */ + { {128, 0, 1, 2}, {0, 0, 1, 2}, {129, 1, 2, 2}, { 2, 2, 2, 130} }, /* 25 */ + { {128, 1, 1, 0}, {1, 2, 130, 1}, {129, 2, 2, 1}, { 0, 1, 1, 0} }, /* 26 */ + { {128, 0, 0, 0}, {0, 1, 129, 0}, { 1, 2, 130, 1}, { 1, 2, 2, 1} }, /* 27 */ + { {128, 0, 2, 2}, {1, 1, 0, 2}, {129, 1, 0, 2}, { 0, 0, 2, 130} }, /* 28 */ + { {128, 1, 1, 0}, {0, 129, 1, 0}, { 2, 0, 0, 2}, { 2, 2, 2, 130} }, /* 29 */ + { {128, 0, 1, 1}, {0, 1, 2, 2}, { 0, 1, 130, 2}, { 0, 0, 1, 129} }, /* 30 */ + { {128, 0, 0, 0}, {2, 0, 0, 0}, {130, 2, 1, 1}, { 2, 2, 2, 129} }, /* 31 */ + { {128, 0, 0, 0}, {0, 0, 0, 2}, {129, 1, 2, 2}, { 1, 2, 2, 130} }, /* 32 */ + { {128, 2, 2, 130}, {0, 0, 2, 2}, { 0, 0, 1, 2}, { 0, 0, 1, 129} }, /* 33 */ + { {128, 0, 1, 129}, {0, 0, 1, 2}, { 0, 0, 2, 2}, { 0, 2, 2, 130} }, /* 34 */ + { {128, 1, 2, 0}, {0, 129, 2, 0}, { 0, 1, 130, 0}, { 0, 1, 2, 0} }, /* 35 */ + { {128, 0, 0, 0}, {1, 1, 129, 1}, { 2, 2, 130, 2}, { 0, 0, 0, 0} }, /* 36 */ + { {128, 1, 2, 0}, {1, 2, 0, 1}, {130, 0, 129, 2}, { 0, 1, 2, 0} }, /* 37 */ + { {128, 1, 2, 0}, {2, 0, 1, 2}, {129, 130, 0, 1}, { 0, 1, 2, 0} }, /* 38 */ + { {128, 0, 1, 1}, {2, 2, 0, 0}, { 1, 1, 130, 2}, { 0, 0, 1, 129} }, /* 39 */ + { {128, 0, 1, 1}, {1, 1, 130, 2}, { 2, 2, 0, 0}, { 0, 0, 1, 129} }, /* 40 */ + { {128, 1, 0, 129}, {0, 1, 0, 1}, { 2, 2, 2, 2}, { 2, 2, 2, 130} }, /* 41 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, {130, 1, 2, 1}, { 2, 1, 2, 129} }, /* 42 */ + { {128, 0, 2, 2}, {1, 129, 2, 2}, { 0, 0, 2, 2}, { 1, 1, 2, 130} }, /* 43 */ + { {128, 0, 2, 130}, {0, 0, 1, 1}, { 0, 0, 2, 2}, { 0, 0, 1, 129} }, /* 44 */ + { {128, 2, 2, 0}, {1, 2, 130, 1}, { 0, 2, 2, 0}, { 1, 2, 2, 129} }, /* 45 */ + { {128, 1, 0, 1}, {2, 2, 130, 2}, { 2, 2, 2, 2}, { 0, 1, 0, 129} }, /* 46 */ + { {128, 0, 0, 0}, {2, 1, 2, 1}, {130, 1, 2, 1}, { 2, 1, 2, 129} }, /* 47 */ + { {128, 1, 0, 129}, {0, 1, 0, 1}, { 0, 1, 0, 1}, { 2, 2, 2, 130} }, /* 48 */ + { {128, 2, 2, 130}, {0, 1, 1, 1}, { 0, 2, 2, 2}, { 0, 1, 1, 129} }, /* 49 */ + { {128, 0, 0, 2}, {1, 129, 1, 2}, { 0, 0, 0, 2}, { 1, 1, 1, 130} }, /* 50 */ + { {128, 0, 0, 0}, {2, 129, 1, 2}, { 2, 1, 1, 2}, { 2, 1, 1, 130} }, /* 51 */ + { {128, 2, 2, 2}, {0, 129, 1, 1}, { 0, 1, 1, 1}, { 0, 2, 2, 130} }, /* 52 */ + { {128, 0, 0, 2}, {1, 1, 1, 2}, {129, 1, 1, 2}, { 0, 0, 0, 130} }, /* 53 */ + { {128, 1, 1, 0}, {0, 129, 1, 0}, { 0, 1, 1, 0}, { 2, 2, 2, 130} }, /* 54 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, { 2, 1, 129, 2}, { 2, 1, 1, 130} }, /* 55 */ + { {128, 1, 1, 0}, {0, 129, 1, 0}, { 2, 2, 2, 2}, { 2, 2, 2, 130} }, /* 56 */ + { {128, 0, 2, 2}, {0, 0, 1, 1}, { 0, 0, 129, 1}, { 0, 0, 2, 130} }, /* 57 */ + { {128, 0, 2, 2}, {1, 1, 2, 2}, {129, 1, 2, 2}, { 0, 0, 2, 130} }, /* 58 */ + { {128, 0, 0, 0}, {0, 0, 0, 0}, { 0, 0, 0, 0}, { 2, 129, 1, 130} }, /* 59 */ + { {128, 0, 0, 130}, {0, 0, 0, 1}, { 0, 0, 0, 2}, { 0, 0, 0, 129} }, /* 60 */ + { {128, 2, 2, 2}, {1, 2, 2, 2}, { 0, 2, 2, 2}, {129, 2, 2, 130} }, /* 61 */ + { {128, 1, 0, 129}, {2, 2, 2, 2}, { 2, 2, 2, 2}, { 2, 2, 2, 130} }, /* 62 */ + { {128, 1, 1, 129}, {2, 0, 1, 1}, {130, 2, 0, 1}, { 2, 2, 2, 0} } /* 63 */ + } + }; + + static int aWeight2[] = { 0, 21, 43, 64 }; + static int aWeight3[] = { 0, 9, 18, 27, 37, 46, 55, 64 }; + static int aWeight4[] = { 0, 4, 9, 13, 17, 21, 26, 30, 34, 38, 43, 47, 51, 55, 60, 64 }; + + static unsigned char sModeHasPBits = 0b11001011; + + bcdec__bitstream_t bstream; + int mode, partition, numPartitions, numEndpoints, i, j, k, rotation, partitionSet; + int indexSelectionBit, indexBits, indexBits2, index, index2; + int endpoints[6][4]; + char indices[4][4]; + int r, g, b, a; + int* weights, * weights2; + unsigned char* decompressed; + + decompressed = (unsigned char*)decompressedBlock; + + bstream.low = ((unsigned long long*)compressedBlock)[0]; + bstream.high = ((unsigned long long*)compressedBlock)[1]; + + for (mode = 0; mode < 8 && (0 == bcdec__bitstream_read_bit(&bstream)); ++mode); + + /* unexpected mode, clear the block (transparent black) */ + if (mode >= 8) { + for (i = 0; i < 4; ++i) { + for (j = 0; j < 4; ++j) { + decompressed[j * 4 + 0] = 0; + decompressed[j * 4 + 1] = 0; + decompressed[j * 4 + 2] = 0; + decompressed[j * 4 + 3] = 0; + } + decompressed += destinationPitch; + } + + return; + } + + partition = 0; + numPartitions = 1; + rotation = 0; + indexSelectionBit = 0; + + if (mode == 0 || mode == 1 || mode == 2 || mode == 3 || mode == 7) { + numPartitions = (mode == 0 || mode == 2) ? 3 : 2; + partition = bcdec__bitstream_read_bits(&bstream, (mode == 0) ? 4 : 6); + } + + numEndpoints = numPartitions * 2; + + if (mode == 4 || mode == 5) { + rotation = bcdec__bitstream_read_bits(&bstream, 2); + + if (mode == 4) { + indexSelectionBit = bcdec__bitstream_read_bit(&bstream); + } + } + + /* Extract endpoints */ + /* RGB */ + for (i = 0; i < 3; ++i) { + for (j = 0; j < numEndpoints; ++j) { + endpoints[j][i] = bcdec__bitstream_read_bits(&bstream, actual_bits_count[0][mode]); + } + } + /* Alpha (if any) */ + if (actual_bits_count[1][mode] > 0) { + for (j = 0; j < numEndpoints; ++j) { + endpoints[j][3] = bcdec__bitstream_read_bits(&bstream, actual_bits_count[1][mode]); + } + } + + /* Fully decode endpoints */ + /* First handle modes that have P-bits */ + if (mode == 0 || mode == 1 || mode == 3 || mode == 6 || mode == 7) { + for (i = 0; i < numEndpoints; ++i) { + /* component-wise left-shift */ + for (j = 0; j < 4; ++j) { + endpoints[i][j] <<= 1; + } + } + + /* if P-bit is shared */ + if (mode == 1) { + i = bcdec__bitstream_read_bit(&bstream); + j = bcdec__bitstream_read_bit(&bstream); + + /* rgb component-wise insert pbits */ + for (k = 0; k < 3; ++k) { + endpoints[0][k] |= i; + endpoints[1][k] |= i; + endpoints[2][k] |= j; + endpoints[3][k] |= j; + } + } else if (sModeHasPBits & (1 << mode)) { + /* unique P-bit per endpoint */ + for (i = 0; i < numEndpoints; ++i) { + j = bcdec__bitstream_read_bit(&bstream); + for (k = 0; k < 4; ++k) { + endpoints[i][k] |= j; + } + } + } + } + + for (i = 0; i < numEndpoints; ++i) { + /* get color components precision including pbit */ + j = actual_bits_count[0][mode] + ((sModeHasPBits >> mode) & 1); + + for (k = 0; k < 3; ++k) { + /* left shift endpoint components so that their MSB lies in bit 7 */ + endpoints[i][k] = endpoints[i][k] << (8 - j); + /* Replicate each component's MSB into the LSBs revealed by the left-shift operation above */ + endpoints[i][k] = endpoints[i][k] | (endpoints[i][k] >> j); + } + + /* get alpha component precision including pbit */ + j = actual_bits_count[1][mode] + ((sModeHasPBits >> mode) & 1); + + /* left shift endpoint components so that their MSB lies in bit 7 */ + endpoints[i][3] = endpoints[i][3] << (8 - j); + /* Replicate each component's MSB into the LSBs revealed by the left-shift operation above */ + endpoints[i][3] = endpoints[i][3] | (endpoints[i][3] >> j); + } + + /* If this mode does not explicitly define the alpha component */ + /* set alpha equal to 1.0 */ + if (!actual_bits_count[1][mode]) { + for (j = 0; j < numEndpoints; ++j) { + endpoints[j][3] = 0xFF; + } + } + + /* Determine weights tables */ + indexBits = (mode == 0 || mode == 1) ? 3 : ((mode == 6) ? 4 : 2); + indexBits2 = (mode == 4) ? 3 : ((mode == 5) ? 2 : 0); + weights = (indexBits == 2) ? aWeight2 : ((indexBits == 3) ? aWeight3 : aWeight4); + weights2 = (indexBits2 == 2) ? aWeight2 : aWeight3; + + /* Quite inconvenient that indices aren't interleaved so we have to make 2 passes here */ + /* Pass #1: collecting color indices */ + for (i = 0; i < 4; ++i) { + for (j = 0; j < 4; ++j) { + partitionSet = (numPartitions == 1) ? ((i | j) ? 0 : 128) : partition_sets[numPartitions - 2][partition][i][j]; + + indexBits = (mode == 0 || mode == 1) ? 3 : ((mode == 6) ? 4 : 2); + /* fix-up index is specified with one less bit */ + /* The fix-up index for subset 0 is always index 0 */ + if (partitionSet & 0x80) { + indexBits--; + } + + indices[i][j] = bcdec__bitstream_read_bits(&bstream, indexBits); + } + } + + /* Pass #2: reading alpha indices (if any) and interpolating & rotating */ + for (i = 0; i < 4; ++i) { + for (j = 0; j < 4; ++j) { + partitionSet = (numPartitions == 1) ? ((i|j) ? 0 : 128) : partition_sets[numPartitions - 2][partition][i][j]; + partitionSet &= 0x03; + + index = indices[i][j]; + + if (!indexBits2) { + r = bcdec__interpolate(endpoints[partitionSet * 2][0], endpoints[partitionSet * 2 + 1][0], weights, index); + g = bcdec__interpolate(endpoints[partitionSet * 2][1], endpoints[partitionSet * 2 + 1][1], weights, index); + b = bcdec__interpolate(endpoints[partitionSet * 2][2], endpoints[partitionSet * 2 + 1][2], weights, index); + a = bcdec__interpolate(endpoints[partitionSet * 2][3], endpoints[partitionSet * 2 + 1][3], weights, index); + } else { + index2 = bcdec__bitstream_read_bits(&bstream, (i|j) ? indexBits2 : (indexBits2 - 1)); + /* The index value for interpolating color comes from the secondary index bits for the texel + if the mode has an index selection bit and its value is one, and from the primary index bits otherwise. + The alpha index comes from the secondary index bits if the block has a secondary index and + the block either doesn’t have an index selection bit or that bit is zero, and from the primary index bits otherwise. */ + if (!indexSelectionBit) { + r = bcdec__interpolate(endpoints[partitionSet * 2][0], endpoints[partitionSet * 2 + 1][0], weights, index); + g = bcdec__interpolate(endpoints[partitionSet * 2][1], endpoints[partitionSet * 2 + 1][1], weights, index); + b = bcdec__interpolate(endpoints[partitionSet * 2][2], endpoints[partitionSet * 2 + 1][2], weights, index); + a = bcdec__interpolate(endpoints[partitionSet * 2][3], endpoints[partitionSet * 2 + 1][3], weights2, index2); + } else { + r = bcdec__interpolate(endpoints[partitionSet * 2][0], endpoints[partitionSet * 2 + 1][0], weights2, index2); + g = bcdec__interpolate(endpoints[partitionSet * 2][1], endpoints[partitionSet * 2 + 1][1], weights2, index2); + b = bcdec__interpolate(endpoints[partitionSet * 2][2], endpoints[partitionSet * 2 + 1][2], weights2, index2); + a = bcdec__interpolate(endpoints[partitionSet * 2][3], endpoints[partitionSet * 2 + 1][3], weights, index); + } + } + + switch (rotation) { + case 1: { /* 01 – Block format is Scalar(R) Vector(AGB) - swap A and R */ + bcdec__swap_values(&a, &r); + } break; + case 2: { /* 10 – Block format is Scalar(G) Vector(RAB) - swap A and G */ + bcdec__swap_values(&a, &g); + } break; + case 3: { /* 11 - Block format is Scalar(B) Vector(RGA) - swap A and B */ + bcdec__swap_values(&a, &b); + } break; + } + + decompressed[j * 4 + 0] = r; + decompressed[j * 4 + 1] = g; + decompressed[j * 4 + 2] = b; + decompressed[j * 4 + 3] = a; + } + + decompressed += destinationPitch; + } +} + +#endif /* BCDEC_IMPLEMENTATION */ + +/* LICENSE: + +This software is available under 2 licenses -- choose whichever you prefer. + +------------------------------------------------------------------------------ +ALTERNATIVE A - MIT License + +Copyright (c) 2022 Sergii Kudlai + +Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the "Software"), to deal in +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do +so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. + +------------------------------------------------------------------------------ +ALTERNATIVE B - The Unlicense + +This is free and unencumbered software released into the public domain. + +Anyone is free to copy, modify, publish, use, compile, sell, or +distribute this software, either in source code form or as a compiled +binary, for any purpose, commercial or non-commercial, and by any +means. + +In jurisdictions that recognize copyright laws, the author or authors +of this software dedicate any and all copyright interest in the +software to the public domain. We make this dedication for the benefit +of the public at large and to the detriment of our heirs and +successors. We intend this dedication to be an overt act of +relinquishment in perpetuity of all present and future rights to this +software under copyright law. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR +OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, +ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +OTHER DEALINGS IN THE SOFTWARE. + +For more information, please refer to https://unlicense.org + +*/ diff --git a/dlls/d3dx9_36/surface.c b/dlls/d3dx9_36/surface.c index 7ead6ae3e78..70ee7bdbd01 100644 --- a/dlls/d3dx9_36/surface.c +++ b/dlls/d3dx9_36/surface.c @@ -25,6 +25,8 @@ #include "ole2.h" #include "wincodec.h"

+#define BCDEC_IMPLEMENTATION +#include "bcdec.h" #include "txc_dxtn.h" #include <assert.h>

@@ -2697,32 +2699,100 @@ static void point_filter_argb_pixels(const BYTE *src, UINT src_row_pitch, UINT s } }

+struct d3dx_bcn_decompression_context +{ + void (*decompress_bcn_block)(const void *, void *, int); + const struct pixel_format_desc *compressed_format_desc; + struct d3dx_pixels *compressed_pixels; + + const struct pixel_format_desc *decompressed_format_desc; + uint8_t cur_block_decompressed[64]; + uint32_t cur_block_row_pitch; + int cur_block_x; + int cur_block_y; + int cur_block_z; +}; + +static void d3dx_init_bcn_decompression_context(struct d3dx_bcn_decompression_context *context, + struct d3dx_pixels *pixels, const struct pixel_format_desc *desc, const struct pixel_format_desc *dst_desc) +{ + memset(context, 0, sizeof(*context)); + switch (desc->format) + { + case D3DX_PIXEL_FORMAT_DXT1_UNORM: + context->decompress_bcn_block = bcdec_bc1; + break; + + case D3DX_PIXEL_FORMAT_DXT2_UNORM: + case D3DX_PIXEL_FORMAT_DXT3_UNORM: + context->decompress_bcn_block = bcdec_bc2; + break; + + case D3DX_PIXEL_FORMAT_DXT4_UNORM: + case D3DX_PIXEL_FORMAT_DXT5_UNORM: + context->decompress_bcn_block = bcdec_bc3; + break; + + default: + assert(0); + break; + } + + context->compressed_format_desc = desc; + context->compressed_pixels = pixels; + + context->decompressed_format_desc = dst_desc; + context->cur_block_row_pitch = dst_desc->bytes_per_pixel * desc->block_width; + context->cur_block_x = context->cur_block_y = context->cur_block_z = -1; +} + +static void d3dx_fetch_bcn_texel(struct d3dx_bcn_decompression_context *context, int x, int y, int z, void *texel) +{ + const struct pixel_format_desc *decomp_fmt_desc = context->decompressed_format_desc; + const struct pixel_format_desc *comp_fmt_desc = context->compressed_format_desc; + const int y_aligned = (y & ~(comp_fmt_desc->block_height - 1)); + const int x_aligned = (x & ~(comp_fmt_desc->block_width - 1)); + uint32_t pixel_offset; + + /* Check if we can't re-use the currently decompressed block. */ + if (z != context->cur_block_z || (x_aligned != context->cur_block_x) || (y_aligned != context->cur_block_y)) + { + const BYTE *block_ptr = context->compressed_pixels->data; + + block_ptr += z * context->compressed_pixels->slice_pitch; + block_ptr += (y / comp_fmt_desc->block_height) * context->compressed_pixels->row_pitch; + block_ptr += (x / comp_fmt_desc->block_width) * comp_fmt_desc->block_byte_count; + context->decompress_bcn_block(block_ptr, context->cur_block_decompressed, context->cur_block_row_pitch); + context->cur_block_x = x_aligned; + context->cur_block_y = y_aligned; + context->cur_block_z = z; + } + + pixel_offset = (y & (comp_fmt_desc->block_height - 1)) * context->cur_block_row_pitch; + pixel_offset += (x & (comp_fmt_desc->block_width - 1)) * decomp_fmt_desc->bytes_per_pixel; + memcpy(texel, context->cur_block_decompressed + pixel_offset, decomp_fmt_desc->bytes_per_pixel); +} + static HRESULT d3dx_pixels_decompress(struct d3dx_pixels *pixels, const struct pixel_format_desc *desc, BOOL is_dst, void **out_memory, uint32_t *out_row_pitch, uint32_t *out_slice_pitch, const struct pixel_format_desc **out_desc) { - void (*fetch_dxt_texel)(int srcRowStride, const BYTE *pixdata, int i, int j, void *texel); - uint32_t x, y, z, tmp_pitch, uncompressed_slice_pitch, uncompressed_row_pitch; + uint32_t x, y, z, uncompressed_slice_pitch, uncompressed_row_pitch; const struct pixel_format_desc *uncompressed_desc = NULL; + struct d3dx_bcn_decompression_context context; const struct volume *size = &pixels->size; BYTE *uncompressed_mem;

switch (desc->format) { case D3DX_PIXEL_FORMAT_DXT1_UNORM: - uncompressed_desc = get_format_info(D3DFMT_A8B8G8R8); - fetch_dxt_texel = fetch_2d_texel_rgba_dxt1; - break; case D3DX_PIXEL_FORMAT_DXT2_UNORM: case D3DX_PIXEL_FORMAT_DXT3_UNORM: - uncompressed_desc = get_format_info(D3DFMT_A8B8G8R8); - fetch_dxt_texel = fetch_2d_texel_rgba_dxt3; - break; case D3DX_PIXEL_FORMAT_DXT4_UNORM: case D3DX_PIXEL_FORMAT_DXT5_UNORM: - uncompressed_desc = get_format_info(D3DFMT_A8B8G8R8); - fetch_dxt_texel = fetch_2d_texel_rgba_dxt5; + uncompressed_desc = get_d3dx_pixel_format_info(D3DX_PIXEL_FORMAT_R8G8B8A8_UNORM); break; + default: FIXME("Unexpected compressed texture format %u.\n", desc->format); return E_NOTIMPL; @@ -2752,11 +2822,9 @@ static HRESULT d3dx_pixels_decompress(struct d3dx_pixels *pixels, const struct p }

TRACE("Decompressing pixels.\n"); - tmp_pitch = pixels->row_pitch * desc->block_width / desc->block_byte_count; + d3dx_init_bcn_decompression_context(&context, pixels, desc, uncompressed_desc); for (z = 0; z < size->depth; ++z) { - const BYTE *slice_data = ((BYTE *)pixels->data) + (pixels->slice_pitch * z); - for (y = 0; y < size->height; ++y) { BYTE *ptr = &uncompressed_mem[(z * uncompressed_slice_pitch) + (y * uncompressed_row_pitch)]; @@ -2765,10 +2833,9 @@ static HRESULT d3dx_pixels_decompress(struct d3dx_pixels *pixels, const struct p const POINT pt = { x, y };

if (!is_dst) - fetch_dxt_texel(tmp_pitch, slice_data, x + pixels->unaligned_rect.left, - y + pixels->unaligned_rect.top, ptr); + d3dx_fetch_bcn_texel(&context, x + pixels->unaligned_rect.left, y + pixels->unaligned_rect.top, z, ptr); else if (!PtInRect(&pixels->unaligned_rect, pt)) - fetch_dxt_texel(tmp_pitch, slice_data, x, y, ptr); + d3dx_fetch_bcn_texel(&context, x, y, z, ptr); ptr += uncompressed_desc->bytes_per_pixel; } } diff --git a/dlls/d3dx9_36/txc_dxtn.h b/dlls/d3dx9_36/txc_dxtn.h index 3c1e3d46c28..9bd59037a4b 100644 --- a/dlls/d3dx9_36/txc_dxtn.h +++ b/dlls/d3dx9_36/txc_dxtn.h @@ -36,15 +36,6 @@ typedef GLubyte GLchan; #define BCOMP 2 #define ACOMP 3

-void fetch_2d_texel_rgb_dxt1(GLint srcRowStride, const GLubyte *pixdata, - GLint i, GLint j, GLvoid *texel); -void fetch_2d_texel_rgba_dxt1(GLint srcRowStride, const GLubyte *pixdata, - GLint i, GLint j, GLvoid *texel); -void fetch_2d_texel_rgba_dxt3(GLint srcRowStride, const GLubyte *pixdata, - GLint i, GLint j, GLvoid *texel); -void fetch_2d_texel_rgba_dxt5(GLint srcRowStride, const GLubyte *pixdata, - GLint i, GLint j, GLvoid *texel); - void tx_compress_dxtn(GLint srccomps, GLint width, GLint height, const GLubyte *srcPixData, GLenum destformat, GLubyte *dest, GLint dstRowStride); diff --git a/dlls/d3dx9_36/txc_fetch_dxtn.c b/dlls/d3dx9_36/txc_fetch_dxtn.c deleted file mode 100644 index 7f0db56a155..00000000000 --- a/dlls/d3dx9_36/txc_fetch_dxtn.c +++ /dev/null @@ -1,243 +0,0 @@ -/* - * libtxc_dxtn - * Version: 1.0 - * - * Copyright (C) 2004 Roland Scheidegger All Rights Reserved. - * - * Permission is hereby granted, free of charge, to any person obtaining a - * copy of this software and associated documentation files (the "Software"), - * to deal in the Software without restriction, including without limitation - * the rights to use, copy, modify, merge, publish, distribute, sublicense, - * and/or sell copies of the Software, and to permit persons to whom the - * Software is furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included - * in all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS - * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL - * BRIAN PAUL BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN - * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. - */ - -#include <stdio.h> -#include "txc_dxtn.h" - -#define EXP5TO8R(packedcol) \ - ((((packedcol) >> 8) & 0xf8) | (((packedcol) >> 13) & 0x7)) - -#define EXP6TO8G(packedcol) \ - ((((packedcol) >> 3) & 0xfc) | (((packedcol) >> 9) & 0x3)) - -#define EXP5TO8B(packedcol) \ - ((((packedcol) << 3) & 0xf8) | (((packedcol) >> 2) & 0x7)) - -#define EXP4TO8(col) \ - ((col) | ((col) << 4)) - -/* inefficient. To be efficient, it would be necessary to decode 16 pixels at once */ - -static void dxt135_decode_imageblock ( const GLubyte *img_block_src, - GLint i, GLint j, GLuint dxt_type, GLvoid *texel ) { - GLchan *rgba = (GLchan *) texel; - const GLushort color0 = img_block_src[0] | (img_block_src[1] << 8); - const GLushort color1 = img_block_src[2] | (img_block_src[3] << 8); - const GLuint bits = img_block_src[4] | (img_block_src[5] << 8) | - (img_block_src[6] << 16) | (img_block_src[7] << 24); - /* What about big/little endian? */ - GLubyte bit_pos = 2 * (j * 4 + i) ; - GLubyte code = (GLubyte) ((bits >> bit_pos) & 3); - - rgba[ACOMP] = CHAN_MAX; - switch (code) { - case 0: - rgba[RCOMP] = UBYTE_TO_CHAN( EXP5TO8R(color0) ); - rgba[GCOMP] = UBYTE_TO_CHAN( EXP6TO8G(color0) ); - rgba[BCOMP] = UBYTE_TO_CHAN( EXP5TO8B(color0) ); - break; - case 1: - rgba[RCOMP] = UBYTE_TO_CHAN( EXP5TO8R(color1) ); - rgba[GCOMP] = UBYTE_TO_CHAN( EXP6TO8G(color1) ); - rgba[BCOMP] = UBYTE_TO_CHAN( EXP5TO8B(color1) ); - break; - case 2: - if ((dxt_type > 1) || (color0 > color1)) { - rgba[RCOMP] = UBYTE_TO_CHAN( ((EXP5TO8R(color0) * 2 + EXP5TO8R(color1)) / 3) ); - rgba[GCOMP] = UBYTE_TO_CHAN( ((EXP6TO8G(color0) * 2 + EXP6TO8G(color1)) / 3) ); - rgba[BCOMP] = UBYTE_TO_CHAN( ((EXP5TO8B(color0) * 2 + EXP5TO8B(color1)) / 3) ); - } - else { - rgba[RCOMP] = UBYTE_TO_CHAN( ((EXP5TO8R(color0) + EXP5TO8R(color1)) / 2) ); - rgba[GCOMP] = UBYTE_TO_CHAN( ((EXP6TO8G(color0) + EXP6TO8G(color1)) / 2) ); - rgba[BCOMP] = UBYTE_TO_CHAN( ((EXP5TO8B(color0) + EXP5TO8B(color1)) / 2) ); - } - break; - case 3: - if ((dxt_type > 1) || (color0 > color1)) { - rgba[RCOMP] = UBYTE_TO_CHAN( ((EXP5TO8R(color0) + EXP5TO8R(color1) * 2) / 3) ); - rgba[GCOMP] = UBYTE_TO_CHAN( ((EXP6TO8G(color0) + EXP6TO8G(color1) * 2) / 3) ); - rgba[BCOMP] = UBYTE_TO_CHAN( ((EXP5TO8B(color0) + EXP5TO8B(color1) * 2) / 3) ); - } - else { - rgba[RCOMP] = 0; - rgba[GCOMP] = 0; - rgba[BCOMP] = 0; - if (dxt_type == 1) rgba[ACOMP] = UBYTE_TO_CHAN(0); - } - break; - default: - /* CANNOT happen (I hope) */ - break; - } -} - - -void fetch_2d_texel_rgb_dxt1(GLint srcRowStride, const GLubyte *pixdata, - GLint i, GLint j, GLvoid *texel) -{ - /* Extract the (i,j) pixel from pixdata and return it - * in texel[RCOMP], texel[GCOMP], texel[BCOMP], texel[ACOMP]. - */ - - const GLubyte *blksrc = (pixdata + ((srcRowStride + 3) / 4 * (j / 4) + (i / 4)) * 8); - dxt135_decode_imageblock(blksrc, (i&3), (j&3), 0, texel); -} - - -void fetch_2d_texel_rgba_dxt1(GLint srcRowStride, const GLubyte *pixdata, - GLint i, GLint j, GLvoid *texel) -{ - /* Extract the (i,j) pixel from pixdata and return it - * in texel[RCOMP], texel[GCOMP], texel[BCOMP], texel[ACOMP]. - */ - - const GLubyte *blksrc = (pixdata + ((srcRowStride + 3) / 4 * (j / 4) + (i / 4)) * 8); - dxt135_decode_imageblock(blksrc, (i&3), (j&3), 1, texel); -} - -void fetch_2d_texel_rgba_dxt3(GLint srcRowStride, const GLubyte *pixdata, - GLint i, GLint j, GLvoid *texel) { - - /* Extract the (i,j) pixel from pixdata and return it - * in texel[RCOMP], texel[GCOMP], texel[BCOMP], texel[ACOMP]. - */ - - GLchan *rgba = (GLchan *) texel; - const GLubyte *blksrc = (pixdata + ((srcRowStride + 3) / 4 * (j / 4) + (i / 4)) * 16); -#if 0 - /* Simple 32bit version. */ -/* that's pretty brain-dead for a single pixel, isn't it? */ - const GLubyte bit_pos = 4 * ((j&3) * 4 + (i&3)); - const GLuint alpha_low = blksrc[0] | (blksrc[1] << 8) | (blksrc[2] << 16) | (blksrc[3] << 24); - const GLuint alpha_high = blksrc[4] | (blksrc[5] << 8) | (blksrc[6] << 16) | (blksrc[7] << 24); - - dxt135_decode_imageblock(blksrc + 8, (i&3), (j&3), 2, texel); - if (bit_pos < 32) - rgba[ACOMP] = UBYTE_TO_CHAN( (GLubyte)(EXP4TO8((alpha_low >> bit_pos) & 15)) ); - else - rgba[ACOMP] = UBYTE_TO_CHAN( (GLubyte)(EXP4TO8((alpha_high >> (bit_pos - 32)) & 15)) ); -#endif -#if 1 -/* TODO test this! */ - const GLubyte anibble = (blksrc[((j&3) * 4 + (i&3)) / 2] >> (4 * (i&1))) & 0xf; - dxt135_decode_imageblock(blksrc + 8, (i&3), (j&3), 2, texel); - rgba[ACOMP] = UBYTE_TO_CHAN( (GLubyte)(EXP4TO8(anibble)) ); -#endif - -} - -void fetch_2d_texel_rgba_dxt5(GLint srcRowStride, const GLubyte *pixdata, - GLint i, GLint j, GLvoid *texel) { - - /* Extract the (i,j) pixel from pixdata and return it - * in texel[RCOMP], texel[GCOMP], texel[BCOMP], texel[ACOMP]. - */ - - GLchan *rgba = (GLchan *) texel; - const GLubyte *blksrc = (pixdata + ((srcRowStride + 3) / 4 * (j / 4) + (i / 4)) * 16); - const GLubyte alpha0 = blksrc[0]; - const GLubyte alpha1 = blksrc[1]; -#if 0 - const GLubyte bit_pos = 3 * ((j&3) * 4 + (i&3)); - /* simple 32bit version */ - const GLuint bits_low = blksrc[2] | (blksrc[3] << 8) | (blksrc[4] << 16) | (blksrc[5] << 24); - const GLuint bits_high = blksrc[6] | (blksrc[7] << 8); - GLubyte code; - - if (bit_pos < 30) - code = (GLubyte) ((bits_low >> bit_pos) & 7); - else if (bit_pos == 30) - code = (GLubyte) ((bits_low >> 30) & 3) | ((bits_high << 2) & 4); - else - code = (GLubyte) ((bits_high >> (bit_pos - 32)) & 7); -#endif -#if 1 -/* TODO test this! */ - const GLubyte bit_pos = ((j&3) * 4 + (i&3)) * 3; - const GLubyte acodelow = blksrc[2 + bit_pos / 8]; - const GLubyte acodehigh = blksrc[3 + bit_pos / 8]; - const GLubyte code = (acodelow >> (bit_pos & 0x7) | - (acodehigh << (8 - (bit_pos & 0x7)))) & 0x7; -#endif - dxt135_decode_imageblock(blksrc + 8, (i&3), (j&3), 2, texel); -#if 0 - if (alpha0 > alpha1) { - switch (code) { - case 0: - rgba[ACOMP] = UBYTE_TO_CHAN( alpha0 ); - break; - case 1: - rgba[ACOMP] = UBYTE_TO_CHAN( alpha1 ); - break; - case 2: - case 3: - case 4: - case 5: - case 6: - case 7: - rgba[ACOMP] = UBYTE_TO_CHAN( ((alpha0 * (8 - code) + (alpha1 * (code - 1))) / 7) ); - break; - } - } - else { - switch (code) { - case 0: - rgba[ACOMP] = UBYTE_TO_CHAN( alpha0 ); - break; - case 1: - rgba[ACOMP] = UBYTE_TO_CHAN( alpha1 ); - break; - case 2: - case 3: - case 4: - case 5: - rgba[ACOMP] = UBYTE_TO_CHAN( ((alpha0 * (6 - code) + (alpha1 * (code - 1))) / 5) ); - break; - case 6: - rgba[ACOMP] = 0; - break; - case 7: - rgba[ACOMP] = CHAN_MAX; - break; - } - } -#endif -/* not sure. Which version is faster? */ -#if 1 -/* TODO test this */ - if (code == 0) - rgba[ACOMP] = UBYTE_TO_CHAN( alpha0 ); - else if (code == 1) - rgba[ACOMP] = UBYTE_TO_CHAN( alpha1 ); - else if (alpha0 > alpha1) - rgba[ACOMP] = UBYTE_TO_CHAN( ((alpha0 * (8 - code) + (alpha1 * (code - 1))) / 7) ); - else if (code < 6) - rgba[ACOMP] = UBYTE_TO_CHAN( ((alpha0 * (6 - code) + (alpha1 * (code - 1))) / 5) ); - else if (code == 6) - rgba[ACOMP] = 0; - else - rgba[ACOMP] = CHAN_MAX; -#endif -} diff --git a/dlls/d3dx9_37/Makefile.in b/dlls/d3dx9_37/Makefile.in index b61c8ceddf5..88bc6e43589 100644 --- a/dlls/d3dx9_37/Makefile.in +++ b/dlls/d3dx9_37/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_38/Makefile.in b/dlls/d3dx9_38/Makefile.in index 4d9a11580bc..56cc69cf193 100644 --- a/dlls/d3dx9_38/Makefile.in +++ b/dlls/d3dx9_38/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_39/Makefile.in b/dlls/d3dx9_39/Makefile.in index 6ff59e7b98e..b0b539fccf7 100644 --- a/dlls/d3dx9_39/Makefile.in +++ b/dlls/d3dx9_39/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_40/Makefile.in b/dlls/d3dx9_40/Makefile.in index ae63a814403..0c23fced19e 100644 --- a/dlls/d3dx9_40/Makefile.in +++ b/dlls/d3dx9_40/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_41/Makefile.in b/dlls/d3dx9_41/Makefile.in index 9fdc2884e1b..994f8e2e385 100644 --- a/dlls/d3dx9_41/Makefile.in +++ b/dlls/d3dx9_41/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_42/Makefile.in b/dlls/d3dx9_42/Makefile.in index 0e2bf79aeff..a96721a35b0 100644 --- a/dlls/d3dx9_42/Makefile.in +++ b/dlls/d3dx9_42/Makefile.in @@ -24,7 +24,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_43/Makefile.in b/dlls/d3dx9_43/Makefile.in index 0ba2e5e8fe0..27122d33617 100644 --- a/dlls/d3dx9_43/Makefile.in +++ b/dlls/d3dx9_43/Makefile.in @@ -24,7 +24,6 @@ SOURCES = \ surface.c \ texture.c \ txc_compress_dxtn.c \ - txc_fetch_dxtn.c \ util.c \ version.rc \ volume.c \

-- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/8226

Connor McAdams

1:18 p.m.

New subject: [PATCH 2/2] d3dx9: Replace txc_compress_dxtn with stb_dxt.

From: Connor McAdams cmcadams@codeweavers.com

diff --git a/dlls/d3dx9_24/Makefile.in b/dlls/d3dx9_24/Makefile.in index 000bf47af30..e5329f14ab0 100644 --- a/dlls/d3dx9_24/Makefile.in +++ b/dlls/d3dx9_24/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_25/Makefile.in b/dlls/d3dx9_25/Makefile.in index 6796f90b1c4..5ee141cddca 100644 --- a/dlls/d3dx9_25/Makefile.in +++ b/dlls/d3dx9_25/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_26/Makefile.in b/dlls/d3dx9_26/Makefile.in index cb6ff05ccbb..5ca36cfbd55 100644 --- a/dlls/d3dx9_26/Makefile.in +++ b/dlls/d3dx9_26/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_27/Makefile.in b/dlls/d3dx9_27/Makefile.in index 339bccaae96..09172cf7e94 100644 --- a/dlls/d3dx9_27/Makefile.in +++ b/dlls/d3dx9_27/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_28/Makefile.in b/dlls/d3dx9_28/Makefile.in index d25dc56a7e7..3cd59749900 100644 --- a/dlls/d3dx9_28/Makefile.in +++ b/dlls/d3dx9_28/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_29/Makefile.in b/dlls/d3dx9_29/Makefile.in index fe06e9b4a31..c7a5b1aacc9 100644 --- a/dlls/d3dx9_29/Makefile.in +++ b/dlls/d3dx9_29/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_30/Makefile.in b/dlls/d3dx9_30/Makefile.in index 882c2355521..2c618fd1113 100644 --- a/dlls/d3dx9_30/Makefile.in +++ b/dlls/d3dx9_30/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_31/Makefile.in b/dlls/d3dx9_31/Makefile.in index a48b829a26c..7f41c5cf369 100644 --- a/dlls/d3dx9_31/Makefile.in +++ b/dlls/d3dx9_31/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_32/Makefile.in b/dlls/d3dx9_32/Makefile.in index e7b902765d4..2bdc91c4c0f 100644 --- a/dlls/d3dx9_32/Makefile.in +++ b/dlls/d3dx9_32/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_33/Makefile.in b/dlls/d3dx9_33/Makefile.in index 34f63cf1819..9b4e7671c21 100644 --- a/dlls/d3dx9_33/Makefile.in +++ b/dlls/d3dx9_33/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_34/Makefile.in b/dlls/d3dx9_34/Makefile.in index 23faf212b25..446454e6bae 100644 --- a/dlls/d3dx9_34/Makefile.in +++ b/dlls/d3dx9_34/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_35/Makefile.in b/dlls/d3dx9_35/Makefile.in index e7c6cb997b7..bc90efda4ef 100644 --- a/dlls/d3dx9_35/Makefile.in +++ b/dlls/d3dx9_35/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_36/Makefile.in b/dlls/d3dx9_36/Makefile.in index e21a80122ae..a0100daf666 100644 --- a/dlls/d3dx9_36/Makefile.in +++ b/dlls/d3dx9_36/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_36/stb_dxt.h b/dlls/d3dx9_36/stb_dxt.h new file mode 100644 index 00000000000..6150a87f08d --- /dev/null +++ b/dlls/d3dx9_36/stb_dxt.h @@ -0,0 +1,719 @@ +// stb_dxt.h - v1.12 - DXT1/DXT5 compressor - public domain +// original by fabian "ryg" giesen - ported to C by stb +// use '#define STB_DXT_IMPLEMENTATION' before including to create the implementation +// +// USAGE: +// call stb_compress_dxt_block() for every block (you must pad) +// source should be a 4x4 block of RGBA data in row-major order; +// Alpha channel is not stored if you specify alpha=0 (but you +// must supply some constant alpha in the alpha channel). +// You can turn on dithering and "high quality" using mode. +// +// version history: +// v1.12 - (ryg) fix bug in single-color table generator +// v1.11 - (ryg) avoid racy global init, better single-color tables, remove dither +// v1.10 - (i.c) various small quality improvements +// v1.09 - (stb) update documentation re: surprising alpha channel requirement +// v1.08 - (stb) fix bug in dxt-with-alpha block +// v1.07 - (stb) bc4; allow not using libc; add STB_DXT_STATIC +// v1.06 - (stb) fix to known-broken 1.05 +// v1.05 - (stb) support bc5/3dc (Arvids Kokins), use extern "C" in C++ (Pavel Krajcevski) +// v1.04 - (ryg) default to no rounding bias for lerped colors (as per S3TC/DX10 spec); +// single color match fix (allow for inexact color interpolation); +// optimal DXT5 index finder; "high quality" mode that runs multiple refinement steps. +// v1.03 - (stb) endianness support +// v1.02 - (stb) fix alpha encoding bug +// v1.01 - (stb) fix bug converting to RGB that messed up quality, thanks ryg & cbloom +// v1.00 - (stb) first release +// +// contributors: +// Rich Geldreich (more accurate index selection) +// Kevin Schmidt (#defines for "freestanding" compilation) +// github:ppiastucki (BC4 support) +// Ignacio Castano - improve DXT endpoint quantization +// Alan Hickman - static table initialization +// +// LICENSE +// +// See end of file for license information. + +#ifndef STB_INCLUDE_STB_DXT_H +#define STB_INCLUDE_STB_DXT_H + +#ifdef __cplusplus +extern "C" { +#endif + +#ifdef STB_DXT_STATIC +#define STBDDEF static +#else +#define STBDDEF extern +#endif + +// compression mode (bitflags) +#define STB_DXT_NORMAL 0 +#define STB_DXT_DITHER 1 // use dithering. was always dubious, now deprecated. does nothing! +#define STB_DXT_HIGHQUAL 2 // high quality mode, does two refinement steps instead of 1. ~30-40% slower. + +STBDDEF void stb_compress_dxt_block(unsigned char *dest, const unsigned char *src_rgba_four_bytes_per_pixel, int alpha, int mode); +STBDDEF void stb_compress_bc4_block(unsigned char *dest, const unsigned char *src_r_one_byte_per_pixel); +STBDDEF void stb_compress_bc5_block(unsigned char *dest, const unsigned char *src_rg_two_byte_per_pixel); + +#define STB_COMPRESS_DXT_BLOCK + +#ifdef __cplusplus +} +#endif +#endif // STB_INCLUDE_STB_DXT_H + +#ifdef STB_DXT_IMPLEMENTATION + +// configuration options for DXT encoder. set them in the project/makefile or just define +// them at the top. + +// STB_DXT_USE_ROUNDING_BIAS +// use a rounding bias during color interpolation. this is closer to what "ideal" +// interpolation would do but doesn't match the S3TC/DX10 spec. old versions (pre-1.03) +// implicitly had this turned on. +// +// in case you're targeting a specific type of hardware (e.g. console programmers): +// NVidia and Intel GPUs (as of 2010) as well as DX9 ref use DXT decoders that are closer +// to STB_DXT_USE_ROUNDING_BIAS. AMD/ATI, S3 and DX10 ref are closer to rounding with no bias. +// you also see "(a*5 + b*3) / 8" on some old GPU designs. +// #define STB_DXT_USE_ROUNDING_BIAS + +#include <stdlib.h> + +#if !defined(STBD_FABS) +#include <math.h> +#endif + +#ifndef STBD_FABS +#define STBD_FABS(x) fabs(x) +#endif + +static const unsigned char stb__OMatch5[256][2] = { + { 0, 0 }, { 0, 0 }, { 0, 1 }, { 0, 1 }, { 1, 0 }, { 1, 0 }, { 1, 0 }, { 1, 1 }, + { 1, 1 }, { 1, 1 }, { 1, 2 }, { 0, 4 }, { 2, 1 }, { 2, 1 }, { 2, 1 }, { 2, 2 }, + { 2, 2 }, { 2, 2 }, { 2, 3 }, { 1, 5 }, { 3, 2 }, { 3, 2 }, { 4, 0 }, { 3, 3 }, + { 3, 3 }, { 3, 3 }, { 3, 4 }, { 3, 4 }, { 3, 4 }, { 3, 5 }, { 4, 3 }, { 4, 3 }, + { 5, 2 }, { 4, 4 }, { 4, 4 }, { 4, 5 }, { 4, 5 }, { 5, 4 }, { 5, 4 }, { 5, 4 }, + { 6, 3 }, { 5, 5 }, { 5, 5 }, { 5, 6 }, { 4, 8 }, { 6, 5 }, { 6, 5 }, { 6, 5 }, + { 6, 6 }, { 6, 6 }, { 6, 6 }, { 6, 7 }, { 5, 9 }, { 7, 6 }, { 7, 6 }, { 8, 4 }, + { 7, 7 }, { 7, 7 }, { 7, 7 }, { 7, 8 }, { 7, 8 }, { 7, 8 }, { 7, 9 }, { 8, 7 }, + { 8, 7 }, { 9, 6 }, { 8, 8 }, { 8, 8 }, { 8, 9 }, { 8, 9 }, { 9, 8 }, { 9, 8 }, + { 9, 8 }, { 10, 7 }, { 9, 9 }, { 9, 9 }, { 9, 10 }, { 8, 12 }, { 10, 9 }, { 10, 9 }, + { 10, 9 }, { 10, 10 }, { 10, 10 }, { 10, 10 }, { 10, 11 }, { 9, 13 }, { 11, 10 }, { 11, 10 }, + { 12, 8 }, { 11, 11 }, { 11, 11 }, { 11, 11 }, { 11, 12 }, { 11, 12 }, { 11, 12 }, { 11, 13 }, + { 12, 11 }, { 12, 11 }, { 13, 10 }, { 12, 12 }, { 12, 12 }, { 12, 13 }, { 12, 13 }, { 13, 12 }, + { 13, 12 }, { 13, 12 }, { 14, 11 }, { 13, 13 }, { 13, 13 }, { 13, 14 }, { 12, 16 }, { 14, 13 }, + { 14, 13 }, { 14, 13 }, { 14, 14 }, { 14, 14 }, { 14, 14 }, { 14, 15 }, { 13, 17 }, { 15, 14 }, + { 15, 14 }, { 16, 12 }, { 15, 15 }, { 15, 15 }, { 15, 15 }, { 15, 16 }, { 15, 16 }, { 15, 16 }, + { 15, 17 }, { 16, 15 }, { 16, 15 }, { 17, 14 }, { 16, 16 }, { 16, 16 }, { 16, 17 }, { 16, 17 }, + { 17, 16 }, { 17, 16 }, { 17, 16 }, { 18, 15 }, { 17, 17 }, { 17, 17 }, { 17, 18 }, { 16, 20 }, + { 18, 17 }, { 18, 17 }, { 18, 17 }, { 18, 18 }, { 18, 18 }, { 18, 18 }, { 18, 19 }, { 17, 21 }, + { 19, 18 }, { 19, 18 }, { 20, 16 }, { 19, 19 }, { 19, 19 }, { 19, 19 }, { 19, 20 }, { 19, 20 }, + { 19, 20 }, { 19, 21 }, { 20, 19 }, { 20, 19 }, { 21, 18 }, { 20, 20 }, { 20, 20 }, { 20, 21 }, + { 20, 21 }, { 21, 20 }, { 21, 20 }, { 21, 20 }, { 22, 19 }, { 21, 21 }, { 21, 21 }, { 21, 22 }, + { 20, 24 }, { 22, 21 }, { 22, 21 }, { 22, 21 }, { 22, 22 }, { 22, 22 }, { 22, 22 }, { 22, 23 }, + { 21, 25 }, { 23, 22 }, { 23, 22 }, { 24, 20 }, { 23, 23 }, { 23, 23 }, { 23, 23 }, { 23, 24 }, + { 23, 24 }, { 23, 24 }, { 23, 25 }, { 24, 23 }, { 24, 23 }, { 25, 22 }, { 24, 24 }, { 24, 24 }, + { 24, 25 }, { 24, 25 }, { 25, 24 }, { 25, 24 }, { 25, 24 }, { 26, 23 }, { 25, 25 }, { 25, 25 }, + { 25, 26 }, { 24, 28 }, { 26, 25 }, { 26, 25 }, { 26, 25 }, { 26, 26 }, { 26, 26 }, { 26, 26 }, + { 26, 27 }, { 25, 29 }, { 27, 26 }, { 27, 26 }, { 28, 24 }, { 27, 27 }, { 27, 27 }, { 27, 27 }, + { 27, 28 }, { 27, 28 }, { 27, 28 }, { 27, 29 }, { 28, 27 }, { 28, 27 }, { 29, 26 }, { 28, 28 }, + { 28, 28 }, { 28, 29 }, { 28, 29 }, { 29, 28 }, { 29, 28 }, { 29, 28 }, { 30, 27 }, { 29, 29 }, + { 29, 29 }, { 29, 30 }, { 29, 30 }, { 30, 29 }, { 30, 29 }, { 30, 29 }, { 30, 30 }, { 30, 30 }, + { 30, 30 }, { 30, 31 }, { 30, 31 }, { 31, 30 }, { 31, 30 }, { 31, 30 }, { 31, 31 }, { 31, 31 }, +}; +static const unsigned char stb__OMatch6[256][2] = { + { 0, 0 }, { 0, 1 }, { 1, 0 }, { 1, 1 }, { 1, 1 }, { 1, 2 }, { 2, 1 }, { 2, 2 }, + { 2, 2 }, { 2, 3 }, { 3, 2 }, { 3, 3 }, { 3, 3 }, { 3, 4 }, { 4, 3 }, { 4, 4 }, + { 4, 4 }, { 4, 5 }, { 5, 4 }, { 5, 5 }, { 5, 5 }, { 5, 6 }, { 6, 5 }, { 6, 6 }, + { 6, 6 }, { 6, 7 }, { 7, 6 }, { 7, 7 }, { 7, 7 }, { 7, 8 }, { 8, 7 }, { 8, 8 }, + { 8, 8 }, { 8, 9 }, { 9, 8 }, { 9, 9 }, { 9, 9 }, { 9, 10 }, { 10, 9 }, { 10, 10 }, + { 10, 10 }, { 10, 11 }, { 11, 10 }, { 8, 16 }, { 11, 11 }, { 11, 12 }, { 12, 11 }, { 9, 17 }, + { 12, 12 }, { 12, 13 }, { 13, 12 }, { 11, 16 }, { 13, 13 }, { 13, 14 }, { 14, 13 }, { 12, 17 }, + { 14, 14 }, { 14, 15 }, { 15, 14 }, { 14, 16 }, { 15, 15 }, { 15, 16 }, { 16, 14 }, { 16, 15 }, + { 17, 14 }, { 16, 16 }, { 16, 17 }, { 17, 16 }, { 18, 15 }, { 17, 17 }, { 17, 18 }, { 18, 17 }, + { 20, 14 }, { 18, 18 }, { 18, 19 }, { 19, 18 }, { 21, 15 }, { 19, 19 }, { 19, 20 }, { 20, 19 }, + { 20, 20 }, { 20, 20 }, { 20, 21 }, { 21, 20 }, { 21, 21 }, { 21, 21 }, { 21, 22 }, { 22, 21 }, + { 22, 22 }, { 22, 22 }, { 22, 23 }, { 23, 22 }, { 23, 23 }, { 23, 23 }, { 23, 24 }, { 24, 23 }, + { 24, 24 }, { 24, 24 }, { 24, 25 }, { 25, 24 }, { 25, 25 }, { 25, 25 }, { 25, 26 }, { 26, 25 }, + { 26, 26 }, { 26, 26 }, { 26, 27 }, { 27, 26 }, { 24, 32 }, { 27, 27 }, { 27, 28 }, { 28, 27 }, + { 25, 33 }, { 28, 28 }, { 28, 29 }, { 29, 28 }, { 27, 32 }, { 29, 29 }, { 29, 30 }, { 30, 29 }, + { 28, 33 }, { 30, 30 }, { 30, 31 }, { 31, 30 }, { 30, 32 }, { 31, 31 }, { 31, 32 }, { 32, 30 }, + { 32, 31 }, { 33, 30 }, { 32, 32 }, { 32, 33 }, { 33, 32 }, { 34, 31 }, { 33, 33 }, { 33, 34 }, + { 34, 33 }, { 36, 30 }, { 34, 34 }, { 34, 35 }, { 35, 34 }, { 37, 31 }, { 35, 35 }, { 35, 36 }, + { 36, 35 }, { 36, 36 }, { 36, 36 }, { 36, 37 }, { 37, 36 }, { 37, 37 }, { 37, 37 }, { 37, 38 }, + { 38, 37 }, { 38, 38 }, { 38, 38 }, { 38, 39 }, { 39, 38 }, { 39, 39 }, { 39, 39 }, { 39, 40 }, + { 40, 39 }, { 40, 40 }, { 40, 40 }, { 40, 41 }, { 41, 40 }, { 41, 41 }, { 41, 41 }, { 41, 42 }, + { 42, 41 }, { 42, 42 }, { 42, 42 }, { 42, 43 }, { 43, 42 }, { 40, 48 }, { 43, 43 }, { 43, 44 }, + { 44, 43 }, { 41, 49 }, { 44, 44 }, { 44, 45 }, { 45, 44 }, { 43, 48 }, { 45, 45 }, { 45, 46 }, + { 46, 45 }, { 44, 49 }, { 46, 46 }, { 46, 47 }, { 47, 46 }, { 46, 48 }, { 47, 47 }, { 47, 48 }, + { 48, 46 }, { 48, 47 }, { 49, 46 }, { 48, 48 }, { 48, 49 }, { 49, 48 }, { 50, 47 }, { 49, 49 }, + { 49, 50 }, { 50, 49 }, { 52, 46 }, { 50, 50 }, { 50, 51 }, { 51, 50 }, { 53, 47 }, { 51, 51 }, + { 51, 52 }, { 52, 51 }, { 52, 52 }, { 52, 52 }, { 52, 53 }, { 53, 52 }, { 53, 53 }, { 53, 53 }, + { 53, 54 }, { 54, 53 }, { 54, 54 }, { 54, 54 }, { 54, 55 }, { 55, 54 }, { 55, 55 }, { 55, 55 }, + { 55, 56 }, { 56, 55 }, { 56, 56 }, { 56, 56 }, { 56, 57 }, { 57, 56 }, { 57, 57 }, { 57, 57 }, + { 57, 58 }, { 58, 57 }, { 58, 58 }, { 58, 58 }, { 58, 59 }, { 59, 58 }, { 59, 59 }, { 59, 59 }, + { 59, 60 }, { 60, 59 }, { 60, 60 }, { 60, 60 }, { 60, 61 }, { 61, 60 }, { 61, 61 }, { 61, 61 }, + { 61, 62 }, { 62, 61 }, { 62, 62 }, { 62, 62 }, { 62, 63 }, { 63, 62 }, { 63, 63 }, { 63, 63 }, +}; + +static int stb__Mul8Bit(int a, int b) +{ + int t = a*b + 128; + return (t + (t >> 8)) >> 8; +} + +static void stb__From16Bit(unsigned char *out, unsigned short v) +{ + int rv = (v & 0xf800) >> 11; + int gv = (v & 0x07e0) >> 5; + int bv = (v & 0x001f) >> 0; + + // expand to 8 bits via bit replication + out[0] = (rv * 33) >> 2; + out[1] = (gv * 65) >> 4; + out[2] = (bv * 33) >> 2; + out[3] = 0; +} + +static unsigned short stb__As16Bit(int r, int g, int b) +{ + return (unsigned short)((stb__Mul8Bit(r,31) << 11) + (stb__Mul8Bit(g,63) << 5) + stb__Mul8Bit(b,31)); +} + +// linear interpolation at 1/3 point between a and b, using desired rounding type +static int stb__Lerp13(int a, int b) +{ +#ifdef STB_DXT_USE_ROUNDING_BIAS + // with rounding bias + return a + stb__Mul8Bit(b-a, 0x55); +#else + // without rounding bias + // replace "/ 3" by "* 0xaaab) >> 17" if your compiler sucks or you really need every ounce of speed. + return (2*a + b) / 3; +#endif +} + +// lerp RGB color +static void stb__Lerp13RGB(unsigned char *out, unsigned char *p1, unsigned char *p2) +{ + out[0] = (unsigned char)stb__Lerp13(p1[0], p2[0]); + out[1] = (unsigned char)stb__Lerp13(p1[1], p2[1]); + out[2] = (unsigned char)stb__Lerp13(p1[2], p2[2]); +} + +/****************************************************************************/ + +static void stb__EvalColors(unsigned char *color,unsigned short c0,unsigned short c1) +{ + stb__From16Bit(color+ 0, c0); + stb__From16Bit(color+ 4, c1); + stb__Lerp13RGB(color+ 8, color+0, color+4); + stb__Lerp13RGB(color+12, color+4, color+0); +} + +// The color matching function +static unsigned int stb__MatchColorsBlock(unsigned char *block, unsigned char *color) +{ + unsigned int mask = 0; + int dirr = color[0*4+0] - color[1*4+0]; + int dirg = color[0*4+1] - color[1*4+1]; + int dirb = color[0*4+2] - color[1*4+2]; + int dots[16]; + int stops[4]; + int i; + int c0Point, halfPoint, c3Point; + + for(i=0;i<16;i++) + dots[i] = block[i*4+0]*dirr + block[i*4+1]*dirg + block[i*4+2]*dirb; + + for(i=0;i<4;i++) + stops[i] = color[i*4+0]*dirr + color[i*4+1]*dirg + color[i*4+2]*dirb; + + // think of the colors as arranged on a line; project point onto that line, then choose + // next color out of available ones. we compute the crossover points for "best color in top + // half"/"best in bottom half" and then the same inside that subinterval. + // + // relying on this 1d approximation isn't always optimal in terms of euclidean distance, + // but it's very close and a lot faster. + // http://cbloomrants.blogspot.com/2008/12/12-08-08-dxtc-summary.html + + c0Point = (stops[1] + stops[3]); + halfPoint = (stops[3] + stops[2]); + c3Point = (stops[2] + stops[0]); + + for (i=15;i>=0;i--) { + int dot = dots[i]*2; + mask <<= 2; + + if(dot < halfPoint) + mask |= (dot < c0Point) ? 1 : 3; + else + mask |= (dot < c3Point) ? 2 : 0; + } + + return mask; +} + +// The color optimization function. (Clever code, part 1) +static void stb__OptimizeColorsBlock(unsigned char *block, unsigned short *pmax16, unsigned short *pmin16) +{ + int mind,maxd; + unsigned char *minp, *maxp; + double magn; + int v_r,v_g,v_b; + static const int nIterPower = 4; + float covf[6],vfr,vfg,vfb; + + // determine color distribution + int cov[6]; + int mu[3],min[3],max[3]; + int ch,i,iter; + + for(ch=0;ch<3;ch++) + { + const unsigned char *bp = ((const unsigned char *) block) + ch; + int muv,minv,maxv; + + muv = minv = maxv = bp[0]; + for(i=4;i<64;i+=4) + { + muv += bp[i]; + if (bp[i] < minv) minv = bp[i]; + else if (bp[i] > maxv) maxv = bp[i]; + } + + mu[ch] = (muv + 8) >> 4; + min[ch] = minv; + max[ch] = maxv; + } + + // determine covariance matrix + for (i=0;i<6;i++) + cov[i] = 0; + + for (i=0;i<16;i++) + { + int r = block[i*4+0] - mu[0]; + int g = block[i*4+1] - mu[1]; + int b = block[i*4+2] - mu[2]; + + cov[0] += r*r; + cov[1] += r*g; + cov[2] += r*b; + cov[3] += g*g; + cov[4] += g*b; + cov[5] += b*b; + } + + // convert covariance matrix to float, find principal axis via power iter + for(i=0;i<6;i++) + covf[i] = cov[i] / 255.0f; + + vfr = (float) (max[0] - min[0]); + vfg = (float) (max[1] - min[1]); + vfb = (float) (max[2] - min[2]); + + for(iter=0;iter<nIterPower;iter++) + { + float r = vfr*covf[0] + vfg*covf[1] + vfb*covf[2]; + float g = vfr*covf[1] + vfg*covf[3] + vfb*covf[4]; + float b = vfr*covf[2] + vfg*covf[4] + vfb*covf[5]; + + vfr = r; + vfg = g; + vfb = b; + } + + magn = STBD_FABS(vfr); + if (STBD_FABS(vfg) > magn) magn = STBD_FABS(vfg); + if (STBD_FABS(vfb) > magn) magn = STBD_FABS(vfb); + + if(magn < 4.0f) { // too small, default to luminance + v_r = 299; // JPEG YCbCr luma coefs, scaled by 1000. + v_g = 587; + v_b = 114; + } else { + magn = 512.0 / magn; + v_r = (int) (vfr * magn); + v_g = (int) (vfg * magn); + v_b = (int) (vfb * magn); + } + + minp = maxp = block; + mind = maxd = block[0]*v_r + block[1]*v_g + block[2]*v_b; + // Pick colors at extreme points + for(i=1;i<16;i++) + { + int dot = block[i*4+0]*v_r + block[i*4+1]*v_g + block[i*4+2]*v_b; + + if (dot < mind) { + mind = dot; + minp = block+i*4; + } + + if (dot > maxd) { + maxd = dot; + maxp = block+i*4; + } + } + + *pmax16 = stb__As16Bit(maxp[0],maxp[1],maxp[2]); + *pmin16 = stb__As16Bit(minp[0],minp[1],minp[2]); +} + +static const float stb__midpoints5[32] = { + 0.015686f, 0.047059f, 0.078431f, 0.111765f, 0.145098f, 0.176471f, 0.207843f, 0.241176f, 0.274510f, 0.305882f, 0.337255f, 0.370588f, 0.403922f, 0.435294f, 0.466667f, 0.5f, + 0.533333f, 0.564706f, 0.596078f, 0.629412f, 0.662745f, 0.694118f, 0.725490f, 0.758824f, 0.792157f, 0.823529f, 0.854902f, 0.888235f, 0.921569f, 0.952941f, 0.984314f, 1.0f +}; + +static const float stb__midpoints6[64] = { + 0.007843f, 0.023529f, 0.039216f, 0.054902f, 0.070588f, 0.086275f, 0.101961f, 0.117647f, 0.133333f, 0.149020f, 0.164706f, 0.180392f, 0.196078f, 0.211765f, 0.227451f, 0.245098f, + 0.262745f, 0.278431f, 0.294118f, 0.309804f, 0.325490f, 0.341176f, 0.356863f, 0.372549f, 0.388235f, 0.403922f, 0.419608f, 0.435294f, 0.450980f, 0.466667f, 0.482353f, 0.500000f, + 0.517647f, 0.533333f, 0.549020f, 0.564706f, 0.580392f, 0.596078f, 0.611765f, 0.627451f, 0.643137f, 0.658824f, 0.674510f, 0.690196f, 0.705882f, 0.721569f, 0.737255f, 0.754902f, + 0.772549f, 0.788235f, 0.803922f, 0.819608f, 0.835294f, 0.850980f, 0.866667f, 0.882353f, 0.898039f, 0.913725f, 0.929412f, 0.945098f, 0.960784f, 0.976471f, 0.992157f, 1.0f +}; + +static unsigned short stb__Quantize5(float x) +{ + unsigned short q; + x = x < 0 ? 0 : x > 1 ? 1 : x; // saturate + q = (unsigned short)(x * 31); + q += (x > stb__midpoints5[q]); + return q; +} + +static unsigned short stb__Quantize6(float x) +{ + unsigned short q; + x = x < 0 ? 0 : x > 1 ? 1 : x; // saturate + q = (unsigned short)(x * 63); + q += (x > stb__midpoints6[q]); + return q; +} + +// The refinement function. (Clever code, part 2) +// Tries to optimize colors to suit block contents better. +// (By solving a least squares system via normal equations+Cramer's rule) +static int stb__RefineBlock(unsigned char *block, unsigned short *pmax16, unsigned short *pmin16, unsigned int mask) +{ + static const int w1Tab[4] = { 3,0,2,1 }; + static const int prods[4] = { 0x090000,0x000900,0x040102,0x010402 }; + // ^some magic to save a lot of multiplies in the accumulating loop... + // (precomputed products of weights for least squares system, accumulated inside one 32-bit register) + + float f; + unsigned short oldMin, oldMax, min16, max16; + int i, akku = 0, xx,xy,yy; + int At1_r,At1_g,At1_b; + int At2_r,At2_g,At2_b; + unsigned int cm = mask; + + oldMin = *pmin16; + oldMax = *pmax16; + + if((mask ^ (mask<<2)) < 4) // all pixels have the same index? + { + // yes, linear system would be singular; solve using optimal + // single-color match on average color + int r = 8, g = 8, b = 8; + for (i=0;i<16;++i) { + r += block[i*4+0]; + g += block[i*4+1]; + b += block[i*4+2]; + } + + r >>= 4; g >>= 4; b >>= 4; + + max16 = (stb__OMatch5[r][0]<<11) | (stb__OMatch6[g][0]<<5) | stb__OMatch5[b][0]; + min16 = (stb__OMatch5[r][1]<<11) | (stb__OMatch6[g][1]<<5) | stb__OMatch5[b][1]; + } else { + At1_r = At1_g = At1_b = 0; + At2_r = At2_g = At2_b = 0; + for (i=0;i<16;++i,cm>>=2) { + int step = cm&3; + int w1 = w1Tab[step]; + int r = block[i*4+0]; + int g = block[i*4+1]; + int b = block[i*4+2]; + + akku += prods[step]; + At1_r += w1*r; + At1_g += w1*g; + At1_b += w1*b; + At2_r += r; + At2_g += g; + At2_b += b; + } + + At2_r = 3*At2_r - At1_r; + At2_g = 3*At2_g - At1_g; + At2_b = 3*At2_b - At1_b; + + // extract solutions and decide solvability + xx = akku >> 16; + yy = (akku >> 8) & 0xff; + xy = (akku >> 0) & 0xff; + + f = 3.0f / 255.0f / (xx*yy - xy*xy); + + max16 = stb__Quantize5((At1_r*yy - At2_r * xy) * f) << 11; + max16 |= stb__Quantize6((At1_g*yy - At2_g * xy) * f) << 5; + max16 |= stb__Quantize5((At1_b*yy - At2_b * xy) * f) << 0; + + min16 = stb__Quantize5((At2_r*xx - At1_r * xy) * f) << 11; + min16 |= stb__Quantize6((At2_g*xx - At1_g * xy) * f) << 5; + min16 |= stb__Quantize5((At2_b*xx - At1_b * xy) * f) << 0; + } + + *pmin16 = min16; + *pmax16 = max16; + return oldMin != min16 || oldMax != max16; +} + +// Color block compression +static void stb__CompressColorBlock(unsigned char *dest, unsigned char *block, int mode) +{ + unsigned int mask; + int i; + int refinecount; + unsigned short max16, min16; + unsigned char color[4*4]; + + refinecount = (mode & STB_DXT_HIGHQUAL) ? 2 : 1; + + // check if block is constant + for (i=1;i<16;i++) + if (((unsigned int *) block)[i] != ((unsigned int *) block)[0]) + break; + + if(i == 16) { // constant color + int r = block[0], g = block[1], b = block[2]; + mask = 0xaaaaaaaa; + max16 = (stb__OMatch5[r][0]<<11) | (stb__OMatch6[g][0]<<5) | stb__OMatch5[b][0]; + min16 = (stb__OMatch5[r][1]<<11) | (stb__OMatch6[g][1]<<5) | stb__OMatch5[b][1]; + } else { + // first step: PCA+map along principal axis + stb__OptimizeColorsBlock(block,&max16,&min16); + if (max16 != min16) { + stb__EvalColors(color,max16,min16); + mask = stb__MatchColorsBlock(block,color); + } else + mask = 0; + + // third step: refine (multiple times if requested) + for (i=0;i<refinecount;i++) { + unsigned int lastmask = mask; + + if (stb__RefineBlock(block,&max16,&min16,mask)) { + if (max16 != min16) { + stb__EvalColors(color,max16,min16); + mask = stb__MatchColorsBlock(block,color); + } else { + mask = 0; + break; + } + } + + if(mask == lastmask) + break; + } + } + + // write the color block + if(max16 < min16) + { + unsigned short t = min16; + min16 = max16; + max16 = t; + mask ^= 0x55555555; + } + + dest[0] = (unsigned char) (max16); + dest[1] = (unsigned char) (max16 >> 8); + dest[2] = (unsigned char) (min16); + dest[3] = (unsigned char) (min16 >> 8); + dest[4] = (unsigned char) (mask); + dest[5] = (unsigned char) (mask >> 8); + dest[6] = (unsigned char) (mask >> 16); + dest[7] = (unsigned char) (mask >> 24); +} + +// Alpha block compression (this is easy for a change) +static void stb__CompressAlphaBlock(unsigned char *dest,unsigned char *src, int stride) +{ + int i,dist,bias,dist4,dist2,bits,mask; + + // find min/max color + int mn,mx; + mn = mx = src[0]; + + for (i=1;i<16;i++) + { + if (src[i*stride] < mn) mn = src[i*stride]; + else if (src[i*stride] > mx) mx = src[i*stride]; + } + + // encode them + dest[0] = (unsigned char)mx; + dest[1] = (unsigned char)mn; + dest += 2; + + // determine bias and emit color indices + // given the choice of mx/mn, these indices are optimal: + // http://fgiesen.wordpress.com/2009/12/15/dxt5-alpha-block-index-determination... + dist = mx-mn; + dist4 = dist*4; + dist2 = dist*2; + bias = (dist < 8) ? (dist - 1) : (dist/2 + 2); + bias -= mn * 7; + bits = 0,mask=0; + + for (i=0;i<16;i++) { + int a = src[i*stride]*7 + bias; + int ind,t; + + // select index. this is a "linear scale" lerp factor between 0 (val=min) and 7 (val=max). + t = (a >= dist4) ? -1 : 0; ind = t & 4; a -= dist4 & t; + t = (a >= dist2) ? -1 : 0; ind += t & 2; a -= dist2 & t; + ind += (a >= dist); + + // turn linear scale into DXT index (0/1 are extremal pts) + ind = -ind & 7; + ind ^= (2 > ind); + + // write index + mask |= ind << bits; + if((bits += 3) >= 8) { + *dest++ = (unsigned char)mask; + mask >>= 8; + bits -= 8; + } + } +} + +void stb_compress_dxt_block(unsigned char *dest, const unsigned char *src, int alpha, int mode) +{ + unsigned char data[16][4]; + if (alpha) { + int i; + stb__CompressAlphaBlock(dest,(unsigned char*) src+3, 4); + dest += 8; + // make a new copy of the data in which alpha is opaque, + // because code uses a fast test for color constancy + memcpy(data, src, 4*16); + for (i=0; i < 16; ++i) + data[i][3] = 255; + src = &data[0][0]; + } + + stb__CompressColorBlock(dest,(unsigned char*) src,mode); +} + +void stb_compress_bc4_block(unsigned char *dest, const unsigned char *src) +{ + stb__CompressAlphaBlock(dest,(unsigned char*) src, 1); +} + +void stb_compress_bc5_block(unsigned char *dest, const unsigned char *src) +{ + stb__CompressAlphaBlock(dest,(unsigned char*) src,2); + stb__CompressAlphaBlock(dest + 8,(unsigned char*) src+1,2); +} +#endif // STB_DXT_IMPLEMENTATION + +// Compile with STB_DXT_IMPLEMENTATION and STB_DXT_GENERATE_TABLES +// defined to generate the tables above. +#ifdef STB_DXT_GENERATE_TABLES +#include <stdio.h> + +int main() +{ + int i, j; + const char *omatch_names[] = { "stb__OMatch5", "stb__OMatch6" }; + int dequant_mults[2] = { 33*4, 65 }; // .4 fixed-point dequant multipliers + + // optimal endpoint tables + for (i = 0; i < 2; ++i) { + int dequant = dequant_mults[i]; + int size = i ? 64 : 32; + printf("static const unsigned char %s[256][2] = {\n", omatch_names[i]); + for (int j = 0; j < 256; ++j) { + int mn, mx; + int best_mn = 0, best_mx = 0; + int best_err = 256 * 100; + for (mn=0;mn<size;mn++) { + for (mx=0;mx<size;mx++) { + int mine = (mn * dequant) >> 4; + int maxe = (mx * dequant) >> 4; + int err = abs(stb__Lerp13(maxe, mine) - j) * 100; + + // DX10 spec says that interpolation must be within 3% of "correct" result, + // add this as error term. Normally we'd expect a random distribution of + // +-1.5% error, but nowhere in the spec does it say that the error has to be + // unbiased - better safe than sorry. + err += abs(maxe - mine) * 3; + + if(err < best_err) { + best_mn = mn; + best_mx = mx; + best_err = err; + } + } + } + if ((j % 8) == 0) printf(" "); // 2 spaces, third is done below + printf(" { %2d, %2d },", best_mx, best_mn); + if ((j % 8) == 7) printf("\n"); + } + printf("};\n"); + } + + return 0; +} +#endif + +/* +------------------------------------------------------------------------------ +This software is available under 2 licenses -- choose whichever you prefer. +------------------------------------------------------------------------------ +ALTERNATIVE A - MIT License +Copyright (c) 2017 Sean Barrett +Permission is hereby granted, free of charge, to any person obtaining a copy of +this software and associated documentation files (the "Software"), to deal in +the Software without restriction, including without limitation the rights to +use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies +of the Software, and to permit persons to whom the Software is furnished to do +so, subject to the following conditions: +The above copyright notice and this permission notice shall be included in all +copies or substantial portions of the Software. +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER +LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, +OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +SOFTWARE. +------------------------------------------------------------------------------ +ALTERNATIVE B - Public Domain (www.unlicense.org) +This is free and unencumbered software released into the public domain. +Anyone is free to copy, modify, publish, use, compile, sell, or distribute this +software, either in source code form or as a compiled binary, for any purpose, +commercial or non-commercial, and by any means. +In jurisdictions that recognize copyright laws, the author or authors of this +software dedicate any and all copyright interest in the software to the public +domain. We make this dedication for the benefit of the public at large and to +the detriment of our heirs and successors. We intend this dedication to be an +overt act of relinquishment in perpetuity of all present and future rights to +this software under copyright law. +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR +IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, +FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE +AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN +ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION +WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +------------------------------------------------------------------------------ +*/ diff --git a/dlls/d3dx9_36/surface.c b/dlls/d3dx9_36/surface.c index 70ee7bdbd01..28567c0fe75 100644 --- a/dlls/d3dx9_36/surface.c +++ b/dlls/d3dx9_36/surface.c @@ -27,7 +27,8 @@

#define BCDEC_IMPLEMENTATION #include "bcdec.h" -#include "txc_dxtn.h" +#define STB_DXT_IMPLEMENTATION +#include "stb_dxt.h" #include <assert.h>

WINE_DEFAULT_DEBUG_CHANNEL(d3dx); @@ -2917,6 +2918,106 @@ static HRESULT d3dx_pixels_unpack_index(struct d3dx_pixels *pixels, const struct return S_OK; }

+static void d3dx_compress_block(enum d3dx_pixel_format_id fmt, uint8_t *block_buf, void *dst_buf) +{ + switch (fmt) + { + case D3DX_PIXEL_FORMAT_DXT1_UNORM: + stb_compress_dxt_block(dst_buf, block_buf, FALSE, 0); + break; + + case D3DX_PIXEL_FORMAT_DXT2_UNORM: + case D3DX_PIXEL_FORMAT_DXT3_UNORM: + { + uint8_t *dst_data_offset = dst_buf; + unsigned int y; + + /* STB doesn't do DXT2/DXT3, we'll do the alpha part ourselves. */ + for (y = 0; y < 4; ++y) + { + uint8_t *tmp_row = &block_buf[y * 4 * 4]; + + dst_data_offset[0] = (tmp_row[7] & 0xf0); + dst_data_offset[0] |= (tmp_row[3] >> 4); + dst_data_offset[1] = (tmp_row[15] & 0xf0); + dst_data_offset[1] |= (tmp_row[11] >> 4); + + /* + * Set all alpha values to 0xff so they aren't considered during + * compression. + */ + tmp_row[3] = tmp_row[7] = tmp_row[11] = tmp_row[15] = 0xff; + dst_data_offset += 2; + } + stb_compress_dxt_block(dst_data_offset, block_buf, FALSE, 0); + break; + } + + case D3DX_PIXEL_FORMAT_DXT4_UNORM: + case D3DX_PIXEL_FORMAT_DXT5_UNORM: + stb_compress_dxt_block(dst_buf, block_buf, TRUE, 0); + break; + + default: + assert(0); + break; + } +} + +static HRESULT d3dx_pixels_compress(struct d3dx_pixels *src_pixels, + const struct pixel_format_desc *src_desc, struct d3dx_pixels *dst_pixels, + const struct pixel_format_desc *dst_desc) +{ + unsigned int x, y, z, block_buf_row_pitch; + uint8_t block_buf[64]; + + switch (dst_desc->format) + { + case D3DX_PIXEL_FORMAT_DXT1_UNORM: + case D3DX_PIXEL_FORMAT_DXT2_UNORM: + case D3DX_PIXEL_FORMAT_DXT3_UNORM: + case D3DX_PIXEL_FORMAT_DXT4_UNORM: + case D3DX_PIXEL_FORMAT_DXT5_UNORM: + assert(src_desc->format == D3DX_PIXEL_FORMAT_R8G8B8A8_UNORM); + break; + + default: + FIXME("Unexpected compressed texture format %u.\n", dst_desc->format); + return E_NOTIMPL; + } + + TRACE("Compressing pixels.\n"); + block_buf_row_pitch = src_desc->bytes_per_pixel * dst_desc->block_width; + for (z = 0; z < src_pixels->size.depth; ++z) + { + const uint8_t *src_slice = &((const uint8_t *)src_pixels->data)[z * src_pixels->slice_pitch]; + uint8_t *dst_slice = &((uint8_t *)dst_pixels->data)[z * dst_pixels->slice_pitch]; + + for (y = 0; y < src_pixels->size.height; y += dst_desc->block_height) + { + const unsigned int tmp_src_height = min(dst_desc->block_height, src_pixels->size.height - y); + uint8_t *dst_ptr = &dst_slice[(y / dst_desc->block_height) * dst_pixels->row_pitch]; + const uint8_t *src_ptr = &src_slice[y * src_pixels->row_pitch]; + + for (x = 0; x < src_pixels->size.width; x += dst_desc->block_width) + { + const unsigned int tmp_src_width = min(dst_desc->block_width, src_pixels->size.width - x); + struct volume block_buf_size = { tmp_src_width, tmp_src_height, 1 }; + + if (tmp_src_width != dst_desc->block_width || tmp_src_height != dst_desc->block_height) + memset(block_buf, 0, sizeof(block_buf)); + copy_pixels(src_ptr, src_pixels->row_pitch, src_pixels->slice_pitch, block_buf, block_buf_row_pitch, 0, + &block_buf_size, src_desc); + d3dx_compress_block(dst_desc->format, block_buf, dst_ptr); + src_ptr += (src_desc->bytes_per_pixel * dst_desc->block_width); + dst_ptr += dst_desc->block_byte_count; + } + } + } + + return S_OK; +} + HRESULT d3dx_pixels_init(const void *data, uint32_t row_pitch, uint32_t slice_pitch, const PALETTEENTRY *palette, enum d3dx_pixel_format_id format, uint32_t left, uint32_t top, uint32_t right, uint32_t bottom, uint32_t front, uint32_t back, struct d3dx_pixels *pixels) @@ -3095,35 +3196,13 @@ HRESULT d3dx_load_pixels_from_pixels(struct d3dx_pixels *dst_pixels, color_key); if (SUCCEEDED(hr)) { - GLenum gl_format = 0; - uint32_t i; - - TRACE("Compressing DXTn surface.\n"); - switch (dst_desc->format) - { - case D3DX_PIXEL_FORMAT_DXT1_UNORM: - gl_format = GL_COMPRESSED_RGBA_S3TC_DXT1_EXT; - break; - case D3DX_PIXEL_FORMAT_DXT2_UNORM: - case D3DX_PIXEL_FORMAT_DXT3_UNORM: - gl_format = GL_COMPRESSED_RGBA_S3TC_DXT3_EXT; - break; - case D3DX_PIXEL_FORMAT_DXT4_UNORM: - case D3DX_PIXEL_FORMAT_DXT5_UNORM: - gl_format = GL_COMPRESSED_RGBA_S3TC_DXT5_EXT; - break; - default: - ERR("Unexpected destination compressed format %u.\n", dst_desc->format); - } - - for (i = 0; i < dst_size_aligned.depth; ++i) - { - BYTE *uncompressed_mem_slice = (BYTE *)uncompressed_mem + (i * uncompressed_slice_pitch); - BYTE *dst_memory_slice = ((BYTE *)dst_pixels->data) + (i * dst_pixels->slice_pitch); + d3dx_pixels_init(uncompressed_mem, uncompressed_row_pitch, uncompressed_slice_pitch, NULL, + uncompressed_desc->format, 0, 0, dst_size_aligned.width, dst_size_aligned.height, 0, + dst_pixels->size.depth, &uncompressed_pixels);

- tx_compress_dxtn(4, dst_size_aligned.width, dst_size_aligned.height, uncompressed_mem_slice, gl_format, - dst_memory_slice, dst_pixels->row_pitch); - } + hr = d3dx_pixels_compress(&uncompressed_pixels, uncompressed_desc, dst_pixels, dst_desc); + if (FAILED(hr)) + WARN("Failed to compress pixels, hr %#lx.\n", hr); } free(uncompressed_mem); goto exit; diff --git a/dlls/d3dx9_36/txc_compress_dxtn.c b/dlls/d3dx9_36/txc_compress_dxtn.c deleted file mode 100644 index 7f10de8076b..00000000000 --- a/dlls/d3dx9_36/txc_compress_dxtn.c +++ /dev/null @@ -1,841 +0,0 @@ -/* - * libtxc_dxtn - * Version: 1.0 - * - * Copyright (C) 2004 Roland Scheidegger All Rights Reserved. - * - * Permission is hereby granted, free of charge, to any person obtaining a - * copy of this software and associated documentation files (the "Software"), - * to deal in the Software without restriction, including without limitation - * the rights to use, copy, modify, merge, publish, distribute, sublicense, - * and/or sell copies of the Software, and to permit persons to whom the - * Software is furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included - * in all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS - * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL - * BRIAN PAUL BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN - * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. - */ - -#include <stdio.h> -#include <stdlib.h> -#include "txc_dxtn.h" - -/* weights used for error function, basically weights (unsquared 2/4/1) according to rgb->luminance conversion - not sure if this really reflects visual perception */ -#define REDWEIGHT 4 -#define GREENWEIGHT 16 -#define BLUEWEIGHT 1 - -#define ALPHACUT 127 - -static void fancybasecolorsearch( GLubyte *blkaddr, GLubyte srccolors[4][4][4], GLubyte *bestcolor[2], - GLint numxpixels, GLint numypixels, GLint type, GLboolean haveAlpha) -{ - /* use same luminance-weighted distance metric to determine encoding as for finding the base colors */ - - /* TODO could also try to find a better encoding for the 3-color-encoding type, this really should be done - if it's rgba_dxt1 and we have alpha in the block, currently even values which will be mapped to black - due to their alpha value will influence the result */ - GLint i, j, colors, z; - GLuint pixerror, pixerrorred, pixerrorgreen, pixerrorblue, pixerrorbest; - GLint colordist, blockerrlin[2][3]; - GLubyte nrcolor[2]; - GLint pixerrorcolorbest[3]; - GLubyte enc = 0; - GLubyte cv[4][4]; - GLubyte testcolor[2][3]; - -/* fprintf(stderr, "color begin 0 r/g/b %d/%d/%d, 1 r/g/b %d/%d/%d\n", - bestcolor[0][0], bestcolor[0][1], bestcolor[0][2], bestcolor[1][0], bestcolor[1][1], bestcolor[1][2]);*/ - if (((bestcolor[0][0] & 0xf8) << 8 | (bestcolor[0][1] & 0xfc) << 3 | bestcolor[0][2] >> 3) < - ((bestcolor[1][0] & 0xf8) << 8 | (bestcolor[1][1] & 0xfc) << 3 | bestcolor[1][2] >> 3)) { - testcolor[0][0] = bestcolor[0][0]; - testcolor[0][1] = bestcolor[0][1]; - testcolor[0][2] = bestcolor[0][2]; - testcolor[1][0] = bestcolor[1][0]; - testcolor[1][1] = bestcolor[1][1]; - testcolor[1][2] = bestcolor[1][2]; - } - else { - testcolor[1][0] = bestcolor[0][0]; - testcolor[1][1] = bestcolor[0][1]; - testcolor[1][2] = bestcolor[0][2]; - testcolor[0][0] = bestcolor[1][0]; - testcolor[0][1] = bestcolor[1][1]; - testcolor[0][2] = bestcolor[1][2]; - } - - for (i = 0; i < 3; i ++) { - cv[0][i] = testcolor[0][i]; - cv[1][i] = testcolor[1][i]; - cv[2][i] = (testcolor[0][i] * 2 + testcolor[1][i]) / 3; - cv[3][i] = (testcolor[0][i] + testcolor[1][i] * 2) / 3; - } - - blockerrlin[0][0] = 0; - blockerrlin[0][1] = 0; - blockerrlin[0][2] = 0; - blockerrlin[1][0] = 0; - blockerrlin[1][1] = 0; - blockerrlin[1][2] = 0; - - nrcolor[0] = 0; - nrcolor[1] = 0; - - for (j = 0; j < numypixels; j++) { - for (i = 0; i < numxpixels; i++) { - pixerrorbest = 0xffffffff; - for (colors = 0; colors < 4; colors++) { - colordist = srccolors[j][i][0] - (cv[colors][0]); - pixerror = colordist * colordist * REDWEIGHT; - pixerrorred = colordist; - colordist = srccolors[j][i][1] - (cv[colors][1]); - pixerror += colordist * colordist * GREENWEIGHT; - pixerrorgreen = colordist; - colordist = srccolors[j][i][2] - (cv[colors][2]); - pixerror += colordist * colordist * BLUEWEIGHT; - pixerrorblue = colordist; - if (pixerror < pixerrorbest) { - enc = colors; - pixerrorbest = pixerror; - pixerrorcolorbest[0] = pixerrorred; - pixerrorcolorbest[1] = pixerrorgreen; - pixerrorcolorbest[2] = pixerrorblue; - } - } - if (enc == 0) { - for (z = 0; z < 3; z++) { - blockerrlin[0][z] += 3 * pixerrorcolorbest[z]; - } - nrcolor[0] += 3; - } - else if (enc == 2) { - for (z = 0; z < 3; z++) { - blockerrlin[0][z] += 2 * pixerrorcolorbest[z]; - } - nrcolor[0] += 2; - for (z = 0; z < 3; z++) { - blockerrlin[1][z] += 1 * pixerrorcolorbest[z]; - } - nrcolor[1] += 1; - } - else if (enc == 3) { - for (z = 0; z < 3; z++) { - blockerrlin[0][z] += 1 * pixerrorcolorbest[z]; - } - nrcolor[0] += 1; - for (z = 0; z < 3; z++) { - blockerrlin[1][z] += 2 * pixerrorcolorbest[z]; - } - nrcolor[1] += 2; - } - else if (enc == 1) { - for (z = 0; z < 3; z++) { - blockerrlin[1][z] += 3 * pixerrorcolorbest[z]; - } - nrcolor[1] += 3; - } - } - } - if (nrcolor[0] == 0) nrcolor[0] = 1; - if (nrcolor[1] == 0) nrcolor[1] = 1; - for (j = 0; j < 2; j++) { - for (i = 0; i < 3; i++) { - GLint newvalue = testcolor[j][i] + blockerrlin[j][i] / nrcolor[j]; - if (newvalue <= 0) - testcolor[j][i] = 0; - else if (newvalue >= 255) - testcolor[j][i] = 255; - else testcolor[j][i] = newvalue; - } - } - - if ((abs(testcolor[0][0] - testcolor[1][0]) < 8) && - (abs(testcolor[0][1] - testcolor[1][1]) < 4) && - (abs(testcolor[0][2] - testcolor[1][2]) < 8)) { - /* both colors are so close they might get encoded as the same 16bit values */ - GLubyte coldiffred, coldiffgreen, coldiffblue, coldiffmax, factor, ind0, ind1; - - coldiffred = abs(testcolor[0][0] - testcolor[1][0]); - coldiffgreen = 2 * abs(testcolor[0][1] - testcolor[1][1]); - coldiffblue = abs(testcolor[0][2] - testcolor[1][2]); - coldiffmax = coldiffred; - if (coldiffmax < coldiffgreen) coldiffmax = coldiffgreen; - if (coldiffmax < coldiffblue) coldiffmax = coldiffblue; - if (coldiffmax > 0) { - if (coldiffmax > 4) factor = 2; - else if (coldiffmax > 2) factor = 3; - else factor = 4; - /* Won't do much if the color value is near 255... */ - /* argh so many ifs */ - if (testcolor[1][1] >= testcolor[0][1]) { - ind1 = 1; ind0 = 0; - } - else { - ind1 = 0; ind0 = 1; - } - if ((testcolor[ind1][1] + factor * coldiffgreen) <= 255) - testcolor[ind1][1] += factor * coldiffgreen; - else testcolor[ind1][1] = 255; - if ((testcolor[ind1][0] - testcolor[ind0][1]) > 0) { - if ((testcolor[ind1][0] + factor * coldiffred) <= 255) - testcolor[ind1][0] += factor * coldiffred; - else testcolor[ind1][0] = 255; - } - else { - if ((testcolor[ind0][0] + factor * coldiffred) <= 255) - testcolor[ind0][0] += factor * coldiffred; - else testcolor[ind0][0] = 255; - } - if ((testcolor[ind1][2] - testcolor[ind0][2]) > 0) { - if ((testcolor[ind1][2] + factor * coldiffblue) <= 255) - testcolor[ind1][2] += factor * coldiffblue; - else testcolor[ind1][2] = 255; - } - else { - if ((testcolor[ind0][2] + factor * coldiffblue) <= 255) - testcolor[ind0][2] += factor * coldiffblue; - else testcolor[ind0][2] = 255; - } - } - } - - if (((testcolor[0][0] & 0xf8) << 8 | (testcolor[0][1] & 0xfc) << 3 | testcolor[0][2] >> 3) < - ((testcolor[1][0] & 0xf8) << 8 | (testcolor[1][1] & 0xfc) << 3 | testcolor[1][2]) >> 3) { - for (i = 0; i < 3; i++) { - bestcolor[0][i] = testcolor[0][i]; - bestcolor[1][i] = testcolor[1][i]; - } - } - else { - for (i = 0; i < 3; i++) { - bestcolor[0][i] = testcolor[1][i]; - bestcolor[1][i] = testcolor[0][i]; - } - } - -/* fprintf(stderr, "color end 0 r/g/b %d/%d/%d, 1 r/g/b %d/%d/%d\n", - bestcolor[0][0], bestcolor[0][1], bestcolor[0][2], bestcolor[1][0], bestcolor[1][1], bestcolor[1][2]);*/ -} - - - -static void storedxtencodedblock( GLubyte *blkaddr, GLubyte srccolors[4][4][4], GLubyte *bestcolor[2], - GLint numxpixels, GLint numypixels, GLuint type, GLboolean haveAlpha) -{ - /* use same luminance-weighted distance metric to determine encoding as for finding the base colors */ - - GLint i, j, colors; - GLuint testerror, testerror2, pixerror, pixerrorbest; - GLint colordist; - GLushort color0, color1, tempcolor; - GLuint bits = 0, bits2 = 0; - GLubyte *colorptr; - GLubyte enc = 0; - GLubyte cv[4][4]; - - bestcolor[0][0] = bestcolor[0][0] & 0xf8; - bestcolor[0][1] = bestcolor[0][1] & 0xfc; - bestcolor[0][2] = bestcolor[0][2] & 0xf8; - bestcolor[1][0] = bestcolor[1][0] & 0xf8; - bestcolor[1][1] = bestcolor[1][1] & 0xfc; - bestcolor[1][2] = bestcolor[1][2] & 0xf8; - - color0 = bestcolor[0][0] << 8 | bestcolor[0][1] << 3 | bestcolor[0][2] >> 3; - color1 = bestcolor[1][0] << 8 | bestcolor[1][1] << 3 | bestcolor[1][2] >> 3; - if (color0 < color1) { - tempcolor = color0; color0 = color1; color1 = tempcolor; - colorptr = bestcolor[0]; bestcolor[0] = bestcolor[1]; bestcolor[1] = colorptr; - } - - - for (i = 0; i < 3; i++) { - cv[0][i] = bestcolor[0][i]; - cv[1][i] = bestcolor[1][i]; - cv[2][i] = (bestcolor[0][i] * 2 + bestcolor[1][i]) / 3; - cv[3][i] = (bestcolor[0][i] + bestcolor[1][i] * 2) / 3; - } - - testerror = 0; - for (j = 0; j < numypixels; j++) { - for (i = 0; i < numxpixels; i++) { - pixerrorbest = 0xffffffff; - for (colors = 0; colors < 4; colors++) { - colordist = srccolors[j][i][0] - cv[colors][0]; - pixerror = colordist * colordist * REDWEIGHT; - colordist = srccolors[j][i][1] - cv[colors][1]; - pixerror += colordist * colordist * GREENWEIGHT; - colordist = srccolors[j][i][2] - cv[colors][2]; - pixerror += colordist * colordist * BLUEWEIGHT; - if (pixerror < pixerrorbest) { - pixerrorbest = pixerror; - enc = colors; - } - } - testerror += pixerrorbest; - bits |= enc << (2 * (j * 4 + i)); - } - } - /* some hw might disagree but actually decoding should always use 4-color encoding - for non-dxt1 formats */ - if (type == GL_COMPRESSED_RGB_S3TC_DXT1_EXT || type == GL_COMPRESSED_RGBA_S3TC_DXT1_EXT) { - for (i = 0; i < 3; i++) { - cv[2][i] = (bestcolor[0][i] + bestcolor[1][i]) / 2; - /* this isn't used. Looks like the black color constant can only be used - with RGB_DXT1 if I read the spec correctly (note though that the radeon gpu disagrees, - it will decode 3 to black even with DXT3/5), and due to how the color searching works - it won't get used even then */ - cv[3][i] = 0; - } - testerror2 = 0; - for (j = 0; j < numypixels; j++) { - for (i = 0; i < numxpixels; i++) { - pixerrorbest = 0xffffffff; - if ((type == GL_COMPRESSED_RGBA_S3TC_DXT1_EXT) && (srccolors[j][i][3] <= ALPHACUT)) { - enc = 3; - pixerrorbest = 0; /* don't calculate error */ - } - else { - /* we're calculating the same what we have done already for colors 0-1 above... */ - for (colors = 0; colors < 3; colors++) { - colordist = srccolors[j][i][0] - cv[colors][0]; - pixerror = colordist * colordist * REDWEIGHT; - colordist = srccolors[j][i][1] - cv[colors][1]; - pixerror += colordist * colordist * GREENWEIGHT; - colordist = srccolors[j][i][2] - cv[colors][2]; - pixerror += colordist * colordist * BLUEWEIGHT; - if (pixerror < pixerrorbest) { - pixerrorbest = pixerror; - /* need to exchange colors later */ - if (colors > 1) enc = colors; - else enc = colors ^ 1; - } - } - } - testerror2 += pixerrorbest; - bits2 |= enc << (2 * (j * 4 + i)); - } - } - } else { - testerror2 = 0xffffffff; - } - - /* finally we're finished, write back colors and bits */ - if ((testerror > testerror2) || (haveAlpha)) { - *blkaddr++ = color1 & 0xff; - *blkaddr++ = color1 >> 8; - *blkaddr++ = color0 & 0xff; - *blkaddr++ = color0 >> 8; - *blkaddr++ = bits2 & 0xff; - *blkaddr++ = ( bits2 >> 8) & 0xff; - *blkaddr++ = ( bits2 >> 16) & 0xff; - *blkaddr = bits2 >> 24; - } - else { - *blkaddr++ = color0 & 0xff; - *blkaddr++ = color0 >> 8; - *blkaddr++ = color1 & 0xff; - *blkaddr++ = color1 >> 8; - *blkaddr++ = bits & 0xff; - *blkaddr++ = ( bits >> 8) & 0xff; - *blkaddr++ = ( bits >> 16) & 0xff; - *blkaddr = bits >> 24; - } -} - -static void encodedxtcolorblockfaster( GLubyte *blkaddr, GLubyte srccolors[4][4][4], - GLint numxpixels, GLint numypixels, GLuint type ) -{ -/* simplistic approach. We need two base colors, simply use the "highest" and the "lowest" color - present in the picture as base colors */ - - /* define lowest and highest color as shortest and longest vector to 0/0/0, though the - vectors are weighted similar to their importance in rgb-luminance conversion - doesn't work too well though... - This seems to be a rather difficult problem */ - - GLubyte *bestcolor[2]; - GLubyte basecolors[2][3]; - GLubyte i, j; - GLuint lowcv, highcv, testcv; - GLboolean haveAlpha = GL_FALSE; - - lowcv = highcv = srccolors[0][0][0] * srccolors[0][0][0] * REDWEIGHT + - srccolors[0][0][1] * srccolors[0][0][1] * GREENWEIGHT + - srccolors[0][0][2] * srccolors[0][0][2] * BLUEWEIGHT; - bestcolor[0] = bestcolor[1] = srccolors[0][0]; - for (j = 0; j < numypixels; j++) { - for (i = 0; i < numxpixels; i++) { - /* don't use this as a base color if the pixel will get black/transparent anyway */ - if ((type != GL_COMPRESSED_RGBA_S3TC_DXT1_EXT) || (srccolors[j][i][3] > ALPHACUT)) { - testcv = srccolors[j][i][0] * srccolors[j][i][0] * REDWEIGHT + - srccolors[j][i][1] * srccolors[j][i][1] * GREENWEIGHT + - srccolors[j][i][2] * srccolors[j][i][2] * BLUEWEIGHT; - if (testcv > highcv) { - highcv = testcv; - bestcolor[1] = srccolors[j][i]; - } - else if (testcv < lowcv) { - lowcv = testcv; - bestcolor[0] = srccolors[j][i]; - } - } - else haveAlpha = GL_TRUE; - } - } - /* make sure the original color values won't get touched... */ - for (j = 0; j < 2; j++) { - for (i = 0; i < 3; i++) { - basecolors[j][i] = bestcolor[j][i]; - } - } - bestcolor[0] = basecolors[0]; - bestcolor[1] = basecolors[1]; - - /* try to find better base colors */ - fancybasecolorsearch(blkaddr, srccolors, bestcolor, numxpixels, numypixels, type, haveAlpha); - /* find the best encoding for these colors, and store the result */ - storedxtencodedblock(blkaddr, srccolors, bestcolor, numxpixels, numypixels, type, haveAlpha); -} - -static void writedxt5encodedalphablock( GLubyte *blkaddr, GLubyte alphabase1, GLubyte alphabase2, - GLubyte alphaenc[16]) -{ - *blkaddr++ = alphabase1; - *blkaddr++ = alphabase2; - *blkaddr++ = alphaenc[0] | (alphaenc[1] << 3) | ((alphaenc[2] & 3) << 6); - *blkaddr++ = (alphaenc[2] >> 2) | (alphaenc[3] << 1) | (alphaenc[4] << 4) | ((alphaenc[5] & 1) << 7); - *blkaddr++ = (alphaenc[5] >> 1) | (alphaenc[6] << 2) | (alphaenc[7] << 5); - *blkaddr++ = alphaenc[8] | (alphaenc[9] << 3) | ((alphaenc[10] & 3) << 6); - *blkaddr++ = (alphaenc[10] >> 2) | (alphaenc[11] << 1) | (alphaenc[12] << 4) | ((alphaenc[13] & 1) << 7); - *blkaddr++ = (alphaenc[13] >> 1) | (alphaenc[14] << 2) | (alphaenc[15] << 5); -} - -static void encodedxt5alpha(GLubyte *blkaddr, GLubyte srccolors[4][4][4], - GLint numxpixels, GLint numypixels) -{ - GLubyte alphabase[2], alphause[2]; - GLshort alphatest[2]; - GLuint alphablockerror1, alphablockerror2, alphablockerror3; - GLubyte i, j, aindex, acutValues[7]; - GLubyte alphaenc1[16], alphaenc2[16], alphaenc3[16]; - GLboolean alphaabsmin = GL_FALSE; - GLboolean alphaabsmax = GL_FALSE; - GLshort alphadist; - - /* find lowest and highest alpha value in block, alphabase[0] lowest, alphabase[1] highest */ - alphabase[0] = 0xff; alphabase[1] = 0x0; - for (j = 0; j < numypixels; j++) { - for (i = 0; i < numxpixels; i++) { - if (srccolors[j][i][3] == 0) - alphaabsmin = GL_TRUE; - else if (srccolors[j][i][3] == 255) - alphaabsmax = GL_TRUE; - else { - if (srccolors[j][i][3] > alphabase[1]) - alphabase[1] = srccolors[j][i][3]; - if (srccolors[j][i][3] < alphabase[0]) - alphabase[0] = srccolors[j][i][3]; - } - } - } - - - if ((alphabase[0] > alphabase[1]) && !(alphaabsmin && alphaabsmax)) { /* one color, either max or min */ - /* shortcut here since it is a very common case (and also avoids later problems) */ - /* || (alphabase[0] == alphabase[1] && !alphaabsmin && !alphaabsmax) */ - /* could also test for alpha0 == alpha1 (and not min/max), but probably not common, so don't bother */ - - *blkaddr++ = srccolors[0][0][3]; - blkaddr++; - *blkaddr++ = 0; - *blkaddr++ = 0; - *blkaddr++ = 0; - *blkaddr++ = 0; - *blkaddr++ = 0; - *blkaddr++ = 0; -/* fprintf(stderr, "enc0 used\n");*/ - return; - } - - /* find best encoding for alpha0 > alpha1 */ - /* it's possible this encoding is better even if both alphaabsmin and alphaabsmax are true */ - alphablockerror1 = 0x0; - alphablockerror2 = 0xffffffff; - alphablockerror3 = 0xffffffff; - if (alphaabsmin) alphause[0] = 0; - else alphause[0] = alphabase[0]; - if (alphaabsmax) alphause[1] = 255; - else alphause[1] = alphabase[1]; - /* calculate the 7 cut values, just the middle between 2 of the computed alpha values */ - for (aindex = 0; aindex < 7; aindex++) { - /* don't forget here is always rounded down */ - acutValues[aindex] = (alphause[0] * (2*aindex + 1) + alphause[1] * (14 - (2*aindex + 1))) / 14; - } - - for (j = 0; j < numypixels; j++) { - for (i = 0; i < numxpixels; i++) { - /* maybe it's overkill to have the most complicated calculation just for the error - calculation which we only need to figure out if encoding1 or encoding2 is better... */ - if (srccolors[j][i][3] > acutValues[0]) { - alphaenc1[4*j + i] = 0; - alphadist = srccolors[j][i][3] - alphause[1]; - } - else if (srccolors[j][i][3] > acutValues[1]) { - alphaenc1[4*j + i] = 2; - alphadist = srccolors[j][i][3] - (alphause[1] * 6 + alphause[0] * 1) / 7; - } - else if (srccolors[j][i][3] > acutValues[2]) { - alphaenc1[4*j + i] = 3; - alphadist = srccolors[j][i][3] - (alphause[1] * 5 + alphause[0] * 2) / 7; - } - else if (srccolors[j][i][3] > acutValues[3]) { - alphaenc1[4*j + i] = 4; - alphadist = srccolors[j][i][3] - (alphause[1] * 4 + alphause[0] * 3) / 7; - } - else if (srccolors[j][i][3] > acutValues[4]) { - alphaenc1[4*j + i] = 5; - alphadist = srccolors[j][i][3] - (alphause[1] * 3 + alphause[0] * 4) / 7; - } - else if (srccolors[j][i][3] > acutValues[5]) { - alphaenc1[4*j + i] = 6; - alphadist = srccolors[j][i][3] - (alphause[1] * 2 + alphause[0] * 5) / 7; - } - else if (srccolors[j][i][3] > acutValues[6]) { - alphaenc1[4*j + i] = 7; - alphadist = srccolors[j][i][3] - (alphause[1] * 1 + alphause[0] * 6) / 7; - } - else { - alphaenc1[4*j + i] = 1; - alphadist = srccolors[j][i][3] - alphause[0]; - } - alphablockerror1 += alphadist * alphadist; - } - } -/* for (i = 0; i < 16; i++) { - fprintf(stderr, "%d ", alphaenc1[i]); - } - fprintf(stderr, "cutVals "); - for (i = 0; i < 8; i++) { - fprintf(stderr, "%d ", acutValues[i]); - } - fprintf(stderr, "srcVals "); - for (j = 0; j < numypixels; j++) - for (i = 0; i < numxpixels; i++) { - fprintf(stderr, "%d ", srccolors[j][i][3]); - } - - fprintf(stderr, "\n"); - }*/ - /* it's not very likely this encoding is better if both alphaabsmin and alphaabsmax - are false but try it anyway */ - if (alphablockerror1 >= 32) { - - /* don't bother if encoding is already very good, this condition should also imply - we have valid alphabase colors which we absolutely need (alphabase[0] <= alphabase[1]) */ - alphablockerror2 = 0; - for (aindex = 0; aindex < 5; aindex++) { - /* don't forget here is always rounded down */ - acutValues[aindex] = (alphabase[0] * (10 - (2*aindex + 1)) + alphabase[1] * (2*aindex + 1)) / 10; - } - for (j = 0; j < numypixels; j++) { - for (i = 0; i < numxpixels; i++) { - /* maybe it's overkill to have the most complicated calculation just for the error - calculation which we only need to figure out if encoding1 or encoding2 is better... */ - if (srccolors[j][i][3] == 0) { - alphaenc2[4*j + i] = 6; - alphadist = 0; - } - else if (srccolors[j][i][3] == 255) { - alphaenc2[4*j + i] = 7; - alphadist = 0; - } - else if (srccolors[j][i][3] <= acutValues[0]) { - alphaenc2[4*j + i] = 0; - alphadist = srccolors[j][i][3] - alphabase[0]; - } - else if (srccolors[j][i][3] <= acutValues[1]) { - alphaenc2[4*j + i] = 2; - alphadist = srccolors[j][i][3] - (alphabase[0] * 4 + alphabase[1] * 1) / 5; - } - else if (srccolors[j][i][3] <= acutValues[2]) { - alphaenc2[4*j + i] = 3; - alphadist = srccolors[j][i][3] - (alphabase[0] * 3 + alphabase[1] * 2) / 5; - } - else if (srccolors[j][i][3] <= acutValues[3]) { - alphaenc2[4*j + i] = 4; - alphadist = srccolors[j][i][3] - (alphabase[0] * 2 + alphabase[1] * 3) / 5; - } - else if (srccolors[j][i][3] <= acutValues[4]) { - alphaenc2[4*j + i] = 5; - alphadist = srccolors[j][i][3] - (alphabase[0] * 1 + alphabase[1] * 4) / 5; - } - else { - alphaenc2[4*j + i] = 1; - alphadist = srccolors[j][i][3] - alphabase[1]; - } - alphablockerror2 += alphadist * alphadist; - } - } - - - /* skip this if the error is already very small - this encoding is MUCH better on average than #2 though, but expensive! */ - if ((alphablockerror2 > 96) && (alphablockerror1 > 96)) { - GLshort blockerrlin1 = 0; - GLshort blockerrlin2 = 0; - GLubyte nralphainrangelow = 0; - GLubyte nralphainrangehigh = 0; - alphatest[0] = 0xff; - alphatest[1] = 0x0; - /* if we have large range it's likely there are values close to 0/255, try to map them to 0/255 */ - for (j = 0; j < numypixels; j++) { - for (i = 0; i < numxpixels; i++) { - if ((srccolors[j][i][3] > alphatest[1]) && (srccolors[j][i][3] < (255 -(alphabase[1] - alphabase[0]) / 28))) - alphatest[1] = srccolors[j][i][3]; - if ((srccolors[j][i][3] < alphatest[0]) && (srccolors[j][i][3] > (alphabase[1] - alphabase[0]) / 28)) - alphatest[0] = srccolors[j][i][3]; - } - } - /* shouldn't happen too often, don't really care about those degenerated cases */ - if (alphatest[1] <= alphatest[0]) { - alphatest[0] = 1; - alphatest[1] = 254; -/* fprintf(stderr, "only 1 or 0 colors for encoding!\n");*/ - } - for (aindex = 0; aindex < 5; aindex++) { - /* don't forget here is always rounded down */ - acutValues[aindex] = (alphatest[0] * (10 - (2*aindex + 1)) + alphatest[1] * (2*aindex + 1)) / 10; - } - - /* find the "average" difference between the alpha values and the next encoded value. - This is then used to calculate new base values. - Should there be some weighting, i.e. those values closer to alphatest[x] have more weight, - since they will see more improvement, and also because the values in the middle are somewhat - likely to get no improvement at all (because the base values might move in different directions)? - OTOH it would mean the values in the middle are even less likely to get an improvement - */ - for (j = 0; j < numypixels; j++) { - for (i = 0; i < numxpixels; i++) { - if (srccolors[j][i][3] <= alphatest[0] / 2) { - } - else if (srccolors[j][i][3] > ((255 + alphatest[1]) / 2)) { - } - else if (srccolors[j][i][3] <= acutValues[0]) { - blockerrlin1 += (srccolors[j][i][3] - alphatest[0]); - nralphainrangelow += 1; - } - else if (srccolors[j][i][3] <= acutValues[1]) { - blockerrlin1 += (srccolors[j][i][3] - (alphatest[0] * 4 + alphatest[1] * 1) / 5); - blockerrlin2 += (srccolors[j][i][3] - (alphatest[0] * 4 + alphatest[1] * 1) / 5); - nralphainrangelow += 1; - nralphainrangehigh += 1; - } - else if (srccolors[j][i][3] <= acutValues[2]) { - blockerrlin1 += (srccolors[j][i][3] - (alphatest[0] * 3 + alphatest[1] * 2) / 5); - blockerrlin2 += (srccolors[j][i][3] - (alphatest[0] * 3 + alphatest[1] * 2) / 5); - nralphainrangelow += 1; - nralphainrangehigh += 1; - } - else if (srccolors[j][i][3] <= acutValues[3]) { - blockerrlin1 += (srccolors[j][i][3] - (alphatest[0] * 2 + alphatest[1] * 3) / 5); - blockerrlin2 += (srccolors[j][i][3] - (alphatest[0] * 2 + alphatest[1] * 3) / 5); - nralphainrangelow += 1; - nralphainrangehigh += 1; - } - else if (srccolors[j][i][3] <= acutValues[4]) { - blockerrlin1 += (srccolors[j][i][3] - (alphatest[0] * 1 + alphatest[1] * 4) / 5); - blockerrlin2 += (srccolors[j][i][3] - (alphatest[0] * 1 + alphatest[1] * 4) / 5); - nralphainrangelow += 1; - nralphainrangehigh += 1; - } - else { - blockerrlin2 += (srccolors[j][i][3] - alphatest[1]); - nralphainrangehigh += 1; - } - } - } - /* shouldn't happen often, needed to avoid div by zero */ - if (nralphainrangelow == 0) nralphainrangelow = 1; - if (nralphainrangehigh == 0) nralphainrangehigh = 1; - alphatest[0] = alphatest[0] + (blockerrlin1 / nralphainrangelow); -/* fprintf(stderr, "block err lin low %d, nr %d\n", blockerrlin1, nralphainrangelow); - fprintf(stderr, "block err lin high %d, nr %d\n", blockerrlin2, nralphainrangehigh);*/ - /* again shouldn't really happen often... */ - if (alphatest[0] < 0) { - alphatest[0] = 0; -/* fprintf(stderr, "adj alpha base val to 0\n");*/ - } - alphatest[1] = alphatest[1] + (blockerrlin2 / nralphainrangehigh); - if (alphatest[1] > 255) { - alphatest[1] = 255; -/* fprintf(stderr, "adj alpha base val to 255\n");*/ - } - - alphablockerror3 = 0; - for (aindex = 0; aindex < 5; aindex++) { - /* don't forget here is always rounded down */ - acutValues[aindex] = (alphatest[0] * (10 - (2*aindex + 1)) + alphatest[1] * (2*aindex + 1)) / 10; - } - for (j = 0; j < numypixels; j++) { - for (i = 0; i < numxpixels; i++) { - /* maybe it's overkill to have the most complicated calculation just for the error - calculation which we only need to figure out if encoding1 or encoding2 is better... */ - if (srccolors[j][i][3] <= alphatest[0] / 2) { - alphaenc3[4*j + i] = 6; - alphadist = srccolors[j][i][3]; - } - else if (srccolors[j][i][3] > ((255 + alphatest[1]) / 2)) { - alphaenc3[4*j + i] = 7; - alphadist = 255 - srccolors[j][i][3]; - } - else if (srccolors[j][i][3] <= acutValues[0]) { - alphaenc3[4*j + i] = 0; - alphadist = srccolors[j][i][3] - alphatest[0]; - } - else if (srccolors[j][i][3] <= acutValues[1]) { - alphaenc3[4*j + i] = 2; - alphadist = srccolors[j][i][3] - (alphatest[0] * 4 + alphatest[1] * 1) / 5; - } - else if (srccolors[j][i][3] <= acutValues[2]) { - alphaenc3[4*j + i] = 3; - alphadist = srccolors[j][i][3] - (alphatest[0] * 3 + alphatest[1] * 2) / 5; - } - else if (srccolors[j][i][3] <= acutValues[3]) { - alphaenc3[4*j + i] = 4; - alphadist = srccolors[j][i][3] - (alphatest[0] * 2 + alphatest[1] * 3) / 5; - } - else if (srccolors[j][i][3] <= acutValues[4]) { - alphaenc3[4*j + i] = 5; - alphadist = srccolors[j][i][3] - (alphatest[0] * 1 + alphatest[1] * 4) / 5; - } - else { - alphaenc3[4*j + i] = 1; - alphadist = srccolors[j][i][3] - alphatest[1]; - } - alphablockerror3 += alphadist * alphadist; - } - } - } - } - /* write the alpha values and encoding back. */ - if ((alphablockerror1 <= alphablockerror2) && (alphablockerror1 <= alphablockerror3)) { -/* if (alphablockerror1 > 96) fprintf(stderr, "enc1 used, error %d\n", alphablockerror1);*/ - writedxt5encodedalphablock( blkaddr, alphause[1], alphause[0], alphaenc1 ); - } - else if (alphablockerror2 <= alphablockerror3) { -/* if (alphablockerror2 > 96) fprintf(stderr, "enc2 used, error %d\n", alphablockerror2);*/ - writedxt5encodedalphablock( blkaddr, alphabase[0], alphabase[1], alphaenc2 ); - } - else { -/* fprintf(stderr, "enc3 used, error %d\n", alphablockerror3);*/ - writedxt5encodedalphablock( blkaddr, (GLubyte)alphatest[0], (GLubyte)alphatest[1], alphaenc3 ); - } -} - -static void extractsrccolors( GLubyte srcpixels[4][4][4], const GLchan *srcaddr, - GLint srcRowStride, GLint numxpixels, GLint numypixels, GLint comps) -{ - GLubyte i, j, c; - const GLchan *curaddr; - for (j = 0; j < numypixels; j++) { - curaddr = srcaddr + j * srcRowStride * comps; - for (i = 0; i < numxpixels; i++) { - for (c = 0; c < comps; c++) { - srcpixels[j][i][c] = *curaddr++ / (CHAN_MAX / 255); - } - } - } -} - - -void tx_compress_dxtn(GLint srccomps, GLint width, GLint height, const GLubyte *srcPixData, - GLenum destFormat, GLubyte *dest, GLint dstRowStride) -{ - GLubyte *blkaddr = dest; - GLubyte srcpixels[4][4][4]; - const GLchan *srcaddr = srcPixData; - GLint numxpixels, numypixels; - GLint i, j; - GLint dstRowDiff; - - switch (destFormat) { - case GL_COMPRESSED_RGB_S3TC_DXT1_EXT: - case GL_COMPRESSED_RGBA_S3TC_DXT1_EXT: - /* hmm we used to get called without dstRowStride... */ - dstRowDiff = dstRowStride >= (width * 2) ? dstRowStride - (((width + 3) & ~3) * 2) : 0; -/* fprintf(stderr, "dxt1 tex width %d tex height %d dstRowStride %d\n", - width, height, dstRowStride); */ - for (j = 0; j < height; j += 4) { - if (height > j + 3) numypixels = 4; - else numypixels = height - j; - srcaddr = srcPixData + j * width * srccomps; - for (i = 0; i < width; i += 4) { - if (width > i + 3) numxpixels = 4; - else numxpixels = width - i; - extractsrccolors(srcpixels, srcaddr, width, numxpixels, numypixels, srccomps); - encodedxtcolorblockfaster(blkaddr, srcpixels, numxpixels, numypixels, destFormat); - srcaddr += srccomps * numxpixels; - blkaddr += 8; - } - blkaddr += dstRowDiff; - } - break; - case GL_COMPRESSED_RGBA_S3TC_DXT3_EXT: - dstRowDiff = dstRowStride >= (width * 4) ? dstRowStride - (((width + 3) & ~3) * 4) : 0; -/* fprintf(stderr, "dxt3 tex width %d tex height %d dstRowStride %d\n", - width, height, dstRowStride); */ - for (j = 0; j < height; j += 4) { - if (height > j + 3) numypixels = 4; - else numypixels = height - j; - srcaddr = srcPixData + j * width * srccomps; - for (i = 0; i < width; i += 4) { - if (width > i + 3) numxpixels = 4; - else numxpixels = width - i; - extractsrccolors(srcpixels, srcaddr, width, numxpixels, numypixels, srccomps); - *blkaddr++ = (srcpixels[0][0][3] >> 4) | (srcpixels[0][1][3] & 0xf0); - *blkaddr++ = (srcpixels[0][2][3] >> 4) | (srcpixels[0][3][3] & 0xf0); - *blkaddr++ = (srcpixels[1][0][3] >> 4) | (srcpixels[1][1][3] & 0xf0); - *blkaddr++ = (srcpixels[1][2][3] >> 4) | (srcpixels[1][3][3] & 0xf0); - *blkaddr++ = (srcpixels[2][0][3] >> 4) | (srcpixels[2][1][3] & 0xf0); - *blkaddr++ = (srcpixels[2][2][3] >> 4) | (srcpixels[2][3][3] & 0xf0); - *blkaddr++ = (srcpixels[3][0][3] >> 4) | (srcpixels[3][1][3] & 0xf0); - *blkaddr++ = (srcpixels[3][2][3] >> 4) | (srcpixels[3][3][3] & 0xf0); - encodedxtcolorblockfaster(blkaddr, srcpixels, numxpixels, numypixels, destFormat); - srcaddr += srccomps * numxpixels; - blkaddr += 8; - } - blkaddr += dstRowDiff; - } - break; - case GL_COMPRESSED_RGBA_S3TC_DXT5_EXT: - dstRowDiff = dstRowStride >= (width * 4) ? dstRowStride - (((width + 3) & ~3) * 4) : 0; -/* fprintf(stderr, "dxt5 tex width %d tex height %d dstRowStride %d\n", - width, height, dstRowStride); */ - for (j = 0; j < height; j += 4) { - if (height > j + 3) numypixels = 4; - else numypixels = height - j; - srcaddr = srcPixData + j * width * srccomps; - for (i = 0; i < width; i += 4) { - if (width > i + 3) numxpixels = 4; - else numxpixels = width - i; - extractsrccolors(srcpixels, srcaddr, width, numxpixels, numypixels, srccomps); - encodedxt5alpha(blkaddr, srcpixels, numxpixels, numypixels); - encodedxtcolorblockfaster(blkaddr + 8, srcpixels, numxpixels, numypixels, destFormat); - srcaddr += srccomps * numxpixels; - blkaddr += 16; - } - blkaddr += dstRowDiff; - } - break; - default: - /* fprintf(stderr, "libdxtn: Bad dstFormat %d in tx_compress_dxtn\n", destFormat); */ - return; - } -} diff --git a/dlls/d3dx9_36/txc_dxtn.h b/dlls/d3dx9_36/txc_dxtn.h deleted file mode 100644 index 9bd59037a4b..00000000000 --- a/dlls/d3dx9_36/txc_dxtn.h +++ /dev/null @@ -1,43 +0,0 @@ -/* - * libtxc_dxtn - * Version: 1.0 - * - * Copyright (C) 2004 Roland Scheidegger All Rights Reserved. - * - * Permission is hereby granted, free of charge, to any person obtaining a - * copy of this software and associated documentation files (the "Software"), - * to deal in the Software without restriction, including without limitation - * the rights to use, copy, modify, merge, publish, distribute, sublicense, - * and/or sell copies of the Software, and to permit persons to whom the - * Software is furnished to do so, subject to the following conditions: - * - * The above copyright notice and this permission notice shall be included - * in all copies or substantial portions of the Software. - * - * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS - * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, - * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL - * BRIAN PAUL BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN - * AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN - * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. - */ - -#ifndef _TXC_DXTN_H -#define _TXC_DXTN_H - -#include "winternl.h" -#include "wine/wgl.h" - -typedef GLubyte GLchan; -#define UBYTE_TO_CHAN(b) (b) -#define CHAN_MAX 255 -#define RCOMP 0 -#define GCOMP 1 -#define BCOMP 2 -#define ACOMP 3 - -void tx_compress_dxtn(GLint srccomps, GLint width, GLint height, - const GLubyte *srcPixData, GLenum destformat, - GLubyte *dest, GLint dstRowStride); - -#endif /* _TXC_DXTN_H */ diff --git a/dlls/d3dx9_37/Makefile.in b/dlls/d3dx9_37/Makefile.in index 88bc6e43589..6b7da8e56a3 100644 --- a/dlls/d3dx9_37/Makefile.in +++ b/dlls/d3dx9_37/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_38/Makefile.in b/dlls/d3dx9_38/Makefile.in index 56cc69cf193..223b1165135 100644 --- a/dlls/d3dx9_38/Makefile.in +++ b/dlls/d3dx9_38/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_39/Makefile.in b/dlls/d3dx9_39/Makefile.in index b0b539fccf7..63cfbba0672 100644 --- a/dlls/d3dx9_39/Makefile.in +++ b/dlls/d3dx9_39/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_40/Makefile.in b/dlls/d3dx9_40/Makefile.in index 0c23fced19e..e76ef6de8a0 100644 --- a/dlls/d3dx9_40/Makefile.in +++ b/dlls/d3dx9_40/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_41/Makefile.in b/dlls/d3dx9_41/Makefile.in index 994f8e2e385..0951206c6eb 100644 --- a/dlls/d3dx9_41/Makefile.in +++ b/dlls/d3dx9_41/Makefile.in @@ -22,7 +22,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_42/Makefile.in b/dlls/d3dx9_42/Makefile.in index a96721a35b0..697fa194d51 100644 --- a/dlls/d3dx9_42/Makefile.in +++ b/dlls/d3dx9_42/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \ diff --git a/dlls/d3dx9_43/Makefile.in b/dlls/d3dx9_43/Makefile.in index 27122d33617..7257b021dcb 100644 --- a/dlls/d3dx9_43/Makefile.in +++ b/dlls/d3dx9_43/Makefile.in @@ -23,7 +23,6 @@ SOURCES = \ sprite.c \ surface.c \ texture.c \ - txc_compress_dxtn.c \ util.c \ version.rc \ volume.c \

-- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/8226

Matteo Bruni (＠Mystral)

12 Jun 12 Jun

1:55 p.m.

Matteo Bruni (@Mystral) commented about dlls/d3dx9_36/surface.c:

...

 }
}

+struct d3dx_bcn_decompression_context +{

void (*decompress_bcn_block)(const void *, void *, int);

Do we need to support BC6H / BC7 eventually?

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/8226#note_106338

Matteo Bruni (＠Mystral)

1:55 p.m.

Matteo Bruni (@Mystral) commented about dlls/d3dx9_36/surface.c:

...

}
TRACE("Compressing pixels.\n");
block_buf_row_pitch = src_desc->bytes_per_pixel * dst_desc->block_width;
for (z = 0; z < src_pixels->size.depth; ++z)
{

   const uint8_t *src_slice = &((const uint8_t *)src_pixels->data)[z * src_pixels->slice_pitch];

   uint8_t *dst_slice = &((uint8_t *)dst_pixels->data)[z * dst_pixels->slice_pitch];

   for (y = 0; y < src_pixels->size.height; y += dst_desc->block_height)

```
   {
```

       const unsigned int tmp_src_height = min(dst_desc->block_height, src_pixels->size.height - y);

       uint8_t *dst_ptr = &dst_slice[(y / dst_desc->block_height) * dst_pixels->row_pitch];

       const uint8_t *src_ptr = &src_slice[y * src_pixels->row_pitch];

       for (x = 0; x < src_pixels->size.width; x += dst_desc->block_width)

Okay, something like this is what I meant WRT the previous patch :grin:

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/8226#note_106340

Matteo Bruni (＠Mystral)

1:55 p.m.

Matteo Bruni (@Mystral) commented about dlls/d3dx9_36/surface.c:

...

       /* STB doesn't do DXT2/DXT3, we'll do the alpha part ourselves. */

```
       for (y = 0; y < 4; ++y)
```
```
       {
```

           uint8_t *tmp_row = &block_buf[y * 4 * 4];

           dst_data_offset[0]  = (tmp_row[7] & 0xf0);

           dst_data_offset[0] |= (tmp_row[3] >> 4);

           dst_data_offset[1]  = (tmp_row[15] & 0xf0);

           dst_data_offset[1] |= (tmp_row[11] >> 4);

```
           /*
```

            * Set all alpha values to 0xff so they aren't considered during

```
            * compression.
```
```
            */
```

           tmp_row[3] = tmp_row[7] = tmp_row[11] = tmp_row[15] = 0xff;

Is it okay to modify the source data here? I think at least a comment on top of the function might be in order.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/8226#note_106341

Matteo Bruni (＠Mystral)

1:55 p.m.

Matteo Bruni (@Mystral) commented about dlls/d3dx9_36/surface.c:

...

             const POINT pt = { x, y };

             if (!is_dst)

               fetch_dxt_texel(tmp_pitch, slice_data, x + pixels->unaligned_rect.left,

                       y + pixels->unaligned_rect.top, ptr);

               d3dx_fetch_bcn_texel(&context, x + pixels->unaligned_rect.left, y + pixels->unaligned_rect.top, z, ptr);

I think it's probably better to rework `d3dx_pixels_decompress()` to make use of the bcdec API more naturally, instead of essentially introducing an adapter for bcdec offering the old txc_dxtn interface. I haven't thought it through by any means but I imagine we could iterate block by block, decompressing directly into `uncompressed_mem` if it fits, or into a temporary buffer if we only need a part (and then copy the subset we want).

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/8226#note_106339

Connor McAdams (＠cmcadams)

2:06 p.m.

On Thu Jun 12 13:55:16 2025 +0000, Matteo Bruni wrote:

...

Do we need to support BC6H / BC7 eventually?

Yep, d3dx11 supports those formats.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/8226#note_106346

Connor McAdams (＠cmcadams)

2:10 p.m.

On Thu Jun 12 13:55:17 2025 +0000, Matteo Bruni wrote:

...

I think it's probably better to rework `d3dx_pixels_decompress()` to make use of the bcdec API more naturally, instead of essentially introducing an adapter for bcdec offering the old txc_dxtn interface. I haven't thought it through by any means but I imagine we could iterate block by block, decompressing directly into `uncompressed_mem` if it fits, or into a temporary buffer if we only need a part (and then copy the subset we want).

Yes, it's possible to do this, I was just trying to keep things simple :) The code gets a bit messier because of the need to classify whether a particular block should be fully decoded, partially decoded, or skipped all together. It should be more performant though.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/8226#note_106347

150

Age (days ago)

156

Last active (days ago)

wine-gitlab@winehq.org

8 comments

3 participants

tags (0)

participants (3)

Connor McAdams
Connor McAdams (＠cmcadams)
Matteo Bruni (＠Mystral)