On Fri Jan 5 19:28:26 2024 +0000, Giovanni Mascellani wrote:
I haven't read the code yet, but I'm not yet sold on the idea of reimplementing our own serialization format. AFAIU getting a database right (correct, stable, performant, etc) is quite tricky, and since there are already a lot of time-tested alternatives around I'd like to have a discussion about why cooking our own is our best way forward. We've already had some discussion internally, but maybe it's a good idea to also have it here and possibly go a bit deeper.
I started with the same idea, for the same reasons you mention. An on-disk file is a potential attack vector, so we need to tread carefully, and I didn't put a lot of validation into the serialization I wrote there.
What I investigated in pre-existing libraries:
LevelDB: The compiled binary is about 4 times the size of vkd3d. The last commit was in April 2023.
RocksDB: A leveldb fork, 10 times the size of vkd3d, takes about 30 minutes to build on my system
Fossilize: A library by valve that is pretty close to what we need. See below.
berkeley db, gdbm, etc: Afaiu a copy of those hangs around on every Unix system, but not inside Win32. They are either unmaintained or have incompatible licenses.
memcached, and a few others from web-related environments: Client-server architectures, even more overkill than RocksDB, although potentially smaller.
The one realistic choice is Mesa's C-only reimplementation of the fossilize serializer. It looks reasonably small at first size, but it is dependent on Mesa's hash table. I spent a few days trying to make sense of it, but failed. It mixes various hash formats (truncate_hash_to_64bits, and somewhere I saw 32 bit hashes too).
Afaics the foz backend is not the default backend in mesa (the default one populates a directory with thousands of files), so I am not sure how much testing it gets. The populate-a-directory approach is feasible for a cache on the user's machine (if you assume a post-FAT32 file system), but makes shipping a prepared cache awkward.
After some time of staring at the mesa code I needed a feeling of progress and decided to roll my own 500 lines of code. Is it NIH syndrome? Certainly. But copypasting Mesa code I don't understand and hoping it does a good job may or may not be better.