dbghelp performance

List overview All Threads

newer

older

Re: Console output with different...

Re: Initial Mixer support on Mac...

Markus Amsler

2 May 2007 2 May '07

12:29 a.m.

I've played around with dbghelp performance. My test case was breaking at an unknown symbol (break gaga) while WoW was loaded in the debugger (wine winedbg WoW.exe). The time was hand stopped, memory usage measured with ps -AF and looked at the RSS column.

Test Time(s) Memory Usage(MB) current git 4.5 54 pool_heap.patch 4.5 63 process_heap.patch 4.5 126 insert_first.patch 4.5 54 current git, r300 115 146 pool_heap.patch, r300 17 119 process_heap.patch, r300 17 260 insert_first.patch, r300 27 167

insert_first is the patch from Eric Pouech. r300 means with the debug version of Mesas r300_dri.so, which has a total compilation unit size of around 9.2M (compared to the second biggest Wines user32 with 1.1M).

Conclusions: - current git wins with small debug files (<2M or so), pool_heap wins with bigger files. insert_first, process_heap are out. - small pools have less memory overhead than small heaps - big pools have more memory overhead than big heaps. - big pools are a lot slower than big heaps.

IMO the best results would give removing the pools (like in process_heap) and freeing unused memory manually, the other way round it was allocated. But at a first glance it looks like quite a bit of work, which I'm not sure is worth the result. I think the best approach would be to code some destroy functions in storage.c which would free the allocated vector, sparse_array and hash_table memory. And then gradually replace pool_alloc calls with HeapAlloc/HeapFree pairs.

Markus

Attachments:

pool_heap.patch (text/x-diff — 4.2 KB)
process_heap.patch (text/x-diff — 3.0 KB)
insert_first.patch (text/x-diff — 1.3 KB)

Show replies by date

Eric Pouech

2 May 2 May

7:48 p.m.

Markus Amsler a écrit :

...

I've played around with dbghelp performance. My test case was breaking at an unknown symbol (break gaga) while WoW was loaded in the debugger (wine winedbg WoW.exe). The time was hand stopped, memory usage measured with ps -AF and looked at the RSS column.

Test Time(s) Memory Usage(MB) current git 4.5 54 pool_heap.patch 4.5 63 process_heap.patch 4.5 126 insert_first.patch 4.5 54 current git, r300 115 146 pool_heap.patch, r300 17 119 process_heap.patch, r300 17 260 insert_first.patch, r300 27 167

insert_first is the patch from Eric Pouech. r300 means with the debug version of Mesas r300_dri.so, which has a total compilation unit size of around 9.2M (compared to the second biggest Wines user32 with 1.1M).

Conclusions:

current git wins with small debug files (<2M or so), pool_heap wins

with bigger files. insert_first, process_heap are out.

small pools have less memory overhead than small heaps

big pools have more memory overhead than big heaps.

big pools are a lot slower than big heaps.

thanks for the testings & timings !

you're also missing a couple of elements: - for the memory overhead, in the first case you consider 50 MB (roughly) over 10 or 20 modules while in your r300 case the impact (and memory difference) is only on a single module - time to unload a module hasn't been computed (it's less used than loading a module)

what's also strange is that the pool_heap gets lower memory consumption than the process heap one, which is rather not a natural result... I wonder if some data haven't been swapped out and aren't accounted for in RSS A+

-- Eric Pouech "The problem with designing something completely foolproof is to underestimate the ingenuity of a complete idiot." (Douglas Adams)

Markus Amsler

11:44 p.m.

Eric Pouech wrote:

...

Markus Amsler a écrit :

...
I've played around with dbghelp performance. My test case was breaking at an unknown symbol (break gaga) while WoW was loaded in the debugger (wine winedbg WoW.exe). The time was hand stopped, memory usage measured with ps -AF and looked at the RSS column.

Test Time(s) Memory Usage(MB) current git 4.5 54 pool_heap.patch 4.5 63 process_heap.patch 4.5 126 insert_first.patch 4.5 54 current git, r300 115 146 pool_heap.patch, r300 17 119 process_heap.patch, r300 17 260 insert_first.patch, r300 27 167

insert_first is the patch from Eric Pouech. r300 means with the debug version of Mesas r300_dri.so, which has a total compilation unit size of around 9.2M (compared to the second biggest Wines user32 with 1.1M).

Conclusions:

current git wins with small debug files (<2M or so), pool_heap wins

with bigger files. insert_first, process_heap are out.

small pools have less memory overhead than small heaps

big pools have more memory overhead than big heaps.

big pools are a lot slower than big heaps.

thanks for the testings & timings !

you're also missing a couple of elements:

for the memory overhead, in the first case you consider 50 MB

(roughly) over 10 or 20 modules while in your r300 case the impact (and memory difference) is only on a single module

I'm not sure what's your point is.

...

time to unload a module hasn't been computed (it's less used than

loading a module)

Unloading is more or less instant in all cases.

...

what's also strange is that the pool_heap gets lower memory consumption than the process heap one, which is rather not a natural result... I wonder if some data haven't been swapped out and aren't accounted for in RSS

The process_heap is the one I sent to wine-patches, which never frees any memory. I've also tested an improved process_heap, which stores the allocated memory pointer in an array and frees it afterwards. Without luck, it's slower and uses more memory the pool_heap.

So I don't see a simple solution which only affects storage.c, is equal or better than the current, and is significantly faster at big debug files. Any ideas?

Markus

Eric Pouech

5 May 5 May

12:46 p.m.

Markus Amsler a écrit :

...

Eric Pouech wrote:

...
Markus Amsler a écrit :

...
I've played around with dbghelp performance. My test case was breaking at an unknown symbol (break gaga) while WoW was loaded in the debugger (wine winedbg WoW.exe). The time was hand stopped, memory usage measured with ps -AF and looked at the RSS column.

Test Time(s) Memory Usage(MB) current git 4.5 54 pool_heap.patch 4.5 63 process_heap.patch 4.5 126 insert_first.patch 4.5 54 current git, r300 115 146 pool_heap.patch, r300 17 119 process_heap.patch, r300 17 260 insert_first.patch, r300 27 167

insert_first is the patch from Eric Pouech. r300 means with the debug version of Mesas r300_dri.so, which has a total compilation unit size of around 9.2M (compared to the second biggest Wines user32 with 1.1M).

Conclusions:

current git wins with small debug files (<2M or so), pool_heap

wins with bigger files. insert_first, process_heap are out.

small pools have less memory overhead than small heaps

big pools have more memory overhead than big heaps.

big pools are a lot slower than big heaps.

thanks for the testings & timings !

you're also missing a couple of elements:

for the memory overhead, in the first case you consider 50 MB

(roughly) over 10 or 20 modules while in your r300 case the impact (and memory difference) is only on a single module

I'm not sure what's your point is.

...

time to unload a module hasn't been computed (it's less used than

loading a module)

Unloading is more or less instant in all cases.

...
what's also strange is that the pool_heap gets lower memory consumption than the process heap one, which is rather not a natural result... I wonder if some data haven't been swapped out and aren't accounted for in RSS

The process_heap is the one I sent to wine-patches, which never frees any memory. I've also tested an improved process_heap, which stores the allocated memory pointer in an array and frees it afterwards. Without luck, it's slower and uses more memory the pool_heap.

So I don't see a simple solution which only affects storage.c, is equal or better than the current, and is significantly faster at big debug files. Any ideas?

Markus

Hi Markus, does the slightly modified version of pool_heap improve your performances (it shouldn't modify the perf for large files(or just a bit), but should reduce memory consumption for small pools (from 1 to 2M depending on your configuration)

A+

-- Eric Pouech "The problem with designing something completely foolproof is to underestimate the ingenuity of a complete idiot." (Douglas Adams) diff --git a/dlls/dbghelp/dbghelp_private.h b/dlls/dbghelp/dbghelp_private.h index a7f3235..5f8d1c4 100644 --- a/dlls/dbghelp/dbghelp_private.h +++ b/dlls/dbghelp/dbghelp_private.h @@ -35,13 +35,12 @@ /* #define USE_STATS */ -struct pool /* poor's man */ +struct pool { - struct pool_arena* first; - unsigned arena_size; + HANDLE heap; }; -void pool_init(struct pool* a, unsigned arena_size); +void pool_init(struct pool* a); void pool_destroy(struct pool* a); void* pool_alloc(struct pool* a, unsigned len); /* void* pool_realloc(struct pool* a, void* p, diff --git a/dlls/dbghelp/dwarf.c b/dlls/dbghelp/dwarf.c index bf8435f..1364834 100644 --- a/dlls/dbghelp/dwarf.c +++ b/dlls/dbghelp/dwarf.c @@ -1960,7 +1960,7 @@ static BOOL dwarf2_parse_compilation_unit(const dwarf2_section_t* sections, return FALSE; } - pool_init(&ctx.pool, 65536); + pool_init(&ctx.pool); ctx.sections = sections; ctx.section = section_debug; ctx.module = module; diff --git a/dlls/dbghelp/elf_module.c b/dlls/dbghelp/elf_module.c index b844e59..75d0821 100644 --- a/dlls/dbghelp/elf_module.c +++ b/dlls/dbghelp/elf_module.c @@ -1131,7 +1131,7 @@ BOOL elf_load_debug_info(struct module* module, struct elf_file_map* fmap) return FALSE; } - pool_init(&pool, 65536); + pool_init(&pool); hash_table_init(&pool, &ht_symtab, 256); if (!fmap) diff --git a/dlls/dbghelp/module.c b/dlls/dbghelp/module.c index 3093c58..adbdce0 100644 --- a/dlls/dbghelp/module.c +++ b/dlls/dbghelp/module.c @@ -138,7 +138,7 @@ struct module* module_new(struct process* pcs, const WCHAR* name, get_module_type(type, virtual), mod_addr, mod_addr + size, debugstr_w(name)); - pool_init(&module->pool, 65536); + pool_init(&module->pool); module->module.SizeOfStruct = sizeof(module->module); module->module.BaseOfImage = mod_addr; diff --git a/dlls/dbghelp/storage.c b/dlls/dbghelp/storage.c index e196143..6ccac54 100644 --- a/dlls/dbghelp/storage.c +++ b/dlls/dbghelp/storage.c @@ -38,94 +38,25 @@ struct pool_arena char* current; }; -void pool_init(struct pool* a, unsigned arena_size) +void pool_init(struct pool* pool) { - a->arena_size = arena_size; - a->first = NULL; + pool->heap = HeapCreate(HEAP_NO_SERIALIZE, 0, 0); } void pool_destroy(struct pool* pool) { - struct pool_arena* arena; - struct pool_arena* next; - -#ifdef USE_STATS - unsigned alloc, used, num; - - for (alloc = used = num = 0, arena = pool->first; arena; arena = arena->next) - { - alloc += pool->arena_size; - used += arena->current - (char*)arena; - num++; - } - FIXME("STATS: pool %p has allocated %u kbytes, used %u kbytes in %u arenas,\n" - "\t\t\t\tnon-allocation ratio: %.2f%%\n", - pool, alloc >> 10, used >> 10, num, 100.0 - (float)used / (float)alloc * 100.0); -#endif - - for (arena = pool->first; arena; arena = next) - { - next = arena->next; - HeapFree(GetProcessHeap(), 0, arena); - } - pool_init(pool, 0); + HeapDestroy(pool->heap); + pool->heap = NULL; } void* pool_alloc(struct pool* pool, unsigned len) { - struct pool_arena** parena; - struct pool_arena* arena; - void* ret; - - len = (len + 3) & ~3; /* round up size on DWORD boundary */ - assert(sizeof(struct pool_arena) + len <= pool->arena_size && len); - - for (parena = &pool->first; *parena; parena = &(*parena)->next) - { - if ((char*)(*parena) + pool->arena_size - (*parena)->current >= len) - { - ret = (*parena)->current; - (*parena)->current += len; - return ret; - } - } - - arena = HeapAlloc(GetProcessHeap(), 0, pool->arena_size); - if (!arena) {FIXME("OOM\n");return NULL;} - - *parena = arena; - - ret = (char*)arena + sizeof(*arena); - arena->next = NULL; - arena->current = (char*)ret + len; - return ret; -} - -static struct pool_arena* pool_is_last(const struct pool* pool, const void* p, unsigned old_size) -{ - struct pool_arena* arena; - - for (arena = pool->first; arena; arena = arena->next) - { - if (arena->current == (const char*)p + old_size) return arena; - } - return NULL; + return HeapAlloc(pool->heap, 0, len); } -static void* pool_realloc(struct pool* pool, void* p, unsigned old_size, unsigned new_size) +static void* pool_realloc(struct pool* pool, void* p, unsigned new_size) { - struct pool_arena* arena; - void* new; - - if ((arena = pool_is_last(pool, p, old_size)) && - (char*)p + new_size <= (char*)arena + pool->arena_size) - { - arena->current = (char*)p + new_size; - return p; - } - if ((new = pool_alloc(pool, new_size)) && old_size) - memcpy(new, p, min(old_size, new_size)); - return new; + return p ? HeapReAlloc(pool->heap, 0, p, new_size) : HeapAlloc(pool->heap, 0, new_size); } char* pool_strdup(struct pool* pool, const char* str) @@ -181,7 +112,6 @@ void* vector_add(struct vector* v, struct pool* pool) if (ncurr == (v->num_buckets << v->shift)) { v->buckets = pool_realloc(pool, v->buckets, - v->num_buckets * sizeof(void*), (v->num_buckets + 1) * sizeof(void*)); v->buckets[v->num_buckets] = pool_alloc(pool, v->elt_size << v->shift); return v->buckets[v->num_buckets++];

Markus Amsler

6 May 6 May

12:44 p.m.

Eric Pouech wrote:

...

Markus Amsler a écrit :

...
Eric Pouech wrote:

...
Markus Amsler a écrit :

...
I've played around with dbghelp performance. My test case was breaking at an unknown symbol (break gaga) while WoW was loaded in the debugger (wine winedbg WoW.exe). The time was hand stopped, memory usage measured with ps -AF and looked at the RSS column.

Test Time(s) Memory Usage(MB) current git 4.5 54 pool_heap.patch 4.5 63 process_heap.patch 4.5 126 insert_first.patch 4.5 54 current git, r300 115 146 pool_heap.patch, r300 17 119 process_heap.patch, r300 17 260 insert_first.patch, r300 27 167

insert_first is the patch from Eric Pouech. r300 means with the debug version of Mesas r300_dri.so, which has a total compilation unit size of around 9.2M (compared to the second biggest Wines user32 with 1.1M).

Hi Markus, does the slightly modified version of pool_heap improve your performances (it shouldn't modify the perf for large files(or just a bit), but should reduce memory consumption for small pools (from 1 to 2M depending on your configuration)

A+

No, performance is exactly the same as pool_heap :( . I analyzed why your original insert_first version was slower and memory hungrier then pool_heap. It turned out pool_realloc is the problem, not pool_alloc. First there's a memory leak, if the memory is moved the old one is not freed. Second pool_realloc is O(n) that's the reason for the speed hits. Directly using heap functions for reallocs solves both problems (but looks to hackish to get commited, perhaps you have a better idea).

Here the results for pool_realloc on top of insert_first pool_realloc 4.5s 54M pool_realloc,r300 17s 104M

The next problem is vector_iter_[up|down], because vector_position is O(n). Explicitly storing the current iter position speeds r300 up to 8s (from original 115s)! But I'm not sure how to implement it cleanly. Directly use for() instead of vector_iter_*(), use an iterator, ...

Markus

Eric Pouech

2:40 p.m.

Markus Amsler a écrit :

...

Eric Pouech wrote:

...
Markus Amsler a écrit :

...
Eric Pouech wrote:

...
Markus Amsler a écrit :

...
I've played around with dbghelp performance. My test case was breaking at an unknown symbol (break gaga) while WoW was loaded in the debugger (wine winedbg WoW.exe). The time was hand stopped, memory usage measured with ps -AF and looked at the RSS column.

Test Time(s) Memory Usage(MB) current git 4.5 54 pool_heap.patch 4.5 63 process_heap.patch 4.5 126 insert_first.patch 4.5 54 current git, r300 115 146 pool_heap.patch, r300 17 119 process_heap.patch, r300 17 260 insert_first.patch, r300 27 167

insert_first is the patch from Eric Pouech. r300 means with the debug version of Mesas r300_dri.so, which has a total compilation unit size of around 9.2M (compared to the second biggest Wines user32 with 1.1M).

Hi Markus, does the slightly modified version of pool_heap improve your performances (it shouldn't modify the perf for large files(or just a bit), but should reduce memory consumption for small pools (from 1 to 2M depending on your configuration)

A+

No, performance is exactly the same as pool_heap :( .

even for memory consumption ???

...

I analyzed why your original insert_first version was slower and memory hungrier then pool_heap. It turned out pool_realloc is the problem, not pool_alloc. First there's a memory leak, if the memory is moved the old one is not freed. Second pool_realloc is O(n) that's the reason for the speed hits. Directly using heap functions for reallocs solves both problems (but looks to hackish to get commited, perhaps you have a better idea).

we could try not to realloc the array of arrays but rather use a tree of arrays which should solve most of the issues, but that would make the code complicated another way is to double the size of the bucket each time we need to increase size (instead of adding one bucket)

...

Here the results for pool_realloc on top of insert_first pool_realloc 4.5s 54M pool_realloc,r300 17s 104M The next problem is vector_iter_[up|down], because vector_position is O(n). Explicitly storing the current iter position speeds r300 up to 8s (from original 115s)! But I'm not sure how to implement it cleanly. Directly use for() instead of vector_iter_*(), use an iterator, ...

likely use an interator which keeps track of current position (as we do for the hash tables) A+

-- Eric Pouech "The problem with designing something completely foolproof is to underestimate the ingenuity of a complete idiot." (Douglas Adams)

Markus Amsler

4:12 p.m.

Eric Pouech schrieb:

...

Markus Amsler a écrit :

...
No, performance is exactly the same as pool_heap :( .

even for memory consumption ???

Yes, it looks like HeapCreate has a default size of 64k.

...

...
I analyzed why your original insert_first version was slower and memory hungrier then pool_heap. It turned out pool_realloc is the problem, not pool_alloc. First there's a memory leak, if the memory is moved the old one is not freed. Second pool_realloc is O(n) that's the reason for the speed hits. Directly using heap functions for reallocs solves both problems (but looks to hackish to get commited, perhaps you have a better idea).

we could try not to realloc the array of arrays but rather use a tree of arrays which should solve most of the issues, but that would make the code complicated another way is to double the size of the bucket each time we need to increase size (instead of adding one bucket)

I'll have a look at duplicating bucket size.

...

...
Here the results for pool_realloc on top of insert_first pool_realloc 4.5s 54M pool_realloc,r300 17s 104M The next problem is vector_iter_[up|down], because vector_position is O(n). Explicitly storing the current iter position speeds r300 up to 8s (from original 115s)! But I'm not sure how to implement it cleanly. Directly use for() instead of vector_iter_*(), use an iterator, ...

likely use an interator which keeps track of current position (as we do for the hash tables)

Iterator for an vector looks a bit like an overkill, I was in favor of for(i=0; i<vector_length(); i++). Either solution will add some code on the caller side.

Markus

6645

Age (days ago)

6649

Last active (days ago)

wine-devel@winehq.org

6 comments

2 participants

tags (0)

participants (2)

Eric Pouech
Markus Amsler