On 10 August 2016 at 16:01, Jacek Caban jacek@codeweavers.com wrote:
include/wine/rbtree.h | 376 +++++++++++++++++++++++++++----------------------- 1 file changed, 200 insertions(+), 176 deletions(-)
Since this patch was assigned to me...
I think this patch has quite a number of unrelated changes, to the point that the patch touches over half of the lines in the file and I think deletion for example ends up just being a different algorithm. And while the aforementioned means I didn't look at everything in quite the amount of detail that would otherwise be appropriate, I do think some of those changes make things overly complicated. For example, a straightforward modification to wine_rb_put() would turn it into something like the following:
static inline int wine_rb_put(struct wine_rb_tree *tree, const void *key, struct wine_rb_entry *entry) { struct wine_rb_entry **node = &tree->root; struct wine_rb_entry *parent = *node;
while (*node) { int c;
if (!(c = tree->functions->compare(key, *node))) return -1; parent = *node; if (c < 0) node = &(*node)->left; else node = &(*node)->right; }
entry->flags = WINE_RB_FLAG_RED; entry->parent = parent; entry->left = NULL; entry->right = NULL; *node = entry;
wine_rb_fixup(&tree->root, entry); tree->root->flags &= ~WINE_RB_FLAG_RED;
return 0; }
I think that's much clearer than what this patch does with wine_rb_put().
Hi Henri,
On 8/11/16 2:50 PM, Henri Verbeet wrote:
On 10 August 2016 at 16:01, Jacek Caban jacek@codeweavers.com wrote:
include/wine/rbtree.h | 376 +++++++++++++++++++++++++++----------------------- 1 file changed, 200 insertions(+), 176 deletions(-)
Since this patch was assigned to me...
I think this patch has quite a number of unrelated changes,
Implementation details that I change in helper functions affect other parts in such a way, that one patch felt appropriate. Anyway, I sent a split version (at the cost of temporary arguments).
to the point that the patch touches over half of the lines in the file
Yes, that's more or less a rewrite, so it's not surprising.
and I think deletion for example ends up just being a different algorithm.
Yes, that's intentional and there are reasons for that. The algorithm that I used (see implementation from Introduction to Algorithms book for reference) needs only bottom-up traversal once you have the node, which is perfect for following parent chain. Instead of doing rotations on way down and up, we do them just when going up. And, although asymptotically it's the same complexity, it usually needs fewer steps since it stops at the first red parent. Further, the traversal down can be a simple wine_rb_get call (and I intend to change it further so that wine_rb_remove takes wine_rb_entry as an argument, but that's something for future discussion). All that wasn't possible when we had to record the path to root on the stack, but when I had to make significant changes to the function anyway, I think that's a good moment to do full rewrite and take those little advantages.
And while the aforementioned means I didn't look at everything in quite the amount of detail that would otherwise be appropriate, I do think some of those changes make things overly complicated. For example, a straightforward modification to wine_rb_put() would turn it into something like the following:
static inline int wine_rb_put(struct wine_rb_tree *tree, const
void *key, struct wine_rb_entry *entry) { struct wine_rb_entry **node = &tree->root; struct wine_rb_entry *parent = *node;
while (*node) { int c; if (!(c = tree->functions->compare(key, *node))) return -1; parent = *node; if (c < 0) node = &(*node)->left; else node = &(*node)->right; } entry->flags = WINE_RB_FLAG_RED; entry->parent = parent; entry->left = NULL; entry->right = NULL; *node = entry; wine_rb_fixup(&tree->root, entry); tree->root->flags &= ~WINE_RB_FLAG_RED; return 0; }
I think that's much clearer than what this patch does with wine_rb_put().
Well, I don't see why one or another is cleaner, but it indeed slightly reduces the diff, so I changed that.
Thanks, Jacek
On 15 August 2016 at 21:57, Jacek Caban jacek@codeweavers.com wrote:
Yes, that's intentional and there are reasons for that. The algorithm that I used (see implementation from Introduction to Algorithms book for reference) needs only bottom-up traversal once you have the node, which is perfect for following parent chain. Instead of doing rotations on way down and up, we do them just when going up. And, although asymptotically it's the same complexity, it usually needs fewer steps since it stops at the first red parent. Further, the traversal down can be a simple wine_rb_get call (and I intend to change it further so that wine_rb_remove takes wine_rb_entry as an argument, but that's something for future discussion). All that wasn't possible when we had to record the path to root on the stack, but when I had to make significant changes to the function anyway, I think that's a good moment to do full rewrite and take those little advantages.
Well, the original code was written the way it was because of simplicity more than anything else. The part that's performance critical is wine_rb_get(), but that depends largely on the performance of the compare function. Inserts and removals shouldn't be slow of course, but I think it's reasonable to balance performance against simplicity there. If in practice there are concrete performance advantages to doing it a different way we could of course reconsider that.
One of the things I don't like about these patches is that I think full-blown parent pointers make it harder to reason about the code. Similarly, I don't think the amount of branching and fiddling with flags in e.g. wine_rb_remove() is an improvement. I realise those are pretty subjective considerations, and the patches do work as intended in my tests. Any performance differences I found seemed to be within the margin of error. Nevertheless, would the attached patch work for you?
Hi Henri,
On 8/23/16 5:54 PM, Henri Verbeet wrote:
On 15 August 2016 at 21:57, Jacek Caban jacek@codeweavers.com wrote: Well, the original code was written the way it was because of simplicity more than anything else. The part that's performance critical is wine_rb_get(), but that depends largely on the performance of the compare function. Inserts and removals shouldn't be slow of course, but I think it's reasonable to balance performance against simplicity there. If in practice there are concrete performance advantages to doing it a different way we could of course reconsider that.
Sure, my patches address mostly friendly and more flexible API (more about that bellow), so let's concentrate on that.
One of the things I don't like about these patches is that I think full-blown parent pointers make it harder to reason about the code. Similarly, I don't think the amount of branching and fiddling with flags in e.g. wine_rb_remove() is an improvement. I realise those are pretty subjective considerations,
Yes, those are subjective. I find the new code actually cleaner and easier to follow and I have some subjective thoughts about the other solution as well. How about we both skip those subjective considerations? If you have concrete improvements suggestions, I'm happy to address them.
and the patches do work as intended in my tests. Any performance differences I found seemed to be within the margin of error. Nevertheless, would the attached patch work for you?
Not really. I obviously considered something along those lines, you could have asked before writing the patch. This solution wouldn't allow something I'm planning as a follow up, which is friendly (callback free) iterations. Something like WINE_RB_FOR_EACH_SAFE/WINE_RB_FOR_EACH_ENTRY_SAFE macros (just like their list.h counterparts) is just a matter of adding those defines with my patches.
Also ordered iterations would be an useful addition. I had usage example in my very recent jscript work, when I missed something like that and I ended up having simplicity-oriented ad-hoc solution. It would be nice to change that. Macros WINE_RB_FOR_EACH/WINE_RB_FOR_EACH_ENTRY doing ordered, mutation-safe, iterations would be straightforward with my patches. They are exactly what I needed.
Also, being able to efficiently iterate to the next node (something like wine_rb_next(), which could cope with tree mutations) is important if we want to use it for any of IDispatchEx implementations.
Jacek
On 24.08.2016 12:49, Jacek Caban wrote:
and the patches do work as intended in my tests. Any performance differences I found seemed to be within the margin of error. Nevertheless, would the attached patch work for you?
Not really. I obviously considered something along those lines, you could have asked before writing the patch. This solution wouldn't allow something I'm planning as a follow up, which is friendly (callback free) iterations. Something like WINE_RB_FOR_EACH_SAFE/WINE_RB_FOR_EACH_ENTRY_SAFE macros (just like their list.h counterparts) is just a matter of adding those defines with my patches.
Also ordered iterations would be an useful addition. I had usage example in my very recent jscript work, when I missed something like that and I ended up having simplicity-oriented ad-hoc solution. It would be nice to change that. Macros WINE_RB_FOR_EACH/WINE_RB_FOR_EACH_ENTRY doing ordered, mutation-safe, iterations would be straightforward with my patches. They are exactly what I needed.
Also, being able to efficiently iterate to the next node (something like wine_rb_next(), which could cope with tree mutations) is important if we want to use it for any of IDispatchEx implementations.
Jacek
(Non-safe) iterator macros with different orders could also be added without bigger code changes, but I assume that alone would still not be sufficient for your requirements. In order to implement safe iterator callbacks (and to get rid of the stack) those parent pointers would really be useful, although they have the disadvantage that they increase memory usage a bit. This makes me wonder if it wouldn't be useful have multiple tree implementations targeting different use-cases and performance requirements.
I assume that there are currently no plans to exchange the RBTree with something else, nevertheless it shouldn't hurt to share some code. I've attached a proof-of-concept patch to replace the RBTree with a Scapegoat tree I wrote a while ago. In contrast to other trees a nice property is that it is not really required to keep the tree balanced all the time, instead we have rebuild operations which will automatically detect it if the performance gets too bad and build a binary search tree. When deleting lots of elements for example we could disable rebuild temporarily to avoid too much overhead. If someone has code to measure performance differences in practice, feel free to give it a try. ;)
Regards, Sebastian
Hi Sebastian,
On 25.08.2016 03:57, Sebastian Lackner wrote:
On 24.08.2016 12:49, Jacek Caban wrote:
and the patches do work as intended in my tests. Any performance differences I found seemed to be within the margin of error. Nevertheless, would the attached patch work for you?
Not really. I obviously considered something along those lines, you could have asked before writing the patch. This solution wouldn't allow something I'm planning as a follow up, which is friendly (callback free) iterations. Something like WINE_RB_FOR_EACH_SAFE/WINE_RB_FOR_EACH_ENTRY_SAFE macros (just like their list.h counterparts) is just a matter of adding those defines with my patches.
Also ordered iterations would be an useful addition. I had usage example in my very recent jscript work, when I missed something like that and I ended up having simplicity-oriented ad-hoc solution. It would be nice to change that. Macros WINE_RB_FOR_EACH/WINE_RB_FOR_EACH_ENTRY doing ordered, mutation-safe, iterations would be straightforward with my patches. They are exactly what I needed.
Also, being able to efficiently iterate to the next node (something like wine_rb_next(), which could cope with tree mutations) is important if we want to use it for any of IDispatchEx implementations.
Jacek
(Non-safe) iterator macros with different orders could also be added without bigger code changes, but I assume that alone would still not be sufficient for your requirements. In order to implement safe iterator callbacks (and to get rid of the stack) those parent pointers would really be useful, although they have the disadvantage that they increase memory usage a bit.
As discussed earlier, if memory becomes a real problem, we may reduce it to three pointers at the cost of requiring aligned pointers.
This makes me wonder if it wouldn't be useful have multiple tree implementations targeting different use-cases and performance requirements.
I guess we could if we had convincing reason. I think that improved existing implementation should be enough so far.
I assume that there are currently no plans to exchange the RBTree with something else, nevertheless it shouldn't hurt to share some code. I've attached a proof-of-concept patch to replace the RBTree with a Scapegoat tree I wrote a while ago. In contrast to other trees a nice property is that it is not really required to keep the tree balanced all the time, instead we have rebuild operations which will automatically detect it if the performance gets too bad and build a binary search tree. When deleting lots of elements for example we could disable rebuild temporarily to avoid too much overhead.
Sure. Since you posted it, I had a quick look. That implementation requires allocators, which is what we should avoid to simplify usage of the tree. You could get rid of the array by storing another pointer in nodes that would be used to link them into ordered list instead of an array (that would require a bit more tricky wine_rb_build, but it should be doable without much perf cost). However, then you'd give up memory advantage.
Jacek
On 25.08.2016 15:30, Jacek Caban wrote:
Hi Sebastian,
[...]
I assume that there are currently no plans to exchange the RBTree with something else, nevertheless it shouldn't hurt to share some code. I've attached a proof-of-concept patch to replace the RBTree with a Scapegoat tree I wrote a while ago. In contrast to other trees a nice property is that it is not really required to keep the tree balanced all the time, instead we have rebuild operations which will automatically detect it if the performance gets too bad and build a binary search tree. When deleting lots of elements for example we could disable rebuild temporarily to avoid too much overhead.
Sure. Since you posted it, I had a quick look. That implementation requires allocators, which is what we should avoid to simplify usage of the tree. You could get rid of the array by storing another pointer in nodes that would be used to link them into ordered list instead of an array (that would require a bit more tricky wine_rb_build, but it should be doable without much perf cost). However, then you'd give up memory advantage.
Well, I do not think allocators in general are a problem. It is only a problem if important operations can fail because of out of memory errors. The Scapegoat tree uses it only for the rebuild operation which is not critical (performance will get worse, but everything will continue to work as expected). Also, I do not think its possible to replace the array with an ordered list - the rebuild operation requires computation of the median which would be way too expensive for regular lists.
Jacek
On 25.08.2016 15:59, Sebastian Lackner wrote:
On 25.08.2016 15:30, Jacek Caban wrote:
Hi Sebastian,
[...]
I assume that there are currently no plans to exchange the RBTree with something else, nevertheless it shouldn't hurt to share some code. I've attached a proof-of-concept patch to replace the RBTree with a Scapegoat tree I wrote a while ago. In contrast to other trees a nice property is that it is not really required to keep the tree balanced all the time, instead we have rebuild operations which will automatically detect it if the performance gets too bad and build a binary search tree. When deleting lots of elements for example we could disable rebuild temporarily to avoid too much overhead.
Sure. Since you posted it, I had a quick look. That implementation requires allocators, which is what we should avoid to simplify usage of the tree. You could get rid of the array by storing another pointer in nodes that would be used to link them into ordered list instead of an array (that would require a bit more tricky wine_rb_build, but it should be doable without much perf cost). However, then you'd give up memory advantage.
Well, I do not think allocators in general are a problem.It is only a problem if important operations can fail because of out of memory errors. The Scapegoat tree uses it only for the rebuild operation which is not critical (performance will get worse, but everything will continue to work as expected).
Sure, failing operations is the most annoying part of allocators, so it's a step forward. Still, avoiding overhead of adding allocators in each tree user would be nice.
Also, I do not think its possible to replace the array with an ordered list - the rebuild operation requires computation of the median which would be way too expensive for regular lists.
I think that should be doable, but with a different algorithm in wine_rb_build. That's what I meant by tricky. Here is first that comes first into my mind:
let's say that nodes are in list1
while(list1 length > 1) { while(list1 length > 0) { take first 3 elements node1, node2, node3 from list1 (*) node2->left = node1 node2->right = node3 append node2 into list2 } list1 = list2 }
root = the only element in list1
(*) if list length 2, use NULL for node3, if length is 1, node1=node3=NULL, node2=the only node in list
Since on every outer loop iteration length of new list is 3 times shorter than previous length, the complexity is O(n), just like with the array.
Jacek
On 25.08.2016 16:33, Jacek Caban wrote:
On 25.08.2016 15:59, Sebastian Lackner wrote:
On 25.08.2016 15:30, Jacek Caban wrote:
Hi Sebastian,
[...]
I assume that there are currently no plans to exchange the RBTree with something else, nevertheless it shouldn't hurt to share some code. I've attached a proof-of-concept patch to replace the RBTree with a Scapegoat tree I wrote a while ago. In contrast to other trees a nice property is that it is not really required to keep the tree balanced all the time, instead we have rebuild operations which will automatically detect it if the performance gets too bad and build a binary search tree. When deleting lots of elements for example we could disable rebuild temporarily to avoid too much overhead.
Sure. Since you posted it, I had a quick look. That implementation requires allocators, which is what we should avoid to simplify usage of the tree. You could get rid of the array by storing another pointer in nodes that would be used to link them into ordered list instead of an array (that would require a bit more tricky wine_rb_build, but it should be doable without much perf cost). However, then you'd give up memory advantage.
Well, I do not think allocators in general are a problem.It is only a problem if important operations can fail because of out of memory errors. The Scapegoat tree uses it only for the rebuild operation which is not critical (performance will get worse, but everything will continue to work as expected).
Sure, failing operations is the most annoying part of allocators, so it's a step forward. Still, avoiding overhead of adding allocators in each tree user would be nice.
Also, I do not think its possible to replace the array with an ordered list - the rebuild operation requires computation of the median which would be way too expensive for regular lists.
I think that should be doable, but with a different algorithm in wine_rb_build. That's what I meant by tricky. Here is first that comes first into my mind:
let's say that nodes are in list1
while(list1 length > 1) { while(list1 length > 0) { take first 3 elements node1, node2, node3 from list1 (*) node2->left = node1 node2->right = node3 append node2 into list2 } list1 = list2 }
root = the only element in list1
(*) if list length 2, use NULL for node3, if length is 1, node1=node3=NULL, node2=the only node in list
Since on every outer loop iteration length of new list is 3 times shorter than previous length, the complexity is O(n), just like with the array.
Jacek
I see what you mean. In fact, since we know how many nodes are in the subtree which has to be rebuild (and thus the height of the balanced subtree), this could probably even be implemented recursively without using any allocators and O(log n) stack usage. Nevertheless, before spending more time on an alternative implementation it might be useful to decide in which direction we want to proceed. ;)
Regards, Sebastian
On 25.08.2016 16:44, Sebastian Lackner wrote:
Nevertheless, before spending more time on an alternative implementation it might be useful to decide in which direction we want to proceed. ;)
Agreed ;) I still hope that my patches will get in. After last Henri's reply I was expecting a sign off, but it's still not there. Henri?
Jacek
On 25 August 2016 at 17:01, Jacek Caban jacek@codeweavers.com wrote:
Agreed ;) I still hope that my patches will get in. After last Henri's reply I was expecting a sign off, but it's still not there. Henri?
You may have misread my reply somewhat. Bottom line, and what I told Alexandre a while ago, is that while these changes aren't something I'd put my Signed-off-by on, I don't care enough to do it myself or to want to block them from going in.