https://bugs.winehq.org/show_bug.cgi?id=45023
Bug ID: 45023 Summary: TestBot Object Relational Mapper issues Product: Wine-Testbot Version: unspecified Hardware: x86 OS: Linux Status: NEW Severity: normal Priority: P2 Component: unknown Assignee: wine-bugs@winehq.org Reporter: fgouget@codeweavers.com Distribution: ---
The TestBot implements its own perl Object Relational Mapper (ORM) to access the database. It is implemented by the modules in lib/ObjectModel. The are two main concepts:
* Items An Item is an object representing a table row. The classes such as Job, Step and Task all inherit from Item.
* Collections A collection represents a table and is thus a collection of Items. To allow for fast access the items are stored in a perl hashtable indexed by their primary key. A Collection may not contain all of the table's rows. For instance a Step Collection only contains the Steps of the parent Job object used to create it. So $Job->Steps is a collection that contains only the $Job steps.
The items in a Collection can be further filtered out by specifying that one of their property must be in a set of values. For instance $Jobs->AddFilter("Status", ["queued", "running"]) will cause the $Jobs collection to only return the queued and running jobs.
Unfortunately this implementation has a number of limitations and design issues.
1. Care must be taken when creating multiple objects.
my $Job1 = $Jobs->Add(); ... Set mandatory fields ... my $Job2 = $Jobs->Add(); ... Set mandatory fields ... # Here $Job1 != $Job2 $Jobs->Save();
Only $Job2 got saved to the database! The reason is that a Job's primary key is an auto-increment field. So the Job Id is set when it is saved to the database. Thus when created in memory that field is unset and an empty string is used when adding it to the $Jobs items hashtable. Thus $Job2 overwrites $Jobs->{Items}->{""} and is the only one that gets saved.
Note that there is no such issue for Steps for instance because their key is not an auto-increment field and is set as soon as the object is created.
Fortunately the workaround is simple: systematically save an object before creating the next one but the difference in behavior between $Jobs->Add() and $Steps->Add() is error prone.
2. Auto-increment values on new Items are not ignored
If an auto-increment field is modified on an object, that value is used when saving the object to the database.
This means fixing the previous issue is not as simple as setting $Job->Id(--$Unique): handling of auto-increment fields also needs to be changed when saving new objects.
3. Items retrieved from Detailref collections are missing fields
'Detailref' properties are used to represent 'one to many' relationships. They are used for instance to access the Steps that belong to a Job:
my $Step = $Job->Steps->GetItem(1);
Here Steps is a collection that gets created based on the information of the job's DetailRef property.
However objects gotten in this way don't provide access to all of the corresponding database fields. In the case of Steps, although the Steps table has a JobId column, the $Step->JobId property does not exist. This presents difficulties in a number of cases such as when trying to figure out the name of the directory containing the Step's files since it is called "jobs/$JobId/$StepNo" (see Step::GetDir()).
It also means there is no $Step->Job property to get back to the parent Job object (this would also cause a perl reference loop which would be troublesome).
The reason for this issue is that when a Collection or Item has a parent object (called a 'master' object in the TestBot ORM) the parent's foreign key fields (Step.JobId here) are stored separately in a 'master cols' structure and are thus not accessible through the normal field methods.
4. "Child" objects can only be accessed from their parent
Any collection referenced through a Detailref property can only be accessed through the corresponding 'parent' object.
This is the case for the Steps collections. The only way to get a collection of steps is through $Job->Steps and this only returns the steps that belong to $Job. So it's not possible to iterate over all the rows in the Steps table.
There is a (somewhat incorrect) workaround for this for the Records table. This involves having the CreateRecords() method picking one of two PropertyDescriptors list depending on the presence of a parent object.
However the corresponding Record objects have an extra field compared to the ones obtained through $RecordGroup->Records: this time the foreign key identifying the parent object is accessible since it's not not tucked away in the 'master cols' structure. This is not an issue unless you start mixing both types of objects, for instance through scope inheritance.
5. The 'master cols' terminology is ambiguous.
It's never clear if a 'master cols' method treats $self as a child of its parent object, or as the parent to other child objects.
* $Step->GetMasterCols() treats $Step as a child object and returns the key columns of its $Job master object, that is (JobId).
* Similarly $Step->MasterKeyChanged() means the parent object's key changed, so here that the value of $JobId changed (which happens on new objects due to the auto-increment field btw).
* Despite the unchanged 'Master' moniker, $Step->GetMasterKey() considers $Step to be the master object of the Tasks it contains and thus includes the step's key in the returned columns. This means $Step->GetMasterKey() returns (JobId, StepNo).
* Despite not having the 'Master' moniker, $Step->GetFullKey() works the same as $Step->GetMasterKey() but returns the column values as a single string '$JobId#@#$StepNo'.
6. Filters on Detailref collections
It's possible to put a filter on a collection to only get a subset of the corresponding rows. For instance $Job->Steps->AddFilter('Status', ['queued']) ensures that one will only get the queued steps of $Job.
But Detailref collections are stored as a field of the Item they belong to. This means $Job->Steps always returns the same collection object. So once a piece of code has set a filter on it, to only return 'queued' steps for instance, all other parts of the code will only get the 'queued' steps, no matter what they need. Also there is no method for cloning a collection (so one cannot do $Job->Steps->Clone()->AddFilter()), or for removing a filter.
For instance: $Job->Steps->AddFilter('Status', ['queued']); ... # Here UpdateStatus() will only take into account the queued steps! $Job->UpdateStatus();
Note also that it's not entirely trivial to work around. It's tempting to do something like: $Job->Steps->AddFilter('Status', ['queued']); ... # Make sure the database contains an up-to-date view of all the Steps and # Tasks in $Job. $Job->Save(); # Then create a new job object from the database my $TmpJob = CreateJobs()->GetItem($Job->Id); # Now we can update the status without fearing bad filters. $TmpJob->UpdateStatus(); # But here $Job->Status could still be wrong so any code that still uses # $Job will be lead astray. So update $Job. $Job->Status($TmpJob->Status) # But that still leaves out every Step and Task object referenced by $Job # Also saving $Job will save $Job->Status, again. What could go wrong?
7. Filters vs. loading from the database
Once a collection has been loaded from the database, adding a filter to it has no effect. That's because filtering happens by tweaking the SQL query and changing the filter does not mark the collection for reloading.
Again this issue is made more severe because of the way Detailref collections are handled. If a piece of code has caused the $Job->Steps collection to be loaded, then adding a filter elsewhere will have no effect.
For instance: $Job->UpdateStatus(); # calls $Job->GetItems(), loading it from DB ... $Job->Steps->AddFilter('Status', ['queued']); foreach my $Step ($Job->Steps->GetItem()) # Here we will get all steps, not just the queued ones!
8. No "or" operator in filters
Originally the filters only allowed one to request that a field be in a set of admissible values. This was extended to allow requesting for a field to be less than, greater than or like some value.
However it is still not possible to use an "or" operator in filters. So for instance one cannot retrieve all Steps that completed recently or correspond to a WineTest run (WHERE Ended > 5 minutes ago OR User == 'batch').
9. No Itemref support for multiple key fields.
A Task has a VMName field which is a foreign key referencing a VM. Through the use of an Itemref property one can directly access the VM with $Task->VM. But we cannot do the same thing for steps.
The primary key of a step has two fields, (JobId, No). A step also has a PreviousNo field which identifies the step it depends on. So we would want to be able to access the previous step through $Step->Previous by using an Itemref property:
CreateItemrefPropertyDescriptor("Previous", "Previous step", !1, !1, &CreateSteps, ["JobId", "PreviousNo"]),
However this fails with a "Multiple key components not supported" error. Note that for this to work we would also need a way to tell the Itemref that the Step.PreviousNo field should be mapped to Step.No. But there is no way to do so.
10. Collection::GetItem() blindly adds the item to the collection
Collections have a GetItem() method that returns an item when given a string containing that object's key.
So using the Step example above, if we still have the right $Job->Steps collection accessible, we can easily get the Previous step through:
$Job->Steps->GetItem($Step->PreviousNo)
However this also adds $Step->PreviousNo to the $Job->Steps collection, regardless of what the filter on that collection is. So when the scheduler analyzes the queued and running steps to figure out whether the one they depend on, $Step->PreviousNo, has completed, it also unwittingly adds the previous step to the $Job->Step collection. So the next iteration on the collection may return completed steps despite expectations.
11. Save order issues
The TestBot ORM automatically takes care of saving parent objects before the child objects they contain. In practice, if you create a Job > Step > Task hierarchy you can count on the TestBot ORM to save the Job before saving the Steps that use the JobId as part of their key and so on.
But this breaks down for $Step->PreviousNo. There you have to make sure to save the steps in the right order otherwise you get an SQL error.
Similarly deletions must be performed in the right order. Fortunately this is mostly transparent since we only delete whole Jobs and Job::OnDelete() takes care of blanking the Step::PreviousNo fields before recursing.
12. No Order-by support
There are a many cases where we retrieve a number of rows from the database and then do a simple alphabetical or numerical sort on them. That's the kind of thing that the database would do much faster than Perl. This is mostly an issue for the activity and statistics web pages because of the number of RecordGroups they handle.
So it would be nice to be able to specify an Order-By SQL directive when retrieving the objects. However this would run into the same issues as the filters for Detailref collections with regards to already loaded collections, and GetItem() not knowing where to insert freshly loaded objects.
13. No support for joins
There are a few cases where doing a join could be useful.
For instance when reporting the activity what we really want is all the Records rows corresponding to a RecordGroup that is less than 12 hours old. For now we have to proceed indirectly: we query all the RecordGroup objects that are less than 12 hours old, take the lowest GroupId we find and then retrieve all the Records that have a Group Id greater than that.
A different type of join could also be useful in many other places: currently we first retrieve the jobs and then we do one more SQL request per job to retrieve its steps and then one per step to retrieve the tasks. A join could let us load it all in one request. Fortunately we don't have many jobs so unlike in the activity case this does not have a significant performance impact.
14. Performance
Most parts of the TestBot don't have much processing to do so that performance is not an issue (though there is not really any hard data).
Things are different for the activity and statistics page. The statistics page is the worst and generating it takes over a dozen seconds. That's annoying because of the load it puts on the TestBot web server. It's also really long for a mere couple dozen thousand Records. Not all of it comes from the ORM but over 50% of it does.
There are some relatively obvious paths for improvement. For instance accessing a field like $Job->Ended causes the ORM to loop over all of a job's property descriptors until it finds one called 'Ended', and then it returns the value from a hash table. It should be possible to make it so that most of the time the value is returned directly from the hash table. Detailref and Itemref properties have their own hash tables and so must be handled separately. But hat could likely also be changed.
https://bugs.winehq.org/show_bug.cgi?id=45023
--- Comment #1 from François Gouget fgouget@codeweavers.com --- I have patches for some of these issues. However there are enough of them that I don't think it makes sense to really try to fix the TestBot's ORM. Rather the most promising approach is to replace it with an existing perl ORM such as Rose::DB or DBIx::Class.
https://bugs.winehq.org/show_bug.cgi?id=45023
--- Comment #2 from François Gouget fgouget@codeweavers.com --- Yet another issue.
15. New items must be saved through the collection
It's possible to call an item's Save() method to save it to the database after modifying it. So the following code works:
my $Job = CreateJobs()->GetItem(1234); $Job->Status("queued"); ... $Job->Save();
But it's not possible to do the same thing on a new item. So the following does not work:
my $Job = CreateJobs()->Add(); $Job->Status("queued"); ... $Job->Save();
Instead one must do:
my $Jobs = CreateJobs(); my $Job = $Jobs->Add(); $Job->Status("queued"); ... $Jobs->Save(); # Save **every new and modified item** in the collection
https://bugs.winehq.org/show_bug.cgi?id=45023
tokktokk fdsfgs@krutt.org changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |fdsfgs@krutt.org
https://bugs.winehq.org/show_bug.cgi?id=45023
--- Comment #3 from François Gouget fgouget@codeweavers.com --- And for issue number 16!
16. Item::Save() and Collection::Save() have inconsistent return values
The returned values depend on whether Save() is called on an Item or a Collection:
($ErrProperty, $ErrMessage) = $Job->Save(); ($ErrKey, $ErrProperty, $ErrMessage) = $Jobs->Save();
Notice also that the corret way to detect errors is through an if (defined $ErrMessage). However if you use the 3 values form on an Item, $ErrMessage will always be undefined, causing errors to not be detected.
Finally the same issue exists for the Validate() methods but those are more rarely used.
https://bugs.winehq.org/show_bug.cgi?id=45023
--- Comment #4 from François Gouget fgouget@codeweavers.com ---
17. Optional enum properties are not really supported
CreateEnumPropertyDescriptor() takes an $IsRequired parameter. But although it is possible to set it to False, various DBI/Collection/Item places generate errors when such an enum field is set to undef.
https://bugs.winehq.org/show_bug.cgi?id=45023
--- Comment #5 from François Gouget fgouget@codeweavers.com --- 18. $Item->Xxx("foo") does not go through PutColValue()!
Instead AUTOLOAD sets the specified property directly, maybe for performance reasons, and PutColValue() is only used by DBIBackend.pm.
The consequence is that there is no way to take special action when a specific property is modified. It is however still possible to detect whether a given property was modified by overriding ResetModified(), saving its initial value there and then comparing the current value to the saved one.
https://bugs.winehq.org/show_bug.cgi?id=45023
--- Comment #6 from François Gouget fgouget@codeweavers.com --- 19. Only one Detailref can point to a table (otherwise expect inconsistencies)
The TestBot hasa Users table (e.g. fgouget), a Roles table (e.g. wine-devel), and a UserRoles table which describes the roles that each user has (e.g. [fgouget, wine-devel]).
One can access a user's roles, for instance: $User->Roles->GetItem("wine-devel"). But there is no support for accessing a role's users: $Role->Users->GetItem("fgouget") fails because there is no Users field.
Adding a Users field is possible but it does not work right. For instance:
my $UR = CreateUsers()->GetItem("fgouget")->Roles->GetItem("wine-devel"); print "Full key=", $UR->GetFullKey(), "\n"; -> fgouget#@#wine-devel # This is correct
my $UR = CreateRoles()->GetItem("wine-devel")->Users->GetItem("fgouget"); print "RU=<undef>\n" if (!defined $RU); -> RU=<undef> # Wrong!!! [1]
my $Users = CreateRoles()->GetItem("wine-devel")->Users; map { print $_->GetFullKey(), "\n" } @{$Users->GetItems()}; -> wine-devel#@#wine-devel # Wrong!!! [2]
The trouble is that whenever a UserRole is retrieved from a Collection its declaration is:
my @PropertyDescriptors = ( CreateItemrefPropertyDescriptor("Role", "Role", 1, 1, &CreateRoles, ["RoleName"]), );
That's because it is assumed that in that case the UserName field in comes from the parent User Item and thus will be present in as master column.
* This is why trying to retrieve the UserRole item by its UserName field in [1] does not work. There is no UserName field, neither as a regular field, nor as a master column.
* This will also break the database saves since the SQL query will not include the required UserName field. (Note: reads work, the SQL query just returns all the entries with the specified RoleName field, and we create a bunch of UserRole objects where only the RoleName field is set).
* This is also why the full key is wrong in [2]: RoleName ends up as both a master column and a regular field so the full key is 'wine-devel' (master column) + '#@#' + 'wine-devel' (regular field).
* Note that even if the UserName field was somehow added back, the full key would end up being 'wine-devel#@#fgouget' instead of 'fgouget#@#wine-devel'. This issue will remain as long as the first component of the full key comes from the parent Item.
* This also breaks the scope lookups since they depend on the full key. The consequence is code that expects the same object to be returned no matter which path is taken will get two different objects with all the implied state inconsistencies.