On 8/2/21 3:04 PM, Zebediah Figura (she/her) wrote:
On 8/2/21 11:59 AM, Torge Matthies wrote:
Fixes a crash when shutting down a prefix.
Wine-Bug: https://bugs.winehq.org/show_bug.cgi?id=51479 Signed-off-by: Torge Matthies openglfreak@googlemail.com
dlls/wineusb.sys/wineusb.c | 3 +++ 1 file changed, 3 insertions(+)
diff --git a/dlls/wineusb.sys/wineusb.c b/dlls/wineusb.sys/wineusb.c index fae297915fc..28a36b9f380 100644 --- a/dlls/wineusb.sys/wineusb.c +++ b/dlls/wineusb.sys/wineusb.c @@ -413,12 +413,15 @@ static NTSTATUS pdo_pnp(DEVICE_OBJECT *device_obj, IRP *irp) break; case IRP_MN_REMOVE_DEVICE: + EnterCriticalSection(&wineusb_cs); remove_pending_irps(device); libusb_unref_device(device->libusb_device); libusb_close(device->handle); + list_remove(&device->entry); IoDeleteDevice(device->device_obj); + LeaveCriticalSection(&wineusb_cs); ret = STATUS_SUCCESS; break;
This looks wrong; the device could have been removed by remove_usb_device().
[1] implies that we actually shouldn't remove the device at all unless we reported it missing via IRP_MN_QUERY_DEVICE_RELATIONS, which probably implies that we shouldn't remove and delete the device here, but should instead remove (and delete?) it from the parent IRP_MN_SURPRISE_REMOVAL/IRP_MN_REMOVE_DEVICE. It's not particularly clear though. I'll have to run some tests and check native drivers, but I won't have access to those until next week.
[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/removing-a-...
(TL;DR: Lots of research and observation follows. I'm currently writing patches that solve the initial bug.)
I've done more research, and the situation is horrible. I think this is not only the worst API I've ever seen, but it would be the worst even if I wasn't considering the fact that it's a kernel API.
The first, horrible thing, is that despite what I said in [2], IRP_MN_REMOVE_DEVICE isn't actually guaranteed to be the last IRP received. It's only guaranteed to be the last IRP under certain conditions, which is why [1] instructs to retain the PDO if it hasn't actually been physically removed (*and* that this fact has been reported to the system).
In particular, apparently if you expose a child PDO with RawDeviceOK set to 1 but which has a function driver, Windows will send IRP_MN_START_DEVICE, then IRP_MN_QUERY_REMOVE_DEVICE, then IRP_MN_REMOVE_DEVICE, then attach a function driver and restart the same device object. Which means that you can't actually delete the device.
What then becomes unclear is what you're supposed to do if the parent device was deleted (which is, essentially, what we do at shutdown as well. I don't know if Windows does this at shutdown; I remember trying to test and being unable to conclude that Windows does or doesn't try to tear down the device tree—if it does, it does so late that it's impossible to log this fact anywhere).
According to [3] a function driver is supposed to defer deleting the child PDO even when the child PDO has received IRP_MN_REMOVE_DEVICE, and instead delete it in the parent FDO's IRP_MN_REMOVE_DEVICE. You'd think that the child could take its cue from IRP_MN_SURPRISE_REMOVAL that it can safely delete itself, but no.
The "toaster" driver sample from the Windows SDK, which has been my source of a lot of this, corroborates this—it deletes all child PDOs in the parent FDO's IRP_MN_REMOVE_DEVICE. However, it also says this:
// Typically the system removes all the children before // removing the parent FDO. If for any reason child Pdos are // still present we will destroy them explicitly, with one exception - // we will not delete the PDOs that are in SurpriseRemovePending state.
And it also does something which I can't find documented anywhere: in the parent FDO's IRP_MN_SURPRISE_REMOVAL it marks all child devices as "reported missing" so that they *are* deleted in the (child PDO) IRP_MN_REMOVE_DEVICE. It also removes them from the device list so that they aren't reported in a subsequent call to IRP_MN_QUERY_DEVICE_RELATIONS. It doesn't actually call IoInvalidateDeviceRelations() so this happens the "usual" way—no, it just pretends that the PnP manager already knows.
And then I tried writing some tests with native hidclass to see what that does, and as far as I can tell it doesn't do that thing: it keeps reporting the child in response to IRP_MN_QUERY_DEVICE_RELATIONS, even if I call that IRP from the (child PDO) IRP_MN_REMOVE_DEVICE. And it doesn't seem to delete the child PDO either, even as late as the minidriver FDO's IRP_MN_REMOVE_DEVICE. (Maybe it calls the minidriver IRP_MJ_PNP handler and then deletes children? It's basically impossible to confirm that it does delete the children and doesn't leak them...)
A different driver sample, "gameenum" from the Windows XP DDK, seems to match the documentation and hidclass behaviour. It doesn't delete anything in FDO IRP_MN_SURPRISE_REMOVAL, and doesn't delete anything in the PDO removal IRPs, but deletes all of the child PDOs in the FDO IRP_MN_REMOVE_DEVICE. It also keeps reporting devices in IRP_MN_QUERY_DEVICE_RELATIONS until the FDO IRP_MN_REMOVE_DEVICE.
Ultimately I think it might not even matter all that much. I think we could get away with removing the device in the PDO IRP_MN_REMOVE_DEVICE and things would still work. I also checked and Windows seems to ignore if IoInvalidateDeviceRelations() is called while removing a device; which is especially important for Wine as we can reasonably get hotplug events while trying to shut down the system. That does mean that we won't always get IRP_MN_REMOVE_DEVICE for a child device, though, which means that we will always need to clean up leftover child devices in the FDO IRP_MN_REMOVE_DEVICE. Given that it may not be a bad idea to just always clean up child devices there.
That's roughly how we handled things before 84c9206ca. That commit reused "removed" for a second purpose—to finish and prevent any pending IRPs so that a child or grandchild device wouldn't get stuck waiting for them in IRP_MN_REMOVE_DEVICE (which of course executes after parent SURPRISE_REMOVAL but before parent REMOVE_DEVICE). I think it would be reasonable to also unlink the device in the child (PDO) IRP_MN_SURPRISE_REMOVAL, and just continue to reuse "removed" for multiple purposes.
We could also follow closer to the documentation and the hidgame sample, though, and use separate per-device flags to track whether the device has been unplugged and whether it is delete-pending (as it were). It's not really clear to me that we need to, though maybe it's better for clarity anyway...
ἔρρωσθε, Zebediah
[1] https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/removing-a-...
[2] https://www.winehq.org/pipermail/wine-devel/2021-June/189773.html
[3] https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/removing-a-...