On Aug 27, 2015, at 2:08 PM, Matteo Bruni mbruni@codeweavers.com wrote:
The man page for getdirentries mentions that only positions previously returned from lseek or getdirentries should be used, I'm arguably abusing the pointer to store the index and then restoring the position using SEEK_CUR. It seems to work fine for me, not sure if that might cause issues with some specific filesystem setups.
You're not restoring it, you're setting it to a computed value. The computation assumes that the file position is the index of the entry that was read. For example, you get the current position and then you're setting the position to that original position minus 2 (for the synthesized . and .. entries). That doesn't seem like a reasonable assumption to me.
In some quick testing (with different code), I get results like these:
rc 8172 base 0/0x00000000 filepos 469762300/0x1c0000fc entries 252/0x000000fc total 252/0x000000fc rc 8188 base 469762300/0x1c0000fc filepos 469762640/0x1c000250 entries 340/0x00000154 total 592/0x00000250 rc 2364 base 469762640/0x1c000250 filepos 469762732/0x1c0002ac entries 92/0x0000005c total 684/0x000002ac
The file positions seem to be 0x1c000000 with the entry index added to it. However, each run of my test program gives a different fixed baseline. The file positions are definitely not _just_ entry indices.
It does seem to work to seek to an entry index in some quick testing, but it makes me very nervous.
dlls/ntdll/directory.c | 143 ++++++++++++++++++------------------------------- 1 file changed, 53 insertions(+), 90 deletions(-)
diff --git a/dlls/ntdll/directory.c b/dlls/ntdll/directory.c index f9562d2..eb6cd45 100644 --- a/dlls/ntdll/directory.c +++ b/dlls/ntdll/directory.c @@ -1820,27 +1820,27 @@ static int read_directory_getdirentries( int fd, IO_STATUS_BLOCK *io, void *buff BOOLEAN restart_scan, FILE_INFORMATION_CLASS class ) { long restart_pos;
- ULONG_PTR restart_info_pos = 0;
- size_t size, initial_size = length;
- int res, fake_dot_dot = 1;
- char *data, local_buffer[8192];
- off_t newcursor;
This newcursor variable is only ever set and then never used.
- int total, res, i;
- char data[8192]; struct dirent *de;
- union file_directory_info *info, *last_info = NULL, *restart_last_info = NULL;
- union file_directory_info *info, *last_info = NULL;
- size = initial_size;
- data = local_buffer;
- if (size > sizeof(local_buffer) && !(data = RtlAllocateHeap( GetProcessHeap(), 0, size )))
- io->u.Status = STATUS_SUCCESS;
- if (restart_scan) {
io->u.Status = STATUS_NO_MEMORY;
return io->u.Status;
lseek( fd, 0, SEEK_SET );
i = 0;
- }
- else
- {
i = lseek( fd, 0, SEEK_CUR );
lseek( fd, 0, SEEK_SET );
newcursor = lseek( fd, i < 2 ? 0 : i - 2, SEEK_CUR );
Here's where you're seeking to an arbitrary position. Also, the last two seeks can be combined (if they were right). That is, seeking to the beginning (0, SEEK_SET) and then seeking relative to the current position is just seeking to the relative position as though it were absolute.
res = 0;
done:
- if (data != local_buffer) RtlFreeHeap( GetProcessHeap(), 0, data ); return res;
This wasn't changed by your patch, but it appears that this function can only return 0 at this point, so it should just do so. res is set to 0 just before the "done" label. All "goto done" statements are immediately preceded by setting res to 0. So, the conflation of "res" with a return value should be eliminated. Setting it to 0 in those various places is unnecessary.
Once that's cleared up, I would rename res and total to something like bytes_left and bytes_read, respectively. To me, on my first scan through your patch, I was expecting total to accumulate the total amount read over several iterations. (Not that that would have been useful, but it confused me.)
-Ken
2015-08-27 23:27 GMT+02:00 Ken Thomases ken@codeweavers.com:
On Aug 27, 2015, at 2:08 PM, Matteo Bruni mbruni@codeweavers.com wrote:
The man page for getdirentries mentions that only positions previously returned from lseek or getdirentries should be used, I'm arguably abusing the pointer to store the index and then restoring the position using SEEK_CUR. It seems to work fine for me, not sure if that might cause issues with some specific filesystem setups.
You're not restoring it, you're setting it to a computed value. The computation assumes that the file position is the index of the entry that was read. For example, you get the current position and then you're setting the position to that original position minus 2 (for the synthesized . and .. entries). That doesn't seem like a reasonable assumption to me.
In some quick testing (with different code), I get results like these:
rc 8172 base 0/0x00000000 filepos 469762300/0x1c0000fc entries 252/0x000000fc total 252/0x000000fc rc 8188 base 469762300/0x1c0000fc filepos 469762640/0x1c000250 entries 340/0x00000154 total 592/0x00000250 rc 2364 base 469762640/0x1c000250 filepos 469762732/0x1c0002ac entries 92/0x0000005c total 684/0x000002ac
The file positions seem to be 0x1c000000 with the entry index added to it. However, each run of my test program gives a different fixed baseline. The file positions are definitely not _just_ entry indices.
It does seem to work to seek to an entry index in some quick testing, but it makes me very nervous.
Yes, I didn't explain it clearly but that's what I meant with the commit message. More below.
dlls/ntdll/directory.c | 143 ++++++++++++++++++------------------------------- 1 file changed, 53 insertions(+), 90 deletions(-)
diff --git a/dlls/ntdll/directory.c b/dlls/ntdll/directory.c index f9562d2..eb6cd45 100644 --- a/dlls/ntdll/directory.c +++ b/dlls/ntdll/directory.c @@ -1820,27 +1820,27 @@ static int read_directory_getdirentries( int fd, IO_STATUS_BLOCK *io, void *buff BOOLEAN restart_scan, FILE_INFORMATION_CLASS class ) { long restart_pos;
- ULONG_PTR restart_info_pos = 0;
- size_t size, initial_size = length;
- int res, fake_dot_dot = 1;
- char *data, local_buffer[8192];
- off_t newcursor;
This newcursor variable is only ever set and then never used.
Right, will fix.
- int total, res, i;
- char data[8192]; struct dirent *de;
- union file_directory_info *info, *last_info = NULL, *restart_last_info = NULL;
- union file_directory_info *info, *last_info = NULL;
- size = initial_size;
- data = local_buffer;
- if (size > sizeof(local_buffer) && !(data = RtlAllocateHeap( GetProcessHeap(), 0, size )))
- io->u.Status = STATUS_SUCCESS;
- if (restart_scan) {
io->u.Status = STATUS_NO_MEMORY;
return io->u.Status;
lseek( fd, 0, SEEK_SET );
i = 0;
- }
- else
- {
i = lseek( fd, 0, SEEK_CUR );
lseek( fd, 0, SEEK_SET );
newcursor = lseek( fd, i < 2 ? 0 : i - 2, SEEK_CUR );
Here's where you're seeking to an arbitrary position. Also, the last two seeks can be combined (if they were right). That is, seeking to the beginning (0, SEEK_SET) and then seeking relative to the current position is just seeking to the relative position as though it were absolute.
That's true only assuming that a lseek(SEEK_SET) and a lseek(SEEK_CUR) turn out to be in practice the same operation. I instead thought that they would be intrinsically different and that, while a SEEK_SET to an arbitrary position is not fine, SEEK_SET to 0 + SEEK_CUR is okay. I guess I'm wrong.
A "safer" option would be to store the position in the fd unchanged but reserve two values (-1 and -2?) for the '.' and '..'. I'm not sure that's entirely safe but I don't see anything better.
res = 0;
done:
- if (data != local_buffer) RtlFreeHeap( GetProcessHeap(), 0, data ); return res;
This wasn't changed by your patch, but it appears that this function can only return 0 at this point, so it should just do so. res is set to 0 just before the "done" label. All "goto done" statements are immediately preceded by setting res to 0. So, the conflation of "res" with a return value should be eliminated. Setting it to 0 in those various places is unnecessary.
That's right, I missed it.
Once that's cleared up, I would rename res and total to something like bytes_left and bytes_read, respectively. To me, on my first scan through your patch, I was expecting total to accumulate the total amount read over several iterations. (Not that that would have been useful, but it confused me.)
Good point, I'll probably take your suggested names.
Thanks for the review!
On Aug 27, 2015, at 5:05 PM, Matteo Bruni matteo.mystral@gmail.com wrote:
2015-08-27 23:27 GMT+02:00 Ken Thomases ken@codeweavers.com:
On Aug 27, 2015, at 2:08 PM, Matteo Bruni mbruni@codeweavers.com wrote:
The man page for getdirentries mentions that only positions previously returned from lseek or getdirentries should be used, I'm arguably abusing the pointer to store the index and then restoring the position using SEEK_CUR. It seems to work fine for me, not sure if that might cause issues with some specific filesystem setups.
You're not restoring it, you're setting it to a computed value. The computation assumes that the file position is the index of the entry that was read. For example, you get the current position and then you're setting the position to that original position minus 2 (for the synthesized . and .. entries). That doesn't seem like a reasonable assumption to me.
In some quick testing (with different code), I get results like these:
rc 8172 base 0/0x00000000 filepos 469762300/0x1c0000fc entries 252/0x000000fc total 252/0x000000fc rc 8188 base 469762300/0x1c0000fc filepos 469762640/0x1c000250 entries 340/0x00000154 total 592/0x00000250 rc 2364 base 469762640/0x1c000250 filepos 469762732/0x1c0002ac entries 92/0x0000005c total 684/0x000002ac
The file positions seem to be 0x1c000000 with the entry index added to it. However, each run of my test program gives a different fixed baseline. The file positions are definitely not _just_ entry indices.
It does seem to work to seek to an entry index in some quick testing, but it makes me very nervous.
Yes, I didn't explain it clearly but that's what I meant with the commit message. More below.
I did some poking through the OS X kernel code. Different file systems seem to interpret the file position differently when reading directory entries. So, I'm more convinced than before that assuming you can compute a file position from the desired directory entry index won't work in the general case. The observed scheme seems to be specific to the HFS(+) driver.
A "safer" option would be to store the position in the fd unchanged but reserve two values (-1 and -2?) for the '.' and '..'. I'm not sure that's entirely safe but I don't see anything better.
Negative values are rejected by lseek() with EINVAL. (Didn't actually try it, but the man page says so and the kernel code seemed to agree. ;)
We would need to treat offset 0 as indicating that the next entry to be read is ".". OFF_MAX - 1 would indicate that the next entry to be read is "..". OFF_MAX - 2 would indicate that we should seek to 0 and read real entries. Or something like that.
I think we should not use OFF_MAX itself as a magic value. In my testing, the NFS driver sets the position to that at EOF. Of course, that raises the possibility that there are other special values that we'll collide with.
-Ken
2015-08-28 2:40 GMT+02:00 Ken Thomases ken@codeweavers.com:
On Aug 27, 2015, at 5:05 PM, Matteo Bruni matteo.mystral@gmail.com wrote:
A "safer" option would be to store the position in the fd unchanged but reserve two values (-1 and -2?) for the '.' and '..'. I'm not sure that's entirely safe but I don't see anything better.
Negative values are rejected by lseek() with EINVAL. (Didn't actually try it, but the man page says so and the kernel code seemed to agree. ;)
Empirical tests agree too, lseek(fd, -1, SEEK_SET) returns -1 i.e. error and a subsequent lseek(SEEK_CUR) gives a different value ;)
We would need to treat offset 0 as indicating that the next entry to be read is ".". OFF_MAX - 1 would indicate that the next entry to be read is "..". OFF_MAX - 2 would indicate that we should seek to 0 and read real entries. Or something like that.
I think we should not use OFF_MAX itself as a magic value. In my testing, the NFS driver sets the position to that at EOF. Of course, that raises the possibility that there are other special values that we'll collide with.
Yeah, that's a concern. Actually there is another possibility though. We could swap the "." and ".." entries with the first two entries returned from getdirentries(), in the same vein as read_directory_getdents(). That should work but I'm not sure it's much of an improvement overall. Also it looks like HFS+ at least lists directory entries in-order and it would be somewhat unfortunate to lose that "feature".
I'm attaching a v2 of the patch with your points addressed. I think I'll also try to come up with an alternative version following the aforementioned approach and see how ugly it looks like compared to this...
On Aug 28, 2015, at 2:56 PM, Matteo Bruni matteo.mystral@gmail.com wrote:
We could swap the "." and ".." entries with the first two entries returned from getdirentries(), in the same vein as read_directory_getdents(). That should work but I'm not sure it's much of an improvement overall.
Yikes! I just had a look at read_directory_getdents(). It uses a static variable to store the information needed to know if it should enumerate ".." or a real entry. That's terrible. It's not thread safe. It's not even safe against enumerating two directories item by item in alternation.
I'm attaching a v2 of the patch with your points addressed.
- /* Store the updated position in the fd. */
- if (i == OFF_MAX - 2 || i == OFF_MAX - 1)
- {
i = lseek( fd, i, SEEK_SET );
- }
- else
- {
lseek( fd, restart_pos, SEEK_SET );
if (bytes_left != bytes_read)
}wine_getdirentries( fd, data, bytes_read - bytes_left, &restart_pos );
Hmm. This raises another problem. I knew about it, but had forgotten. You can't rely on getdirentries() reading a specified number of entries (or bytes) and leaving the file position such that the next read will resume with the next entry.
On some file systems, getdirentries() reads entries from the directory in whole blocks. It may read fewer entries than the supplied buffer could hold if it couldn't fit all of the next block's worth of entries into the buffer. On the affected file systems, getdirentries() will always leave the file position on a multiple of the block size and there's nothing you can do to change that. (This is CodeWeavers bug 12271, FYI.)
So, that last call to wine_getdirentries() won't do what you hope. And, in general, there's no good way to make read_directory_getdirentries() leave the file position in a place such that it can resume at the next entry it would have enumerated.
This is a nasty problem. Sorry to have wasted your time working on this before remembering this fundamental flaw.
It may be that the only reliable fix that can be implemented in ntdll is to use a directory index as the file position, after all. Except that we'd never use that as the place to read from. Instead, we'd always seek to the beginning and read entries, throwing away any that were before the index stored in the file position. Of course, this causes tons of redundant reading of directory entries.
Alexandre had a vague suggestion for a different approach in a previous thread https://www.winehq.org/pipermail/wine-devel/2014-February/102688.html that I didn't entirely follow. Part of it seems to have been to read the whole directory listing into memory when it is first requested (or a restart is requested) and then return entries from this cache on subsequent calls. His scheme may have required additional support from the server to track metadata for handles. I don't know how reads of the same handle in multiple processes would be coordinated.
Perhaps he can be persuaded to describe the solution he has in mind more fully. Or, if you can understand what he wrote in that previous thread better than I, you can try your hand at it.
-Ken
2015-08-31 21:09 GMT+02:00 Ken Thomases ken@codeweavers.com:
On Aug 28, 2015, at 2:56 PM, Matteo Bruni matteo.mystral@gmail.com wrote:
We could swap the "." and ".." entries with the first two entries returned from getdirentries(), in the same vein as read_directory_getdents(). That should work but I'm not sure it's much of an improvement overall.
Yikes! I just had a look at read_directory_getdents(). It uses a static variable to store the information needed to know if it should enumerate ".." or a real entry. That's terrible. It's not thread safe. It's not even safe against enumerating two directories item by item in alternation.
Actually it should be safe for both. About thread safety, check NtQueryDirectoryFile(), the various read_directory_* helpers are protected by the "dir_section" critical section. BTW notice that fd is opened by changing the current directory with fchdir(). As for the enumerating multiple directories by alternation, look at the last_dir_id checks and related code.
Yes, all this is not nice by any means.
I'm attaching a v2 of the patch with your points addressed.
- /* Store the updated position in the fd. */
- if (i == OFF_MAX - 2 || i == OFF_MAX - 1)
- {
i = lseek( fd, i, SEEK_SET );
- }
- else
- {
lseek( fd, restart_pos, SEEK_SET );
if (bytes_left != bytes_read)
}wine_getdirentries( fd, data, bytes_read - bytes_left, &restart_pos );
Hmm. This raises another problem. I knew about it, but had forgotten. You can't rely on getdirentries() reading a specified number of entries (or bytes) and leaving the file position such that the next read will resume with the next entry.
On some file systems, getdirentries() reads entries from the directory in whole blocks. It may read fewer entries than the supplied buffer could hold if it couldn't fit all of the next block's worth of entries into the buffer. On the affected file systems, getdirentries() will always leave the file position on a multiple of the block size and there's nothing you can do to change that. (This is CodeWeavers bug 12271, FYI.)
So, that last call to wine_getdirentries() won't do what you hope. And, in general, there's no good way to make read_directory_getdirentries() leave the file position in a place such that it can resume at the next entry it would have enumerated.
This is a nasty problem. Sorry to have wasted your time working on this before remembering this fundamental flaw.
Eh, right, that's real nasty. No problem, I should have realized that too.
It may be that the only reliable fix that can be implemented in ntdll is to use a directory index as the file position, after all. Except that we'd never use that as the place to read from. Instead, we'd always seek to the beginning and read entries, throwing away any that were before the index stored in the file position. Of course, this causes tons of redundant reading of directory entries.
I think I'll try this just in case, as a sort of "experiment", even though it doesn't seem to be the correct way forward overall.
Alexandre had a vague suggestion for a different approach in a previous thread https://www.winehq.org/pipermail/wine-devel/2014-February/102688.html that I didn't entirely follow. Part of it seems to have been to read the whole directory listing into memory when it is first requested (or a restart is requested) and then return entries from this cache on subsequent calls. His scheme may have required additional support from the server to track metadata for handles. I don't know how reads of the same handle in multiple processes would be coordinated.
Perhaps he can be persuaded to describe the solution he has in mind more fully. Or, if you can understand what he wrote in that previous thread better than I, you can try your hand at it.
Uh, I didn't recall this wine-devel thread at all but I'm sure I followed it back then. FWIW my understanding pretty much matches yours and I see why that would be the "right" fix. I guess I'll just try to implement Alexandre's suggestion and see what happens. Expect either a new set of patches or some kind of request for help from me in the somewhat near future :P
-Ken
Thanks again for the feedback!