Enforce proper atomic update so that other threads do not read stale data from IO_STATUS_BLOCK.
Signed-off-by: Jinoh Kang jinoh.kang.kr@gmail.com
-- v17: ntdll: Fix reading stale Information from IOSB.
From: Jinoh Kang jinoh.kang.kr@gmail.com
Make it clear that we're accessing an IO_STATUS_BLOCK.
Don't rely on the little endian representation of the Internal field on 64-bit build, when we're in fact accessing the Status field. --- dlls/kernelbase/file.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/dlls/kernelbase/file.c b/dlls/kernelbase/file.c index 661bc0c2778..3f3c3649c31 100644 --- a/dlls/kernelbase/file.c +++ b/dlls/kernelbase/file.c @@ -3151,12 +3151,16 @@ BOOL WINAPI DECLSPEC_HOTPATCH GetOverlappedResult( HANDLE file, LPOVERLAPPED ove BOOL WINAPI DECLSPEC_HOTPATCH GetOverlappedResultEx( HANDLE file, OVERLAPPED *overlapped, DWORD *result, DWORD timeout, BOOL alertable ) { + union { + IO_STATUS_BLOCK iosb; + OVERLAPPED ovl; + } const *io = (void *)overlapped; NTSTATUS status; DWORD ret;
TRACE( "(%p %p %p %lu %d)\n", file, overlapped, result, timeout, alertable );
- status = overlapped->Internal; + status = io->iosb.u.Status; if (status == STATUS_PENDING) { if (!timeout) @@ -3164,7 +3168,7 @@ BOOL WINAPI DECLSPEC_HOTPATCH GetOverlappedResultEx( HANDLE file, OVERLAPPED *ov SetLastError( ERROR_IO_INCOMPLETE ); return FALSE; } - ret = WaitForSingleObjectEx( overlapped->hEvent ? overlapped->hEvent : file, timeout, alertable ); + ret = WaitForSingleObjectEx( io->ovl.hEvent ? io->ovl.hEvent : file, timeout, alertable ); if (ret == WAIT_FAILED) return FALSE; else if (ret) @@ -3173,11 +3177,11 @@ BOOL WINAPI DECLSPEC_HOTPATCH GetOverlappedResultEx( HANDLE file, OVERLAPPED *ov return FALSE; }
- status = overlapped->Internal; + status = io->iosb.u.Status; if (status == STATUS_PENDING) status = STATUS_SUCCESS; }
- *result = overlapped->InternalHigh; + *result = io->iosb.Information; return set_ntstatus( status ); }
From: Jinoh Kang jinoh.kang.kr@gmail.com
Make it clear that we're accessing an IO_STATUS_BLOCK.
Don't rely on the little endian representation of the Internal field on 64-bit build, when we're in fact accessing the Status field. --- dlls/ws2_32/socket.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-)
diff --git a/dlls/ws2_32/socket.c b/dlls/ws2_32/socket.c index 2f115f02e48..be491a32b97 100644 --- a/dlls/ws2_32/socket.c +++ b/dlls/ws2_32/socket.c @@ -3684,6 +3684,10 @@ BOOL WINAPI WSAGetOverlappedResult( SOCKET s, LPWSAOVERLAPPED lpOverlapped, LPDWORD lpcbTransfer, BOOL fWait, LPDWORD lpdwFlags ) { + union { + IO_STATUS_BLOCK iosb; + OVERLAPPED ovl; + } const *io = (void *)lpOverlapped; NTSTATUS status;
TRACE( "socket %#Ix, overlapped %p, transfer %p, wait %d, flags %p\n", @@ -3702,7 +3706,7 @@ BOOL WINAPI WSAGetOverlappedResult( SOCKET s, LPWSAOVERLAPPED lpOverlapped, return FALSE; }
- status = lpOverlapped->Internal; + status = io->iosb.u.Status; if (status == STATUS_PENDING) { if (!fWait) @@ -3711,17 +3715,17 @@ BOOL WINAPI WSAGetOverlappedResult( SOCKET s, LPWSAOVERLAPPED lpOverlapped, return FALSE; }
- if (WaitForSingleObject( lpOverlapped->hEvent ? lpOverlapped->hEvent : SOCKET2HANDLE(s), + if (WaitForSingleObject( io->ovl.hEvent ? io->ovl.hEvent : SOCKET2HANDLE(s), INFINITE ) == WAIT_FAILED) return FALSE; - status = lpOverlapped->Internal; + status = io->iosb.u.Status; }
if ( lpcbTransfer ) - *lpcbTransfer = lpOverlapped->InternalHigh; + *lpcbTransfer = io->iosb.Information;
if ( lpdwFlags ) - *lpdwFlags = lpOverlapped->u.s.Offset; + *lpdwFlags = io->ovl.u.s.Offset;
SetLastError( NtStatusToWSAError(status) ); TRACE( "status %#lx.\n", status );
From: Jinoh Kang jinoh.kang.kr@gmail.com
Ensure proper atomic update, so that the application do not read stale information from IO_STATUS_BLOCK.
Also, ensure IOSB Status loads are not reordered with Information loads from the consumer side as well, specifically in the following functions:
- GetOverlappedResultEx - WSAGetOverlappedResult --- dlls/kernelbase/file.c | 62 ++++++++++++++++++++++++++++- dlls/ntdll/unix/unix_private.h | 71 ++++++++++++++++++++++++++++++++-- dlls/ws2_32/socket.c | 62 ++++++++++++++++++++++++++++- 3 files changed, 190 insertions(+), 5 deletions(-)
diff --git a/dlls/kernelbase/file.c b/dlls/kernelbase/file.c index 3f3c3649c31..6034e744d81 100644 --- a/dlls/kernelbase/file.c +++ b/dlls/kernelbase/file.c @@ -3160,7 +3160,67 @@ BOOL WINAPI DECLSPEC_HOTPATCH GetOverlappedResultEx( HANDLE file, OVERLAPPED *ov
TRACE( "(%p %p %p %lu %d)\n", file, overlapped, result, timeout, alertable );
- status = io->iosb.u.Status; + /* + * When a system procedure that peforms an overlapped I/O operation returns + * before the I/O has completed, the system sets the Status field of the + * associated (passed as an argument) IO_STATUS_BLOCK to STATUS_PENDING to + * indicate that it is still pending. After the I/O operation is + * completed, the system notifies the application of the I/O completion + * through the following steps: + * + * 1. The output buffer, if any, is populated with the data retrieved from + * the I/O operation. + * + * 2. The Information field of the associated IO_STATUS_BLOCK is written to + * indicate e.g. the number of bytes transferred. + * + * 3. The Status field of the IO_STATUS_BLOCK is written to indicate the + * status of the I/O operation. + * + * 4. Finally, the system signals the application via synchronization APIs + * (waitable objects), I/O completion ports, and/or APCs. + * + * Therefore, if the Status field is set to any value other than + * STATUS_PENDING, we can safely assume that the I/O operation has + * completed. + * + * Here is a catch: memory accesses (read and write) can be reordered by the + * compiler or a processor using weak memory ordering, which affects the + * memory values observed by other threads running on different processors. + * Specifically, it's possible that the reads from the Status field gets + * reordered *after* the reads from the Information field. If we race with + * I/O completion, this results in the application reading a stale + * Information value. In fact, any memory operation (read or write) in the + * current thread could be reordered before the Status read in theory. + * + * CPU 0 (set_async_iosb) CPU 1 (GetOverlappedResultEx) + * =========================== =========================== + * READ(Information) <-- (reordered!) + * WRITE(Information) + * WRITE(Status) + * READ(Status) + * + * In order to prevent such data races, we place appropriate barriers to + * limit reordering so that overlapped I/O operations work correctly in + * multi-threaded applications. A release barrier placed just before the + * Status write is paired with an acquire barrier placed just after the + * Status read. This ensures that all writes from the completed I/O + * operation are seen after GetOverlappedResultEx observes the completion + * (i.e. Status != STATUS_PENDING). + * + * CPU 0 (set_async_iosb) CPU 1 (GetOverlappedResultEx) + * =========================== =========================== + * WRITE(Information) + * <release barrier> <-----------. + * WRITE(Status) | + * | + * | READ(Status) + * (paired with) `-> <acquire barrier> + * READ(Information) + * + * Paired with write-release of IOSB Status in ntdll's set_async_iosb. + */ + status = ReadAcquire( &io->iosb.u.Status ); if (status == STATUS_PENDING) { if (!timeout) diff --git a/dlls/ntdll/unix/unix_private.h b/dlls/ntdll/unix/unix_private.h index 73b9ed76de0..46854e2135c 100644 --- a/dlls/ntdll/unix/unix_private.h +++ b/dlls/ntdll/unix/unix_private.h @@ -361,6 +361,8 @@ static inline BOOL in_wow64_call(void)
static inline void set_async_iosb( client_ptr_t iosb, NTSTATUS status, ULONG_PTR info ) { + NTSTATUS *status_ptr; + if (!iosb) return;
if (in_wow64_call()) @@ -370,19 +372,82 @@ static inline void set_async_iosb( client_ptr_t iosb, NTSTATUS status, ULONG_PTR NTSTATUS Status; ULONG Information; } *io = wine_server_get_ptr( iosb ); - io->Status = status; + status_ptr = &io->Status; io->Information = info; } else { IO_STATUS_BLOCK *io = wine_server_get_ptr( iosb ); #ifdef NONAMELESSUNION - io->u.Status = status; + status_ptr = &io->u.Status; #else - io->Status = status; + status_ptr = &io->Status; #endif io->Information = info; } + + /* + * When a system procedure that peforms an overlapped I/O operation returns + * before the I/O has completed, the system sets the Status field of the + * associated (passed as an argument) IO_STATUS_BLOCK to STATUS_PENDING to + * indicate that it is still pending. After the I/O operation is + * completed, the system notifies the application of the I/O completion + * through the following steps: + * + * 1. The output buffer, if any, is populated with the data retrieved from + * the I/O operation. + * + * 2. The Information field of the associated IO_STATUS_BLOCK is written to + * indicate e.g. the number of bytes transferred. + * + * 3. The Status field of the IO_STATUS_BLOCK is written to indicate the + * status of the I/O operation. + * + * 4. Finally, the system signals the application via synchronization APIs + * (waitable objects), I/O completion ports, and/or APCs. + * + * Therefore, if the Status field is set to any value other than + * STATUS_PENDING, GetOverlappedResultEx can safely assume that the I/O + * operation has completed. + * + * Here is a catch: memory accesses (read and write) can be reordered by the + * compiler or a processor using weak memory ordering, which affects the + * memory values observed by other threads running on different processors. + * Specifically, it's possible that the writes to the Status field gets + * reordered *before* the writes to the Information field. If we race with + * calls to GetOverlappedResultEx or WSAGetOverlappedResultx, this results + * in the application reading a stale Information value. In fact, any + * memory operation (read or write) in the current thread could be reordered + * after the Status write in theory. + * + * CPU 0 (set_async_iosb) CPU 1 (GetOverlappedResult) + * =========================== =========================== + * WRITE(Status) + * READ(Status) + * READ(Information) <-- (stale read!) + * WRITE(Information) <-- (reordered from above) + * + * In order to prevent such data races, we place appropriate barriers to + * limit reordering so that overlapped I/O operations work correctly in + * multi-threaded applications. A release barrier placed just before the + * Status write is paired with an acquire barrier placed just after the + * Status read. This ensures that all writes from the completed I/O + * operation are seen after GetOverlappedResultEx observes the completion + * (i.e. Status != STATUS_PENDING). + * + * CPU 0 (set_async_iosb) CPU 1 (GetOverlappedResult) + * =========================== =========================== + * WRITE(Information) + * <release barrier> <-----------. + * WRITE(Status) | + * | + * | READ(Status) + * (paired with) `-> <acquire barrier> + * READ(Information) + * + * Paired with GetOverlappedResultEx and WSAGetOverlappedResult. + */ + WriteRelease( status_ptr, status ); }
static inline client_ptr_t iosb_client_ptr( IO_STATUS_BLOCK *io ) diff --git a/dlls/ws2_32/socket.c b/dlls/ws2_32/socket.c index be491a32b97..ef65f887c6a 100644 --- a/dlls/ws2_32/socket.c +++ b/dlls/ws2_32/socket.c @@ -3706,7 +3706,67 @@ BOOL WINAPI WSAGetOverlappedResult( SOCKET s, LPWSAOVERLAPPED lpOverlapped, return FALSE; }
- status = io->iosb.u.Status; + /* + * When a system procedure that peforms an overlapped I/O operation returns + * before the I/O has completed, the system sets the Status field of the + * associated (passed as an argument) IO_STATUS_BLOCK to STATUS_PENDING to + * indicate that it is still pending. After the I/O operation is + * completed, the system notifies the application of the I/O completion + * through the following steps: + * + * 1. The output buffer, if any, is populated with the data retrieved from + * the I/O operation. + * + * 2. The Information field of the associated IO_STATUS_BLOCK is written to + * indicate e.g. the number of bytes transferred. + * + * 3. The Status field of the IO_STATUS_BLOCK is written to indicate the + * status of the I/O operation. + * + * 4. Finally, the system signals the application via synchronization APIs + * (waitable objects), I/O completion ports, and/or APCs. + * + * Therefore, if the Status field is set to any value other than + * STATUS_PENDING, we can safely assume that the I/O operation has + * completed. + * + * Here is a catch: memory accesses (read and write) can be reordered by the + * compiler or a processor using weak memory ordering, which affects the + * memory values observed by other threads running on different processors. + * Specifically, it's possible that the reads from the Status field gets + * reordered *after* the reads from the Information field. If we race with + * I/O completion, this results in the application reading a stale + * Information value. In fact, any memory operation (read or write) in the + * current thread could be reordered before the Status read in theory. + * + * CPU 0 (set_async_iosb) CPU 1 (WSAGetOverlappedResult) + * =========================== =========================== + * READ(Information) <-- (reordered!) + * WRITE(Information) + * WRITE(Status) + * READ(Status) + * + * In order to prevent such data races, we place appropriate barriers to + * limit reordering so that overlapped I/O operations work correctly in + * multi-threaded applications. A release barrier placed just before the + * Status write is paired with an acquire barrier placed just after the + * Status read. This ensures that all writes from the completed I/O + * operation are seen after WSAGetOverlappedResult observes the completion + * (i.e. Status != STATUS_PENDING). + * + * CPU 0 (set_async_iosb) CPU 1 (WSAGetOverlappedResult) + * =========================== =========================== + * WRITE(Information) + * <release barrier> <-----------. + * WRITE(Status) | + * | + * | READ(Status) + * (paired with) `-> <acquire barrier> + * READ(Information) + * + * Paired with write-release of IOSB Status in ntdll's set_async_iosb. + */ + status = ReadAcquire( &io->iosb.u.Status ); if (status == STATUS_PENDING) { if (!fWait)
Sorry for the long delay in reviewing this. I've submitted a merge request !1342, which is closer to what I was envisioning in terms of documentation.
This merge request was closed by Jinoh Kang.