Wine has many unreliable tests which makes it hard to know whether a patch causes new failures. So one proposal was to mark unreliable tests as flaky, and have a mode where flaky failures can ignored by setting $WINETEST_ALLOW_FLAKY.
So one would be able to apply a patch, run 'WINETEST_ALLOW_FLAKY=1 make test' and if that fails that means the patch introduces a new failure.
The way it works is that when a flaky test fails the message is 'Flaky failed...' instead of 'Test failed...'. I also added the count of flaky failures to the summary line. So the flaky failures are not totally swept under the rug. The main difference is that if $WINETEST_ALLOW_FLAKY is set, the exit code is 0 if the only failures are flaky ones.
That still leaves some open questions though:
* Should the message be 'Test failed' instead of 'Flaky failed' when $WINETEST_ALLOW_FLAKY is not set? I opted for the latter because it adds information that the failed test has been marked as flaky.
* Should tests that fail systematically on some platforms but not others (for instance specific Windows versions, specific locales, specific GPUs) be marked as flaky? Or should flaky be reserved for tests that fail randomly?
* What about failures where the message is different every time so that the failure always looks new? Should these not be marked as flaky since it only impacts the TestBot?
* On a related note the patch below has flaky_windows and flaky_wine macros. Is that too fine grained? Maybe we don't want to run the risk of marking a test as flaky_windows only to discover later that it can also fail in Wine? But conversely, wouldn't it be valuable to know that a test is only flaky in Wine and not in Windows?
Note: flaky_windows is a bit long but I was worried that flaky_win would be too similar to flaky_wine.
* I decided having flaky_{windows,wine}_if() macros would be overkill.
* How does one know a flaky directive can be removed?
* Is it allowed to submit a patch that contains new flaky directives?
* Also the flaky directive cannot deal with tests that randomly crash, prematurely exit, or time out. These issues are rarer so maybe we can hope they can all be fixed?
* How does this impact bug 48912, aka "Allow blacklisting unreliable and always new failures"? https://bugs.winehq.org/show_bug.cgi?id=48912
* test.winehq.org and the TestBot will need to be updated to highlight the new messages and, more importantly, to recognize the new summary line.
The patch is below and I bundled it with tweaks to a few tests for illustration purposes: d3d8:device - Failure happens on Windows (rarely) and Wine. dinput:mouse - Failure happens on Windows 7 and Wine. mmdevapi:render - Timing issue. schedsvc:atsvcapi - Wine-only failure. --- dlls/d3d8/tests/device.c | 2 +- dlls/dinput/tests/mouse.c | 2 +- dlls/mmdevapi/tests/render.c | 2 +- dlls/schedsvc/tests/atsvcapi.c | 2 +- include/wine/test.h | 78 +++++++++++++++++++++++++++++----- 5 files changed, 72 insertions(+), 14 deletions(-)
diff --git a/dlls/d3d8/tests/device.c b/dlls/d3d8/tests/device.c index cb1e83b64b3..f0e8c27fafd 100644 --- a/dlls/d3d8/tests/device.c +++ b/dlls/d3d8/tests/device.c @@ -3356,7 +3356,7 @@ static void test_wndproc(void)
expect_messages = sc_maximize_messages; SendMessageA(focus_window, WM_SYSCOMMAND, SC_MAXIMIZE, 0); - ok(!expect_messages->message, "Expected message %#x for window %#x, but didn't receive it.\n", + flaky ok(!expect_messages->message, "Expected message %#x for window %#x, but didn't receive it.\n", expect_messages->message, expect_messages->window); expect_messages = NULL; flush_events(); diff --git a/dlls/dinput/tests/mouse.c b/dlls/dinput/tests/mouse.c index 8be808e5da6..dcb5453a825 100644 --- a/dlls/dinput/tests/mouse.c +++ b/dlls/dinput/tests/mouse.c @@ -156,7 +156,7 @@ static void test_acquire(IDirectInputA *pDI, HWND hwnd)
SetActiveWindow( hwnd ); hr = IDirectInputDevice_Acquire(pMouse); - ok(hr == S_OK, "Acquire() failed: %#lx\n", hr); + flaky ok(hr == S_OK, "Acquire() failed: %#lx\n", hr);
mouse_event(MOUSEEVENTF_MOVE, 10, 10, 0, 0); cnt = 1; diff --git a/dlls/mmdevapi/tests/render.c b/dlls/mmdevapi/tests/render.c index 4e19b91c5f5..882be89b7ff 100644 --- a/dlls/mmdevapi/tests/render.c +++ b/dlls/mmdevapi/tests/render.c @@ -1098,7 +1098,7 @@ static void test_clock(int share) ok(pos == 0, "GetPosition returned non-zero pos before being started\n");
hr = IAudioClient_Start(ac); /* #1 */ - ok(hr == S_OK, "Start failed: %08lx\n", hr); + flaky ok(hr == S_OK, "Start failed: %08lx\n", hr);
Sleep(100); slept += 100; diff --git a/dlls/schedsvc/tests/atsvcapi.c b/dlls/schedsvc/tests/atsvcapi.c index 3aaad8b5cfe..83774804b40 100644 --- a/dlls/schedsvc/tests/atsvcapi.c +++ b/dlls/schedsvc/tests/atsvcapi.c @@ -168,7 +168,7 @@ START_TEST(atsvcapi)
skip_tests_delete: ret = NetrJobDel(server_name, jobid, jobid); - ok(ret == ERROR_SUCCESS, "NetrJobDel error %lu\n", ret); + flaky_wine ok(ret == ERROR_SUCCESS, "NetrJobDel error %lu\n", ret);
skip_tests: SetUnhandledExceptionFilter(old_exception_filter); diff --git a/include/wine/test.h b/include/wine/test.h index 01ba81f4857..422ba3dce38 100644 --- a/include/wine/test.h +++ b/include/wine/test.h @@ -50,6 +50,9 @@ extern int winetest_time; /* running in interactive mode? */ extern int winetest_interactive;
+/* always count flaky tests as successful (BOOL) */ +extern int winetest_allow_flaky; + /* report successful tests (BOOL) */ extern int winetest_report_success;
@@ -117,6 +120,13 @@ extern void winetest_pop_context(void); #define trace trace_(__FILE__, __LINE__) #define wait_child_process wait_child_process_(__FILE__, __LINE__)
+#define flaky_if(is_flaky) for (winetest_start_flaky(is_flaky); \ + winetest_loop_flaky(); \ + winetest_end_flaky()) +#define flaky flaky_if(1) +#define flaky_windows flaky_if(!strcmp(winetest_platform, "windows")) +#define flaky_wine flaky_if(!strcmp(winetest_platform, "wine")) + #define todo_if(is_todo) for (winetest_start_todo(is_todo); \ winetest_loop_todo(); \ winetest_end_todo()) @@ -200,6 +210,9 @@ int winetest_interactive = 0; /* current platform */ const char *winetest_platform = "windows";
+/* always count flaky tests as successful (BOOL) */ +int winetest_allow_flaky; + /* report successful tests (BOOL) */ int winetest_report_success = 0;
@@ -214,6 +227,7 @@ static const struct test *current_test; /* test currently being run */
static LONG successes; /* number of successful tests */ static LONG failures; /* number of failures */ +static LONG flaky_failures; /* number of failures inside flaky block */ static LONG skipped; /* number of skipped test chunks */ static LONG todo_successes; /* number of successful tests inside todo block */ static LONG todo_failures; /* number of failures inside todo block */ @@ -229,6 +243,8 @@ struct tls_data { const char* current_file; /* file of current check */ int current_line; /* line of current check */ + unsigned int flaky_level; /* current flaky nesting level */ + int flaky_do_loop; unsigned int todo_level; /* current todo nesting level */ int todo_do_loop; char *str_pos; /* position in debug buffer */ @@ -355,9 +371,18 @@ int winetest_vok( int condition, const char *msg, va_list args ) { if (condition) { - winetest_print_context( "Test succeeded inside todo block: " ); - vprintf(msg, args); - InterlockedIncrement(&todo_failures); + if (data->flaky_level) + { + winetest_print_context( "Flaky succeeded inside todo block: " ); + vprintf(msg, args); + InterlockedIncrement(&flaky_failures); + } + else + { + winetest_print_context( "Test succeeded inside todo block: " ); + vprintf(msg, args); + InterlockedIncrement(&todo_failures); + } return 0; } else @@ -381,9 +406,18 @@ int winetest_vok( int condition, const char *msg, va_list args ) { if (!condition) { - winetest_print_context( "Test failed: " ); - vprintf(msg, args); - InterlockedIncrement(&failures); + if (data->flaky_level) + { + winetest_print_context( "Flaky failed: " ); + vprintf(msg, args); + InterlockedIncrement(&flaky_failures); + } + else + { + winetest_print_context( "Test failed: " ); + vprintf(msg, args); + InterlockedIncrement(&failures); + } return 0; } else @@ -456,6 +490,27 @@ void winetest_win_skip( const char *msg, ... ) va_end(valist); }
+void winetest_start_flaky( int is_flaky ) +{ + struct tls_data *data = get_tls_data(); + data->flaky_level = (data->flaky_level << 1) | (is_flaky != 0); + data->flaky_do_loop=1; +} + +int winetest_loop_flaky(void) +{ + struct tls_data *data = get_tls_data(); + int do_flaky=data->flaky_do_loop; + data->flaky_do_loop=0; + return do_flaky; +} + +void winetest_end_flaky(void) +{ + struct tls_data *data = get_tls_data(); + data->flaky_level >>= 1; +} + void winetest_start_todo( int is_todo ) { struct tls_data *data = get_tls_data(); @@ -601,14 +656,16 @@ static int run_test( const char *name ) printf( "%04x:%s:%s Silenced %d todos, %d skips and %d traces.\n", (UINT)GetCurrentProcessId(), test->name, winetest_elapsed(), (UINT)muted_todo_successes, (UINT)muted_skipped, (UINT)muted_traces); - printf( "%04x:%s:%s %d tests executed (%d marked as todo, %d %s), %d skipped.\n", + printf( "%04x:%s:%s %d tests executed (%d marked as todo, %d as flaky, %d %s), %d skipped.\n", (UINT)GetCurrentProcessId(), test->name, winetest_elapsed(), - (UINT)(successes + failures + todo_successes + todo_failures), - (UINT)todo_successes, (UINT)(failures + todo_failures), + (UINT)(successes + failures + flaky_failures + todo_successes + todo_failures), + (UINT)todo_successes, (UINT)flaky_failures, (UINT)(failures + todo_failures), (failures + todo_failures != 1) ? "failures" : "failure", (UINT)skipped ); } - status = (failures + todo_failures < 255) ? failures + todo_failures : 255; + status = failures + todo_failures; + if (!winetest_allow_flaky) status += flaky_failures; + if (status > 255) status = 255; return status; }
@@ -665,6 +722,7 @@ int main( int argc, char **argv )
if (GetEnvironmentVariableA( "WINETEST_DEBUG", p, sizeof(p) )) winetest_debug = atoi(p); if (GetEnvironmentVariableA( "WINETEST_INTERACTIVE", p, sizeof(p) )) winetest_interactive = atoi(p); + if (GetEnvironmentVariableA( "WINETEST_ALLOW_FLAKY", p, sizeof(p) )) winetest_allow_flaky = atoi(p); if (GetEnvironmentVariableA( "WINETEST_REPORT_SUCCESS", p, sizeof(p) )) winetest_report_success = atoi(p); if (GetEnvironmentVariableA( "WINETEST_TIME", p, sizeof(p) )) winetest_time = atoi(p); winetest_last_time = winetest_start_time = GetTickCount();
On 6/1/22 15:26, Francois Gouget wrote:
Wine has many unreliable tests which makes it hard to know whether a patch causes new failures. So one proposal was to mark unreliable tests as flaky, and have a mode where flaky failures can ignored by setting $WINETEST_ALLOW_FLAKY.
So one would be able to apply a patch, run 'WINETEST_ALLOW_FLAKY=1 make test' and if that fails that means the patch introduces a new failure.
The way it works is that when a flaky test fails the message is 'Flaky failed...' instead of 'Test failed...'. I also added the count of flaky failures to the summary line. So the flaky failures are not totally swept under the rug. The main difference is that if $WINETEST_ALLOW_FLAKY is set, the exit code is 0 if the only failures are flaky ones.
That still leaves some open questions though:
- Should the message be 'Test failed' instead of 'Flaky failed' when $WINETEST_ALLOW_FLAKY is not set? I opted for the latter because it adds information that the failed test has been marked as flaky.
Imho the message should keep the 'Test failed' / 'Test succeeded' prefixes. For instance I think it would look more consistent with something like:
Test succeeded inside todo block: Test succeeded inside flaky block: Test marked todo: Test marked flaky: Test failed: Test succeeded:
On a related note the patch below has flaky_windows and flaky_wine macros. Is that too fine grained? Maybe we don't want to run the risk of marking a test as flaky_windows only to discover later that it can also fail in Wine? But conversely, wouldn't it be valuable to know that a test is only flaky in Wine and not in Windows?
Note: flaky_windows is a bit long but I was worried that flaky_win would be too similar to flaky_wine.
I decided having flaky_{windows,wine}_if() macros would be overkill.
I think flaky_if should be enough, and then there could be some shortcut global variables for `!strcmp(winetest_platform, "windows" / "wine")`, so it's not too long to write.
Cheers,
On Wed, 1 Jun 2022, Rémi Bernon wrote: [...]
Imho the message should keep the 'Test failed' / 'Test succeeded' prefixes. For instance I think it would look more consistent with something like:
Test succeeded inside todo block:
This is a non flaky todo test which succeeds.
Test succeeded inside flaky block:
The message does not reflect that such a test is both flaky *and* todo. Mabe something like this would be clearer:
Test succeeded inside flaky todo block:
Test marked todo: Test marked flaky: Test failed: Test succeeded:
That can work.
[...]
I think flaky_if should be enough, and then there could be some shortcut global variables for `!strcmp(winetest_platform, "windows" / "wine")`, so it's not too long to write.
So: flaky_if( is_windows ) flaky_if( is_wine ) and plain flaky()
I worry that we would risk name collisions but winetest_is_windows would defeat the purpose. (plus we already have some short potentially collision-prone names and it worked out so far)
It's tempting to have things line is_win10 but again that's likely either too fine grained or not enough which probably makes it useless (is_win10_2009_or_greater).
On 8/31/22 18:20, Francois Gouget wrote:
On Wed, 1 Jun 2022, Rémi Bernon wrote: [...]
Imho the message should keep the 'Test failed' / 'Test succeeded' prefixes. For instance I think it would look more consistent with something like:
Test succeeded inside todo block:
This is a non flaky todo test which succeeds.
Test succeeded inside flaky block:
The message does not reflect that such a test is both flaky *and* todo. Mabe something like this would be clearer: Test succeeded inside flaky todo block:
Test marked todo: Test marked flaky: Test failed: Test succeeded:
That can work.
[...]
I think flaky_if should be enough, and then there could be some shortcut global variables for `!strcmp(winetest_platform, "windows" / "wine")`, so it's not too long to write.
So: flaky_if( is_windows ) flaky_if( is_wine ) and plain flaky()
I worry that we would risk name collisions but winetest_is_windows would defeat the purpose. (plus we already have some short potentially collision-prone names and it worked out so far)
It's tempting to have things line is_win10 but again that's likely either too fine grained or not enough which probably makes it useless (is_win10_2009_or_greater).
Yeah I don't think we need something finer for the most common cases. And I believe we're trying to avoid having tests checking Windows versions in general, using feature checks instead.
Francois Gouget fgouget@codeweavers.com writes:
On Wed, 1 Jun 2022, Rémi Bernon wrote: [...]
I think flaky_if should be enough, and then there could be some shortcut global variables for `!strcmp(winetest_platform, "windows" / "wine")`, so it's not too long to write.
So: flaky_if( is_windows ) flaky_if( is_wine ) and plain flaky()
I think flaky_wine is better. I don't see a need for flaky_windows though, there's no reason to enforce that a test should always succeed on Wine if it doesn't on Windows.
Hi,
While running your changed tests, I think I found new failures. Being a bot and all I'm not very good at pattern recognition, so I might be wrong, but could you please double-check?
Full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=115928
Your paranoid android.
=== w7u_2qxl (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for skip messages
=== w7u_adm (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for skip messages
=== w7u_el (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for skip messages
=== w8 (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for skip messages
=== w8adm (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for skip messages
=== w864 (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for skip messages
=== w1064v1507 (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for skip messages
=== w1064v1809 (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for skip messages
=== w1064 (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for skip messages
=== w1064_tsign (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for skip messages
=== w10pro64 (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for skip messages
=== w7u_2qxl (32 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w7u_adm (32 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w7u_el (32 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w8 (32 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w8adm (32 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w864 (32 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w1064v1507 (32 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w1064v1809 (32 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w1064 (32 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w1064_tsign (32 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w10pro64 (32 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w864 (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w1064v1507 (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w1064v1809 (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w1064 (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w1064_2qxl (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w1064_adm (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w1064_tsign (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w10pro64 (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w10pro64_en_AE_u8 (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w10pro64_ar (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w10pro64_ja (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w10pro64_zh_CN (64 bit report) ===
Report validation errors: dinput:mouse has no test summary line (early exit of the main process?)
=== w7u_2qxl (32 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 6495 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w7u_adm (32 bit report) ===
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== w7u_el (32 bit report) ===
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== w8 (32 bit report) ===
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w8adm (32 bit report) ===
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== w864 (32 bit report) ===
mmdevapi: render.c:1340: Test failed: GetBuffer large (20671) at iteration 2 render.c:1114: Test failed: Position 6656 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w1064v1507 (32 bit report) ===
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== w1064v1809 (32 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 54464 too far after playing 100ms render.c:1114: Test failed: Position 6494 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w1064 (32 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 58280 too far after playing 100ms render.c:1114: Test failed: Position 6866 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w1064_tsign (32 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 58568 too far after playing 100ms render.c:1340: Test failed: GetBuffer large (22500) at iteration 6 render.c:1114: Test failed: Position 7323 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w10pro64 (32 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 57944 too far after playing 100ms render.c:1114: Test failed: Position 6922 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w864 (64 bit report) ===
mmdevapi: render.c:1340: Test failed: GetBuffer large (20671) at iteration 5
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w1064v1507 (64 bit report) ===
mmdevapi: render.c:1340: Test failed: GetBuffer large (20671) at iteration 6 render.c:1114: Test failed: Position 6728 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w1064v1809 (64 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 6421 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w1064 (64 bit report) ===
mmdevapi: render.c:1340: Test failed: GetBuffer large (22500) at iteration 5 render.c:1114: Test failed: Position 6878 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w1064_2qxl (64 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 57872 too far after playing 100ms render.c:1114: Test failed: Position 7054 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w1064_adm (64 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 58688 too far after playing 100ms render.c:1340: Test failed: GetBuffer large (22500) at iteration 7 render.c:1340: Test failed: GetBuffer large (22500) at iteration 8 render.c:1114: Test failed: Position 7091 too far after playing 100ms render.c:1340: Test failed: GetBuffer large (22500) at iteration 5
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w1064_tsign (64 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 58664 too far after playing 100ms render.c:1114: Test failed: Position 6820 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w10pro64 (64 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 58624 too far after playing 100ms render.c:1114: Test failed: Position 7522 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w10pro64_en_AE_u8 (64 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 56680 too far after playing 100ms render.c:1114: Test failed: Position 8151 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w10pro64_ar (64 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 55992 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w10pro64_ja (64 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 55992 too far after playing 100ms render.c:1114: Test failed: Position 7804 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w10pro64_zh_CN (64 bit report) ===
mmdevapi: render.c:1114: Test failed: Position 57280 too far after playing 100ms render.c:1114: Test failed: Position 7404 too far after playing 100ms
Report validation errors: mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== w7u_2qxl (32 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?)
=== w7u_adm (32 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w7u_el (32 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?)
=== w8 (32 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w8adm (32 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w864 (32 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w1064v1507 (32 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w1064v1809 (32 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w1064 (32 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w1064_tsign (32 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w10pro64 (32 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w864 (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w1064v1507 (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w1064v1809 (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w1064 (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w1064_2qxl (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w1064_adm (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w1064_tsign (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w10pro64 (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w10pro64_en_AE_u8 (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w10pro64_ar (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w10pro64_ja (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== w10pro64_zh_CN (64 bit report) ===
Report validation errors: schedsvc:atsvcapi has no test summary line (early exit of the main process?) schedsvc:atsvcapi has unaccounted for skip messages
=== debian11 (32 bit report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for failure messages d3d8:device has unaccounted for todo messages d3d8:device has unaccounted for skip messages dinput:mouse has no test summary line (early exit of the main process?) mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== debian11 (32 bit Arabic:Morocco report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for failure messages d3d8:device has unaccounted for todo messages d3d8:device has unaccounted for skip messages dinput:mouse has no test summary line (early exit of the main process?) mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== debian11 (32 bit German report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for failure messages d3d8:device has unaccounted for todo messages d3d8:device has unaccounted for skip messages dinput:mouse has no test summary line (early exit of the main process?) mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== debian11 (32 bit French report) ===
mmdevapi: render.c:1114: Test failed: Position 54248 too far after playing 100ms
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for failure messages d3d8:device has unaccounted for todo messages d3d8:device has unaccounted for skip messages dinput:mouse has no test summary line (early exit of the main process?) mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for failure messages mmdevapi:render has unaccounted for skip messages
=== debian11 (32 bit Hebrew:Israel report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for failure messages d3d8:device has unaccounted for todo messages d3d8:device has unaccounted for skip messages dinput:mouse has no test summary line (early exit of the main process?) mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== debian11 (32 bit Hindi:India report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for failure messages d3d8:device has unaccounted for todo messages d3d8:device has unaccounted for skip messages dinput:mouse has no test summary line (early exit of the main process?) mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== debian11 (32 bit Japanese:Japan report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for failure messages d3d8:device has unaccounted for todo messages d3d8:device has unaccounted for skip messages dinput:mouse has no test summary line (early exit of the main process?) mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== debian11 (32 bit Chinese:China report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for failure messages d3d8:device has unaccounted for todo messages d3d8:device has unaccounted for skip messages dinput:mouse has no test summary line (early exit of the main process?) mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== debian11 (32 bit WoW report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for failure messages d3d8:device has unaccounted for todo messages d3d8:device has unaccounted for skip messages dinput:mouse has no test summary line (early exit of the main process?) mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
=== debian11 (64 bit WoW report) ===
Report validation errors: d3d8:device has no test summary line (early exit of the main process?) d3d8:device has unaccounted for failure messages d3d8:device has unaccounted for todo messages d3d8:device has unaccounted for skip messages dinput:mouse has no test summary line (early exit of the main process?) mmdevapi:render has no test summary line (early exit of the main process?) mmdevapi:render has unaccounted for skip messages
It might be better to use something like broken() to encourage more precision in marking the failures. For example, if something fails with ACCESS_DENIED sometimes, we'd still like to know if it starts failing with FILE_NOT_FOUND. That does make it trickier to add flaky tests to the summary line, though. I guess you could set it up so that if flaky() is called with non-zero and the next ok() succeeds, it's marked as flaky.
Having the failures in the summary line also opens up the option of handling them by retrying, and can be implemented without making that decision first. Of course, retrying leaves open the possibility that you will be unlucky and all the retries will also fail. Whether that's worth it would probably depend on how often a regression on a flaky test slips through and causes the tests to fail for Alexandre.
On Wed, 1 Jun 2022, Esme Povirk (she/they) wrote:
It might be better to use something like broken() to encourage more precision in marking the failures. For example, if something fails with ACCESS_DENIED sometimes, we'd still like to know if it starts failing with FILE_NOT_FOUND.
So something like this?
ok(success || flaky(gle == ACCESS_DENIED), "Call failed: gle=%lu\n", gle);
I suspect we'd get a lot of "|| flaky(1)", for instance for all the "always new" error messages.
Also the goal is to make the tests more "reliable" but if we are too specific about which unexpected conditions we allow there's a risk we'll spend our time expanding the condition inside the flaky() call. But maybe it would be okay to give it a few tries and fall back to flaky(1).
I pushed a merge request with the latest version so you can also comment there: https://gitlab.winehq.org/wine/wine/-/merge_requests/764
See also my flaky-tests branch for some use cases: https://gitlab.winehq.org/fgouget/wine/-/commits/flaky-tests/
On to the next question: do we need to take special steps to make sure one cannot just (sneakily?) add bad tests by prefixing them with flaky*?
One option is to do nothing and let reviewers deal with it. Certainly if they spot a flaky addition they would scrutinize the patch and ask questions as needed. And if bad flaky calls slip through, the developer's trustiness would be adjusted accordingly.
But I think we could have a CI check that requires specific explanations for commits with flaky additions:
- If there are more '^+.*flaky' lines in a commit than there are '^-.*flaky' ones, then that commit has a flaky addition. - If so the CI check would require a 'Flaky-Tests' line in the commit message where the submitter explains why the additional flaky is needed.
I mostly see a CI check as having three benefits: - Reduce the risk of reviewers missing a flaky addition. - Save a step for the reviewers by providing them with an explanation of why the flaky is needed from the get go. - Raise the inconvenience of using flaky so in most cases it's easier / less embarrassing to fix the test.
Flaky tests are really buggy tests. So we could also require a Wine-Bug line in addition to the Flaky-Tests one for tracking progress on fixing that bug.
That is unless Wine-Bug lines are only supposed to be used when fixing a bug. Another approach would be to put the bug URL on the Flaky-Tests line and leave the explanation in the bug. But that makes the commit message less directly useful and not self-sufficient.
Also that line could be called Flaky-Test instead of Flaky-Tests. Depends if it's about the test unit or the individual ok() calls.
On 9/1/22 12:05, Francois Gouget wrote:
I pushed a merge request with the latest version so you can also comment there: https://gitlab.winehq.org/wine/wine/-/merge_requests/764
See also my flaky-tests branch for some use cases: https://gitlab.winehq.org/fgouget/wine/-/commits/flaky-tests/
On to the next question: do we need to take special steps to make sure one cannot just (sneakily?) add bad tests by prefixing them with flaky*?
One option is to do nothing and let reviewers deal with it. Certainly if they spot a flaky addition they would scrutinize the patch and ask questions as needed. And if bad flaky calls slip through, the developer's trustiness would be adjusted accordingly.
Why would we allow (intentionally) adding any flaky tests at all? My interpretation of "flaky" is that anything that currently has it is a bug and should be fixed.
On Thu, 1 Sep 2022, Zebediah Figura wrote: [...]
Why would we allow (intentionally) adding any flaky tests at all? My interpretation of "flaky" is that anything that currently has it is a bug and should be fixed.
Initially we will need to add a bunch to deal with the current ~170 test units (~20%) that fail on a regular basis. But yes, beyond that we may ban adding more flaky tests entirely. In that case it would still be useful to identify commits that add flaky tests but we would not need the Flaky-Test mechanism.
And should we need to add a flaky call anyway (maybe because a test is restructured so that a single flaky call is split in two) then maybe the author can ask Alexandre to manually cherry-pick from a GitLab repository (or such cases fall back to email).
On Fri, Sep 02, 2022 at 12:41:42PM +0200, Francois Gouget wrote:
On Thu, 1 Sep 2022, Zebediah Figura wrote: [...]
Why would we allow (intentionally) adding any flaky tests at all? My interpretation of "flaky" is that anything that currently has it is a bug and should be fixed.
Initially we will need to add a bunch to deal with the current ~170 test units (~20%) that fail on a regular basis. But yes, beyond that we may ban adding more flaky tests entirely. In that case it would still be useful to identify commits that add flaky tests but we would not need the Flaky-Test mechanism.
And should we need to add a flaky call anyway (maybe because a test is restructured so that a single flaky call is split in two) then maybe the author can ask Alexandre to manually cherry-pick from a GitLab repository (or such cases fall back to email).
The issue is that we don't know that the tests are flakey when they get committed; that only becomes apparent at some later stage.
Huw.
On Fri, 2 Sep 2022, Huw Davies wrote: [...]
The issue is that we don't know that the tests are flakey when they get committed; that only becomes apparent at some later stage.
Whatever CI is used it should detect most flaky tests before they are committed. That's part of the point of running the test not once but multiple times (the other being to test multiple configurations).