[PATCH 0/2] MR764: tests: Allow marking unreliable tests as flaky.

List overview All Threads

newer

older

[PATCH v5 0/3] MR607: mf: Fix...

Re: [PATCH v3 0/4] MR710: Fixes...

Francois Gouget (＠fgouget)

1 Sep 2022 1 Sep '22

2:16 p.m.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/764

Show replies by date

Francois Gouget

1 Sep 1 Sep

2:16 p.m.

New subject: [PATCH 1/2] tests: Add variables to simplify checking the platform.

From: Francois Gouget fgouget@codeweavers.com

Remove the redundant is_wine variables in the tests. --- dlls/ddraw/tests/ddraw4.c | 3 --- dlls/ddraw/tests/ddraw7.c | 3 --- dlls/user32/tests/win.c | 1 - include/wine/test.h | 14 +++++++++----- 4 files changed, 9 insertions(+), 12 deletions(-)

diff --git a/dlls/ddraw/tests/ddraw4.c b/dlls/ddraw/tests/ddraw4.c index 471f6fc3f09..8347d712400 100644 --- a/dlls/ddraw/tests/ddraw4.c +++ b/dlls/ddraw/tests/ddraw4.c @@ -18086,7 +18086,6 @@ static void test_surface_format_conversion_alpha(void) DWORD supported_fmts; IDirectDraw4 *ddraw; ULONG refcount; - BOOL is_wine; HWND window; BOOL passed; HRESULT hr; @@ -18108,8 +18107,6 @@ static void test_surface_format_conversion_alpha(void) &supported_fmts); ok(hr == DD_OK, "Got unexpected hr %#lx.\n", hr);

- is_wine = !strcmp(winetest_platform, "wine"); - memset(&surface_desc, 0, sizeof(surface_desc)); surface_desc.dwSize = sizeof(surface_desc); surface_desc.dwFlags = DDSD_CAPS | DDSD_WIDTH | DDSD_HEIGHT | DDSD_PIXELFORMAT; diff --git a/dlls/ddraw/tests/ddraw7.c b/dlls/ddraw/tests/ddraw7.c index 15ab548ebc6..03a2bb408a2 100644 --- a/dlls/ddraw/tests/ddraw7.c +++ b/dlls/ddraw/tests/ddraw7.c @@ -18106,7 +18106,6 @@ static void test_surface_format_conversion_alpha(void) DWORD supported_fmts; IDirectDraw7 *ddraw; ULONG refcount; - BOOL is_wine; HWND window; BOOL passed; HRESULT hr; @@ -18128,8 +18127,6 @@ static void test_surface_format_conversion_alpha(void) &supported_fmts); ok(hr == DD_OK, "Got unexpected hr %#lx.\n", hr);

- is_wine = !strcmp(winetest_platform, "wine"); - memset(&surface_desc, 0, sizeof(surface_desc)); surface_desc.dwSize = sizeof(surface_desc); surface_desc.dwFlags = DDSD_CAPS | DDSD_WIDTH | DDSD_HEIGHT | DDSD_PIXELFORMAT; diff --git a/dlls/user32/tests/win.c b/dlls/user32/tests/win.c index 4c2af09d0c6..c2765a8810c 100644 --- a/dlls/user32/tests/win.c +++ b/dlls/user32/tests/win.c @@ -11611,7 +11611,6 @@ static void reset_window_state(HWND *state, int n) static void test_topmost(void) { HWND owner, hwnd, hwnd2, hwnd_child, hwnd_child2, hwnd_grandchild, state[6] = { 0 }; - BOOL is_wine = !strcmp(winetest_platform, "wine");

owner = create_tool_window(WS_CAPTION | WS_SYSMENU | WS_MINIMIZEBOX | WS_MAXIMIZEBOX | WS_POPUP | WS_VISIBLE, 0); diff --git a/include/wine/test.h b/include/wine/test.h index 1690c5613d8..0d9e4d52267 100644 --- a/include/wine/test.h +++ b/include/wine/test.h @@ -52,6 +52,7 @@ extern int winetest_mute_threshold;

/* current platform */ extern const char *winetest_platform; +extern int is_windows, is_wine;

extern void winetest_set_location( const char* file, int line ); extern void winetest_subtest( const char* name ); @@ -114,8 +115,8 @@ extern void winetest_pop_context(void); #define todo_if(is_todo) for (winetest_start_todo(is_todo); \ winetest_loop_todo(); \ winetest_end_todo()) -#define todo_wine todo_if(!strcmp(winetest_platform, "wine")) -#define todo_wine_if(is_todo) todo_if((is_todo) && !strcmp(winetest_platform, "wine")) +#define todo_wine todo_if(is_wine) +#define todo_wine_if(is_todo) todo_if((is_todo) && is_wine)

#ifndef ARRAY_SIZE @@ -193,6 +194,7 @@ int winetest_interactive = 0;

/* current platform */ const char *winetest_platform = "windows"; +int is_windows, is_wine;

/* report successful tests (BOOL) */ int winetest_report_success = 0; @@ -321,7 +323,7 @@ void winetest_ignore_exceptions( BOOL ignore )

int broken( int condition ) { - return (strcmp(winetest_platform, "windows") == 0) && condition; + return is_windows && condition; }

static LONG winetest_add_line( void ) @@ -463,7 +465,7 @@ void winetest_win_skip( const char *msg, ... ) { va_list valist; va_start(valist, msg); - if (strcmp(winetest_platform, "windows") == 0) + if (is_windows) winetest_vskip(msg, valist); else winetest_vok(0, msg, valist); @@ -676,6 +678,8 @@ int main( int argc, char **argv ) winetest_platform = strdup(p); else if (running_under_wine()) winetest_platform = "wine"; + is_windows = !strcmp( winetest_platform, "windows" ); + is_wine = !strcmp( winetest_platform, "wine" );

if (GetEnvironmentVariableA( "WINETEST_COLOR", p, sizeof(p) )) winetest_color = !strcasecmp(p, "auto") ? isatty(fileno(stdout)) : atoi(p); @@ -685,7 +689,7 @@ int main( int argc, char **argv ) if (GetEnvironmentVariableA( "WINETEST_TIME", p, sizeof(p) )) winetest_time = atoi(p); winetest_last_time = winetest_start_time = GetTickCount();

- if (!strcmp( winetest_platform, "windows" )) SetUnhandledExceptionFilter( exc_filter ); + if (is_windows) SetUnhandledExceptionFilter( exc_filter ); if (!winetest_interactive) SetErrorMode( SEM_FAILCRITICALERRORS | SEM_NOGPFAULTERRORBOX );

if (!argv[1])

-- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/764

Marvin

9:03 p.m.

New subject: [PATCH 1/2] tests: Add variables to simplify checking the platform.

Hi,

It looks like your patch introduced the new failures shown below. Please investigate and fix them before resubmitting your patch. If they are not new, fixing them anyway would help a lot. Otherwise please ask for the known failures list to be updated.

The tests also ran into some preexisting test failures. If you know how to fix them that would be helpful. See the TestBot job for the details:

The full results can be found at: https://testbot.winehq.org/JobDetails.pl?Key=122406

Your paranoid android.

=== w864 (32 bit report) ===

ddraw: ddraw7.c:18935: Test failed: Got unexpected color 0x00000040.

=== w1064v1507 (32 bit report) ===

ddraw: ddraw7.c:18935: Test failed: Got unexpected color 0x00000040.

=== w1064v1809 (32 bit report) ===

ddraw: ddraw7.c:18935: Test failed: Got unexpected color 0x00000040.

=== w10pro64_en_AE_u8 (64 bit report) ===

user32: win.c:4579: Test failed: hwnd 0000000000080078/0000000000080078 message 0200 win.c:4583: Test failed: hwnd 0000000000080078/0000000000080078 message 0201 win.c:4592: Test failed: hwnd 00000000002102DA/00000000002102DA message 0202 win.c:4595: Test failed: hwnd 00000000002102DA/00000000002102DA message 0201

Francois Gouget

2:16 p.m.

New subject: [PATCH 2/2] tests: Allow marking unreliable tests as flaky.

From: Francois Gouget fgouget@codeweavers.com

--- * The is_wine variable means we don't have to duplicate the full platform string comparion yet again. * No flaky_windows(): use plain flaky() instead. * If a test if both flaky and todo it is only reported as flaky in the summary stats. * The messages preserve the "Test failed" and "Test succeeded" prefix. --- include/wine/test.h | 81 +++++++++++++++++++++++++++++++++++++++------ 1 file changed, 71 insertions(+), 10 deletions(-)

diff --git a/include/wine/test.h b/include/wine/test.h index 0d9e4d52267..366fc0a62e4 100644 --- a/include/wine/test.h +++ b/include/wine/test.h @@ -44,6 +44,9 @@ extern int winetest_time; /* running in interactive mode? */ extern int winetest_interactive;

+/* always count flaky tests as successful (BOOL) */ +extern int winetest_allow_flaky; + /* report successful tests (BOOL) */ extern int winetest_report_success;

@@ -60,6 +63,9 @@ extern void winetest_ignore_exceptions( BOOL ignore ); extern void winetest_start_todo( int is_todo ); extern int winetest_loop_todo(void); extern void winetest_end_todo(void); +extern void winetest_start_flaky( int is_flaky ); +extern int winetest_loop_flaky(void); +extern void winetest_end_flaky(void); extern int winetest_get_mainargs( char*** pargv ); extern LONG winetest_get_failures(void); extern void winetest_add_failures( LONG new_failures ); @@ -112,6 +118,13 @@ extern void winetest_pop_context(void); #define trace trace_(__FILE__, __LINE__) #define wait_child_process wait_child_process_(__FILE__, __LINE__)

+#define flaky_if(is_flaky) for (winetest_start_flaky(is_flaky); \ + winetest_loop_flaky(); \ + winetest_end_flaky()) +#define flaky flaky_if(TRUE) +#define flaky_wine flaky_if(is_wine) +#define flaky_wine_if(is_flaky) flaky_if((is_flaky) && is_wine) + #define todo_if(is_todo) for (winetest_start_todo(is_todo); \ winetest_loop_todo(); \ winetest_end_todo()) @@ -196,6 +209,9 @@ int winetest_interactive = 0; const char *winetest_platform = "windows"; int is_windows, is_wine;

+/* always count flaky tests as successful (BOOL) */ +int winetest_allow_flaky; + /* report successful tests (BOOL) */ int winetest_report_success = 0;

@@ -213,6 +229,7 @@ static const struct test *current_test; /* test currently being run */

static LONG successes; /* number of successful tests */ static LONG failures; /* number of failures */ +static LONG flaky_failures; /* number of failures inside flaky block */ static LONG skipped; /* number of skipped test chunks */ static LONG todo_successes; /* number of successful tests inside todo block */ static LONG todo_failures; /* number of failures inside todo block */ @@ -228,6 +245,8 @@ struct tls_data { const char* current_file; /* file of current check */ int current_line; /* line of current check */ + unsigned int flaky_level; /* current flaky nesting level */ + int flaky_do_loop; unsigned int todo_level; /* current todo nesting level */ int todo_do_loop; char *str_pos; /* position in debug buffer */ @@ -362,10 +381,19 @@ int winetest_vok( int condition, const char *msg, va_list args ) if (condition) { if (winetest_color) printf( color_dark_red ); - winetest_print_context( "Test succeeded inside todo block: " ); - vprintf(msg, args); + if (data->flaky_level) + { + winetest_print_context( "Test succeeded inside flaky todo block: " ); + vprintf(msg, args); + InterlockedIncrement(&flaky_failures); + } + else + { + winetest_print_context( "Test succeeded inside todo block: " ); + vprintf(msg, args); + InterlockedIncrement(&todo_failures); + } if (winetest_color) printf( color_reset ); - InterlockedIncrement(&todo_failures); return 0; } else @@ -392,10 +420,19 @@ int winetest_vok( int condition, const char *msg, va_list args ) if (!condition) { if (winetest_color) printf( color_bright_red ); - winetest_print_context( "Test failed: " ); - vprintf(msg, args); + if (data->flaky_level) + { + winetest_print_context( "Test marked flaky: " ); + vprintf(msg, args); + InterlockedIncrement(&flaky_failures); + } + else + { + winetest_print_context( "Test failed: " ); + vprintf(msg, args); + InterlockedIncrement(&failures); + } if (winetest_color) printf( color_reset ); - InterlockedIncrement(&failures); return 0; } else @@ -472,6 +509,27 @@ void winetest_win_skip( const char *msg, ... ) va_end(valist); }

+void winetest_start_flaky( int is_flaky ) +{ + struct tls_data *data = get_tls_data(); + data->flaky_level = (data->flaky_level << 1) | (is_flaky != 0); + data->flaky_do_loop=1; +} + +int winetest_loop_flaky(void) +{ + struct tls_data *data = get_tls_data(); + int do_flaky=data->flaky_do_loop; + data->flaky_do_loop=0; + return do_flaky; +} + +void winetest_end_flaky(void) +{ + struct tls_data *data = get_tls_data(); + data->flaky_level >>= 1; +} + void winetest_start_todo( int is_todo ) { struct tls_data *data = get_tls_data(); @@ -617,14 +675,16 @@ static int run_test( const char *name ) printf( "%04x:%s:%s Silenced %d todos, %d skips and %d traces.\n", (UINT)GetCurrentProcessId(), test->name, winetest_elapsed(), (UINT)muted_todo_successes, (UINT)muted_skipped, (UINT)muted_traces); - printf( "%04x:%s:%s %d tests executed (%d marked as todo, %d %s), %d skipped.\n", + printf( "%04x:%s:%s %d tests executed (%d marked as todo, %d as flaky, %d %s), %d skipped.\n", (UINT)GetCurrentProcessId(), test->name, winetest_elapsed(), - (UINT)(successes + failures + todo_successes + todo_failures), - (UINT)todo_successes, (UINT)(failures + todo_failures), + (UINT)(successes + failures + flaky_failures + todo_successes + todo_failures), + (UINT)todo_successes, (UINT)flaky_failures, (UINT)(failures + todo_failures), (failures + todo_failures != 1) ? "failures" : "failure", (UINT)skipped ); } - status = (failures + todo_failures < 255) ? failures + todo_failures : 255; + status = failures + todo_failures; + if (!winetest_allow_flaky) status += flaky_failures; + if (status > 255) status = 255; return status; }

@@ -685,6 +745,7 @@ int main( int argc, char **argv ) winetest_color = !strcasecmp(p, "auto") ? isatty(fileno(stdout)) : atoi(p); if (GetEnvironmentVariableA( "WINETEST_DEBUG", p, sizeof(p) )) winetest_debug = atoi(p); if (GetEnvironmentVariableA( "WINETEST_INTERACTIVE", p, sizeof(p) )) winetest_interactive = atoi(p); + if (GetEnvironmentVariableA( "WINETEST_ALLOW_FLAKY", p, sizeof(p) )) winetest_allow_flaky = atoi(p); if (GetEnvironmentVariableA( "WINETEST_REPORT_SUCCESS", p, sizeof(p) )) winetest_report_success = atoi(p); if (GetEnvironmentVariableA( "WINETEST_TIME", p, sizeof(p) )) winetest_time = atoi(p); winetest_last_time = winetest_start_time = GetTickCount();

-- GitLab https://gitlab.winehq.org/wine/wine/-/merge_requests/764

eric pouech (＠epo)

3:24 p.m.

for what it's worth, I don't like the idea of having three redundant global variables (winetest_platform, is_wine, is_windows) - it lacks IMO for the two is_??? a winetest_ prefix (to limit potential names conflict) - winetest_platform should no longer be a global variable (kept local to main()) (but ~50 changes to be done)

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/764#note_7473

Jinoh Kang (＠iamahuman)

3:28 p.m.

"Flaky" is effectively equivalent to "skip" except that the test itself is always run.

Ideally we could have ~~testbot~~ CI detect noticeable changes in success/failure frequency over the last _N_ test runs, and flag such occurences as unexpected success/failure. Otherwise, tests marked flaky risk going stale; this applies the same for skipped tests.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/764#note_7474

Francois Gouget (＠fgouget)

4:28 p.m.

On Thu Sep 1 16:28:33 2022 +0000, eric pouech wrote:

...

for what it's worth, I don't like the idea of having three redundant global variables (winetest_platform, is_wine, is_windows)

it lacks IMO for the two is_??? a winetest_ prefix (to limit potential

names conflict)

winetest_platform should no longer be a global variable (kept local to

main()) (but ~50 changes to be done)

The original scheme assumes there can be other platforms besides wine and windows: after all we explicitly check for wine and windows, not wine and !wine. So that justifies preserving access to winetest_platform.

But I admit it's hard to figure out what any other platform would be: some other windows reimplementation? It clearly cannot be Wine's underlying platform (macOS, Linux, BSD) even if that could be useful sometimes. Maybe that would justify changing the whole scheme to a simple boolean.

The issue with a winetest_is_ prefix is that makes the variable names really long and annoying. So I think it's fine to let them join the other short names like ok(), skip(), etc.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/764#note_7482

Francois Gouget (＠fgouget)

4:30 p.m.

On Thu Sep 1 16:30:06 2022 +0000, Jinoh Kang wrote:

...

"Flaky" is effectively equivalent to "skip" except that the test itself is always run. Ideally we could have ~~testbot~~ CI detect noticeable changes in success/failure frequency over the last _N_ test runs, and flag such occurences as unexpected success/failure. Otherwise, tests marked flaky risk going stale; this applies the same for skipped tests.

skip() would imply that we know beforehand whether the test is going to fail or succeeds. But that's not the case.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/764#note_7483

Jinoh Kang (＠iamahuman)

5:12 p.m.

...

skip() would imply that we know beforehand whether the test is going to fail or succeds.

If we know that a test is going to fail, we mark it as todo or broken. If it succeeds, we have no reason to skip it at all.

We use skip() _because_ checking the outcome of the test would not be meaningful, whether it is success, failure, or outright crashing of the test program. An example of this would be missing API, lack of implementation, or unavailable test target (e.g. Internet connectivity).

The same applies to flaky tests in that their results are unreliable and thus not effectively meaningful for the purpose of conformance verification. What makes the flaky tests different from skipped ones is that the tests themselves are required not to crash (or lead to undefined state) since they actually run, and the developer is sometimes interested in whether they actually succeed or not.

Anyway, my idea is that we should prevent the flaky mark from going stale when the test gets fixed _but_ the corresponding patch fails to remove the (now outdated) mark, especially if the author is unaware that the patch actually fix the flakiness. We can't do much for skipped tests; however, flaky test has increasing probability of being useful (i.e. not flaky) as their success/failure streak increases.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/764#note_7486

Jinoh Kang (＠iamahuman)

5:23 p.m.

On Thu Sep 1 17:13:35 2022 +0000, Jinoh Kang wrote:

...

...
skip() would imply that we know beforehand whether the test is going

to fail or succeds. If we know that a test is going to fail, we mark it as todo or broken. If it succeeds, we have no reason to skip it at all. We use skip() _because_ checking the outcome of the test would not be meaningful, whether it is success, failure, or outright crashing of the test program. An example of this would be missing API, lack of implementation, or unavailable test target (e.g. Internet connectivity). The same applies to flaky tests in that their results are unreliable and thus not effectively meaningful for the purpose of conformance verification. What makes the flaky tests different from skipped ones is that the tests themselves are required not to crash (or lead to undefined state) since they actually run, and the developer is sometimes interested in whether they actually succeed or not. Anyway, my idea is that we should prevent the flaky mark from going stale when the test gets fixed _but_ the corresponding patch fails to remove the (now outdated) mark, especially if the author is unaware that the patch actually fix the flakiness. We can't do much for skipped tests; however, flaky test has increasing probability of being useful (i.e. not flaky) as their success/failure streak increases.

To clarify, I'm not against this patch at all; the original remark was just a comment on how the test infrastructure could be improved post this MR.

-- https://gitlab.winehq.org/wine/wine/-/merge_requests/764#note_7487

1038

Age (days ago)

1038

Last active (days ago)

wine-gitlab@winehq.org

9 comments

5 participants

tags (0)

participants (5)

eric pouech (＠epo)
Francois Gouget
Francois Gouget (＠fgouget)
Jinoh Kang (＠iamahuman)
Marvin