On 4/28/20 12:11 PM, Rémi Bernon wrote:
On 4/28/20 6:40 PM, Zebediah Figura wrote:
On 4/28/20 10:11 AM, Rémi Bernon wrote:
Pages are created on demand, depending on the initial values the process requests. These values usually include NT version information, processor features and number of physical pages.
The server maps the pages write-only and the clients map them read-only, then the server updates the timestamps on every page using a 16 ms timeout.
Note that it is hard to split the timestamp updates to a separate patch as it would require to map the pages RW on the client side to keep the ntoskrnl updates working, and potentially have some issues with RW pages being shared across clients.
Signed-off-by: Rémi Bernon rbernon@codeweavers.com
Note that we could also do one page per process if simpler is better, but that would also increase the work to do on every update.
One thing that occurs to me as being potentially desirable, if only potentially, is to actually use these counters to implement our tick count APIs, in which case we'd absolutely want to have one page per process. I know we've already had applications that want GetTickCount() to be extremely fast, and that'd allow us to get a little more Unix code out of kernel32. Maybe there's something I'm missing, though.
Not sure to see why that would make one-page per process absolutely necessary? Is there any timer here that is process-specific? But yeah I had the same thing in mind for later, and it seems that on Windows the GetTickCount and so on are just reading from the USD, or at least the values perfectly match in the test.
Oh, I see, I was assuming you were trying to avoid creating pages for processes that didn't need them, rather than trying to reuse pages where they're identical. In retrospect that was dumb of me; we already depend on KUSER_SHARED_DATA for some things.
+static void kusd_mapping_set_current_time( struct kusd_mapping *kusd ) +{ + kusd->data->SystemTime.High2Time = (current_time >> 32); + kusd->data->SystemTime.LowPart = (current_time & 0xffffffff); + kusd->data->SystemTime.High1Time = (current_time >> 32);
+ kusd->data->InterruptTime.High2Time = (monotonic_time >> 32); + kusd->data->InterruptTime.LowPart = (monotonic_time & 0xffffffff); + kusd->data->InterruptTime.High1Time = (monotonic_time >> 32);
+ kusd->data->TickCount.High2Time = ((monotonic_time / 10000)
32);
+ kusd->data->TickCount.LowPart = ((monotonic_time / 10000) & 0xffffffff); + kusd->data->TickCount.High1Time = ((monotonic_time / 10000)
32);
+ kusd->data->TickCountLowDeprecated = ((monotonic_time / 10000) & 0xffffffff); +}
Do we want to use atomic writes here, to make sure the compiler does the right thing?
Possibly and it's probably not a bad thing to add to be safe, indeed. The members are already volatile, so there should be no reordering between each write/read, and the correct way to read the data from what I could see is to compare High1Time with High2Time --but I guess there's some application not doing it right out there.
The problem is that as far as I understand, "volatile" doesn't actually specify that operations are atomic or prevent reordering. $COMPILER might do that in practice, but it's probably better to make explicit what you actually want.
+static void update_kusd_mappings( void *private ) +{ + struct kusd_mapping *kusd;
+ add_timeout_user( kusd_timeout, update_kusd_mappings, NULL );
+ LIST_FOR_EACH_ENTRY( kusd, &kusd_mappings, struct kusd_mapping, entry ) + kusd_mapping_set_current_time( kusd ); +}
+int get_user_shared_data_fd( const usd_init_t *init, unsigned int *size ) +{ + static const WCHAR default_windirW[] = {'C',':','\','w','i','n','d','o','w','s',0}; + struct kusd_mapping *kusd; + int i;
+ LIST_FOR_EACH_ENTRY(kusd, &kusd_mappings, struct kusd_mapping, entry) + { + if (!memcmp(init, &kusd->init, sizeof(usd_init_t))) + goto done; + }
+ if (!(kusd = mem_alloc( sizeof(struct kusd_mapping) ))) + return -1;
+ kusd->fd = create_temp_file( sizeof(*kusd->data) ); + kusd->data = mmap( NULL, sizeof(*kusd->data), PROT_WRITE, MAP_SHARED, kusd->fd, 0 ); + kusd->init = *init;
+ kusd->data->NumberOfPhysicalPages = init->number_of_physical_pages; + kusd->data->NtMajorVersion = init->nt_major_version; + kusd->data->NtMinorVersion = init->nt_minor_version; + kusd->data->NtProductType = init->nt_product_type; + for (i = 0; i < 64; ++i) + kusd->data->ProcessorFeatures[i] = init->processor_features[i];
+ memcpy( kusd->data->NtSystemRoot, default_windirW, sizeof(default_windirW) ); + kusd->data->TickCountMultiplier = 1 << 24;
+ kusd_mapping_set_current_time( kusd );
+ if (list_empty( &kusd_mappings )) add_timeout_user( kusd_timeout, update_kusd_mappings, NULL ); + list_add_tail( &kusd_mappings, &kusd->entry );
+done: + *size = sizeof(*kusd->data); + return kusd->fd; +}
/* create a file mapping */ DECL_HANDLER(create_mapping) { diff --git a/server/process.c b/server/process.c index 73984f363f59..6bb18ec6617a 100644 --- a/server/process.c +++ b/server/process.c @@ -1385,6 +1385,8 @@ DECL_HANDLER(init_process_done) if (req->gui) process->idle_event = create_event( NULL, NULL, 0, 1, 0, NULL ); if (process->debugger) set_process_debug_flag( process, 1 ); reply->suspend = (current->suspend || process->suspend);
+ send_client_fd( process, get_user_shared_data_fd( get_req_data(), &reply->usd_size ), -1 ); } /* open a handle to a process */ diff --git a/server/protocol.def b/server/protocol.def index 06a29b153ea0..4a935fd7efcf 100644 --- a/server/protocol.def +++ b/server/protocol.def @@ -787,6 +787,16 @@ struct rawinput_device user_handle_t target; }; +/* Initial values for user shared data */ +typedef struct +{ + unsigned int number_of_physical_pages; + unsigned int nt_major_version; + unsigned int nt_minor_version; + unsigned short nt_product_type; + unsigned char processor_features[64]; +} usd_init_t;
/****************************************************************/ /* Request declarations */ @@ -854,8 +864,10 @@ struct rawinput_device mod_handle_t module; /* main module base address */ client_ptr_t ldt_copy; /* address of LDT copy (in thread address space) */ client_ptr_t entry; /* process entry point */ + VARARG(usd,usd_init); /* initial USD values */ @REPLY int suspend; /* is process suspended? */ + unsigned int usd_size; /* size of USD mapping */ @END
Maybe it would make more sense to instead pass the whole initial KUSER_SHARED_DATA structure as a VARARG to the server. That lets the server be entirely agnostic as to what's in it aside from timestamps, and makes it even easier to set new fields later.
Yeah, it seemed a little bit overkill at first. An I didn't want to bother including wdm.h here, but why not. As I was doing page de-duplication, it felt nicer to know which information was actually used.
For that matter I don't think you need to include wdm.h here; you could just specify it as "bytes" instead.
diff --git a/server/trace.c b/server/trace.c index 2b58ed9fd2c0..31560eaad827 100644 --- a/server/trace.c +++ b/server/trace.c @@ -860,6 +860,28 @@ static void dump_varargs_startup_info( const char *prefix, data_size_t size ) remove_data( size ); } +static void dump_varargs_usd_init( const char *prefix, data_size_t size ) +{ + usd_init_t init; + int i;
+ memset( &init, 0, sizeof(init) ); + memcpy( &init, cur_data, min( size, sizeof(init) ));
+ fprintf( stderr,
"%s{number_of_physical_pages=%u,nt_major_version=%04x,nt_minor_version=%04x,"
+ "nt_product_type=%02x,processor_features={", + prefix, init.number_of_physical_pages, init.nt_major_version, init.nt_minor_version, + init.nt_product_type ); + for (i = 0; i < ARRAY_SIZE(init.processor_features); ++i) + { + if (i > 0) fputc( ',', stderr ); + fprintf( stderr, "%02x", init.processor_features[i] ); + } + fprintf( stderr, "}}" ); + remove_data( size ); +}
In which case you'd probably just get rid of this.
static void dump_varargs_input_records( const char *prefix, data_size_t size ) { const INPUT_RECORD *rec = cur_data;
If one page per process is preferable, then it can actually be passed to the client earlier, with possibly the version lookup done on the server side, and maybe the cpuinfo and page count as well if the values are identical for 32bit and 64bit processes?
So if it's possible, there would be no need to pass anything from the client to the server. If not, then it's just a matter of passing cpu features and page count, which I feel is better than passing the whole initial USD.
Maybe. I get the (perhaps faulty) impression from working on the server that a goal is to keep as much out of the server as possible, so it's desirable to keep all of that information (version, processor features, etc.) computed by the client. In that case we'd certainly want to pass information from client to server, and passing the whole page makes things simpler.
(It's not clear to me why just passing more data would be a problem, either. I'm not sure there's necessarily overhead to begin with, but even if there is, it's only once, at process initialization.)