[RFC] named pipe message-mode design

3 Mar 2009


      the requirements for message-mode named pipes semantics on top of unix
/ wine brings some... interesting limitations on how it can be
implemented, and i believe that i have finally come up with something
that would fit the requirements: the socketpair equivalent of
"double-buffering".
as the implementation of this idea would involve changes to
(doubling-up) of the wineserver namedpipe state machine, and after the
significant amount of time and resources spent (resulting in loss of
personal financial income somewhere in excess of £3,000) i'm happy to
commit further time _with_ active cooperation and engagement from wine
developers, in adding this ... rather strategically important and
complex functionality.
so - here's a description of the idea and its background: your input
into poking at it with a stick and discussing how it can be achieved
greatly appreciated.
background
-------------------
the first (successful) attempt at adding "technically correct" named
pipes message-mode semantics brought up several issues:
1) due to multiple threads being able to read from the same pipe,
receiving of messages cannot be atomically done by reading a "header"
indicating the length of the message (unless there is a per-pipe
mutex, which is a bit too much to be adding and would slow things
down).  one thread would read the header, indicating the beginning of
the message; the other thread would read the beginning of the pipe
data and _treat_ it as the header.
2) the data cannot be read inside wineserver itself due to the
requirement to ensure that wineserver never blocks on read (which
would be fatal).  write really isn't ok, either, due to asynchronicity
possibilities on EPIPE etc. would require a loop, thus delaying
further responses, or it would require a state machine (messy).
3) breaking individual messages into separate socketpair() pipes is
"technically correct" but quickly results in wineserver running out of
filedescriptors!
4) changing the filedescriptor from BLOCKING to O_NONBLOCK in order to
do a poll() inside wineserver (in order to do a read()) is mutually
exclusively incompatible with having a userspace thread perform a
blocking read operation on the same socket, waiting for a new message.
logical deduction of design
------------------------------------------
so, it was made clear by the socketpair()-queue experiment that
messaging semantics _can_ be achieved - without any read/write
operations being performed by wineserver itself, and so the next
logical step is to limit the number of socketpairs used to just two.
i.e. four filedescriptors per named pipe: two for the server-side
(CreateNamedPipe), and two for the client-side (NtCreateFile).
the use of this "double-buffering" will, i believe, make it possible
for the "blocking on read" semantics to be achievable.  effectively,
the "first" socketpair replaces the "tail" of the socketpair()-queue
experiment.
however, in order to make _that_ work (i.e. to preserve the message
boundaries), writes to the "first" socketpair will require a header
(4-byte length) to be prefixed to the message data.
BUT - and this is the really important bit - when the data is
transferred from the first socketpair to the second, the header is NOT
sent, for the recipient to read.
otherwise, exactly as is done in the current socketpair-queue
experiment: the data structures are locked, marked with the "available
data", unlocked, and _then_ the write to the secondary socketpair is
done. on a read, the data structures are locked, a wineserver message
sent requesting that the amount read is to be subtracted from
"available data", and then unlocked.
when the "available data" reaches zero, that's the signal indicating
that it's time to try reading from the first socketpair to fetch the
next message (again, stripping its header).
the piper
--------------
you may have noticed that there's nothing mentioned so far about _how_
the data is to be transferred between the pairs of socketpairs.
we've already established that it cannot be wineserver that does the
transfer (because of blocking / nonblocking  mutually exclusive issues
and asynchronous issues, already outlined)
this is actually good news in disguise.
conclusion: a *SEPARATE APPLICATION* must be responsible for ferrying
the data between the two socketpairs.  for all named pipes.
a diagram outlining this all together is here:
http://lkcl.net/namedpipes/wine-piper-design.pdf
exactly how the client and server "signal" to the piper that a new
message is required is yet to be determined, but the simplest way it
could be achieved would be by instead of sending <length-header>
<data> you send:
<2-byte command> <length-header> <data>
where in the case of "i want piper to wake up!" you send <0x0001>
<0x00000000> (no more data)
and in the case of "i want to write a message" you send <0x0002>
<0x0000NNNN> <data>
a ping, effectively.
The Good News
------------------------
here's the good bit about using the piper to transfer data in
userspace: you don't _have_ to have the client-side (or the
server-side) come _exclusively_ from wine.
in other words, you have an opportunity to put in "break-out"
mechanisms!  yippeee, the piper is the perfect place to put in SMB
Client library usage, or links to SMB IPC$ "proxy" pipe mechanisms for
communication with SMB servers :)
and the even _better_ news is that it doesn't have to be done straight
away.  all that the "first version" of the piper need to is "data in,
data out.  data in, data out".  nothing more.
the even better news than _that_ is that the "first version" of the
piper can be utterly, utterly, simple, single-process, not even any
threading.  one message at a time.  this simple implementation
provides absolute guaranteed avoidance of race conditions, overlaps
between reads and writes by the client or server threads, and more.
_later on_ the possibility can be investigated to provide a more
sophisticated version of the piper that is multi-threaded and has
per-pipe mutexes to ensure that it is _per pipe_ that there is
guaranteed avoidance of race conditions etc.
Another Possibility
----------------------------
once the need for the piper is made clear, there exists a potential
solution, providing named pipe message-mode on top of the _existing_
infrastructure with very little changes to ntdll, kernel32 and
wineserver.  it goes something like this:
* make the existing namedpipe infrastructure the "layer below" NamedPipes
* utilise the existing infrastructure, creating pipe-pairs (one for
client, one for server) where the piper makes the back-to-back
connection between the two
* i.e. "Real" Named Pipes are *only* allowed to connect to the piper;
the piper is the *only* application that is allowed to send data
onwards to "real" clients.
* the provision of "Real" message-mode Named Pipes, on top of this
infrastructure, can then be done by doing a (very) mini RPC system
over the pairs of NamedPipes.
the reason for doing a mini RPC system is that this method is a proven
entity, by principle of implementing MSRPC on top of the existing
namedpipes infrastructure, when that infrastructure has no support for
MessageMode!
the reason _why_ it works is because DCE/RPC has the PDU as its
fundamental "synchronisation" system, thus doing away with the need
for messaging boundary encapsulation, but hey, that's what MS
implemented, that's what we get.... *sigh*.
so - what do people think?  would you agree that a user-space pipe
"proxy" is an effective solution?
l.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

[RFC] named pipe message-mode design