I would love any and all feedback on updated versions of my "Reparse Point" patches, which are now available in staging: https://github.com/wine-staging/wine-staging/tree/master/patches/ntdll-Junct...
Major changes since the last RFC version are: 1) Deletion of symlinks is now atomic (where supported). 2) Regular Unix symlinks are now reported as WSL Unix symlinks. 3) FILE_OPEN_REPARSE_POINT is now properly supported for applications that wish to access reparse points directly (symlink operations by file descriptor instead of by path). 4) Wine's cmd "dir" command now shows reparse point types and targets. 5) Wine's cmd "mklink" command now supports creating junction points and NT symlinks. 6) Absolute reparse points that are outside of the prefix no longer contain the path to drive Z (or appropriate drive letter). 7) Absolute reparse points now contain a string identifying the end of the Wine prefix that is used to modify the target path on file access if a prefix has been relocated (if the target is within the prefix). 8) Many, many bug fixes.
Thank you in advance to those who look these patches over, and previous iterations, your feedback is greatly appreciated! I know that these patches have been a long time coming, but I believe that we're finally getting to a place where we've figured out all the gotchas. Hopefully other folks agree ;)
===
Context: For those of you that wonder "why do these patches exist?", I have been putting these together over the past few years for several reasons: 1) More modern Windows programs use NT symlinks to avoid duplicating files and taking up extra disk space (particularly .NET). 2) Wine currently has no mechanism to make symlinks on the "PE" side of the dividing line, these patches allow this for Windows file paths and a small additional patch (creating WSL Unix symlinks) allows the same behavior for Unix paths. 3) Wine currently takes up a lot of space with system DLLs that are duplicates of the system-wide files, a logical extension to these patches is a Wine-specific reparse tag that allows these system DLLs to be "copy-on-write" so that Wine prefixes don't take up unnecessary disk space. Please note that this feature is not a focus of the patchset (not included), as I would like to get the infrastructure in place before tackling that problem.
Not currently supported: 1) Reparse points of type other than junction points, NT symlinks, and WSL Unix symlinks. 2) Creating WSL Unix symlinks. 3) Changing ownership permissions (lrwxrwxrwx) on the symlink instead of the target (currently a Linux limitation).
Implementation overview: • Reparse points are implemented as Unix symlinks where relative symlinks always start with "./" and absolute symlinks start with "/". • Following the relative/absolute flag the 32-bit reparse point tag (junction point or NT symlink) is encoded as a relative path where each bit is encoded as "/" (0) or "./" (1). When creating WSL Unix symlinks (future feature) this tag encoding will be skipped. • For NT symlinks this is followed by a file/directory flag ("./" for directories, "/" for files) so that dangling symlinks can still be deleted properly with RemoveFile or RemoveDirectory (as appropriate). • After the reparse tag (and file/directory flag) the Unix path is then appended, relative paths are included unmodified and absolute paths have some minor modifications. • If the path is an absolute path and points outside the prefix then the portion of the path to drive Z (or appropriate drive letter) is removed. • If the path is an absolute path that points inside the prefix then after the prefix path an 8-bit 'P' is encoded the same way that the reparse point tag is ("/"=0, "./"=1). • If an absolute symlink is opened by the wineserver then the prefix path is checked (if appropriate) and if it doesn't match the path of the current Wine prefix then the symlink is modified to replace the old prefix path with the current prefix path. • Because reparse points involve a modification to an existing empty file/directory, symlink creation and "removal" is implemented by creating the symlink at a temporary location and then replacing the original with an atomic renameat2 call (on supported systems), equivalent call (BSD/Mac), or by unlinking the original and moving the symlink into place (for legacy systems that do not support atomic replacement).
Best, Erich
"Erich E. Hoover" erich.e.hoover@gmail.com writes:
I would love any and all feedback on updated versions of my "Reparse Point" patches, which are now available in staging: https://github.com/wine-staging/wine-staging/tree/master/patches/ntdll-Junct...
My general impression is that both the design and the implementation are trying to be too clever for their own good.
I would first question the assumption that these have to be actual, resolvable symlinks. What if they weren't?
On Sat, Aug 28, 2021 at 6:14 AM Alexandre Julliard julliard@winehq.org wrote:
"Erich E. Hoover" erich.e.hoover@gmail.com writes:
I would love any and all feedback on updated versions of my "Reparse Point" patches, which are now available in staging: https://github.com/wine-staging/wine-staging/tree/master/patches/ntdll-Junct...
My general impression is that both the design and the implementation are trying to be too clever for their own good.
Well, that's obviously not what I was hoping to hear. I proposed an alternative approach a couple years ago ( https://www.winehq.org/pipermail/wine-devel/2019-March/142536.html ) that stored the reparse tag in the symlink's access time, which is obviously a lot simpler than encoding the tag in the filename, but that approach has some downsides and Jacek suggested this approach as a way to work around them.
I would first question the assumption that these have to be actual, resolvable symlinks. What if they weren't?
Funny you ask, at one point I actually put together an attempt at this and it became pretty messy very quickly. It's not difficult for symlinks that are files, but directory symlinks mean that we essentially need to replace path resolution at every single place where we want to work with a unix path. For example, let's say that we have a symlink at C:\test that points to C:\windows. An attempt to access C:\test\system32 would nominally be converted to ${WINEPREFIX}/drive_c/test/system32 and we would pass this path down to the OS. If we have a "custom" symlink implementation then we need to parse the path and resolve each element in order to get the real path to the file on the system, whereas the OS takes care of this for us if we use "proper" symlinks. Where this gets really nasty when you start trying to deal with the parent directory (".."). Parent directory handling actually has different meaning depending on the context, most of the time it means wrt. the target (which makes resolving each element important) but in the context of commands that operate on the working directory you need to work with the "original" path. This is all _possible_ to take care of, but at that point we're implementing so much custom path handling that you're talking about a substantial undertaking (that needs to be called everywhere there's a unix path that we pass down to the OS). But maybe there's a solution you're seeing here that I'm not?
Best, Erich
"Erich E. Hoover" erich.e.hoover@gmail.com writes:
On Sat, Aug 28, 2021 at 6:14 AM Alexandre Julliard julliard@winehq.org wrote:
I would first question the assumption that these have to be actual, resolvable symlinks. What if they weren't?
Funny you ask, at one point I actually put together an attempt at this and it became pretty messy very quickly. It's not difficult for symlinks that are files, but directory symlinks mean that we essentially need to replace path resolution at every single place where we want to work with a unix path. For example, let's say that we have a symlink at C:\test that points to C:\windows. An attempt to access C:\test\system32 would nominally be converted to ${WINEPREFIX}/drive_c/test/system32 and we would pass this path down to the OS. If we have a "custom" symlink implementation then we need to parse the path and resolve each element in order to get the real path to the file on the system, whereas the OS takes care of this for us if we use "proper" symlinks. Where this gets really nasty when you start trying to deal with the parent directory (".."). Parent directory handling actually has different meaning depending on the context, most of the time it means wrt. the target (which makes resolving each element important) but in the context of commands that operate on the working directory you need to work with the "original" path. This is all _possible_ to take care of, but at that point we're implementing so much custom path handling that you're talking about a substantial undertaking (that needs to be called everywhere there's a unix path that we pass down to the OS). But maybe there's a solution you're seeing here that I'm not?
We are already resolving the path element by element because of case insensitivity, it seems to me that reparse point resolving would fit right in there. All you need is to make sure the initial stat() shortcut fails, which you can do by creating a dangling symlink for instance. Though my feeling is that using a normal file would make things easier, say by appending some magic filename suffix.
On Tue, Aug 31, 2021 at 4:14 AM Alexandre Julliard julliard@winehq.org wrote:
"Erich E. Hoover" erich.e.hoover@gmail.com writes: ... We are already resolving the path element by element because of case insensitivity, it seems to me that reparse point resolving would fit right in there. All you need is to make sure the initial stat() shortcut fails, which you can do by creating a dangling symlink for instance. Though my feeling is that using a normal file would make things easier, say by appending some magic filename suffix.
The difficulty isn't resolving element by element, it's doing that in two different ways depending upon the context. The parent directory inside a symlink works differently depending on the context, where that's not true for files with a different case ( chdir("..") behaves the same no matter what case the directory is ). A Linux example (Windows works the same way): === ~/tmplnk$ mkdir -p test/target ~/tmplnk$ ln -s test/target link ~/tmplnk$ cd link/ ~/tmplnk/link$ ls .. target ~/tmplnk/link$ cd .. ~/tmplnk$ ls link test === The problem I ran into is that if you treat this incorrectly then it will cause subtle breakages all over the place. How I attempted to solve this was to keep the "original" path around and resolve it locally everywhere that it needs to be resolved, this approach proved to be very messy due to the large number of ways that we use unix paths. I believe that it's also possible to instead immediately resolve the path and pass the "resolved" path around, but then we would need to change the working directory handling to _not_ use the normal path processing ( otherwise you will break SetCurrentDirectory("..") ). I say "believe" because, to my knowledge, the working directory handling is the only special case that requires the "original" path. This approach might be more viable, but I didn't get around to trying it. I mostly did this for fun, but either way I didn't think you would be a fan of these approaches due to the unnecessary duplication of the OS path resolution behavior of symlinks.
Best, Erich
On 8/31/21 11:25 AM, Erich E. Hoover wrote:
On Tue, Aug 31, 2021 at 4:14 AM Alexandre Julliard julliard@winehq.org wrote:
"Erich E. Hoover" erich.e.hoover@gmail.com writes: ... We are already resolving the path element by element because of case insensitivity, it seems to me that reparse point resolving would fit right in there. All you need is to make sure the initial stat() shortcut fails, which you can do by creating a dangling symlink for instance. Though my feeling is that using a normal file would make things easier, say by appending some magic filename suffix.
The difficulty isn't resolving element by element, it's doing that in two different ways depending upon the context. The parent directory inside a symlink works differently depending on the context, where that's not true for files with a different case ( chdir("..") behaves the same no matter what case the directory is ). A Linux example (Windows works the same way): === ~/tmplnk$ mkdir -p test/target ~/tmplnk$ ln -s test/target link ~/tmplnk$ cd link/ ~/tmplnk/link$ ls .. target ~/tmplnk/link$ cd .. ~/tmplnk$ ls link test === The problem I ran into is that if you treat this incorrectly then it will cause subtle breakages all over the place. How I attempted to solve this was to keep the "original" path around and resolve it locally everywhere that it needs to be resolved, this approach proved to be very messy due to the large number of ways that we use unix paths. I believe that it's also possible to instead immediately resolve the path and pass the "resolved" path around, but then we would need to change the working directory handling to _not_ use the normal path processing ( otherwise you will break SetCurrentDirectory("..") ). I say "believe" because, to my knowledge, the working directory handling is the only special case that requires the "original" path. This approach might be more viable, but I didn't get around to trying it. I mostly did this for fun, but either way I didn't think you would be a fan of these approaches due to the unnecessary duplication of the OS path resolution behavior of symlinks.
Wait, but by the time we do path resolution, aren't we guaranteed to have an absolute path that's free of . or ..? I thought that RtlDosPathNameToNtPathName() got rid of all of those for us.
On Tue, Aug 31, 2021 at 10:29 AM Zebediah Figura (she/her) zfigura@codeweavers.com wrote:
... Wait, but by the time we do path resolution, aren't we guaranteed to have an absolute path that's free of . or ..? I thought that RtlDosPathNameToNtPathName() got rid of all of those for us.
Yes, that's part of the problem - if you resolve the path to the absolute location then ".." brings you to the parent of the symlink target.
Best, Erich
On 8/31/21 11:33 AM, Erich E. Hoover wrote:
On Tue, Aug 31, 2021 at 10:29 AM Zebediah Figura (she/her) zfigura@codeweavers.com wrote:
... Wait, but by the time we do path resolution, aren't we guaranteed to have an absolute path that's free of . or ..? I thought that RtlDosPathNameToNtPathName() got rid of all of those for us.
Yes, that's part of the problem - if you resolve the path to the absolute location then ".." brings you to the parent of the symlink target.
Is there a problem with collapse_path(), or is it just a matter of asking for the parent of the current directory? If the latter, wouldn't it just be a matter of storing the unresolved path in the TEB?
On Tue, Aug 31, 2021 at 10:52 AM Zebediah Figura (she/her) zfigura@codeweavers.com wrote:
... Is there a problem with collapse_path(), or is it just a matter of asking for the parent of the current directory? If the latter, wouldn't it just be a matter of storing the unresolved path in the TEB?
That's necessary but not sufficient, you also need to change the path resolution. It's completely valid (and common) to pass a relative path that does something "stupid". An extension of the Linux example: === ~/tmplnk$ cd link/.. ~/tmplnk$ ls link test ~/tmplnk$ ls link/.. target ===
Best, Erich
On 8/31/21 12:00 PM, Erich E. Hoover wrote:
On Tue, Aug 31, 2021 at 10:52 AM Zebediah Figura (she/her) zfigura@codeweavers.com wrote:
... Is there a problem with collapse_path(), or is it just a matter of asking for the parent of the current directory? If the latter, wouldn't it just be a matter of storing the unresolved path in the TEB?
That's necessary but not sufficient, you also need to change the path resolution. It's completely valid (and common) to pass a relative path that does something "stupid". An extension of the Linux example: === ~/tmplnk$ cd link/.. ~/tmplnk$ ls link test ~/tmplnk$ ls link/.. target ===
Doesn't this end up being a problem regardless of how we store symlinks internally, though? As far as I can tell, all of this logic should happen on the PE side.
On Tue, Aug 31, 2021 at 11:11 AM Zebediah Figura (she/her) zfigura@codeweavers.com wrote:
... Doesn't this end up being a problem regardless of how we store symlinks internally, though? As far as I can tell, all of this logic should happen on the PE side.
When we let the OS handle symlinks for us then everything seems to work properly, though it's possible that I've missed something.
Best, Erich
On Tue, Aug 31, 2021 at 11:00 AM Erich E. Hoover erich.e.hoover@gmail.com wrote:
On Tue, Aug 31, 2021 at 10:52 AM Zebediah Figura (she/her) zfigura@codeweavers.com wrote:
... Is there a problem with collapse_path(), or is it just a matter of asking for the parent of the current directory? If the latter, wouldn't it just be a matter of storing the unresolved path in the TEB?
That's necessary but not sufficient, you also need to change the path resolution. It's completely valid (and common) to pass a relative path that does something "stupid". An extension of the Linux example: === ~/tmplnk$ cd link/.. ~/tmplnk$ ls link test ~/tmplnk$ ls link/.. target ===
Something about this didn't sit right with me, so I went back over my notes and this is incorrect at the API level. The working directory is only handled in this special way at the shell/console level, at the API level it is handled the same way as other paths (target-relative). This means that it is a lot easier to implement reparsing on a per-element level than I had attempted long-ago where I stupidly tried to resolve the paths right before passing them to the standard C routines.
So, back to the "how do we handle this?" question. I would like to keep my approach _mostly_ the same (using Unix symlinks) for these reasons: 1) it is easy to detect the existence of a symlink programmatically and work with data stored in the symlink (whether that data is interpreted properly by the OS or not) 2) it allows easier traversal of the directory structure when not in Wine for paths that are "convertible" (some tags, such as IO_REPARSE_TAG_APPEXECLINK, are not possibly convertible by the OS) 3) if/when we need to treat certain key folders as Junction Points then these folders will still work properly outside Wine (e.g. My Documents -> Documents)
What I would like to treat differently now is how the Wine prefix is stored inside the symlink. The current implementation rewrites the prefix path if it is incorrect (doesn't match the current prefix) and this solution is ... not ideal. So, I would like to suggest that instead of (<REPARSE-TAG> and <P> are slash/dot encoded): <REPARSE-TAG>/path/to/prefix<P>/remainder/of/path that we can, instead, use a BSD-style variadic symlink that looks like so: <REPARSE-TAG>${WINEPREFIX}/remainder/of/path
I did not use this solution before for several reasons: 1) This would break non-Wine usage of the symlinks 2) Linux does not support BSD-style variadic symlinks 3) I had misremembered that the relative path handling at the API level was complicated, so I thought that implementing this in Wine would be too difficult
However, it looks like I can now work around all of these issues: 1) On BSD systems 'symvar' can be used to set the value of ${WINEPREFIX} 2) On Linux an unprivileged namespace can be configured to effectively allow the same thing (note that this _only_ works for an absolute path*, such as is the case with ${WINEPREFIX}) 3) Wine can perform the reinterpretation of ${WINEPREFIX} when it parses paths, so it doesn't need to worry about #1 or #2
If this doesn't sound too crazy then I can put together the modifications to make this happen. As part of that I'd like to introduce a tool (command 'wineprefix'?) that configures the Unix environment properly for Linux/BSD to allow the shell to function with this variadic symlink so that users like Martin Storsjö (and myself) can just run the tool to be dropped into a shell where ${WINEPREFIX} inside a symlink will be treated appropriately. (Part of why it's taken me a while to respond to this thread has been putting together a "proof of concept" of this tool to make sure that the idea works, which I can now confirm.) Please let me know what you guys think, hopefully this sounds better to folks.
Best, Erich
* If this worked for relative paths then I would propose storing the reparse tag the same way (e.g. "symvar xA000000C=.")
"Erich E. Hoover" erich.e.hoover@gmail.com writes:
If this doesn't sound too crazy then I can put together the modifications to make this happen. As part of that I'd like to introduce a tool (command 'wineprefix'?) that configures the Unix environment properly for Linux/BSD to allow the shell to function with this variadic symlink so that users like Martin Storsjö (and myself) can just run the tool to be dropped into a shell where ${WINEPREFIX} inside a symlink will be treated appropriately. (Part of why it's taken me a while to respond to this thread has been putting together a "proof of concept" of this tool to make sure that the idea works, which I can now confirm.) Please let me know what you guys think, hopefully this sounds better to folks.
You are still trying too hard to shoehorn the complexity of reparse points into working Unix symlinks, which makes it necessary to depend on all kinds of exotic non-portable functionality.
You should forget about making these transparent at the Unix level. Focus on implementing the full semantics of reparse points, including arbitrary tags and data, in a generic, easily extendable way, using only standard Posix APIs.
Once you have this working for a range of different tags, with tests to confirm compatibility, we can think about what subset of reparse points would make sense as pure Unix symlinks, and add a special case for these.
On Thu, Sep 30, 2021 at 2:31 AM Alexandre Julliard julliard@winehq.org wrote:
"Erich E. Hoover" erich.e.hoover@gmail.com writes:
If this doesn't sound too crazy then I can put together the modifications to make this happen. As part of that I'd like to introduce a tool (command 'wineprefix'?) that configures the Unix environment properly for Linux/BSD to allow the shell to function with this variadic symlink so that users like Martin Storsjö (and myself) can just run the tool to be dropped into a shell where ${WINEPREFIX} inside a symlink will be treated appropriately. (Part of why it's taken me a while to respond to this thread has been putting together a "proof of concept" of this tool to make sure that the idea works, which I can now confirm.) Please let me know what you guys think, hopefully this sounds better to folks.
You are still trying too hard to shoehorn the complexity of reparse points into working Unix symlinks, which makes it necessary to depend on all kinds of exotic non-portable functionality.
I think that it's a good idea to make it possible to interpret the links outside of Wine for the majority of users, there are a lot of ways to solve this problem that would make it impossible to do this using any combination of standard tools.
You should forget about making these transparent at the Unix level. Focus on implementing the full semantics of reparse points, including arbitrary tags and data, in a generic, easily extendable way, using only standard Posix APIs.
This is a pretty trivial update to the existing implementation, that will make symlinks that look like (<REPARSE-TAG> is encoded): <REPARSE-TAG>arbitrary data
Once you have this working for a range of different tags, with tests to confirm compatibility, we can think about what subset of reparse points would make sense as pure Unix symlinks, and add a special case for these.
Okay, what exactly are you looking for here? That you get the same thing back that you put in? Just like Unix symlinks, you can store any sort of arbitrary text/unicode/garbage in a reparse point for any tag (if you are so inclined). The OS may not know what to do with that tag unless you write a filter for it, but you can store anything that you want.
Best, Erich
"Erich E. Hoover" erich.e.hoover@gmail.com writes:
On Thu, Sep 30, 2021 at 2:31 AM Alexandre Julliard julliard@winehq.org wrote:
Once you have this working for a range of different tags, with tests to confirm compatibility, we can think about what subset of reparse points would make sense as pure Unix symlinks, and add a special case for these.
Okay, what exactly are you looking for here? That you get the same thing back that you put in? Just like Unix symlinks, you can store any sort of arbitrary text/unicode/garbage in a reparse point for any tag (if you are so inclined). The OS may not know what to do with that tag unless you write a filter for it, but you can store anything that you want.
Yes, basically you need a simple storage mechanism to store/retrieve arbitrary data, and then some hook in the existing path lookup code that can use that data to modify the searched path in all sorts of interesting ways.
Note that all the interesting stuff must happen at path lookup time, not at reparse point creation time. If you insist on making these resolvable at the Unix level, you have to do all sorts of path mangling at creation time, which is going to break very quickly.
On Sat, 28 Aug 2021, Alexandre Julliard wrote:
I would first question the assumption that these have to be actual, resolvable symlinks. What if they weren't?
I guess one drawback is interoperability with cases where you run commands in a mixed wine/native environment, i.e. mostly operate directly on the filesystem with native tools, but run some subcommands that are use wine for executing foreign binaries. (I e.g. have setups for using MSVC in this fashion.)
I don't have a concrete case where such tools would need to create symlinks though, just raising it as a potential use case.
(With the current patchset, with the extra bits encoded in the symlink path, they do look quite weird, but if they resolve and work as regular symlinks, that's always a plus.)
// Martin
On 8/31/21 6:16 AM, Martin Storsjö wrote:
On Sat, 28 Aug 2021, Alexandre Julliard wrote:
I would first question the assumption that these have to be actual, resolvable symlinks. What if they weren't?
I guess one drawback is interoperability with cases where you run commands in a mixed wine/native environment, i.e. mostly operate directly on the filesystem with native tools, but run some subcommands that are use wine for executing foreign binaries. (I e.g. have setups for using MSVC in this fashion.)
I don't have a concrete case where such tools would need to create symlinks though, just raising it as a potential use case.
(With the current patchset, with the extra bits encoded in the symlink path, they do look quite weird, but if they resolve and work as regular symlinks, that's always a plus.)
How useful are Windows symlinks, though? As far as I understand normal users aren't allowed to create them by default, and only a few applications even bother using them for any reason. Granted, I might not have a particularly accurate picture of the situation.
Many applications break if there are symlinks in parts of the path they don't expect—in theory use of symlinks should be transparent if you're not asking for them, but they try to be symlink-aware and screw it up in one way or another (I can pull out bug reports if necessary). We have to kind of weirdly half-hide Unix symlinks to Windows applications as a result.
(Disclaimer: I'm not attached to either solution.)
On Tue, Aug 31, 2021 at 10:24 AM Zebediah Figura (she/her) zfigura@codeweavers.com wrote:
... How useful are Windows symlinks, though? As far as I understand normal users aren't allowed to create them by default, and only a few applications even bother using them for any reason. Granted, I might not have a particularly accurate picture of the situation.
Junction Points (effectively the same from our perspective) do not require administrator privileges, so they're more common.
Many applications break if there are symlinks in parts of the path they don't expect—in theory use of symlinks should be transparent if you're not asking for them, but they try to be symlink-aware and screw it up in one way or another (I can pull out bug reports if necessary). We have to kind of weirdly half-hide Unix symlinks to Windows applications as a result.
Yup, that's been super fun :)
Best, Erich