Hi, my name is Thomas Mullaly and I am a undergraduate in the Computer Science department at Kent State University and I would very much like to participate in this years GSoC. I saw under your project ideas page that the IUri API still needs implemented and I thought that this would be a good project for me, but, before I submit a proposal on it I have a few questions about the project itself.
Firstly, on the project page it says that the main goal is to have the IUri interface and CreateUri function implemented, but, on MSDN they also have functions and interfaces for creating/manipulating IUriBuilder's and I was wondering if these were also part of the project goals. If not can they be or would this be to ambitious to have finished by the end of the summer.
Secondly (more of a design question), I see that the Uri structure and functions are already stubbed out in the "dlls/urlmon/uri.c" file and I was thinking for my implementation I would add another BSTR* member to the Uri struct, which will point to the encoded version of the URI (which will be generated during the CreateUri() call). Since most of the functions that interact with the IUri return components of the URI (e.g. scheme, host, query, etc.) I was thinking about adding more data members to the Uri struct which store the location in the encoded Uri string where each component exists (or -1 if it does not exist) and by doing this the runtimes of the IUri functions will be reduced since the function will already know where to look inside the encoded string for the component it needs. A drawback to this design is that each Uri struct will be bloated with a decent amount of ints which may or may not be used depending on the type of the URI that the IUri represents. The second approach I was thinking of is to not store any locations inside the Uri struct and to compute them on the fly every time the IUri is queried for one of its components, this would result in a smaller memory footprint of the Uri structure but will increase the runtimes of all the functions that access the URI. I was wondering if anyone might have suggestions for which way they think might be better.
Any input will be greatly appreciated!
-Thomas Mullaly
On 3/31/2010 02:57, Thomas Mullaly wrote:
Hi, my name is Thomas Mullaly and I am a undergraduate in the Computer Science department at Kent State University and I would very much like to participate in this years GSoC. I saw under your project ideas page that the IUri API still needs implemented and I thought that this would be a good project for me, but, before I submit a proposal on it I have a few questions about the project itself.
Hi, Thomas, and welcome.
Firstly, on the project page it says that the main goal is to have the IUri interface and CreateUri function implemented, but, on MSDN they also have functions and interfaces for creating/manipulating IUriBuilder's and I was wondering if these were also part of the project goals. If not can they be or would this be to ambitious to have finished by the end of the summer.
Right, a complete IUri with corresponding tests will be enough for a summer project I think. After a brief look at IUriBuilder I think it doesn't depend on a IUri implementation details so much. For IUriBuilder one way I see is to track changed properties and store only new data, using unchanged properties from supplied IUri, but this needs some tests (does it keep reference for IUri for example or not).
Secondly (more of a design question), I see that the Uri structure and functions are already stubbed out in the "dlls/urlmon/uri.c" file and I was thinking for my implementation I would add another BSTR* member to the Uri struct, which will point to the encoded version of the URI (which will be generated during the CreateUri() call). Since most of the functions that interact with the IUri return components of the URI (e.g. scheme, host, query, etc.) I was thinking about adding more data members to the Uri struct which store the location in the encoded Uri string where each component exists (or -1 if it does not exist) and by doing this the runtimes of the IUri functions will be reduced since the function will already know where to look inside the encoded string for the component it needs. A drawback to this design is that each Uri struct will be bloated with a decent amount of ints which may or may not be used depending on the type of the URI that the IUri represents. The second approach I was thinking of is to not store any locations inside the Uri struct and to compute them on the fly every time the IUri is queried for one of its components, this would result in a smaller memory footprint of the Uri structure but will increase the runtimes of all the functions that access the URI. I was wondering if anyone might have suggestions for which way they think might be better.
You could use dynamic array for that or a list with a Uri_PROPERTY value as a key for example and a data as an offset and length. Another way is to compute each property offset only when it's requested and store it. An obvious bad case for that is a long uri. So probably one pass property computation while building IUri instance is not bad.
Waiting for Jacek comments.
Any input will be greatly appreciated!
-Thomas Mullaly
You could use dynamic array for that or a list with a Uri_PROPERTY value as a key for example and a data as an offset and length. Another way is to compute each property offset only when it's requested and store it. An obvious bad case for that is a long uri. So probably one pass property computation while building IUri instance is not bad.
I like the idea of making a lightweight data structure which stores the offset and length for each component property. I'd imagine it would look something like this:
typedef struct { DWORD offset; DWORD length; } UriComponent;
Although it becomes a little more tricky on how to store the UriComponents, but, I have a few ideas if anyone has any suggestions.
I do like the idea of using an array inside the Uri struct to store the UriComponents but not all of the values in the Uri_PROPERTY enum actually mean anything (at least thats what I have gathered from reading the MSDN docs), like the Uri_PROPERTY_STRING_START and the Uri_PROPERTY_STRING_LAST are just there to say all the enum values between >= START and <= LAST correspond the string components of the URI.
So I'm thinking the Uri struct should have a constant size array of UriComponents of length Uri_PROPERTY_STRING_LAST (which would be 15.. correct me if I'm wrong).
So it would look something like...
typedef struct { /** The other stuff */
BSTR *uri; UriComponents components[15]; } Uri;
and then for the GetPropertyBSTR(BSTR *component, Uri_PROPERTY prop) function you could just have something like.
if(prop >= Uri_PROPERTY_START && prop <= Uri_PROPERTY_LAST) { UriComponent comp; comp = uri->components[prop];
/** Parse the component out */ }
And that should get you the necessary offsets and lengths for the component you need.
I also like the idea suggested before using a one-pass solution to find everything when the Uri is constructed.
Thank you for the quick responses and suggestions, I hope to have a proposal ready in the next few days.
Hi Thomas,
On 03/31/2010 04:15 AM, Thomas Mullaly wrote:
You could use dynamic array for that or a list with a Uri_PROPERTY value as a key for example and a data as an offset and length. Another way is to compute each property offset only when it's requested and store it. An obvious bad case for that is a long uri. So probably one pass property computation while building IUri instance is not bad.
I like the idea of making a lightweight data structure which stores the offset and length for each component property. I'd imagine it would look something like this:
typedef struct { DWORD offset; DWORD length; } UriComponent;
Although it becomes a little more tricky on how to store the UriComponents, but, I have a few ideas if anyone has any suggestions.
I do like the idea of using an array inside the Uri struct to store the UriComponents but not all of the values in the Uri_PROPERTY enum actually mean anything (at least thats what I have gathered from reading the MSDN docs), like the Uri_PROPERTY_STRING_START and the Uri_PROPERTY_STRING_LAST are just there to say all the enum values between >= START and <= LAST correspond the string components of the URI.
So I'm thinking the Uri struct should have a constant size array of UriComponents of length Uri_PROPERTY_STRING_LAST (which would be 15.. correct me if I'm wrong).
So it would look something like...
typedef struct { /** The other stuff */
BSTR *uri; UriComponents components[15]; } Uri;
and then for the GetPropertyBSTR(BSTR *component, Uri_PROPERTY prop) function you could just have something like.
if(prop >= Uri_PROPERTY_START && prop <= Uri_PROPERTY_LAST) { UriComponent comp; comp = uri->components[prop];
/** Parse the component out */ }
And that should get you the necessary offsets and lengths for the component you need.
I also like the idea suggested before using a one-pass solution to find everything when the Uri is constructed.
Thank you for the quick responses and suggestions, I hope to have a proposal ready in the next few days.
In general, the idea looks right, that's how it probably should be implemented. This is an implementation detail through. The bigger and more important problem is URI parsing and canonicalization. That's where most of work needs to be done. In this case tests will be also very important. You don't know how it should work until you have a test. The first step would be to write a test infrastructure some tests (adding new test shouldn't be harder than filling a table with more data). Once it's done, you'll be able to decide on best way to implement parser and IUri interface. The project should result in many tests and good support for at least more useful flags and IUri functions.
Thanks, Jacek
In general, the idea looks right, that's how it probably should be implemented. This is an implementation detail through. The bigger and more important problem is URI parsing and canonicalization. That's where most of work needs to be done. In this case tests will be also very important. You don't know how it should work until you have a test. The first step would be to write a test infrastructure some tests (adding new test shouldn't be harder than filling a table with more data). Once it's done, you'll be able to decide on best way to implement parser and IUri interface. The project should result in many tests and good support for at least more useful flags and IUri functions.
Hi Jacek,
Sorry for my delayed response. Thank you for your suggestions.
For the testing infrastructure, I was thinking about writing a few Windows programs that use Microsoft's IUri implementation to generate the results that my testing infrastructure would use to make sure my implementation is working correctly. Is this the right approach or would you recommend doing it another way?
Also, I have finished a rough draft of my proposal and I was wondering if it would appropriate to post to it to the mailing list in order to receive feedback from you and others.
Hi Thomas,
On 4/8/10 3:43 AM, Thomas Mullaly wrote:
In general, the idea looks right, that's how it probably should be implemented. This is an implementation detail through. The bigger and more important problem is URI parsing and canonicalization. That's where most of work needs to be done. In this case tests will be also very important. You don't know how it should work until you have a test. The first step would be to write a test infrastructure some tests (adding new test shouldn't be harder than filling a table with more data). Once it's done, you'll be able to decide on best way to implement parser and IUri interface. The project should result in many tests and good support for at least more useful flags and IUri functions.
Hi Jacek,
Sorry for my delayed response. Thank you for your suggestions.
For the testing infrastructure, I was thinking about writing a few Windows programs that use Microsoft's IUri implementation to generate the results that my testing infrastructure would use to make sure my implementation is working correctly. Is this the right approach or would you recommend doing it another way?
Test should be integrated with Wine tests. See dlls/shlwapi/tests/url.c and dlls/wininet/tests/url.c for an idea of how it should be done.
Also, I have finished a rough draft of my proposal and I was wondering if it would appropriate to post to it to the mailing list in order to receive feedback from you and others.
If you have specific questions, feel free to ask here. Proposal itself should be posted to gsoc app. It's capable of editing proposals and getting feedback.
Jacek