On Mon, Aug 2, 2010 at 19:26, Max TenEyck Woodbury max@mtew.isa-geek.net wrote:
On 08/02/2010 01:04 PM, Gert van den Berg wrote:
There seem to be several interfaces to retreive MSDN articles... Some of those interface might be more stable / provide a way to retrieve a current link? (Framing the content would probably not be allowed, but retreiving links should be...)
If it *looks* like copying, it should be avoided.
What I meant is, that it would allow for things like some really nice side-to-side postings of Wine's documentation and MSDN... It is unlikely to be legal though... (And should therefore be avoided)
I think it will be necessary to regenerate articles as the Wine project evolves. I regenerated the 'dlls' page after Alexandre's CVS run today. There was some new content so I saved the result. I've regenerated the individual DLL pages several times as I improved the generating scripts. Both activities put a load on me that I am trying to automate. In fact, if you look at the original post before it got hijacked into a discussion of the project name, it asks about a way to improve that automation.
Since there will be fairly frequent semi-intelligent reviews of pages, such queries can probably be incorporated into that process.
Can you help me with this, please?
I can give an overview of what I think... I might do a bit of implementation, but finishing something bigger than a short script when I have other things to do isn't always easy...
Sourceforge provides quite decent hosting in the project hosting space as well... Automating things from there might be a good idea.
To handle MSDN, I can see the following (very) high level process: Posted MSDN links gets converted to point to redirect system in the project space. (The conversion retrieves and save some additional data). The redirect system will redirect the user to relevant content on current MSDN (allowing user to configure some parameters, such as view type / language)
What I managed to figure out about MSDN content for the Web services documentation this far: 1. Content is identified by a Content identifier, which can be any of the following: a. GUID b. Short ID (Short (~8 character) unique identifier such as ms224917) c. Alias ("Friendly string") d. asset ID (Documented here: http://msdn.microsoft.com/en-us/magazine/cc163541.aspx Can be used to retrieve other (non-URL) identifiers) e Content URL (The URL to an MSDN page) 2. A Content key uniquely identifies an article and consists of a content identifier, locale and version 3 The version.identifies the various versions of the documented item. e.g. the .NET version / VC++ version 4. Locale identifies preferred language
More detailed version: 1. Find MSDN URL and retrieve other identifiers (GetContent without a locale seem to retreive almost all the metadata ("partial match" in the docmentaiton)) 2. Store data in database, with GUID / short ID / assetID as primary key (they are always-present unique identifiers) other fields include alias, current URL. (content table) 3. Other tables: Locales and versions. Mapping tables between content and possible locales and versions hould also exist. The content table should save when the data was last updated and it should be automatically updated once it reaches a certain age and an user wants to retreive the page / select a non-default locale / version. 4. Generate new link pointing to the redirector system. The redirector system needs the unique content identifier (GUID / shortID / assetID) and preferably a version. It should be able to take a parameter to preview the URL (probably a smaller "options link") and allow the user to choose other locales / versions and generate links to those. This "content options" page should also allow problems with the redirector's link to be reported. The reports should be used to update the information for that link by querying it again from MSDN.
It might require several trips to MSDN to initially add an item / to update versions and locales. This should possibly scheduled once the data get really old (somewhere between 180 days and 1 year) and be run on demand when someone requests the content from somewhere where they might want to see the other versions / locales about half the scheduled update age. This should keep the request volumes to MSDN low...
MSDN changes and updates required: 1. Changes to web services interface: Update relevant parts (This shouldn't be too serious as long as the entire interface is not redesigned) 2. Link changes: Change how the URLs are generated from the identifiers in the database (I don't see an easy way to request an URL from the web services interface)
Other possible extensions: The "redirector" can build a more complete view of what is available to allow documentation to be found easier (the information should only be retrieved on demand and saved) and to prevent duplicate trips to MSDN. (This should eventually provide a nice "tree" of MSDN with reliable links to the content)
Gert