Hi all, this message was originally sent to Alexandre Juilliard; I am resending this to the list on his suggestion.
I am a wine user and a regular mailing list lurker. Beside that, however, I earn my living as an academic econometrician (halfway between an economist and a statistician) and I was considering the idea of doing some econometric research on open source development.
At least initially, I would focus on Wine, for a number or reasons, not least the fact that the wine project is largely unknown in the economics community. My problem is that the data I need for my analysis can be theoretically extracted from the mailing list archives, but this definitely exceeds my VERY limited perl skills.
The data I'd need would, ideally, be in the form of a CSV file, a typical record of which would be
DATE,PN,PLA,PLD,PCN,CN,CLA,CLD
where
PN = number of patches received on day DATE PLA = number of code lines added in patches PLD = number of code lines deleted in patches PCN = number of patches contributors (ie number of coders submitting patches on that particular day) CN = CVS commits CLA = number of lines added in CVS CLD = number of lines deleted in CVS
Of course, the longer the time span the data cover, the larger my sample, the happier I am.
I understand this isn't trivial. Is there anyone out there willing to help me? I know some perl, so maybe all I need is to get started somehow.
Thanks in advance, and keep up the good work!
----------------------------------------------------------
Riccardo `Jack' Lucchetti Dipartimento di Economia Università di Ancona
jack@dea.unian.it http://www.econ.unian.it/lucchetti