On 8 Jan 2003, Paul Millar paulm@astro.gla.ac.uk wrote:
Hi Martin,
On a similar note, I've been investigating OpenMOSIX on and off (mostly off) for doing distributed compiling. In general, I've found it not particularly stable (although its improving) and the performance not particularly great -- although I've not had time to investigate the causes of these problems. Besides, compiling code is quite a tricky task for it to schedule/migrate.
I haven't tried it myself but I would have expected the short, intense jobs generated by compilation to be a problem.
A reduction in compilation time by a factor of 2.6 (for three machines) isn't bad at all!
If you have the time, could you repeat this with different values for the -j option? Timing how long it takes with -j2 (and perhaps -j3 and -j4) would be interesting.
Yes, I've been meaning to add that to the benchmark tool. I'll let you know.
Also, repeat the compilation several times for each value of -j and taking the mean and standard deviation would be interesting too.
In informal measurements it seems to be quite reproducible. The sd is typically only a few percent of the overall time. But that might be an interesting option to add too.
I've just tried distcc and its wonderful! I haven't benchmarked it yet, but subjectively (from watching the nodes' load-avr) distcc seems to give *much* more even loading that just running OpenMOSIX. If anyone has more than one machine at their disposal, I'd strongly recommend investigating distcc.
On Thu, 9 Jan 2003, Martin Pool wrote:
I haven't tried it myself but I would have expected the short, intense jobs generated by compilation to be a problem [for OpenMOSIX]
Yes, I agree. But, I think OM still has a role to play. Running OM underneath distcc should help improve the mean performance (in a heterogeneous cluster). Whenever a faster node is unloaded whilst a slower node is busy compiling (and this situation lasts for any length of time) OM should migrate that process to the faster node, speeding up compilation. That might occur just before linking, for example.
I guess the improvement will depend strongly on the composition of the cluster and the code you're compiling. The improvement (from running OM) might be marginal in certain cases, but I don't think it would make things worse.
Hmmm, time for some more experiments.
---- Paul Millar
On 14 Jan 2003, Paul Millar paulm@astro.gla.ac.uk wrote:
I've just tried distcc and its wonderful!
Thanks. :-)
I haven't benchmarked it yet, but subjectively (from watching the nodes' load-avr) distcc seems to give *much* more even loading that just running OpenMOSIX. If anyone has more than one machine at their disposal, I'd strongly recommend investigating distcc.
On Thu, 9 Jan 2003, Martin Pool wrote:
I haven't tried it myself but I would have expected the short, intense jobs generated by compilation to be a problem [for OpenMOSIX]
Yes, I agree. But, I think OM still has a role to play. Running OM underneath distcc should help improve the mean performance (in a heterogeneous cluster). Whenever a faster node is unloaded whilst a slower node is busy compiling (and this situation lasts for any length of time) OM should migrate that process to the faster node, speeding up compilation. That might occur just before linking, for example.
The argument against it is this: compiler processes have a large working set (>20MB, say, though it varies), do a lot of IO, and only run for a few seconds. On a 100Mbps network migration of a running process will take a few seconds, after which time many other processes may have started and stopped, so the load pattern may be very different. I think it may be difficult for OpenMOSIX to react fast enough to handle the condition you describe.
I have not personally benchmarked OpenMOSIX for this, so you should take the above with a pinch of salt.
I'd very much like to work with somebody with a >8 machine cluster to run distcc benchmarks and comparisons to SSI clusters.
The "grainy" nature of the workload makes it difficult for distcc to schedule optimally, although it should improve somewhat in the next few months.
This paper describes good results using MOSIX for software building:
http://www.mosix.cs.huji.ac.il/ftps/usenix.ps.gz
Thought they did spend USD $390,000, which is more than I can manage. :-)
I guess the improvement will depend strongly on the composition of the cluster and the code you're compiling. The improvement (from running OM) might be marginal in certain cases, but I don't think it would make things worse.
Well, it may use network bandwidth and CPU cycles for migration or overhead that might be better spent on either cc or distcc.
It's great that there is good free single-image clustering software. The point of distcc is just that you can distribute the particular task of compilation with a much simpler and less intrusive program.