Hi Kuba,
On Thursday 06 Oct 2005 23:23, Kuba Ober wrote:
we can probably do better than inb() / outb().
You can't do any better than that [It] is the only one that makes sense (when you run things on ia32).
... and when you're not on an ia32 platform with a superIO chip?
Advantages of using ppdev over simple inb() / outb() are:
should support [*] cross-architecture (arm, alpha, powerpc, ...)
That'd be good for winelib only or wine-with-emulator (bochs? qemu?).
Yup, both. A ported applications (via winelib or qemu) should work under any Linux architecture. Unfortunately, it would be a Linux-specific solution; *-BSDs have their own interface.
should support [*] some esoteric devices (USB-parallel converters, ...)
At a huge performance penalty ;)
But it would work, 's my point. The performance of parallel-over-USB is a separate issue.
Legacy devices (such as parallel ports) are being gradually faded out. So writing code that requires a SuperIO chip is not best.
The overhead in doing a syscall isn't significant as any outb() operation takes ~1us anyway
AFAIK, the overhead stems from the fact that instead of a machine instruction you have to:
- process an exception in the kernel, which then signals SIGSEGV to the
process
- invoke the signal handler
- determine what's up and disassemble the instruction at CS:EIP
- invoke a function/syscall based on the disassembled instruction
If this isn't dog slow, I don't know what is. I wasn't entirely clear, the syscall is the least of our worries in fact :)
I think you may be confusing some other activity (maybe an invalid memory access?). A syscall is pretty simple. The application does some bookkeeping and calls int(errupt) 0x80, triggering the switch from user-land to kernel-land. The kernel then picks up the request and carries on. Its described here[1], although the details may have changed slightly with more recent kernels. There's no signalling (in the Unix user-land sense) going on.
[1] http://www.tldp.org/LDP/khg/HyperNews/get/syscall/syscall86.html
Overhead is "currently" (measured for 2.4.0) at slightly under 0.4us (see [2]). For 2.6-series kernels it may have gone down slightly further, but 0.4us would seem a reasonable upper-bound. Assuming the kernel driver is reasonably written, I'd make a complete guess that the overhead is between 0.4 and 0.6us (although I should benchmark the number :^).
[2] http://cs.nmu.edu/~benchmark/index.php?page=null_call
I suspect most programs designed to work under Win98 just hit the hardware, so obtaining permissions (doing ioperm() as root, for example) should work. If we have some mechanism for catching the program doing either inb() or outb(), then we could provide a better implement via the ppdev interface.
At the cost of slowing things down. For devices that bit bang data (like programmers), this makes things unacceptably slow.
I can't say I share that experience (about being unacceptably slow, that is). A 40-60% increase in overhead for a single instruction would be definitely noticeable, but only if this is the bottleneck in the program. Other activity takes longer (c.f. context-switching in [2], for example). Even just calling functions take order of 100ns (on my ~700MHz laptop). The time between successive changes of parallel port state might be (much) larger than the 400-600ns overhead in using kernel routines, so the overhead becomes less significant. Of course, this would be application specific.
The worse-case would be something driving the parallel port as a square-wave generator: you'd get the full 40-60% drop in performance (assuming all the above numbers). Perhaps slightly more realistically, the PLIP interface is reckoned[3] to have a 1.2Mbit/s bandwidth, corresponding to a ~3.33us turn-around time. Adding a 0.4-0.6us overhead would reduce the bandwidth to between 1.1Mbit/s and 1.0Mbit/s (8-16% performance drop). Would this matter? No, because if it did you'd go out and buy 100baseT cards and achieve far greater performance (or Myrinet, or ...).
[3] http://yara.ecn.purdue.edu/~pplinux/ppcluster.html
For the particular use-case you have in mind, my understanding is that programmers often require some additional delay mechanism to allow the EPROM to keep up (certainly for write, probably for reads too). This would reduce the impact of the performance hit, perhaps acceptably (or even imperceptibly) so.
Does all this matter? Probably not. I would bet you this smartee here that if a program is worrying about ns response of some function, then that function its good enough, and that some better "higher level" algorithmic optimisation would have a much larger benefit (e.g. ethernet vs PLIP).
Cheers,
Paul.
(apologies for the overly long email!)