Hi again,
Sorry for the double post and the stupid default From-Header (not my machine). I replaced my modified comm.c with the latest CVS, made the changes as described below and still got communication errors (the device replies that commands are malformed). So I attempted to apply the changes I made one-by-one to see when the errors disappear. And the first (and easiest) change helped: When replacing both tcflush(fd,TCOFLUSH) by tcdrain(fd) in the PurgeComm function all errors disappear and the program works as expected. Now, I know that the two functions do different things and according to the API specification of PurgeComm tcflush seems more appropriate. So I wonder if this is not really a fix but only a workaround for a different problem (maybe a race condition as mentioned earlier?).
I know that G-Ware does some funny (stupid?) things and there are always at least 2 threads running that poll for input/output. One effect is that it still opens ~200 file handles to the port but, more importantly, maybe they interact in a way it shouldn't happen?
Anyone?
----- Original Message ----- From: Mr Cihan ALTINAY cihan@uq.edu.au Date: Tuesday, November 8, 2005 10:13 am Subject: kernel/comm.c - page fault in thread
Hi,
I finally got G-Ware to work flawlessly under wine after some tweaks in the comm.c file. One problem was in the (thread) function COMM_WaitCommEventService(). When I 'disconnect' in the client program it calls purge and sets the event mask to 0 (ie. it is no longer interested in events from the port). But there are still threads running that poll the port.
The program crashes when the thread exits the loop (with abort!=0) and tries to set *commio->buffer = rc; (line 2010) but this buffer is already freed by the client. Replacing
if (abort) rc = 0; *commio->buffer = rc;
by
if (!abort) *commio->buffer = rc;
fixes this problem.
There were other communication errors which I managed to fix but I have to find out first which changes are really necessary to make things work because I tried a lot of things in comm.c. It looks as if there are race conditions when more than one polling thread is running. Currently I check if a thread is already running and don't start a second one if so. But I will find out if that is responsible for the errors or something else and report it on the list.
Cheers, Cihan
"Cihan" == Cihan ALTINAY cihan@uq.edu.au writes:
Cihan> Hi again, Sorry for the double post and the stupid default Cihan> From-Header (not my machine). I replaced my modified comm.c with Cihan> the latest CVS, made the changes as described below and still got Cihan> communication errors (the device replies that commands are Cihan> malformed). So I attempted to apply the changes I made one-by-one Cihan> to see when the errors disappear. And the first (and easiest) Cihan> change helped: When replacing both tcflush(fd,TCOFLUSH) by Cihan> tcdrain(fd) in the PurgeComm function all errors disappear and Cihan> the program works as expected. Now, I know that the two Cihan> functions do different things and according to the API Cihan> specification of PurgeComm tcflush seems more appropriate. So I Cihan> wonder if this is not really a fix but only a workaround for a Cihan> different problem (maybe a race condition as mentioned earlier?).
Cihan> I know that G-Ware does some funny (stupid?) things and there are Cihan> always at least 2 threads running that poll for input/output. One Cihan> effect is that it still opens ~200 file handles to the port but, Cihan> more importantly, maybe they interact in a way it shouldn't Cihan> happen?
Obvious my WaitCommEvent implementation is not right for G-Ware (b.t.w.: any pointers to a dwonloadable version?). Can you write test case for to show where the current implementation is at fault? Maybe the server needs to be involved...
Uwe Bonnes wrote:
"Cihan" == Cihan ALTINAY cihan@uq.edu.au writes:
Cihan> Hi again, Sorry for the double post and the stupid default Cihan> From-Header (not my machine). I replaced my modified comm.c with Cihan> the latest CVS, made the changes as described below and still got Cihan> communication errors (the device replies that commands are Cihan> malformed). So I attempted to apply the changes I made one-by-one Cihan> to see when the errors disappear. And the first (and easiest) Cihan> change helped: When replacing both tcflush(fd,TCOFLUSH) by Cihan> tcdrain(fd) in the PurgeComm function all errors disappear and Cihan> the program works as expected. Now, I know that the two Cihan> functions do different things and according to the API Cihan> specification of PurgeComm tcflush seems more appropriate. So I Cihan> wonder if this is not really a fix but only a workaround for a Cihan> different problem (maybe a race condition as mentioned earlier?). Cihan> I know that G-Ware does some funny (stupid?) things and there are Cihan> always at least 2 threads running that poll for input/output. One Cihan> effect is that it still opens ~200 file handles to the port but, Cihan> more importantly, maybe they interact in a way it shouldn't Cihan> happen?
Obvious my WaitCommEvent implementation is not right for G-Ware (b.t.w.: any pointers to a dwonloadable version?). Can you write test case for to show where the current implementation is at fault? Maybe the server needs to be involved...
You can find G-Ware 5.0.6 here as posted in the first message: http://www.clearone.com/docs/downloads/G-Ware5.0.6.zip [20MB]
However, without a suitable echo canceller device there is probably not much you can see. If you like I can send a short log of the input/output behaviour under windows and under wine to show the difference (basically the input seems to accumulate more under wine and it happens more often that we read >200 bytes at once whereas this doesn't happen under windows). I would like to write a test case but we don't have 'real' Windows [here at university]. Instead, we were using VMWare up till now just to get this program running (that's why it's so important for us to be able to use wine).
In regards to this I found out something else today while testing: When I run the program under wine with the changes applied I mentioned before, then everything works fine. Starting vmware with a serial port enabled and closing it down again and trying to run G-Ware under wine again breaks things: I get 'Timeout' errors and the like. It took me a while to see the reason: Under wine I get extra bytes inserted in the data and a check of the serial parameters shows why - vmware enables the INPCK flag of the serial port when exiting (who knows why) and wine doesn't reset the the flags but only OR's and AND's it with flags needed. I would say vmware is not behaving correctly but on the other hand we never set the input flags to a fixed state which makes things unpredictable. Any comments?
Cheers, Cihan
"Cihan" == Cihan Altinay cihan@uq.edu.au writes:
Cihan> Uwe Bonnes wrote: >>>>>>> "Cihan" == Cihan ALTINAY cihan@uq.edu.au writes: >> Cihan> Hi again, Sorry for the double post and the stupid default Cihan> From-Header (not my machine). I replaced my modified comm.c with Cihan> the latest CVS, made the changes as described below and still got Cihan> communication errors (the device replies that commands are Cihan> malformed). So I attempted to apply the changes I made one-by-one Cihan> to see when the errors disappear. And the first (and easiest) Cihan> change helped: When replacing both tcflush(fd,TCOFLUSH) by Cihan> tcdrain(fd) in the PurgeComm function all errors disappear and Cihan> the program works as expected. Now, I know that the two Cihan> functions do different things and according to the API Cihan> specification of PurgeComm tcflush seems more appropriate. So I Cihan> wonder if this is not really a fix but only a workaround for a Cihan> different problem (maybe a race condition as mentioned earlier?). >> Cihan> I know that G-Ware does some funny (stupid?) things and there are Cihan> always at least 2 threads running that poll for input/output. One Cihan> effect is that it still opens ~200 file handles to the port but, Cihan> more importantly, maybe they interact in a way it shouldn't Cihan> happen? >> Obvious my WaitCommEvent implementation is not right for G-Ware >> (b.t.w.: any pointers to a dwonloadable version?). Can you write test >> case for to show where the current implementation is at fault? Maybe >> the server needs to be involved...
Cihan> You can find G-Ware 5.0.6 here as posted in the first message: Cihan> http://www.clearone.com/docs/downloads/G-Ware5.0.6.zip [20MB]
Cihan> However, without a suitable echo canceller device there is Cihan> probably not much you can see. If you like I can send a short log Cihan> of the input/output behaviour under windows and under wine to Cihan> show the difference (basically the input seems to accumulate more Cihan> under wine and it happens more often that we read >200 bytes at Cihan> once whereas this doesn't happen under windows). I would like to Cihan> write a test case but we don't have 'real' Windows [here at Cihan> university]. Instead, we were using VMWare up till now just to Cihan> get this program running (that's why it's so important for us to Cihan> be able to use wine).
If tests show a difference in behaviour betweem Wine and VMWare, I guess wine will be wrong. I can than test with my XP machine, if you send me the test.
Cihan> In regards to this I found out something else today while Cihan> testing: When I run the program under wine with the changes Cihan> applied I mentioned before, then everything works fine. Starting Cihan> vmware with a serial port enabled and closing it down again and Cihan> trying to run G-Ware under wine again breaks things: I get Cihan> 'Timeout' errors and the like. It took me a while to see the Cihan> reason: Under wine I get extra bytes inserted in the data and a Cihan> check of the serial parameters shows why - vmware enables the Cihan> INPCK flag of the serial port when exiting (who knows why) and Cihan> wine doesn't reset the the flags but only OR's and AND's it with Cihan> flags needed. I would say vmware is not behaving correctly but Cihan> on the other hand we never set the input flags to a fixed state Cihan> which makes things unpredictable. Any comments?
Here the test would be: Open the port, set the INPCK flag. Then close it and then try to provoke the error you observe.
I guess wine should reset the parity check flag...
Bye
Uwe Bonnes wrote:
"Cihan" == Cihan Altinay cihan@uq.edu.au writes:
Cihan> Uwe Bonnes wrote: >>>>>>> "Cihan" == Cihan ALTINAY <cihan@uq.edu.au> writes: >> Cihan> Hi again, Sorry for the double post and the stupid default Cihan> From-Header (not my machine). I replaced my modified comm.c with Cihan> the latest CVS, made the changes as described below and still got Cihan> communication errors (the device replies that commands are Cihan> malformed). So I attempted to apply the changes I made one-by-one Cihan> to see when the errors disappear. And the first (and easiest) Cihan> change helped: When replacing both tcflush(fd,TCOFLUSH) by Cihan> tcdrain(fd) in the PurgeComm function all errors disappear and Cihan> the program works as expected. Now, I know that the two Cihan> functions do different things and according to the API Cihan> specification of PurgeComm tcflush seems more appropriate. So I Cihan> wonder if this is not really a fix but only a workaround for a Cihan> different problem (maybe a race condition as mentioned earlier?). >> Cihan> I know that G-Ware does some funny (stupid?) things and there are Cihan> always at least 2 threads running that poll for input/output. One Cihan> effect is that it still opens ~200 file handles to the port but, Cihan> more importantly, maybe they interact in a way it shouldn't Cihan> happen? >> Obvious my WaitCommEvent implementation is not right for G-Ware >> (b.t.w.: any pointers to a dwonloadable version?). Can you write test >> case for to show where the current implementation is at fault? Maybe >> the server needs to be involved... Cihan> You can find G-Ware 5.0.6 here as posted in the first message: Cihan> http://www.clearone.com/docs/downloads/G-Ware5.0.6.zip [20MB] Cihan> However, without a suitable echo canceller device there is Cihan> probably not much you can see. If you like I can send a short log Cihan> of the input/output behaviour under windows and under wine to Cihan> show the difference (basically the input seems to accumulate more Cihan> under wine and it happens more often that we read >200 bytes at Cihan> once whereas this doesn't happen under windows). I would like to Cihan> write a test case but we don't have 'real' Windows [here at Cihan> university]. Instead, we were using VMWare up till now just to Cihan> get this program running (that's why it's so important for us to Cihan> be able to use wine).
If tests show a difference in behaviour betweem Wine and VMWare, I guess wine will be wrong. I can than test with my XP machine, if you send me the test.
I studied the test cases in tests/comm.c but I am not sure how to implement a test that requires input from the serial port. I saw the loopback possibility but I cannot test it. Do I need to write a test case for the first issue as well (where *commio->buffer is written to after it is already freed)? It seems quite obvious that the thread may still be running after the client frees its buffers.
Cihan> In regards to this I found out something else today while Cihan> testing: When I run the program under wine with the changes Cihan> applied I mentioned before, then everything works fine. Starting Cihan> vmware with a serial port enabled and closing it down again and Cihan> trying to run G-Ware under wine again breaks things: I get Cihan> 'Timeout' errors and the like. It took me a while to see the Cihan> reason: Under wine I get extra bytes inserted in the data and a Cihan> check of the serial parameters shows why - vmware enables the Cihan> INPCK flag of the serial port when exiting (who knows why) and Cihan> wine doesn't reset the the flags but only OR's and AND's it with Cihan> flags needed. I would say vmware is not behaving correctly but Cihan> on the other hand we never set the input flags to a fixed state Cihan> which makes things unpredictable. Any comments?
Here the test would be: Open the port, set the INPCK flag. Then close it and then try to provoke the error you observe.
I guess wine should reset the parity check flag...
Sorry I just found out that the extra bytes are actually caused by the PARMRK input flag. Interestingly vmware leaves this (and the INPCK) flag on while G-Ware is running and it still works so it seems vmware is handling the mark character without passing it on.
I wrote a little test program to confirm that this time. All it does is: - Get current tty settings via ioctl - toggle the PARMRK flag - Write back settings
(1) Run vmware->Run wine->fail (getting bogus 0xff in data stream) (2) toggle flag->run wine->success (3) toggle flag->run wine->fail etc.
I found in the documentation that the PARMRK flag duplicates 0xff in the stream to avoid confusion with the actual error character. I verified that by inspecting the stream with/without flag set. Can we simply clear the PARMRK flag in wine or is there something similar under windows?
Cheers, Cihan
"Cihan" == Cihan Altinay cihan@uq.edu.au writes:
... Cihan> I studied the test cases in tests/comm.c but I am not sure how to Cihan> implement a test that requires input from the serial port. I saw Cihan> the loopback possibility but I cannot test it. Do I need to Cihan> write a test case for the first issue as well (where Cihan> *commio->buffer is written to after it is already freed)? It Cihan> seems quite obvious that the thread may still be running after Cihan> the client frees its buffers.
If you need input into the serial port, consider using some kind of loopback. Either use the plug with the appropriate pins shorted , or use two serial lines with a crossover cable.
Where do you live. I could consider sending you the plug..
Cihan> I found in the documentation that the PARMRK flag duplicates 0xff Cihan> in the stream to avoid confusion with the actual error Cihan> character. I verified that by inspecting the stream with/without Cihan> flag set. Can we simply clear the PARMRK flag in wine or is Cihan> there something similar under windows?
Clear all those offending flags and write a comment that somebody looking in the code can understand.
Uwe Bonnes wrote:
If you need input into the serial port, consider using some kind of loopback. Either use the plug with the appropriate pins shorted , or use two serial lines with a crossover cable.
Where do you live. I could consider sending you the plug..
I am currently in Australia so I guess it wouldn't be possible although it would help a lot.
Cihan> I found in the documentation that the PARMRK flag duplicates 0xff Cihan> in the stream to avoid confusion with the actual error Cihan> character. I verified that by inspecting the stream with/without Cihan> flag set. Can we simply clear the PARMRK flag in wine or is Cihan> there something similar under windows?
Clear all those offending flags and write a comment that somebody looking in the code can understand.
Can I just do that around line 1106 where c_iflag is initialized? I will send a patch out shortly.
I am still trying to figure out what the problem might be when using purge to clear the output buffer. When G-Ware writes subsequently to the port it looks like this: 1) Purge (input/output) 2) Write 3) SetWaitMask (RXCHAR|RXFLAG|CTS|DSR|RLSD|BRK|ERR) 4) WaitCommEvent (4a) Read input if there's something) 5) Goto 1)
As you can see there is no TXEMPTY flag so G-Ware seems to rely on the buffers to be emptied beforehand and then just calls Purge before writing again. Just to make sure I wrote a short program that does the following: 1) Write command 2) Purge 3) Write another command (Without any delay). Obviously some bytes from the first command are flushed away and thus the device indeed returns the same error value I get under G-Ware. I can prevent it by using tcdrain() or insert a sleep with a large enough value. I also tried to _see_ it by using a tty instead of the serial port but interestingly that does work, ie. no bytes are lost. Uwe, could you try and see what happens when you use the loopback and do a Write-Purge-Write sequence under both, wine and windows?
One more thing regarding the page fault in the EventService: We obviously have to set the buffer to 0 when the event mask changes because that's what the API spec says. But maybe we have to monitor which threads exist and wake them up when SetCommMask is called so that they finish their work before SetCommMask returns. MSDN says: "...WaitCommEvent returns immediately"
Sorry for my long posts, I hope we can close this thread soon...
Cheers, Cihan
"Cihan" == Cihan Altinay cihan@uq.edu.au writes:
Cihan> Uwe Bonnes wrote: >> If you need input into the serial port, consider using some kind of >> loopback. Either use the plug with the appropriate pins shorted , or >> use two serial lines with a crossover cable. >> >> Where do you live. I could consider sending you the plug..
Cihan> I am currently in Australia so I guess it wouldn't be possible Cihan> although it would help a lot.
It's only a RS232 9 pin connector with 3 solder blobs. If you have a solder iron, easily done yourself.
Cihan> I found in the documentation that the PARMRK flag duplicates 0xff Cihan> in the stream to avoid confusion with the actual error Cihan> character. I verified that by inspecting the stream with/without Cihan> flag set. Can we simply clear the PARMRK flag in wine or is Cihan> there something similar under windows? >> Clear all those offending flags and write a comment that somebody >> looking in the code can understand.
Cihan> Can I just do that around line 1106 where c_iflag is initialized? Cihan> I will send a patch out shortly.
Looks reasonable
Cihan> I am still trying to figure out what the problem might be when Cihan> using purge to clear the output buffer. When G-Ware writes Cihan> subsequently to the port it looks like this: Cihan> 1) Purge (input/output) Cihan> 2) Write Cihan> 3) SetWaitMask (RXCHAR|RXFLAG|CTS|DSR|RLSD|BRK|ERR) Cihan> 4) WaitCommEvent Cihan> (4a) Read input if there's something) Cihan> Goto 1)
Cihan> As you can see there is no TXEMPTY flag so G-Ware seems to rely Cihan> on the buffers to be emptied beforehand and then just calls Purge Cihan> before writing again. Just to make sure I wrote a short program Cihan> that does the following: Cihan> 1) Write command Cihan> 2) Purge Cihan> 3) Write
Cihan> another command (Without any delay). Obviously some bytes from Cihan> the first command are flushed away and thus the device indeed Cihan> returns the same error value I get under G-Ware.
Here you forgot the behaviour of the actual device. It probably only reacts when a complete command is written and terminated in some way, perhaps a CR or something like that. If G-Ware send this "magic" byte as the last command, no need fot tcdrain or such....
Cihan> I can prevent it Cihan> by using tcdrain() or insert a sleep with a large enough value. Cihan> I also tried to _see_ it by using a tty instead of the serial Cihan> port but interestingly that does work, ie. no bytes are lost. Cihan> Uwe, could you try and see what happens when you use the loopback Cihan> and do a Write-Purge-Write sequence under both, wine and windows?
I am sure that bytes will get lost without any kind of line dicipline...
Cihan> One more thing regarding the page fault in the EventService: We Cihan> obviously have to set the buffer to 0 when the event mask changes Cihan> because that's what the API spec says. But maybe we have to Cihan> monitor which threads exist and wake them up when SetCommMask is Cihan> called so that they finish their work before SetCommMask returns. Cihan> MSDN says: "...WaitCommEvent returns immediately"
Bye
Uwe Bonnes wrote:
Cihan> I am currently in Australia so I guess it wouldn't be possible Cihan> although it would help a lot.
It's only a RS232 9 pin connector with 3 solder blobs. If you have a solder iron, easily done yourself.
Ok, I'll see if it is still necessary.
Cihan> I am still trying to figure out what the problem might be when Cihan> using purge to clear the output buffer. When G-Ware writes Cihan> subsequently to the port it looks like this: Cihan> 1) Purge (input/output) Cihan> 2) Write Cihan> 3) SetWaitMask (RXCHAR|RXFLAG|CTS|DSR|RLSD|BRK|ERR) Cihan> 4) WaitCommEvent Cihan> (4a) Read input if there's something) Cihan> Goto 1) Cihan> As you can see there is no TXEMPTY flag so G-Ware seems to rely Cihan> on the buffers to be emptied beforehand and then just calls Purge Cihan> before writing again. Just to make sure I wrote a short program Cihan> that does the following: Cihan> 1) Write command Cihan> 2) Purge Cihan> 3) Write Cihan> another command (Without any delay). Obviously some bytes from Cihan> the first command are flushed away and thus the device indeed Cihan> returns the same error value I get under G-Ware.
Here you forgot the behaviour of the actual device. It probably only reacts when a complete command is written and terminated in some way, perhaps a CR or something like that. If G-Ware send this "magic" byte as the last command, no need fot tcdrain or such....
It's a bit more complicated. When connecting to the device G-Ware reads out all stored values (quite a lot of traffic). Most of the commands to do that are 16 or 96 bytes long and are not terminated by any character (the command length is fixed depending on the command code). It happens quite frequently that commands (mostly the shorter ones) are sent out one after another in one string (ie. 32/192 bytes) with no problem. So the device actually decodes the command and interprets the correct amount of bytes that follow as parameters or whatever and then continues with the next command after sending a reply. What seems to happen under wine is that a command string is put in the transmit buffer and before it is actually sent out completely the buffer is flushed and a new command is sent out resulting in an error because the parameters are wrong (or sometimes 'command too long').
What I don't know is why this happens under wine and I tried a lot to find out... any help would be appreciated.
Cihan> I can prevent it Cihan> by using tcdrain() or insert a sleep with a large enough value. Cihan> I also tried to _see_ it by using a tty instead of the serial Cihan> port but interestingly that does work, ie. no bytes are lost. Cihan> Uwe, could you try and see what happens when you use the loopback Cihan> and do a Write-Purge-Write sequence under both, wine and windows?
I am sure that bytes will get lost without any kind of line dicipline...
I would think so as well but maybe something that works differently under windows prevents this loss? Therefore I thought an actual test would clarify.
Cihan> One more thing regarding the page fault in the EventService: We Cihan> obviously have to set the buffer to 0 when the event mask changes Cihan> because that's what the API spec says. But maybe we have to Cihan> monitor which threads exist and wake them up when SetCommMask is Cihan> called so that they finish their work before SetCommMask returns. Cihan> MSDN says: "...WaitCommEvent returns immediately"
Sorry, forgot to ask for comments here. Is it possible to do what I suggest? Does it sound right at all?
Thanks a lot, Cihan