-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Watchdog failures #171
Comments
I am already looking into this issue, don't know the root cause yet. |
From my logs I can see that the protocol is correct, but the response is significantly delayed:
14:38:49.051 - Sending VR on init It is interesting that I don't have this issue on my testing environment, only on production. |
I have no knowledge of the module internals, but my best theory is that it puts all requests in a queue. On startup, we are doing a lot of attribute reads, which takes time (especially if a remote device is off or has poor connection quality), so the response even for a local command gets delayed. The lock didn't help because it didn't cover all possible frames, only commands. I think that the module is capable of processing several requests in parallel, but when their number gets high enough we have a queue saturation condition. So I would suggest two things to address it:
|
@Shulyaka If you have time to test, see if zigpy==0.60.1 (https://github.com/zigpy/zigpy/releases/tag/0.60.1) fixes things for you. I think pausing the watchdog during high-load periods will be enough of a stopgap to fix the current issue. |
Will test today. |
Nope, it does not solve the issue for XBee. |
This seems like a bug with the radio library. Perhaps there should be a global concurrency limit for requests? It should be possible to multiplex Zigbee requests along with normal radio traffic without completely locking up the XBee like this for 60+ seconds. |
The specs don't mention such limit, at least I could not find it. |
I've merged the PR to disable the watchdog for now as we can always re-enable it later. I think #173 may be the proper fix for this problem but I don't have a real XBee network to test it with. If you have time to try it out some time in the future, let me know how it goes (and what concurrency limit allows you to control a lot of devices at once without any retrying from ZHA). |
#170 added a watchdog command (
VR
) that is sent every 30s to ensure the radio is still alive. It seems that the XBee serial protocol can't handle this and the command times out, causing a restart. If I add anasyncio.Lock
around_at_partial
it doesn't seem to help either.@Shulyaka Are you familiar with the serial protocol? Do you happen to know why this would be the case?
Below is my patchset to enable a send lock:
The text was updated successfully, but these errors were encountered: