Author Topic: Odd behavior of Gateway after last F/W update  (Read 60 times)

Nick_W

  • Full Member
  • ***
  • Posts: 211
    • View Profile
Odd behavior of Gateway after last F/W update
« on: January 13, 2017, 02:38:21 pm »
I would like to report some strange thing I am seeing after the last couple of F/W updates:

First, after the last upgrade, my previous problems connecting to WiFI have disappeared! My gateway is now connecting to WiFi properly again. I think this confirms that something in one of the recent upgrades broke WiFi - probably something to do with the gateway registry, as that was erased in the last upgrade.

After reconnecting to each module, mqtt data started to flow again.

Now, since I have implemented my timestamping system, I can see the delays from module to gateway to broker to me. Usually in the range of 3 seconds (but some at 140s), which could just be clock synchronization offset (I'm looking at delivery variation, not absolute values). This compares the timestamp from my server (from NTP) with the module timestamp (from the onboard RTC, set via the gateway).

However, from time to time, there will be a period when no mqtt data arrives on schedule, but then later, 2 or more messages arrive at the same time. I can see from the timestamps that they were sent on time correctly, but the gateway seems to be caching the messages, then delivers 2 or sometimes 3 messages all at once.

See this example output from my logging program: The timestamp on the left is from my server receiving the data from the broker (running on the server), the decoded timestamp is from the module.

Code: [Select]
[I 2017-01-13 15:22:37,548] MCThings/000111BC/BatteryVoltage : 3596.0
[I 2017-01-13 15:22:37,548] MCThings/000111BC/Rssi           : -79
[I 2017-01-13 15:22:38,917] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-13 15:19:48.727000
                                                               WARNING: Publishing Delay of : 170.191 s
                                                               {
                                                                 "time": "1484338788727",
                                                                 "Uptime": 73684,
                                                                 "UptimeString": "0:20:28:4",
                                                                 "BatteryVoltage": 3596,
                                                                 "Temperature": 23.125,
                                                                 "LowBattery": false
                                                               }
[I 2017-01-13 15:22:39,048] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-13 15:20:18.776000
                                                               WARNING: Publishing Delay of : 140.273 s
                                                               {
                                                                 "time": "1484338818776",
                                                                 "Uptime": 73714,
                                                                 "UptimeString": "0:20:28:34",
                                                                 "BatteryVoltage": 3596,
                                                                 "Temperature": 23.125,
                                                                 "LowBattery": false
                                                               }
[I 2017-01-13 15:23:07,973] MCThings/000111BC/LowBattery     : 0.0
[I 2017-01-13 15:23:07,974] MCThings/000111BC/Rssi           : -79
[I 2017-01-13 15:23:13,294] MCThings/000111BC/LowBattery     : 0.0
[I 2017-01-13 15:23:13,294] MCThings/000111BC/Rssi           : -79
[I 2017-01-13 15:23:23,293] MCThings/000111BC/LowBattery     : 0.0
[I 2017-01-13 15:23:23,293] MCThings/000111BC/Rssi           : -79
[I 2017-01-13 15:23:37,898] MCThings/000111BC/LowBattery     : 0.0
[I 2017-01-13 15:23:37,898] MCThings/000111BC/Rssi           : -79
[I 2017-01-13 15:23:40,309] MCThings/000111BC/LowBattery     : 0.0
[I 2017-01-13 15:23:40,309] MCThings/000111BC/Rssi           : -79
[I 2017-01-13 15:23:41,114] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-13 15:20:48.825000
                                                               WARNING: Publishing Delay of : 172.290 s
                                                               {
                                                                 "time": "1484338848825",
                                                                 "Uptime": 73744,
                                                                 "UptimeString": "0:20:29:4",
                                                                 "BatteryVoltage": 3596,
                                                                 "Temperature": 23.125,
                                                                 "LowBattery": false
                                                               }
[I 2017-01-13 15:23:41,243] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-13 15:21:18.874000
                                                               WARNING: Publishing Delay of : 142.369 s
                                                               {
                                                                 "time": "1484338878874",
                                                                 "Uptime": 73774,
                                                                 "UptimeString": "0:20:29:34",
                                                                 "BatteryVoltage": 3596,
                                                                 "Temperature": 23.125,
                                                                 "LowBattery": false
                                                               }
[I 2017-01-13 15:24:07,943] MCThings/000111BC/BatteryVoltage : 3596.0
[I 2017-01-13 15:24:07,944] MCThings/000111BC/Rssi           : -79
[I 2017-01-13 15:24:09,936] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-13 15:21:48.923000
                                                               WARNING: Publishing Delay of : 141.013 s
                                                               {
                                                                 "time": "1484338908923",
                                                                 "Uptime": 73804,
                                                                 "UptimeString": "0:20:30:4",
                                                                 "BatteryVoltage": 3596,
                                                                 "Temperature": 23.125,
                                                                 "LowBattery": false
                                                               }
[I 2017-01-13 15:24:23,292] MCThings/000111BC/BatteryVoltage : 3596.0
[I 2017-01-13 15:24:23,292] MCThings/000111BC/Rssi           : -77

This is from a module set to publish json data every 30 seconds (the other numbers are decoded beacons). I'm assuming that the 140s baseline is just clock offset (but another module gives a delay of 3 seconds - they were both uploaded today - so why the difference?).

Anyway, as you can see, the internal timestamps show that they are indeed sent 30s apart, but they are received in groups of 1,2 or sometimes 3.

My gateway is back in it's central location, and QoS is set to default (QoS.AtMostOnce I think.)

Is my gateway still messed up? it didn't use to do this before. All my modules are doing this, not just one. The only ones that seem to be OK, are the ones not sending json (but they don't have timestamps, so it's hard to tell).

Some modules mostly report correctly, with occasional "make up" publishing, others mostly publish in groups. They are all the same distance (with rssi in the -70 range) from the gateway.

I do seem to be missing some beacons also. This sort of thing was definitely not happening before the latest upgrades. For example, my door sensor would report door open/closed within 0.2 seconds of my other door sensors reporting. Now, I get a sequence of door open/close events minutes after to door has opened/closed.

This is a log from my door sensor (which has not updated it's RTC - not sure why...):

Code: [Select]
[I 2017-01-13 15:49:06,687] subscribing to MCThings/000111C2/#
[I 2017-01-13 15:49:08,820] MCThings/000111C2/Door : 0.0
[I 2017-01-13 15:49:08,820] MCThings/000111C2/Rssi : -72
[I 2017-01-13 15:49:10,075] MCThings/000111C2/Data : Decoded timestamp: 1999-12-31 17:20:07.264000
                                                     Date/time not updated on module
                                                     {
                                                       "time": "946678807264",
                                                       "Uptime": 8407,
                                                       "UptimeString": "0:2:20:7",
                                                       "BatteryVoltage": 2819,
                                                       "Temperature": 14.875,
                                                       "Door": false,
                                                       "KnockEnable": true,
                                                       "PublishEnable": true
                                                     }
[I 2017-01-13 15:49:20,132] MCThings/000111C2/Door : 1.0
[I 2017-01-13 15:49:20,132] MCThings/000111C2/Rssi : -81
[I 2017-01-13 15:49:23,778] MCThings/000111C2/Door : 0.0
[I 2017-01-13 15:49:23,778] MCThings/000111C2/Rssi : -72
[I 2017-01-13 15:50:01,183] MCThings/000111C2/Door : 0.0
[I 2017-01-13 15:50:01,183] MCThings/000111C2/Rssi : -72
[I 2017-01-13 15:50:08,991] MCThings/000111C2/Uptime : 8467.0
[I 2017-01-13 15:50:08,992] MCThings/000111C2/Rssi   : -72
[I 2017-01-13 15:50:09,898] MCThings/000111C2/Data   : Decoded timestamp: 1999-12-31 17:20:18.144000
                                                       Date/time not updated on module
                                                       {
                                                         "time": "946678818144",
                                                         "Doorknock": true
                                                       }
[I 2017-01-13 15:50:10,019] MCThings/000111C2/Data   : Decoded timestamp: 1999-12-31 17:20:18.259000
                                                       Date/time not updated on module
                                                       {
                                                         "time": "946678818259",
                                                         "Door": true
                                                       }
[I 2017-01-13 15:50:10,144] MCThings/000111C2/Data   : Decoded timestamp: 1999-12-31 17:20:20.850000
                                                       Date/time not updated on module
                                                       {
                                                         "time": "946678820850",
                                                         "Door": false
                                                       }
[I 2017-01-13 15:50:10,327] MCThings/000111C2/Data   : Decoded timestamp: 1999-12-31 17:21:07.311000
                                                       Date/time not updated on module
                                                       {
                                                         "time": "946678867311",
                                                         "Uptime": 8467,
                                                         "UptimeString": "0:2:21:7",
                                                         "BatteryVoltage": 2819,
                                                         "Temperature": 14.9375,
                                                         "Door": false,
                                                         "KnockEnable": true,
                                                         "PublishEnable": true
                                                       }

Note the "[I 2017-01-13 15:49:20,132] MCThings/000111C2/Door : 1.0" this is a beacon reporting the door open, then "[I 2017-01-13 15:49:23,778] MCThings/000111C2/Door : 0.0" (0.0 is closed). So the door opened at 15:49:20,132 and closed 3.5 seconds later. The MQTT doesn't show up until "[I 2017-01-13 15:50:10,019]" ie 50 seconds later, but the internal timestamps show the same 3.5 seconds difference, so they were sent correctly, at the correct time, but all arrive at once.

This is NOT what used to happen before the latest upgrades, and obviously makes a door sensor useless...

Any idea's what is going on? this is messing with my real-time data collection.
« Last Edit: January 13, 2017, 03:04:21 pm by Nick_W »

Social Buttons


Nick_W

  • Full Member
  • ***
  • Posts: 211
    • View Profile
Re: Odd behavior of Gateway after last F/W update
« Reply #1 on: January 13, 2017, 07:45:18 pm »
Update,

I can now confirm that all modules and all data types published by MQTT suffer from random delays of up to several minutes (not just json).

Sometimes they arrive immediately, sometimes delayed by seconds to minutes. Beacons always arrive immediately.

Please take a look at this, I have posed my code to github https://github.com/NickWaterton/mcScript The two projects that show this problem the easiest are DoorSensor (the 110 module version) and BatteryMonitor.

Please let me know I'm not living in a time warp...

Nick_W

  • Full Member
  • ***
  • Posts: 211
    • View Profile
Re: Odd behavior of Gateway after last F/W update
« Reply #2 on: January 14, 2017, 10:49:21 am »
Update from Nick stuck in his time loop.

After monitoring this overnight (to see how frequent it is), I have a synopsis of my log files:
Note I cut out all the good data, this is just showing delays, so it's not as bad as it looks!

The first entry is example typical good data.

Code: [Select]
[I 2017-01-14 10:14:41,196] MCThings/000111C2/Data           : Decoded timestamp: 2017-01-14 10:14:38.556000
                                                               WARNING: Publishing Delay of : 2.640 s
--
[I 2017-01-14 10:14:59,743] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-14 10:14:29.870000
                                                               WARNING: Publishing Delay of : 29.873 s
--
[I 2017-01-14 10:14:59,860] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-14 10:14:59.919000
                                                               Publishing Delay of          : -0.059 s
--
[I 2017-01-14 10:15:40,380] MCThings/000111B4/Data           : Decoded timestamp: 2017-01-14 10:14:38.728000
                                                               WARNING: Publishing Delay of : 61.652 s
--
[I 2017-01-14 10:15:40,621] MCThings/000111B4/Data           : Decoded timestamp: 2017-01-14 10:15:38.777000
                                                               Publishing Delay of          : 1.844 s
--
[I 2017-01-14 10:16:59,822] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-14 10:15:29.968000
                                                               WARNING: Publishing Delay of : 89.854 s
--
[I 2017-01-14 10:17:00,074] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-14 10:15:59.017000
                                                               WARNING: Publishing Delay of : 61.057 s
--
[I 2017-01-14 10:17:00,184] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-14 10:16:30.074000
                                                               WARNING: Publishing Delay of : 30.110 s
--
[I 2017-01-14 10:17:00,318] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-14 10:17:00.123000
                                                               Publishing Delay of          : 0.195 s
--
.
.
. long period of the same good/delayed data
.
.
.
[I 2017-01-14 11:00:33,974] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-14 11:00:04.399000
                                                               WARNING: Publishing Delay of : 29.575 s
--
[I 2017-01-14 11:00:34,099] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-14 11:00:34.448000
                                                               Publishing Delay of          : -0.349 s
--
[I 2017-01-14 11:02:04,186] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-14 11:01:34.546000
                                                               WARNING: Publishing Delay of : 29.640 s
--
[I 2017-01-14 11:02:04,315] MCThings/000111BC/Data           : Decoded timestamp: 2017-01-14 11:02:04.599000
                                                               Publishing Delay of          : -0.284 s
--
[I 2017-01-14 11:02:41,863] MCThings/000111B4/Data           : Decoded timestamp: 2017-01-14 11:01:40.128000
                                                               WARNING: Publishing Delay of : 61.735 s
--
[I 2017-01-14 11:02:41,987] MCThings/000111B4/Data           : Decoded timestamp: 2017-01-14 11:02:40.178000
                                                               Publishing Delay of          : 1.809 s
--
[I 2017-01-14 11:02:42,912] MCThings/000111C2/Data           : Decoded timestamp: 2017-01-14 11:01:40.924000
                                                               WARNING: Publishing Delay of : 61.988 s
--
[I 2017-01-14 11:02:43,046] MCThings/000111C2/Data           : Decoded timestamp: 2017-01-14 11:02:40.973000
                                                               WARNING: Publishing Delay of : 2.073 s

Module BC is a battery monitor, so delayed messages don't really matter, so much. C2 and B4 are door monitors (running the same code), delays here are a problem (but the beacons arrive immediately, so there is a sort of workaround).

It seems to me there are a couple of possibilities (OK more than a couple):
  • This is behavior that was always happening, but messages were being lost, not cached, so in fact this is an improvement, in that the messages do eventually arrive.
  • This is some sort of continuation of my Gateway WiFi issues, that have not been properly explained yet, and messages that can't be sent immediately (via WiFi) are cached for later retry. I really need a second gateway to see if this has the same problem.
  • A fix for the "stops publishing" problem has introduced another problem where messages are now delayed (instead of the module stopping publishing) which is better, but still something of an issue.
  • This is a QoS issue, which I could fix if I understood QoS better. So far QoS seems to make no difference to the delay problem.
  • This is a LP network issue, which is now showing up as I'm sending longer (json) messages, and the retry mechanism is "resend failed messages on the next connection".

What I don't understand is whether the delayed message is cached on the module, or the gateway. If it was the gateway, it could resend immediately (and QoS would help). If it's cached on the module, does this indicate a LP network connectivity issue (in which case why do beacons arrive immediately, and mqtt messages don't?). Is this a new thing I'm seeing since I switched to sending json (with timestamps), because json is much longer to send than beacons/short data - ie just publishing a value works, but sending a long json string fails partway through and is queued for retry the next time the module sends? I'm tending to think this is what is happening (ie option 5 above).

Uptime continues to be fantastic, literally weeks without a reset (well done whoever figured that out).
Battery life on the 120 module is holding good for now at 2.881V after 3 days (I need a few weeks to be able to say for sure this problem is fixed), but so far so good.

I may actually be able to get my 120 module accelerator library finished, and unlimber my 205 device...

Keep up the good work guys, and sorry to be the reporter of "weird stuff".
« Last Edit: January 14, 2017, 11:02:31 am by Nick_W »