Update 22nd Jan

VK3KYY
Posts: 7481
Joined: Sat Nov 16, 2019 3:25 am
Location: Melbourne, Australia

Re: Update 22nd Jan

Post by VK3KYY » Sat Jan 25, 2020 8:33 am

One thing that would could change to see if it makes a difference is to change the Watchdog timer to allow a longer time of processing before the radio is rebooted.

The current timeout value is

config.timeoutValue = 0x3ffU;

But I don't show what the units are for this.

User avatar
F1RMB
Posts: 2518
Joined: Sat Nov 16, 2019 5:42 am
Location: Grenoble, France

Re: Update 22nd Jan

Post by F1RMB » Sat Jan 25, 2020 9:19 am

VK3KYY wrote:
Sat Jan 25, 2020 8:33 am
One thing that would could change to see if it makes a difference is to change the Watchdog timer to allow a longer time of processing before the radio is rebooted.

The current timeout value is

config.timeoutValue = 0x3ffU;

But I don't show what the units are for this.
According to the datasheet (and if I'm not wrong), it's currently using the LPO, running at 1kHz.

VK3KYY
Posts: 7481
Joined: Sat Nov 16, 2019 3:25 am
Location: Melbourne, Australia

Re: Update 22nd Jan

Post by VK3KYY » Sat Jan 25, 2020 9:32 am

OK.

So that would just over 1 second.

ummm. If the radio has locked up for that long, I think its got a big problem ;-)

So, I don't think adjusting the watchdog is solution to this

F6GVE
Posts: 84
Joined: Sat Nov 16, 2019 8:52 am

Re: Update 22nd Jan

Post by F6GVE » Sat Jan 25, 2020 10:05 am

F6GVE wrote:
Fri Jan 24, 2020 5:20 pm
Some hours ago, it crashed repetitively, it doesn't any longer...
I tried commercial repeaters, amateur repeaters, when the frequency is free of DMR, or not, and I could not make it crash again.
The only thing I can say is that the scan speed is bit higher if the screen indicates TG99 instead of TG20883...
(I used 2 openGD77 to compare, to monitor the frequency, etc.)
I'll try again with 100% charged battery ...
Sorry the CCscan makes not crash the firmware any longer. May be there were a special condition at a moment.
The only difference that I know that I have done is add a channel in my codeplug

VK3KYY
Posts: 7481
Joined: Sat Nov 16, 2019 3:25 am
Location: Melbourne, Australia

Re: Update 22nd Jan

Post by VK3KYY » Sat Jan 25, 2020 10:15 am

OK.

Sometimes the crashes can be random or have many factors which are difficult to re-create.

G4EML
Posts: 919
Joined: Sat Nov 16, 2019 10:01 am

Re: Update 22nd Jan

Post by G4EML » Sat Jan 25, 2020 11:12 am

From my observation of the radio crashing it first appears to freeze (for probably about a second) then resets. So it looks like the watchdog is doing it job.
The problem is more likely in the task scheduling code. I have not looked closely at that. Was that something written by Kai or is it part of the NXP code? Do we know how the task time allocation is done and what happens if a task overruns?

VK3KYY
Posts: 7481
Joined: Sat Nov 16, 2019 3:25 am
Location: Melbourne, Australia

Re: Update 22nd Jan

Post by VK3KYY » Sat Jan 25, 2020 9:00 pm

Hi Colin

The reboot after 1 second is because the Watchdog timer has not been fed.
The timeout on the watchdog seems to be 1023mS ( I have no idea why such a strange number is used)

The firmware uses FreeRTOS, and things in at lest 4 separate tasks.

I have not looked at the RTOS in detail, but it appears to be pre-emptive, and can cause task switching in the middle of any block of code ( except ISRs) , unless the code is wrapped in the Critical entry and exit commands.


In this case, I think the code must have hung somehow, possibly inside a Critical block. Because if the code had hung in a non critical block, the task scheduler should still be running the watchdog task to feed the watchdog.

The problem is possibly that there is bus contention on the I2C bus, and although I put in some code to attempt to prevent simultaneous access to the bus.
I don’t think all the code that uses the I2C devices, checks the return code from the function, to confirm if the function succeeded.

User avatar
kd2lh
Posts: 312
Joined: Mon Dec 02, 2019 2:44 pm

Node lockup Re: Update 22nd Jan

Post by kd2lh » Sat Jan 25, 2020 10:46 pm

My GD77 just dropped out of hotspot mode. Firmware is built 1/23/2020 [ 3bab51e ]

It had been operating for several hours, and PiStar shows nothing abnormal (including the last conversation in the live log).

When I checked the GD77 display just now, it had defaulted back to normal mode.

Restarting MMDVMHOST (by using PiSTar update) does NOT return the radio to Hotspot mode.

Radio Info Trx shows green "Listening" and Fw: OpenGD77:v0.0.82

DMR and DMR Net now are RED.

I'll power cycle the radio next. It does not switch back to Hotspot mode automatically. Logs show:
Cannot open device - /dev/ttyACM0

Now restarting MMDVMHost on PiStar.

Same problem. Perhaps USB is blocked at the PiStar Raspberry Pi end?

Rebooting PiStar.

This resolved the problem without having to power cycle the GD77 again.

This node had been running for at least 48 hours at the time things locked up.

User avatar
F1RMB
Posts: 2518
Joined: Sat Nov 16, 2019 5:42 am
Location: Grenoble, France

Re: Update 22nd Jan

Post by F1RMB » Sat Jan 25, 2020 10:54 pm

Hi Marc,

Did you touch the USB A connector, or hit the table ?
I found the USB connector really loosy, I have experienced a lot of communication problems when the GD is connected to my laptop, and I'm typing on the keyboard.

In MMDVM mode, the hotspot will go back to radio mode if MMDVM stops to communicate, after 10 seconds. Did you saw the GD rebooting ? If it didn't and the /dev/ttyACM_ changed, it's typically a USB connection problem.


Cheers.

User avatar
kd2lh
Posts: 312
Joined: Mon Dec 02, 2019 2:44 pm

Re: Update 22nd Jan

Post by kd2lh » Sun Jan 26, 2020 12:04 am

The connectors were solidly connected, and the room was isolated with nobody in there until I came in later this afternoon and noticed the GD77 display was back on 146.52 analog (what the radio defaults to).

I did not witness the GD77 reboot, only the aftermath.

Post Reply