"ASCII-fying" DMRId database
Posted: Wed Jan 10, 2024 4:26 pm
Hi!
I've run some experiments trying to find a better way of converting the data in the DMRId.net user database to the limited (ASCII alphanumeric?) set of characters used in the "compressed" OpenGD77 in-radio format. The current "RemoveDiactritics" function isn't great at converting non-latin characters to ASCII (e.g. the greek alphabet), and I thought I would contribute a substitute that does a better job.
The attached program, when run against the user.csv database file, successfully ASCII-fies all but ~500 entries. Those entries are mostly asian scripts that cannot easily be alphabetized, with some instances of incorrect UTF-8 (e.g. double-encoded german characters). Those characters are ignored/removed since they cannot be displayed by the radio anyway.
I hope that this could be included in a future CPS release, to better reflect european languages in the limited available character set.
73 de SA0ASM
I've run some experiments trying to find a better way of converting the data in the DMRId.net user database to the limited (ASCII alphanumeric?) set of characters used in the "compressed" OpenGD77 in-radio format. The current "RemoveDiactritics" function isn't great at converting non-latin characters to ASCII (e.g. the greek alphabet), and I thought I would contribute a substitute that does a better job.
The attached program, when run against the user.csv database file, successfully ASCII-fies all but ~500 entries. Those entries are mostly asian scripts that cannot easily be alphabetized, with some instances of incorrect UTF-8 (e.g. double-encoded german characters). Those characters are ignored/removed since they cannot be displayed by the radio anyway.
I hope that this could be included in a future CPS release, to better reflect european languages in the limited available character set.
73 de SA0ASM