Invalid UTF-8 data is not always being rejected
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
AirDC++ |
Fix Released
|
Undecided
|
Unassigned | ||
DC++ |
New
|
Undecided
|
Unassigned |
Bug Description
There are various cases where invalid UTF-8 data is being consumed by the core:
1. Text::convert will return the original string in case of errors (Linux only, respective Windows-specific functions will return an empty string in case of errors)
2. When using "utf-8" encoding in NMDC hubs, the original string will always be returned by conversion functions without validation (generally Linux only since "utf-8" can't be selected from DC++'s GUI)
3. UTF-8 validation is not performed for strings parsed from XML (specifically file/directory names in filelists)
This will cause issues especially when the data is processed by external sources/libraries that expect to receive valid UTF-8 data (https:/
Another note: messages that fail UTF-8 validation in ADC hubs are ignored silently. At least Flexhub seems to be having problems with data validation which currently goes unnoticed.
Changed in airdcpp: | |
status: | New → Fix Released |
I've implemented UTF-8 validation for XML parsing in AirDC++: https:/ /github. com/airdcpp/ airgit/ commit/ 82e02ff9f7023a4 aa5a35b61c4cd31 3b9fed59a6
The only issue I've noticed is that the default hublist (http:// dchublist. com/hublist. xml.bz2) won't pass the validation.