#INTRO
After digging up for a while I've found where the issue comes from for both `.html` and `.py` (bug #1857824) files.
#SHORT
The culprit responsible for misidentification resides in `.xml` database which specifies how to match mime-type against input data. It can be found here [2].
#LONG
The `kmimetypefinder.cpp` pulls up [0] `QMimeDatabase db` apis by `db.mimeTypeForFile(...)` which in turns bootstrup `QMimeDatabasePrivate ...` XML database from .xml file.[1]
If we look carefully at the content of the `"text/x-perl"` entry we would see the following:
Did you notice the offset attribute `"0:256"`? Now if we run the following two cases we will see that files whose content contains keywords `use strict` in the range of 1..256 will be identified as `text/x-perl` script and as `text/html` if the `use trict` is located outside of such range otherwise, checkout:
#CONCLUSION
This proves that the bug comes from QTBase database which wrongly identifies `x-perl`'s keywords in JS scripts. The latter have `'use strict'` keyword that specifically should be placed at the top of the script. It seems like that they overlap for both languages. I think appropriate bug should be opened in the QTBase bug registry.
#INTRO
After digging up for a while I've found where the issue comes from for both `.html` and `.py` (bug #1857824) files.
#SHORT
The culprit responsible for misidentification resides in `.xml` database which specifies how to match mime-type against input data. It can be found here [2].
#LONG r.cpp` pulls up [0] `QMimeDatabase db` apis by `db.mimeTypeFor File(.. .)` which in turns bootstrup `QMimeDatabaseP rivate ...` XML database from .xml file.[1]
The `kmimetypefinde
If we look carefully at the content of the `"text/x-perl"` entry we would see the following:
``` x-perl" />
<alias type="text/
<magic priority="50">
...
<match value="use strict" type="string" offset="0:256"/>
...
</magic>
```
Did you notice the offset attribute `"0:256"`? Now if we run the following two cases we will see that files whose content contains keywords `use strict` in the range of 1..256 will be identified as `text/x-perl` script and as `text/html` if the `use trict` is located outside of such range otherwise, checkout:
💲 tee "index.html" <<eol ; echo -e "\n"; kmimetypefinder5 index.html
`printf "_"%.0s {1..256}`use strict
eol
application/x-perl # <- OUTPUT IS WRONG ⚠️
💲 tee "index.html" <<eol ; echo -e "\n"; kmimetypefinder5 index.html
`printf "_"%.0s {1..257}`use strict
eol
text/html # <- OUTPUT IS CORRECT!!! ✅ - Surprising, huh? 😏
#CONCLUSION
This proves that the bug comes from QTBase database which wrongly identifies `x-perl`'s keywords in JS scripts. The latter have `'use strict'` keyword that specifically should be placed at the top of the script. It seems like that they overlap for both languages. I think appropriate bug should be opened in the QTBase bug registry.
[0]: https:/ /github. com/KDE/ kde-cli- tools/blob/ master/ kmimetypefinder /kmimetypefinde r.cpp /github. com/qt/ qtbase/ blob/03dfd4199d eb4a0f5123fb1ee ad42f7e1f85e9e3 /src/corelib/ mimetypes/ qmimedatabase. cpp#L102
[1]: https:/
[2]: https:/ /github. com/qt/ qtbase/ tree/03dfd4199d eb4a0f5123fb1ee ad42f7e1f85e9e3 /src/corelib/ mimetypes/ mime/packages