Ubuntu
robot-detection package

[MIR] robot-detection as dependency of mailman3

Bug #1820225 reported by Christian Ehrhardt  on 2019-03-15

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	robot-detection (Ubuntu)	In Progress	Undecided	Unassigned

Bug Description

[Availability]
The package is already universe for quite a while and build/works fine so far.
It is for example already used for https://lists.canonical.com/mailman3/postorius/lists/
OTOH it is a library that can/could be used for much more than just the mailman3 stack.

It builds on amd64 only (arch:all)

This package builds python2 and python3 binaries, but the transition to mailman3 will only pull in the python3 binaries.

[Rationale]
This is part of the MIR activity for all dependencies of mailman3
The "main" MIR of it is at bug 1775427:

Mailman (2) has only python2 support, but we strive for python3,
therefore Mailman3 which has python3 support should be promoted to main.

[Security]

No known CVEs found.

The purpose is to detect robot usage (just on user agent, no heuristics fortunately).
That still means remotely crafted UA which means a higher level check on security might be needed.
OTOH the code is very trivial evtually is it s strcmp against a known list and that is it.
So I assume the checks should be really fast and simple.

[Quality assurance]

As part of the mailman3 stacks as of now (Disco) this installs fine and works fine.
On itself it is useful to (many) other dependencies and does not need a post install configuration on its own.

The package does not ask debconf questions.

No bugs reported against Ubuntu, Debian or Upstream.

There was no activity upstream or in Debian since its initial packaging.
Yet given its trivial funtion that might be ok still.

No exotic HW involved.

The package utilizes build time self tests.

d/watch is set up and ok.

No Lintian warning except no GPG checks on d/watch and minor old VCS/Compat - nothing severe.

The package does not rely on demoted or obsolete packages.
py2 packages in this src, but as mentioned we won't pull them into main.
No new gt2k dependencies

[UI standards]

This is a low level library without (a lot) of user visible strings - no translations (needed).
No End-user applications that needs a standard conformant desktop file.

[Dependencies]

Some dependencies are not in main, but we drive MIR for all related packages
that are not in main at the same time.
Please check the list of bugs from the main Mailman3 MIR in bug 1775427 to get an overview.

[Standards compliance]
The package meets the FHS and Debian Policy standards.
The packaging itself is very straight forward and uses dh_* as much as possible - the d/rules fits on one screen.

[Maintenance]

The Server team will subscribe for the package for maintenance, but in
general it seems low on updates and currently is a sync from Debian.

[Background]
The package description explains the general purpose and context of the package well.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-03-20:

[Duplication]
No duplication of that functionality in the Archive in general or main in particular.

[Embedded sources and static linking]
This package does not contain embedded library sources.
This package doe not statically link to libraries.
No Go package

[Security]
I can confirm that there seems to be no CVE/Security history for this package.
It Does not:
- run a daemon as root
- uses old webkit
- uses lib*v8 directly
- open a port
- integrates arbitrary javascript into the desktop
- deals with system authentication
- uses centralized online accounts

But it does:
- processes arbitrary web content
- parse data formats

This is essentially just a strcmp on the User Agent, but since it is craftable content lets add a security review to be sure (maybe the smallest you have ever had).

[Common blockers]
- builds fine at the moment
- server Team committed to subscribe once this gets promoted (enough for now)
- code is not user visible, no translation needed
- dh_python is used
- package produces python2 bits, but they are not pulled into main by mailman3
- utilizes build time self tests

[Packaging red flags]
- no current ubuntu Delta to evaluate
- no library with classic symbol tracking
- watch file is present
- Lintian warnings are present bug ok
- debian/rules is rather clean
- no usage of Built-Using
- no golang package that would make things harder

[Upstream red flags]
- no suspicious errors during build
- there is an intresting warning thou "UserWarning: Unknown distribution option: 'summary'". That is a feature mismatch of setup.py/distutils/... but newer distutils have that supported and it has no major effect
- it is pure python, so no incautious use of malloc/sprintf
- no use of sudo, gksu
- no use of pkexec
- no use of LD_LIBRARY_PATH
- no important open bugs
- no Dependency on webkit, qtwebkit, libgoa-*
- no embedded copies in upstream either

[Summary]
Ack from the MIR-Teams POV, but as outlined above a security review is recommended.
Assigning the security Team.

[Duplication]
No duplication of that functionality in the Archive in general or main in particular.

[Embedded sources and static linking]
This package does not contain embedded library sources.
This package doe not statically link to libraries.
No Go package

But it does:
- processes arbitrary web content
- parse data formats

This is essentially just a strcmp on the User Agent, but since it is craftable content lets add a security review to be sure (maybe the smallest you have ever had).

[Upstream red flags]
- no suspicious errors during build
  - there is an intresting warning thou "UserWarning: Unknown distribution option: 'summary'". That is a feature mismatch of setup.py/distutils/... but newer distutils have that supported and it has no major effect
- it is pure python, so no incautious use of malloc/sprintf
- no use of sudo, gksu
- no use of pkexec
- no use of LD_LIBRARY_PATH
- no important open bugs
- no Dependency on webkit, qtwebkit, libgoa-*
- no embedded copies in upstream either

[Summary]
Ack from the MIR-Teams POV, but as outlined above a security review is recommended.
Assigning the security Team.

Changed in robot-detection (Ubuntu):
assignee:	nobody → Ubuntu Security Team (ubuntu-security)

Revision history for this message

Mike Salvatore (mikesalvatore) wrote on 2019-04-22:

I reviewed robot-detection version 0.4.0-1 as checked into disco. This
should not be considered a full security audit, but rather a quick gauge
of maintainability.

- There are no prior CVEs against robot-detection
- Build depends:
           debhelper (>= 9),
           dh-python,
           python-all,
           python-setuptools,
           python-six,
           python3-all,
           python3-setuptools,
           python3-s

- does not daemonize
- no initscripts
- no dbus services
- no setuid files
- no sudo fragments
- no udev rules
- does not fork
- Test suite performs some simple "feel-good" tests but does not test
for error cases. These tests are run during build.
- no cronjobs
- no logging (not applicable)
- This project has had no activity in the past 2.5 years.
- does not use WebKit
- does not use PolicyKit
- does not use Javascript
- no memory management concerns

The meat of this package is a single 10 line function that compares a
string to a predefined set of user agent strings (stored as regexes).
While this is very simple, I have two main concerns about this package:
    1) The mechanism for updating the set of known user agent strings is
       to modify the source code. I would like to see a less error-prone
       mechanism for updating the user agent definitions, such as a
       configuration file. This could require a significant change in
       the way this library is used. This concern is more about quality
       and supportability than security.
    2) The is_robot() function can raise a TypeError or ValueError, but
       provides no documentation that informs the user of this behavior.
       The user would have to audit the code themselves to become aware
       of the possible exceptions. If a user is unaware that exceptions
       may be raised, the unchecked use of is_robot() may lead to a
       crashed application. A pull request has been submitted upstream
       to update the README.md.
       https://github.com/rory/robot-detection/pull/1

As used in hyperkitty, I have concerns that is_robot() could potentially
raise a TypeError, causing hyperkitty to crash.
https://sources.debian.org/src/hyperkitty/1.2.2-1/hyperkitty/views/thread.py/?hl=154#L154
This risk may be mitigated by 2 factors: Based on a cursory audit of
django, I could not find any condition where the user agent would be
returned as anything other than a string. Secondly, it's possible that
the django framework would catch any exceptions raised by hyperkitty's
thread_index() function.

As I only briefly investigated these mitigations, a further
investigation is recommended during the MIR for hyperkitty. Caveats
against either of these mitigations could potentially make hyperkitty's
use of is_robot() result in a DoS. Regardless of these mitigations,
hyperkitty's reliance on django to catch or avoid exceptions in
robot_detection is fragile and ill-advised.

Since I did not find any security concerns in robot-detection itself,
ACK from the security team for promoting to main.

I reviewed robot-detection version 0.4.0-1 as checked into disco. This
should not be considered a full security audit, but rather a quick gauge
of maintainability.

- does not daemonize
- no initscripts
- no dbus services
- no setuid files
- no sudo fragments
- no udev rules
- does not fork
- Test suite performs some simple "feel-good" tests but does not test
  for error cases. These tests are run during build.
- no cronjobs
- no logging (not applicable)
- This project has had no activity in the past 2.5 years.
- does not use WebKit
- does not use PolicyKit
- does not use Javascript
- no memory management concerns

The meat of this package is a single 10 line function that compares a
string to a predefined set of user agent strings (stored as regexes).
While this is very simple, I have two main concerns about this package:
    1) The mechanism for updating the set of known user agent strings is
       to modify the source code. I would like to see a less error-prone
       mechanism for updating the user agent definitions, such as a
       configuration file.  This could require a significant change in
       the way this library is used. This concern is more about quality
       and supportability than security.
    2) The is_robot() function can raise a TypeError or ValueError, but
       provides no documentation that informs the user of this behavior.
       The user would have to audit the code themselves to become aware
       of the possible exceptions. If a user is unaware that exceptions
       may be raised, the unchecked use of is_robot() may lead to a
       crashed application. A pull request has been submitted upstream
       to update the README.md.
       https://github.com/rory/robot-detection/pull/1

Since I did not find any security concerns in robot-detection itself,
ACK from the security team for promoting to main.

Revision history for this message

Mike Salvatore (mikesalvatore) wrote on 2019-04-23:

I confirmed with a django developer that django does, indeed, catch exceptions in the views.
https://github.com/django/django/blob/master/django/core/handlers/exception.py#L18

Hyperkitty should be safe from any crash, however, it may still be wise to catch any TypeErrors raised by is_robot().

Alex Murray (alexmurray) on 2019-05-07

Changed in robot-detection (Ubuntu):
assignee:	Ubuntu Security Team (ubuntu-security) → nobody

Christian Ehrhardt  (paelzer) on 2019-05-07

Changed in robot-detection (Ubuntu):
status:	New → In Progress

Revision history for this message

Bryce Harrington (bryce) wrote on 2019-05-13:

0001-Add-python-docstring-documentation-for-the-is_robot-.patch Edit (1.1 KiB, text/plain)

Regarding the point made about the lack of documentation, would it make sense to document the is_robot() routine itself? Maybe something along the lines of the attached?

Revision history for this message

Bryce Harrington (bryce) wrote on 2019-05-13:

0001-Treat-all-exceptions-from-is_robot-as-a-robot.patch Edit (1.6 KiB, text/plain)

"Based on a cursory audit of django, I could not find any condition where the user agent would be
returned as anything other than a string."

I confirm this assessment - in examining the code there does not appear to be any means for user input to trigger the exceptions in is_robot(). The only situation I can envision is if some day the code changes, either by adding a new assertion/exception in is_robot(), or if hyperkitty tweaks the useragent input parameter before the call. It would be very surprising if either of those things ever came to pass.

However, for sake of defensive programming it probably would hurt little to paper over this case via a trivial patch such as attached.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-05-14:

Yeah that seems lake a good safety net, you might submit that to upstream hyperkitty I'd think.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2019-07-03:

After evaluating dependencies, required further changes and mostly maintainability for security and packaging it was decided there are too many concerns - not about any single package in particular, but the overall Mailman3 stack - about the ability to maintain and monitor it as well as we need it for support in main.

We have closed the primary LP bug already, the MIRs that are already approved - like this one - will stay that way, but we will make no seed change to pull things in for now. Yet if other needs come up for those they have a prepared MIR already.
Other bugs which are not yet completed in terms of review will be closed as Won't Fix.

Even thou it ended being aborted, I think that is a valid outcome of the MIR evaluations. Never the less I want to thank everybody involved for all the work spent in what was nearly a year working through these MIRs.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Patches

Add patch

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubunturobot-detection package

[MIR] robot-detection as dependency of mailman3

Bug Description

Other bug subscribers

Patches

Remote bug watches

Ubuntu
robot-detection package