PBR

Huge delays on: [pbr] In git context, generating filelist from git

Bug #1933311 reported by Sorin Sbarnea
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
PBR
New
Undecided
Unassigned
Python
Won't Fix
Unknown

Bug Description

On MacOS I observed huge delays (10min+) just :
```
[pbr] In git context, generating filelist from git
# 5-10mins delay with top cpu usage
adding license file 'AUTHORS'
```

I tried to reproduce the same behavior on Fedora 34 but I was not able to get the same delays.

I also run it with PYTHONUNBUFFERED=1 in order to be sure that the delay really happens between these lines.

After some extra hacks I was able narrow down the call that takes all this unexpected time to run: `self.filelist.include_pattern("*", prefix=ei_cmd.egg_info)`

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

The shipped below is what causes the perverse delays. It is true that there are some symlinks inside the repository but there is no real reason for this to take minutes to run. A simple run of `ncdu` reported 1.0GB among 91000 items, most of them inside .tox folder.
```python
from distutils.filelist import findall
findall(".")
```

I think that use of findall without any filters is inappropriate here as it does return a huge amount of noise from temp files (.tox, .pytest_cache, mypy)

Revision history for this message
Sorin Sbarnea (ssbarnea) wrote :

Apparently what is causing this issue is presence of recursive symlinks inside the source-tree, this is confusing findall().

I observed that a single symlink can easily double the execution time from 5s to 10s and boost the number of returned entries to from 90k to 160k:

time python -c "from distutils.filelist import findall; print(len(findall('.')))"

Changed in python:
status: Unknown → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.