haveged consumes 100% cpu when used with -w > 4067

Bug #1318678 reported by ben thielsen
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
haveged (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

note: i'm not certain this is the right package to file this against.

upon upgrading from 13.10 to 14.04, i noticed that on a number of computers, haveged was consuming 100% cpu. in short, although the systems claims that the pool size is 4096:

>cat /proc/sys/kernel/random/poolsize
4096

entropy never gets above 4067:

>cat /proc/sys/kernel/random/entropy_avail
4067

when using -w =< 4067, haveged does not consume 100% cpu. with -w > 4067, it continuously consumes 100% cpu.

1] >lsb_release -rd
Description: Ubuntu 14.04 LTS
Release: 14.04

2] >apt-cache policy haveged
haveged:
  Installed: 1.7c-1
  Candidate: 1.7c-1
  Version table:
 *** 1.7c-1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
        100 /var/lib/dpkg/status

3] expected haveged to fill the entropy pool with 4096 bits of entropy

4] haveged is only able to fill the entropy pool with 4067 bits of entropy, and consumes 100% cpu in trying to supply more

here is a sample of haveged output during symptomatic behavior:

>haveged --Foreground --verbose=-1 --run=0 --write=4068
haveged: Address 0=0x7fb285ea2e42
haveged: Address 1=0x7fb285ea29d1
haveged: Loop 1: offset=60538, delta=1137
haveged: Address 2=0x7fb285ea23cd
haveged: Loop 2: offset=58998, delta=1540
haveged: Address 3=0x7fb285ea1dc9
haveged: Loop 3: offset=57458, delta=1540
haveged: Address 4=0x7fb285ea17c5
haveged: Loop 4: offset=55918, delta=1540
haveged: Address 5=0x7fb285ea11c1
haveged: Loop 5: offset=54378, delta=1540
haveged: Address 6=0x7fb285ea0bbd
haveged: Loop 6: offset=52838, delta=1540
haveged: Address 7=0x7fb285ea05b9
haveged: Loop 7: offset=51298, delta=1540
haveged: Address 8=0x7fb285e9ffb5
haveged: Loop 8: offset=49758, delta=1540
haveged: Address 9=0x7fb285e9f9b1
haveged: Loop 9: offset=48218, delta=1540
haveged: Address 10=0x7fb285e9f3ab
haveged: Loop 10: offset=46676, delta=1542
haveged: Address 11=0x7fb285e9eda7
haveged: Loop 11: offset=45136, delta=1540
haveged: Address 12=0x7fb285e9e7a3
haveged: Loop 12: offset=43596, delta=1540
haveged: Address 13=0x7fb285e9e193
haveged: Loop 13: offset=42044, delta=1552
haveged: Address 14=0x7fb285e9db83
haveged: Loop 14: offset=40492, delta=1552
haveged: Address 15=0x7fb285e9d574
haveged: Loop 15: offset=38941, delta=1551
haveged: Address 16=0x7fb285e9cf64
haveged: Loop 16: offset=37389, delta=1552
haveged: Address 17=0x7fb285e9c94c
haveged: Loop 17: offset=35829, delta=1560
haveged: Address 18=0x7fb285e9c334
haveged: Loop 18: offset=34269, delta=1560
haveged: Address 19=0x7fb285e9bd1c
haveged: Loop 19: offset=32709, delta=1560
haveged: Address 20=0x7fb285e9b704
haveged: Loop 20: offset=31149, delta=1560
haveged: Address 21=0x7fb285e9b0ec
haveged: Loop 21: offset=29589, delta=1560
haveged: Address 22=0x7fb285e9aad4
haveged: Loop 22: offset=28029, delta=1560
haveged: Address 23=0x7fb285e9a4bc
haveged: Loop 23: offset=26469, delta=1560
haveged: Address 24=0x7fb285e99ea4
haveged: Loop 24: offset=24909, delta=1560
haveged: Address 25=0x7fb285e9988c
haveged: Loop 25: offset=23349, delta=1560
haveged: Address 26=0x7fb285e99274
haveged: Loop 26: offset=21789, delta=1560
haveged: Address 27=0x7fb285e98c5c
haveged: Loop 27: offset=20229, delta=1560
haveged: Address 28=0x7fb285e98644
haveged: Loop 28: offset=18669, delta=1560
haveged: Address 29=0x7fb285e9802c
haveged: Loop 29: offset=17109, delta=1560
haveged: Address 30=0x7fb285e97a14
haveged: Loop 30: offset=15549, delta=1560
haveged: Address 31=0x7fb285e973fc
haveged: Loop 31: offset=13989, delta=1560
haveged: Address 32=0x7fb285e96de4
haveged: Loop 32: offset=12429, delta=1560
haveged: Address 33=0x7fb285e967d4
haveged: Loop 33: offset=10877, delta=1552
haveged: Address 34=0x7fb285e961c4
haveged: Loop 34: offset=9325, delta=1552
haveged: Address 35=0x7fb285e95bb4
haveged: Loop 35: offset=7773, delta=1552
haveged: Address 36=0x7fb285e955a4
haveged: Loop 36: offset=6221, delta=1552
haveged: Address 37=0x7fb285e94f94
haveged: Loop 37: offset=4669, delta=1552
haveged: Address 38=0x7fb285e94984
haveged: Loop 38: offset=3117, delta=1552
haveged: Address 39=0x7fb285e9436c
haveged: Loop 39: offset=1557, delta=1560
haveged: Address 40=0x7fb285e93d57
haveged: Loop 40: offset=0, delta=1557
haveged starting up
haveged: procedure B: test6a:1/1, test6b:1/1, test7a:1/1, test7b:1/1, test8: 1/1
haveged: Completed tot procedure B after 1171766 bytes
haveged: procedure A: test0:1/1, test1:257/257, test2:257/257, test3:257/257, test4:257/257, test5:2/2
haveged: Completed tot procedure A after 1035716 bytes
haveged: ver: 1.7c; arch: x86; vend: GenuineIntel; opts: (T); collect: 128K
haveged: cpu: (L4 VC); data: 32K (L4 V); inst: 32K (L4 V); idx: 21/40; sz: 32709/60538
haveged: tot tests: BA8: A:1/0 B: 1/0; continuous tests: B: A:0/0 B: 0/0; last entropy estimate 7.99918
haveged: fills: 0, generated: 0
haveged: procedure B: test6a:1/1, test6b:1/1, test7a:1/1, test7b:1/1, test8: 1/1
haveged: Completed continuous procedure B after 1171766 bytes
haveged: 0 fill 1.58984 ms
haveged: 0 fill 11.248 ms
haveged: 0 fill 8.1792 ms
haveged: procedure B: test6a:1/1, test6b:1/1, test7a:1/1, test7b:1/1, test8: 1/1
haveged: Completed continuous procedure B after 1099276 bytes
haveged: 0 fill 2.23584 ms
haveged: 0 fill 11.4819 ms
haveged: 0 fill 8.19312 ms
haveged: procedure B: test6a:1/1, test6b:1/1, test7a:1/1, test7b:1/1, test8: 1/1
haveged: Completed continuous procedure B after 1097496 bytes
haveged: 0 fill 2.22119 ms
haveged: 0 fill 11.2661 ms
haveged: 0 fill 8.32812 ms
haveged: procedure B: test6a:1/1, test6b:1/1, test7a:1/1, test7b:1/1, test8: 1/1
haveged: Completed continuous procedure B after 1097876 bytes
haveged: 0 fill 2.23389 ms
haveged: 0 fill 11.3398 ms
haveged: 0 fill 8.28198 ms
haveged: procedure B: test6a:1/1, test6b:1/1, test7a:1/1, test7b:1/1, test8: 1/1
haveged: Completed continuous procedure B after 1099127 bytes
haveged: 0 fill 2.25903 ms
haveged: 0 fill 11.3379 ms
^Chaveged Stopping due to signal 2

Tags: patch
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in haveged (Ubuntu):
status: New → Confirmed
Revision history for this message
Manuel Kraus (1-ubuntukne-f) wrote :

I have exact the same issue.

Ubuntu Server 14.04.1 LTS
haveged 1.7c-1 (amd64)
kernel 3.13.0-40-generic (64-bit)

As workaround I've set it to 3072, which works fine.

Revision history for this message
Manuel Kraus (1-ubuntukne-f) wrote :

Maybe worth to mention:

After I've set it to 3072 "entropy_avail" now commutes between 3150 and 3300 in very short cycles (15-30 seconds), which it shouldn't be to expect either.

Revision history for this message
Michal Smereczynski (michal-s) wrote :

Same on 14.04 with haveged 1.9.1 and 3.19.0-31-generic #36~14.04.1-Ubuntu kernel. 4067 is a maximum.

Revision history for this message
Richard Collins (rcollins0618) wrote :

This also affects me. It's 2016, still no changes, it seems.
 - Ubuntu 14.04.4 LTS (Server) (32-bit i686)
 - haveged 1.7c-1
 - 4067 entropy_avail, 4096 poolsize
 - 4.2.0-27-generic

Revision history for this message
Richard Collins (rcollins0618) wrote :
Revision history for this message
openand (source-h) wrote :

Hi,

There is a possible race condition with the device file and the select command.

We have had a very sketchy attempt at a fix at ( this is not a
complete fix as much as working with the observed issue ):

https://github.com/Openand-I/haveged/tree/oi

The main changes can be summarized as:

Generic Linux based OS:

- Create a new random device at /dev/entropy/random to mitigate
possible file based locking issues

- Simply restrict write_wakeup_threshold to 4064 in the code itself.

- Change select to epoll. This has the added advantage of blocking
till there is any actual entropy depletion, immediately replenishing
the pool upon every use. It therefore does not loop when no entropy is
required.

- Two epolls, one for reading and one for writing.

- Overflowing the pool with an extra byte. There is some strange
reasoning behind this as it kinda sorta releases the entropy gathered
by the kernel till time.

Android specific:

- Sleep upon device sleep

- Every 4 hours, a simple touch to the threshold parameters file again
kinda sorta feels like releases entropy gathered by kernel till time.
This can possibly be solved in Linux ( which doesn't sleep ) by
setting read_wakeup_threshold to 8.

Works excellent. Verified using strace.

There some very distinct usability improvements on both Android and
Linux, including GUI responsiveness and network connectivity due to
faster random number generation. Will almost certainly help virtual
machines too as they are entropy starved.

Invited author to take a look at this.

There is still the problem of a very slow (u)random device that slows
the entire Linux ecosystem down. It is recommended instead to replace
the (u)random device according to the fortuna ( at least 5x faster )
patch at:

http://jlcooke.ca/random

Thanks.

Revision history for this message
nicoo (nicolas+ubuntu1) wrote :

Hi,

This has been fixed, along with a number of other issues, in the Debian package.
I attached the relevant patch, but you might want to consider tracking the upstream Debian package.

Best,

  nicoo

Revision history for this message
nicoo (nicolas+ubuntu1) wrote :

PS: The solved issues include launchpad bugs #1540978 (and its duplicate #915218).

Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

The attachment "0003-Don-t-set-a-watermark-higher-than-pool-size.patch" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
Revision history for this message
openand (source-h) wrote :

Maybe one can try setting the read and write threshold to the same.

The select high cpu issues didn't seem to kick in for values > 4064 when the read threshold and write threshold were set the same > 4064. eg even 4096!

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.