juju machine agent runtime panic

Bug #1903202 reported by Colin Misare on 2020-11-05
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
juju
High
John A Meinel
2.8
High
John A Meinel

Bug Description

We observed a juju machine agent panic:

2020-11-05 19:14:49 WARNING juju.apiserver.common.networkingcommon types.go:226 ignoring address for device {DeviceIndex:0 MACAddress: CIDR: ProviderId: ProviderSubnetId: ProviderNetworkId: ProviderSpaceId: ProviderVLANId: ProviderAddressId: AvailabilityZones:[] VLANTag:0 InterfaceName:lo ParentInterfaceName: InterfaceType: Disabled:false NoAutoStart:false ConfigType: Addresses:[local-cloud:10.48.128.164] ShadowAddresses:[] DNSServers:[] MTU:0 DNSSearchDomains:[] GatewayAddress: Routes:[] IsDefaultGateway:false Origin:}: invalid CIDR address:
2020-11-05 19:14:51 ERROR juju.worker.dependency engine.go:671 "firewaller" manifold worker returned unexpected error: machine 4 not provisioned
2020-11-05 19:14:51 WARNING juju.apiserver.common.networkingcommon types.go:226 ignoring address for device {DeviceIndex:0 MACAddress: CIDR: ProviderId: ProviderSubnetId: ProviderNetworkId: ProviderSpaceId: ProviderVLANId: ProviderAddressId: AvailabilityZones:[] VLANTag:0 InterfaceName:wg0 ParentInterfaceName: InterfaceType: Disabled:false NoAutoStart:false ConfigType: Addresses:[local-cloud:10.48.129.165] ShadowAddresses:[] DNSServers:[] MTU:0 DNSSearchDomains:[] GatewayAddress: Routes:[] IsDefaultGateway:false Origin:}: invalid CIDR address:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xdbd38d]

goroutine 8771166 [running]:
github.com/juju/txn.(*IncrementalPruner).removeTxns.func1(0xc00507b1d0, 0xc0066c8000, 0x3e8, 0x3e8, 0xc0022abb80, 0xc00b408540, 0xc00c34da40, 0xc009e3dd60)
        /workspace/_build/src/github.com/juju/juju/vendor/github.com/juju/txn/incrementalprune.go:722 +0x1ed
created by github.com/juju/txn.(*IncrementalPruner).removeTxns
        /workspace/_build/src/github.com/juju/juju/vendor/github.com/juju/txn/incrementalprune.go:717 +0x161
2020-11-05 19:14:53 INFO juju.cmd supercommand.go:54 running jujud [2.8.3 0 ab69570b38fbc746e54184e4c3274612bcbb8327 gc go1.14.9]
2020-11-05 19:14:53 DEBUG juju.cmd supercommand.go:55 args: []string{"/var/lib/juju/tools/machine-0/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "0", "--debug"}

Let me know if there is any additional information I can provide.

John A Meinel (jameinel) wrote :

https://github.com/juju/txn/pull/57 is a possible fix in the lower library.

Essentially an error at exactly the right time when trying to remove old transactions will bubble up and we end up accessing a nil pointer.

Changed in juju:
status: New → Triaged
importance: Undecided → High
John A Meinel (jameinel) wrote :
Changed in juju:
milestone: none → 2.9-rc3
assignee: nobody → John A Meinel (jameinel)
status: Triaged → In Progress
Colin Misare (cmisare) wrote :

We saw a second instance of this issue on a different machine in the same environment. Is it possible something in this environment is exacerbating the transaction behavior causing these nil pointers to appear more frequently?

2020-11-06 18:51:01 ERROR juju.worker.dependency engine.go:671 "firewaller" manifold worker returned unexpected error: machine 5 not provisioned
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xdbd38d]

goroutine 27078232 [running]:
github.com/juju/txn.(*IncrementalPruner).removeTxns.func1(0xc00cd80c90, 0xc011a7a000, 0x3e8, 0x3e8, 0xc010992780, 0xc01ac30fc0, 0xc0114cd0e0, 0xc00c522200)
        /workspace/_build/src/github.com/juju/juju/vendor/github.com/juju/txn/incrementalprune.go:722 +0x1ed
created by github.com/juju/txn.(*IncrementalPruner).removeTxns
        /workspace/_build/src/github.com/juju/juju/vendor/github.com/juju/txn/incrementalprune.go:717 +0x161
2020-11-06 18:51:13 INFO juju.cmd supercommand.go:54 running jujud [2.8.3 0 ab69570b38fbc746e54184e4c3274612bcbb8327 gc go1.14.9]
2020-11-06 18:51:13 DEBUG juju.cmd supercommand.go:55 args: []string{"/var/lib/juju/tools/machine-2/jujud", "machine", "--data-dir", "/var/lib/juju", "--machine-id", "2", "--debug"}
2020-11-06 18:51:13 DEBUG juju.utils gomaxprocs.go:24 setting GOMAXPROCS to 4
2020-11-06 18:51:13 DEBUG juju.agent agent.go:583 read agent config, format "2.0"
2020-11-06 18:51:13 INFO juju.cmd.jujud agent.go:138 setting logging config to "<root>=WARNING;unit=DEBUG"

It happens if we get an error will issuing a removeAll command. It is
possible that the list of transactions has gotten large enough it is
causing problems for pruning to clean it up, or there is some other sort of
interaction that is causing more problems.

On Fri, Nov 6, 2020 at 3:45 PM Colin Misare <email address hidden>
wrote:

> We saw a second instance of this issue on a different machine in the
> same environment. Is it possible something in this environment is
> exacerbating the transaction behavior causing these nil pointers to
> appear more frequently?
>
>
> 2020-11-06 18:51:01 ERROR juju.worker.dependency engine.go:671
> "firewaller" manifold worker returned unexpected error: machine 5 not
> provisioned
> panic: runtime error: invalid memory address or nil pointer dereference
> [signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0xdbd38d]
>
> goroutine 27078232 [running]:
> github.com/juju/txn.(*IncrementalPruner).removeTxns.func1(0xc00cd80c90,
> 0xc011a7a000, 0x3e8, 0x3e8, 0xc010992780, 0xc01ac30fc0, 0xc0114cd0e0,
> 0xc00c522200)
> /workspace/_build/src/
> github.com/juju/juju/vendor/github.com/juju/txn/incrementalprune.go:722
> +0x1ed
> created by github.com/juju/txn.(*IncrementalPruner).removeTxns
> /workspace/_build/src/
> github.com/juju/juju/vendor/github.com/juju/txn/incrementalprune.go:717
> +0x161
> 2020-11-06 18:51:13 INFO juju.cmd supercommand.go:54 running jujud [2.8.3
> 0 ab69570b38fbc746e54184e4c3274612bcbb8327 gc go1.14.9]
> 2020-11-06 18:51:13 DEBUG juju.cmd supercommand.go:55 args:
> []string{"/var/lib/juju/tools/machine-2/jujud", "machine", "--data-dir",
> "/var/lib/juju", "--machine-id", "2", "--debug"}
> 2020-11-06 18:51:13 DEBUG juju.utils gomaxprocs.go:24 setting GOMAXPROCS
> to 4
> 2020-11-06 18:51:13 DEBUG juju.agent agent.go:583 read agent config,
> format "2.0"
> 2020-11-06 18:51:13 INFO juju.cmd.jujud agent.go:138 setting logging
> config to "<root>=WARNING;unit=DEBUG"
>
> --
> You received this bug notification because you are a bug assignee.
> Matching subscriptions: juju bugs
> https://bugs.launchpad.net/bugs/1903202
>
> Title:
> juju machine agent runtime panic
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/juju/+bug/1903202/+subscriptions
>

John A Meinel (jameinel) on 2020-11-10
Changed in juju:
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers