Bug #388873 “init: failure to respawn finishes the start event, ...” : Bugs : upstart

Revision history for this message

In Red Hat Bugzilla #506679, Slawomir (slawomir-redhat-bugs) wrote on 2009-06-18:

#35

Download full text (4.3 KiB)

Description of problem:
After changing date to previous day and using stop/start commands to restart some upstart jobs I get a segfault in init task. Below is the backtrace from gdb. I can reproduce this pretty much 100% of the time and the backtrace looks always the same (only some pointer values are different).

Version-Release number of selected component (if applicable):

upstart-0.3.9-22.fc9.i386

How reproducible:

100%

Steps to Reproduce:
1. Change date to previous day
2. stop <upstart jobs>
3. emit event which will trigger starting of <upstart jobs>

Actual results:
Message from syslogd@sonostation-usb26-eth at Mar 27 10:52:22 ...
kernel:init[1]: segfault at c1 ip 08057d32 sp bf83b980 error 4 in init[8048000+20000]

Expected results:
No segfault, tasks started.

Additional info:

(gdb) bt full
#0 0x00a9b416 in __kernel_vsyscall ()
No symbol table info available.
#1 0x00c318ff in sigprocmask () from /lib/libc.so.6
No symbol table info available.
#2 0x0804a4d5 in crash_handler (signum=11) at main.c:400
act = {__sigaction_handler = {sa_handler = 0, sa_sigaction = 0}, sa_mask = {__val = {0 <repeats 32 times>}}, sa_flags = 0, sa_restorer = 0x4}
limit = {rlim_cur = 4294967295, rlim_max = 4294967295}
mask = {__val = {2147482623, 4294967294, 4294967295 <repeats 30 times>}}
err = <value optimized out>
loglevel = <value optimized out>
mask = {__val = {0, 0, 11121686, 115, 582, 3217000600, 123, 0, 0, 0, 3087986688, 15, 3087987304, 128, 1211428, 1748533395, 54641668, 3, 12627208, 13109375, 142491099, 142490795, 0,
    0, 1, 1797, 3087987408, 3087986688, 134515571, 12663656, 13521706, 0}}
oldmask = {__val = {14, 4294967295, 0, 12611584, 134618112, 3217001336, 3217001336, 4, 13455331, 2, 134598413, 4, 134615746, 1, 11121664, 17, 51, 3217000744, 11121664, 17, 51,
    3228041216, 123, 123, 14073844, 8, 3217000744, 3217000600, 2, 0, 3217000780, 0}}
__FUNCTION__ = "crash_handler"
#3 <signal handler called>
No symbol table info available.
#4 nih_str_array_append (array=0xbfbf8a48, parent=0x0, len=0xbfbf8a44, args=0x41) at string.c:534
c_len = 5
o_len = <value optimized out>
arg = (char * const *) 0x5
__FUNCTION__ = "nih_str_array_append"
#5 0x0804bc61 in job_run_process (job=0x87e7f30, process=PROCESS_MAIN) at job.c:1368
statbuf = {st_dev = 59308975770263300, __pad1 = 61930, st_ino = 134616700, st_mode = 11, st_nlink = 0, st_uid = 13894081, st_gid = 142479368, st_rdev = 21617315872, __pad2 = 35248,
  st_size = -1077966368, st_blksize = 0, st_blocks = 13808947, st_atim = {tv_sec = 13890026, tv_nsec = 134616700}, st_mtim = {tv_sec = 11, tv_nsec = 14073844}, st_ctim = {
    tv_sec = 142508104, tv_nsec = 142507824}, __unused4 = 3217001048, __unused5 = 12760163}
proc = (JobProcess *) 0x87e8560
argv = (char **) 0x887cea8
script = <value optimized out>
argc = 5
error = <value optimized out>
fds = {142479392, 134616700}
__FUNCTION__ = "job_run_process"
#6 0x0804c53c in job_change_state (job=0x87e7f30, state=JOB_SPAWNED) at job.c:856
old_state = <value optimized out>
__FUNCTION__ = "job_change_state"
#7 0x0804cff2 in job_child_reaper (data=0x0, pid=6764, killed=0, status=0) at job.c:1729
job = <value optimized out>
process...

Description of problem:
After changing date to previous day and using stop/start commands to restart some upstart jobs I get a segfault in init task. Below is the backtrace from gdb. I can reproduce this pretty much 100% of the time and the backtrace looks always the same (only some pointer values are different).

Version-Release number of selected component (if applicable):

upstart-0.3.9-22.fc9.i386

How reproducible:

100%

Steps to Reproduce:
1. Change date to previous day
2. stop <upstart jobs>
3. emit event which will trigger starting of <upstart jobs>
  
Actual results:
Message from syslogd@sonostation-usb26-eth at Mar 27 10:52:22 ...
 kernel:init[1]: segfault at c1 ip 08057d32 sp bf83b980 error 4 in init[8048000+20000]

Expected results:
No segfault, tasks started.

Additional info:

(gdb) bt full
#0  0x00a9b416 in __kernel_vsyscall ()
No symbol table info available.
#1  0x00c318ff in sigprocmask () from /lib/libc.so.6
No symbol table info available.
#2  0x0804a4d5 in crash_handler (signum=11) at main.c:400
	act = {__sigaction_handler = {sa_handler = 0, sa_sigaction = 0}, sa_mask = {__val = {0 <repeats 32 times>}}, sa_flags = 0, sa_restorer = 0x4}
	limit = {rlim_cur = 4294967295, rlim_max = 4294967295}
	mask = {__val = {2147482623, 4294967294, 4294967295 <repeats 30 times>}}
	err = <value optimized out>
	loglevel = <value optimized out>
	mask = {__val = {0, 0, 11121686, 115, 582, 3217000600, 123, 0, 0, 0, 3087986688, 15, 3087987304, 128, 1211428, 1748533395, 54641668, 3, 12627208, 13109375, 142491099, 142490795, 0, 
    0, 1, 1797, 3087987408, 3087986688, 134515571, 12663656, 13521706, 0}}
	oldmask = {__val = {14, 4294967295, 0, 12611584, 134618112, 3217001336, 3217001336, 4, 13455331, 2, 134598413, 4, 134615746, 1, 11121664, 17, 51, 3217000744, 11121664, 17, 51, 
    3228041216, 123, 123, 14073844, 8, 3217000744, 3217000600, 2, 0, 3217000780, 0}}
	__FUNCTION__ = "crash_handler"
#3  <signal handler called>
No symbol table info available.
#4  nih_str_array_append (array=0xbfbf8a48, parent=0x0, len=0xbfbf8a44, args=0x41) at string.c:534
	c_len = 5
	o_len = <value optimized out>
	arg = (char * const *) 0x5
	__FUNCTION__ = "nih_str_array_append"
#5  0x0804bc61 in job_run_process (job=0x87e7f30, process=PROCESS_MAIN) at job.c:1368
	statbuf = {st_dev = 59308975770263300, __pad1 = 61930, st_ino = 134616700, st_mode = 11, st_nlink = 0, st_uid = 13894081, st_gid = 142479368, st_rdev = 21617315872, __pad2 = 35248, 
  st_size = -1077966368, st_blksize = 0, st_blocks = 13808947, st_atim = {tv_sec = 13890026, tv_nsec = 134616700}, st_mtim = {tv_sec = 11, tv_nsec = 14073844}, st_ctim = {
    tv_sec = 142508104, tv_nsec = 142507824}, __unused4 = 3217001048, __unused5 = 12760163}
	proc = (JobProcess *) 0x87e8560
	argv = (char **) 0x887cea8
	script = <value optimized out>
	argc = 5
	error = <value optimized out>
	fds = {142479392, 134616700}
	__FUNCTION__ = "job_run_process"
#6  0x0804c53c in job_change_state (job=0x87e7f30, state=JOB_SPAWNED) at job.c:856
	old_state = <value optimized out>
	__FUNCTION__ = "job_change_state"
#7  0x0804cff2 in job_child_reaper (data=0x0, pid=6764, killed=0, status=0) at job.c:1729
	job = <value optimized out>
	process = PROCESS_PRE_START
	failed = 0
	stop = 0
	state = 1
	__FUNCTION__ = "job_child_reaper"
#8  0x0805959e in nih_child_poll () at child.c:158
	iter = (NihList *) 0x87e1600
	_iter = (NihList *) 0x87e15d0
	pid = 6764
	killed = 0
	status = 0
	info = {si_signo = 17, si_errno = 0, si_code = 1, _sifields = {_pad = {6764, 0 <repeats 28 times>}, _kill = {si_pid = 6764, si_uid = 0}, _timer = {si_tid = 6764, si_overrun = 0, 
      si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _rt = {si_pid = 6764, si_uid = 0, si_sigval = {sival_int = 0, sival_ptr = 0x0}}, _sigchld = {si_pid = 6764, si_uid = 0, si_status = 0, 
      si_utime = 0, si_stime = 0}, _sigfault = {si_addr = 0x1a6c}, _sigpoll = {si_band = 6764, si_fd = 0}}}
#9  0x0805d09d in nih_main_loop () at main.c:606
	timeout = {tv_sec = 4, tv_usec = 992000}
---Type <return> to continue, or q <return> to quit---
	readfds = {fds_bits = {8, 0 <repeats 31 times>}}
	writefds = {fds_bits = {32, 0 <repeats 31 times>}}
	buf = ""
	ret = <value optimized out>
	next_timer = (NihTimer *) 0x0
	exceptfds = {fds_bits = {0 <repeats 32 times>}}
	nfds = 7
#10 0x08049e42 in main (argc=5, argv=0x887cea8) at main.c:302
	args = <value optimized out>
	ret = 5

Revision history for this message

In Red Hat Bugzilla #506679, Slawomir (slawomir-redhat-bugs) wrote on 2009-06-18:

#36

I tried upstart version from Fedora 11 and the problem is still present.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-18:

#1

Could you confirm whether it is initctl or init segfaulting here? This looks a lot like the init daemon segfaults.

Could you supply the exact sequence of commands that you run, including any appropriate upstart jobs

If you have a core file, could you gzip that and attach it to the bug

Thanks

Changed in upstart:
importance:	Undecided → High
status:	New → Incomplete
summary:	- initctl segfault after changing date + init: segfault after changing date

Revision history for this message

In Red Hat Bugzilla #506679, Casey (casey-redhat-bugs) wrote on 2009-06-18:

#37

I think this may be fixed upstream. Checking now.

Bug Watch Updater (bug-watch-updater) on 2009-06-18

Changed in upstart (Fedora):
status:	Unknown → Confirmed

Revision history for this message

Slawomir Czarko (lauchpad-net-sklep) wrote on 2009-06-18: Re: init: segfault after changing date

#2

upstart-segfault.tar.bz2 Edit (144.4 KiB, application/octet-stream)

It is init segfaulting. In the distribution I use (Fedora 9) init is part of upstart RPM.

Unfortunately I can't provide right now the exact sequence of commands plus the upstart jobs since that would involve quite a big amount of my employers code.

The general idea is that I've got about 20 tasks which are started by upstart when the main task (watchdog) starts (I'm using "start on" stanza). While the tasks are running I change the date to previous day using "date" command. Then I stop all these tasks (including watchdog) and when they all have stopped I start the watchdog task again. At this point init daemon segfaults when starting one of the 20 tasks (it happens randomly with a different task each time). The same problem happens (but a bit less often, around 50% of the time) when I change the date after all the tasks were stopped.

Is there anything else I can provide to help with fixing this bug?

I'm attaching three core files from different crashes.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-18:

#3

Which version of Upstart are those core files against? I've tried opening them against the binaries from fc9 (upstart-0.3.9-22-fc9.i386.rpm) but haven't had any luck

It _looks_ like the crash is due to bad data in the job->cause event information

#4 nih_str_array_append (array=0xbfbf8a48, parent=0x0, len=0xbfbf8a44, args=0x41) at string.c:534
c_len = 5
o_len = <value optimized out>
arg = (char * const *) 0x5
__FUNCTION__ = "nih_str_array_append"

I'm not sure why args would be 0x41 or why arg would 0x5 there - that looks wrong to me. I need to examine the data to figure it out.

Unfortunately I can't seem to replicate the issue here, I'm not convinced it is related to the date change though

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-18:

#4

This could be simply a "start 20 tasks with the same event" bug ;)

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-18:

#5

(Though a test here with 50 is fine)

Revision history for this message

Slawomir Czarko (lauchpad-net-sklep) wrote on 2009-06-18:

#6

The core files are again upstart-0.3.9-22.fc9.i386

I opened them like this:

gdb /sbin/init core.6764

I had to install upstart-debuginfo-0.3.9-22.fc9.i386 first to see anything useful.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-18:

#7

Right - I'm obviously trying to read them on Ubuntu with the RPM and debug info unpacked alongside - not getting anywhere with that

I may have to try and install fc9 in a VM

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-18:

#8

Aha, found a gdb incantation that worked:

wing-commander /tmp% gdb ./upstart/usr/lib/debug/sbin/init.debug ./upstart-segfault/core.10726

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-18:

#9

The job's cause (event that caused it to start/stop) is invalid, this causes the segfault

(gdb) p job->cause
$26 = (EventEmission *) 0x9a0eb48

(gdb) p *job->cause
$17 = {event = {entry = {prev = 0x9a0eb10, next = 0x9a0dd18},
    name = 0x8 <Address 0x8 out of bounds>, args = 0x9a0eb00,
    env = 0x9a0eb58}, id = 161540952, progress = 134517312, jobs = -1,
  failed = 1918989427}

(gdb) p *job
$18 = {entry = {prev = 0x9a0ed50, next = 0x9a1a508}, id = 22,
  name = 0x9a18848 "ezono-cyclades-t_ltl", description = 0x0, author = 0x0,
  version = 0x0, replacement = 0x0, replacement_for = 0x0, instance_of = 0x0,
  goal = JOB_START, state = JOB_SPAWNED, cause = 0x9a0eb48, blocked = 0x0,
  failed = 0, failed_process = 4294967295, exit_status = 0, start_events = {
    prev = 0x9a18f10, next = 0x9a18f10}, stop_events = {prev = 0x9a19208,
    next = 0x9a18fe8}, emits = {prev = 0x9a18e14, next = 0x9a18e14},
  process = 0x9a18ed8, normalexit = 0x9a192d8, normalexit_len = 7,
  kill_timeout = 30, kill_timer = 0x0, instance = 0, service = 1, respawn = 1,
  respawn_limit = 2, respawn_interval = 30, respawn_count = 2,
  respawn_time = 1239344371, daemon = 0, pid_file = 0x0, pid_binary = 0x0,
  pid_timeout = 10, pid_timer = 0x0, console = CONSOLE_OUTPUT, env = 0x0,
  umask = 18, nice = 0, limits = {0x0 <repeats 15 times>}, chroot = 0x0,
  chdir = 0x9a19050 "/tmp"}

There are two events in the list:

(gdb) p *((Event *)events->next)
$24 = {entry = {prev = 0x9a0e1f0, next = 0x9a0ec98},
name = 0x9a0d938 "stopped", args = 0x9a0e400, env = 0x0}
(gdb) p *((Event *)events->next->next)
$25 = {entry = {prev = 0x9a0dc68, next = 0x9a0d9a0},
name = 0x9a0e2e8 "restart-bgr", args = 0x0, env = 0x0}

(gdb) p *((EventEmission *)events->next)
$27 = {event = {entry = {prev = 0x9a0e1f0, next = 0x9a0ec98},
    name = 0x9a0d938 "stopped", args = 0x9a0e400, env = 0x0}, id = 242,
  progress = EVENT_HANDLING, jobs = 1, failed = 0}
(gdb) p *((EventEmission *)events->next->next)
$28 = {event = {entry = {prev = 0x9a0dc68, next = 0x9a0d9a0},
    name = 0x9a0e2e8 "restart-bgr", args = 0x0, env = 0x0}, id = 245,
  progress = EVENT_HANDLING, jobs = 1, failed = 0}

The job's cause (event that caused it to start/stop) is invalid, this causes the segfault

(gdb) p job->cause
$26 = (EventEmission *) 0x9a0eb48

(gdb) p *job->cause
$17 = {event = {entry = {prev = 0x9a0eb10, next = 0x9a0dd18},
    name = 0x8 <Address 0x8 out of bounds>, args = 0x9a0eb00,
    env = 0x9a0eb58}, id = 161540952, progress = 134517312, jobs = -1,
  failed = 1918989427}

(gdb) p *job
$18 = {entry = {prev = 0x9a0ed50, next = 0x9a1a508}, id = 22,
  name = 0x9a18848 "ezono-cyclades-t_ltl", description = 0x0, author = 0x0,
  version = 0x0, replacement = 0x0, replacement_for = 0x0, instance_of = 0x0,
  goal = JOB_START, state = JOB_SPAWNED, cause = 0x9a0eb48, blocked = 0x0,
  failed = 0, failed_process = 4294967295, exit_status = 0, start_events = {
    prev = 0x9a18f10, next = 0x9a18f10}, stop_events = {prev = 0x9a19208,
    next = 0x9a18fe8}, emits = {prev = 0x9a18e14, next = 0x9a18e14},
  process = 0x9a18ed8, normalexit = 0x9a192d8, normalexit_len = 7,
  kill_timeout = 30, kill_timer = 0x0, instance = 0, service = 1, respawn = 1,
  respawn_limit = 2, respawn_interval = 30, respawn_count = 2,
  respawn_time = 1239344371, daemon = 0, pid_file = 0x0, pid_binary = 0x0,
  pid_timeout = 10, pid_timer = 0x0, console = CONSOLE_OUTPUT, env = 0x0,
  umask = 18, nice = 0, limits = {0x0 <repeats 15 times>}, chroot = 0x0,
  chdir = 0x9a19050 "/tmp"}

There are two events in the list:

(gdb) p *((Event *)events->next)
$24 = {entry = {prev = 0x9a0e1f0, next = 0x9a0ec98},
  name = 0x9a0d938 "stopped", args = 0x9a0e400, env = 0x0}
(gdb) p *((Event *)events->next->next)
$25 = {entry = {prev = 0x9a0dc68, next = 0x9a0d9a0},
  name = 0x9a0e2e8 "restart-bgr", args = 0x0, env = 0x0}

(gdb) p *((EventEmission *)events->next)
$27 = {event = {entry = {prev = 0x9a0e1f0, next = 0x9a0ec98},
    name = 0x9a0d938 "stopped", args = 0x9a0e400, env = 0x0}, id = 242,
  progress = EVENT_HANDLING, jobs = 1, failed = 0}
(gdb) p *((EventEmission *)events->next->next)
$28 = {event = {entry = {prev = 0x9a0dc68, next = 0x9a0d9a0},
    name = 0x9a0e2e8 "restart-bgr", args = 0x0, env = 0x0}, id = 245,
  progress = EVENT_HANDLING, jobs = 1, failed = 0}

Revision history for this message

Slawomir Czarko (lauchpad-net-sklep) wrote on 2009-06-19:

#10

restart-bgr event is emitted by one of the job files after "watchdog" task stopped if auto-restart option is configured.
The problem happens also with auto-restart disabled.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#11

The odd thing is that this looks like job->cause has been freed, this is odd because job->cause is only ever set inside job_change_cause() which takes care to increment and decrement its ->jobs reference count.

The cause address is greater than the "restart-bgr" and "stopped" events, this implies (but doesn't guarantee) that it was emitted later. It's certainly not in the events list anymore.

I'll take a look through the code to see whether this event is attached to any other jobs.

It shouldn't be possible to free the event emission while it's jobs count is greater than 0. It's possible, for example, that I'm incorrectly freeing an event emission elsewhere (e.g. from job->blocked)

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#12

I've marked this bug Private for now, since my debugging may reveal details that you don't wish revealed (plus your core file contains them)

If you're ok for the information to be revealed, please feel free to mark it un-private.

Feel free to delete the core file attachment if you're worried about that, but not worried about my debugging traces

visibility:

public → private

Revision history for this message

Slawomir Czarko (lauchpad-net-sklep) wrote on 2009-06-19:

#13

Does the core file contain anything else than the state of the init daemon? If not then this can be marked un-private.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#14

Download full text (5.3 KiB)

First up, here's a dump of the jobs that you have registered. I've put the names followed by the goal and state in brackets. Then looked at the "cause" and "blocked" parameters, NULL means there wasn't one.

Please shout if any of this system state looks unusual to you:

rc jobs, all stopped:
  rcS (stop/waiting) NULL NULL
  rcS-sulogin (stop/waiting) NULL NULL
  rc0 (stop/waiting) NULL NULL
  rc1 (stop/waiting) NULL NULL
  rc2 (stop/waiting) NULL NULL
  rc3 (stop/waiting) NULL NULL
  rc4 (stop/waiting) NULL NULL
  rc5 (stop/waiting) NULL NULL
  rc6 (stop/waiting) NULL NULL

This is what I'd expect, the system has booted so there's no runlevel change going on right now.

tty jobs, both running:
tty1 (start/running) NULL NULL
tty2 (start/running) NULL NULL

Again, what we'd expect.

Now, a bunch of system services, some running, some not:
  autofs (start/running) NULL NULL
  acpid (start/running) NULL NULL
  haldaemon (start/running) NULL NULL
  messagebus (start/running) NULL NULL
  netplugd (start/running) NULL NULL
  prefdm (start/running) NULL NULL
  rsyslog (start/running) NULL NULL

  logd (stop/waiting) NULL NULL
  microcode_ctl (stop/waiting) NULL NULL
  netfs (stop/waiting) NULL NULL
  network (stop/waiting) NULL NULL
  serial (stop/waiting) NULL NULL
  sshd (stop/waiting) NULL NULL
  sulogin (stop/waiting) NULL NULL

Now your custom bits:

some jobs are stopped:
ezono-cyclades-calibration (stop/waiting) NULL NULL
ezono-cyclades-pre-start (stop/waiting) NULL NULL

  ezono-cyclades-t_ane (stop/waiting) NULL NULL
  ezono-cyclades-t_ccs (stop/waiting) NULL NULL
  ezono-cyclades-t_emg (stop/waiting) NULL NULL
  ezono-cyclades-t_mco (stop/waiting) NULL NULL
  ezono-cyclades-t_mse (stop/waiting) NULL NULL
  ezono-cyclades-t_par (stop/waiting) NULL NULL
  ezono-cyclades-t_pro (stop/waiting) NULL NULL
  ezono-cyclades-t_ssm (stop/waiting) NULL NULL
  ezono-cyclades-t_upg (stop/waiting) NULL NULL
  ezono-cyclades-t_xss (stop/waiting) NULL NULL

These might be "due to be started", but not yet reached, that's ok - if Upstart hasn't reached them they're not relevant yet.

some are running:
  ezono-cyclades-ac-adapter (start/running) NULL NULL
  ezono-cyclades-list-tasks (start/running) NULL NULL
  ezono-cyclades-stop-software-control (start/running) NULL NULL
  ezono-cyclades-suspend-control (start/running) NULL NULL
  ezono-cyclades-xss-control (start/running) NULL NULL
  ezono-malta-fpga-control (start/running) NULL NULL

  ezono-cyclades-t_bbd (start/running) NULL NULL
  ezono-cyclades-t_cal (start/running) NULL NULL
  ezono-cyclades-t_dso (start/running) NULL NULL
  ezono-cyclades-t_fcl (start/running) NULL NULL
  ezono-cyclades-t_fem (start/running) NULL NULL
  ezono-cyclades-t_emg-control (start/running) NULL NULL
  ezono-cyclades-t_mco-control (start/running) NULL NULL
  ezono-cyclades-t_prd (start/running) NULL NULL
  ezono-cyclades-t_scp (start/running) NULL NULL
  ezono-cyclades-t_smr (start/running) NULL NULL
  ezono-cyclades-t_ssc (start/running) NULL NULL
  ezono-cyclades-t_upg-control (start/running) NULL NUL...

First up, here's a dump of the jobs that you have registered.  I've put the names followed by the goal and state in brackets.  Then looked at the "cause" and "blocked" parameters, NULL means there wasn't one.

Please shout if any of this system state looks unusual to you:

rc jobs, all stopped:
  rcS (stop/waiting)  NULL  NULL
  rcS-sulogin (stop/waiting)  NULL  NULL
  rc0 (stop/waiting)  NULL  NULL
  rc1 (stop/waiting)  NULL  NULL
  rc2 (stop/waiting)  NULL  NULL
  rc3 (stop/waiting)  NULL  NULL
  rc4 (stop/waiting)  NULL  NULL
  rc5 (stop/waiting)  NULL  NULL
  rc6 (stop/waiting)  NULL  NULL

This is what I'd expect, the system has booted so there's no runlevel change going on right now.

tty jobs, both running:
  tty1 (start/running)  NULL  NULL
  tty2 (start/running)  NULL  NULL

Again, what we'd expect.

Now, a bunch of system services, some running, some not:
  autofs (start/running)  NULL  NULL
  acpid (start/running)  NULL  NULL
  haldaemon (start/running)  NULL  NULL
  messagebus (start/running)  NULL  NULL
  netplugd (start/running)  NULL  NULL
  prefdm (start/running)  NULL  NULL
  rsyslog (start/running)  NULL  NULL

logd (stop/waiting)  NULL  NULL
  microcode_ctl (stop/waiting)  NULL  NULL
  netfs (stop/waiting)  NULL  NULL
  network (stop/waiting)  NULL  NULL
  serial (stop/waiting)  NULL  NULL
  sshd (stop/waiting)  NULL  NULL
  sulogin (stop/waiting)  NULL  NULL

Now your custom bits:

some jobs are stopped:
  ezono-cyclades-calibration  (stop/waiting)  NULL  NULL
  ezono-cyclades-pre-start  (stop/waiting)  NULL  NULL

ezono-cyclades-t_ane (stop/waiting)  NULL  NULL
  ezono-cyclades-t_ccs (stop/waiting)  NULL  NULL
  ezono-cyclades-t_emg (stop/waiting)  NULL  NULL
  ezono-cyclades-t_mco (stop/waiting)  NULL  NULL
  ezono-cyclades-t_mse (stop/waiting)  NULL  NULL
  ezono-cyclades-t_par (stop/waiting)  NULL  NULL
  ezono-cyclades-t_pro (stop/waiting)  NULL  NULL
  ezono-cyclades-t_ssm (stop/waiting)  NULL  NULL
  ezono-cyclades-t_upg (stop/waiting)  NULL  NULL
  ezono-cyclades-t_xss (stop/waiting)  NULL  NULL

These might be "due to be started", but not yet reached, that's ok - if Upstart hasn't reached them they're not relevant yet.

some are running:
  ezono-cyclades-ac-adapter (start/running)  NULL  NULL
  ezono-cyclades-list-tasks (start/running)  NULL  NULL
  ezono-cyclades-stop-software-control (start/running)  NULL  NULL
  ezono-cyclades-suspend-control (start/running)  NULL  NULL
  ezono-cyclades-xss-control (start/running)  NULL  NULL
  ezono-malta-fpga-control (start/running)  NULL  NULL

ezono-cyclades-t_bbd (start/running)  NULL  NULL
  ezono-cyclades-t_cal (start/running)  NULL  NULL
  ezono-cyclades-t_dso (start/running)  NULL  NULL
  ezono-cyclades-t_fcl (start/running)  NULL  NULL
  ezono-cyclades-t_fem (start/running)  NULL  NULL
  ezono-cyclades-t_emg-control (start/running)  NULL  NULL
  ezono-cyclades-t_mco-control (start/running)  NULL  NULL
  ezono-cyclades-t_prd (start/running)  NULL  NULL
  ezono-cyclades-t_scp (start/running)  NULL  NULL
  ezono-cyclades-t_smr (start/running)  NULL  NULL
  ezono-cyclades-t_ssc (start/running)  NULL  NULL
  ezono-cyclades-t_upg-control (start/running)  NULL  NULL

These are just running normally, there's no "cause" for them (ie. they're defined as services - I didn't actually verify this, for all of them, but I suspect it's true - if you could make sure that'd be great)

Now we also have some running tasks that retain their start cause:
  ezono-cyclades-gui-start-condition-control (start/running)  0x9a0d9a0 started enzono-cyclades-t_bgr (handling 1)  NULL

ezono-cyclades-t_bgr (start/running)  0x9a0ec98 restart-bgr (handling 1)  NULL

ezono-cyclades-t_wdg (start/running)  0x9a0dc68 stopped ezono-cyclades-pre-start ok (handling 1)  NULL

Some jobs are starting up based on the same event:
  ezono-cyclades-t_hal (start/pre-start)  0x9a0e720 start-ezono-cyclades-gui (handling 6)  NULL
  ezono-cyclades-t_rvs (start/pre-start)  0x9a0e720 start-ezono-cyclades-gui (handling 6)  NULL
  ezono-cyclades-t_scv (start/pre-start)  0x9a0e720 start-ezono-cyclades-gui (handling 6)  NULL
  ezono-cyclades-t_spl (start/pre-start)  0x9a0e720 start-ezono-cyclades-gui (handling 6)  NULL
  ezono-cyclades-t_svs (start/pre-start)  0x9a0e720 start-ezono-cyclades-gui (handling 6)  NULL
  ezono-cyclades-t_upn (start/pre-start)  0x9a0e720 start-ezono-cyclades-gui (handling 6)  NULL

Which all looks ok.

And then we have three which are in valid states of starting, but have an invalid pointer for their cause:
  ezono-cyclades-t_fbs (start/pre-start)  0x9a0eb48 INVALID  NULL
  ezono-cyclades-t_ime (start/pre-start)  0x9a0eb48 INVALID  NULL
  ezono-cyclades-t_ltl (start/spawned)  0x9a0eb48 INVALID  NULL
  ezono-cyclades-usb-storage (start/running)  0x9a0eb48 INVALID  NULL

It was the third of these (start/spawned) that crashed the daemon, but any of the other two would have crashed it the minute they reached start/spawned.

I find it interesting that the fourth one is in start/running

In the events list we have:
  0x9a0dc68 stopped ezono-cylades-pre-start ok (handling 1)
  0x9a0ec98 restart-bgr (handling 1)
  0x9a0d9a0 started ezono-cylades-t_bgr (handling 1)
  0x9a0e720 start-ezono-cyclades-gui (handling 6)
  0x9aa7e20 started ezono-cyclades-t_prd (pending 0)
  0x9a0eb20 started ezono-cyclades-t_scp (pending 0)

These numbers match up to the others, so other than the mysterious 0x9a0eb48 event, the data has integrity

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#15

(For the record, I'm looking at core.10726)

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#16

Slawomir: The core file contains the complete state of the init daemon, which includes the parsed result of your job files and their names as well as their pids and suchlike

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#17

The start conditions for the other jobs are all:

start on started ezono-cyclades-t_smr

The state shows this has started.

Thus we can theorise that when ezono-cyclades-t_smr started, its event was created, was attached to these jobs, and they begun starting - but for whatever reason, the event was discarded prematurely

Could you confirm that these four jobs all have this start condition, and no other jobs have that?

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#18

I see you have rsyslogd running

Could you run "initctl log-priority debug", make sure that the output appears in your logs "initctl version" is a good test for that, and then try doing your date change again.

("initctl list" output before would be nice too)

Thanks

Revision history for this message

Slawomir Czarko (lauchpad-net-sklep) wrote on 2009-06-19:

#19

Yes, all the jobs are defined as services.
The four jobs which have invalid pointer to their cause all use the same stanza "start on started ezono-cyclades-t_smr" so I guess this event should be still in the list.

Revision history for this message

Slawomir Czarko (lauchpad-net-sklep) wrote on 2009-06-19:

#20

initctl-debug.tar.bz2 Edit (72.0 KiB, application/octet-stream)

I'm attaching /var/log/messages, output of "initctl list" before SW was started and the new core file

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#21

Thanks for all your information so far.

Out of interest, at which point in the log did you change the date?

Revision history for this message

Slawomir Czarko (lauchpad-net-sklep) wrote on 2009-06-19:

#22

The log only contains what happened when SW was being started. Do you want to see also what happens when date is being changed and SW is stopped?

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#23

Yes, please - it's helpful to see the whole picture

Revision history for this message

Slawomir Czarko (lauchpad-net-sklep) wrote on 2009-06-19:

#24

initctl-debug-2.tar.bz2 Edit (67.2 KiB, application/octet-stream)

Here it is.
In /var/log/messages I marked when date was changed (although it's clear because syslog timestamps change) by "Changing date to previous day" and when upstart finished stopping my software again ("SW stopped").

Revision history for this message

Slawomir Czarko (lauchpad-net-sklep) wrote on 2009-06-19:

#25

upstart-jobs.tar.bz2 Edit (8.5 KiB, application/octet-stream)

I got permission to attach the upstart job files. I zipped the whole /etc/event.d/ directory. You can mark this bug un-private.

Scott James Remnant (Canonical) (canonical-scott) on 2009-06-19

visibility:

private → public

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#26

Download full text (7.0 KiB)

Thanks very much for all your help so far.

From the logs, this looks like the key moment:

May 17 15:13:32 sonostation-usb26-eth init: Handling started event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm respawning too fast, stopped
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm respawning too fast, stopped
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm goal changed from start to stop
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm state changed from starting to waiting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending stopped event

This starting event is "started ezono-cyclades-t_smr", which we've established is the one that is unexpectedly freed.

Two things strike me about this:

- "respawning too fast" - this wrongly uses the system time, so is one of the very few things to be broken by spooling the clock back a day

- there's two of them printed, that's odd...

The next sequence of log message is everything else starting from this event:

May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-stop-software-control goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-stop-software-control state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_dso goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_dso state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssc goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssc state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_fem goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_fem state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_cal goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_cal state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_prd goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_prd state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_scp goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_scp state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth ...

Thanks very much for all your help so far.

From the logs, this looks like the key moment:

May 17 15:13:32 sonostation-usb26-eth init: Handling started event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm respawning too fast, stopped
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm respawning too fast, stopped
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm goal changed from start to stop
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssm state changed from starting to waiting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending stopped event

This starting event is "started ezono-cyclades-t_smr", which we've established is the one that is unexpectedly freed.

Two things strike me about this:

- "respawning too fast" - this wrongly uses the system time, so is one of the very few things to be broken by spooling the clock back a day

- there's two of them printed, that's odd...

The next sequence of log message is everything else starting from this event:

May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-stop-software-control goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-stop-software-control state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_dso goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_dso state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssc goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ssc state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_fem goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_fem state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_cal goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_cal state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_prd goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_prd state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_scp goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_scp state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ltl goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ltl state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_fbs goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_fbs state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth runuser: pam_unix(runuser-l:session): session closed for user x
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ime goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-t_ime state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-usb-storage goal changed from stop to start
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-usb-storage state changed from waiting to starting
May 17 15:13:32 sonostation-usb26-eth init: event_emit: Pending starting event

We then see two previous "started" events get handled, along with the "stopped" event for ezono-cyclades-t_ssm

May 17 15:13:32 sonostation-usb26-eth init: Handling started event
May 17 15:13:32 sonostation-usb26-eth init: Handling started event
May 17 15:13:32 sonostation-usb26-eth init: Handling stopped event

Then the starting events for those all get handled:

May 17 15:13:32 sonostation-usb26-eth init: Handling starting event
May 17 15:13:32 sonostation-usb26-eth init: Handling starting event
May 17 15:13:32 sonostation-usb26-eth init: Handling starting event
May 17 15:13:32 sonostation-usb26-eth init: Handling starting event
May 17 15:13:32 sonostation-usb26-eth init: Handling starting event
May 17 15:13:32 sonostation-usb26-eth init: Handling starting event
May 17 15:13:32 sonostation-usb26-eth init: Handling starting event
May 17 15:13:32 sonostation-usb26-eth init: Handling starting event
May 17 15:13:32 sonostation-usb26-eth init: Handling starting event
May 17 15:13:32 sonostation-usb26-eth init: Handling starting event
May 17 15:13:32 sonostation-usb26-eth init: Handling starting event

Finally we expect to see the above set finish:

May 17 15:13:32 sonostation-usb26-eth init: event_finished: Finished started event
May 17 15:13:32 sonostation-usb26-eth init: event_finished: Finished started event
May 17 15:13:32 sonostation-usb26-eth init: event_finished: Finished started event
May 17 15:13:32 sonostation-usb26-eth init: event_finished: Finished stopped event

If you're counting, you'll notice that there are THREE "started" events being finished, not TWO as we expect.  Upstart is incorrectly finishing and freeing the "started t_smr" event!

We'd next expect to see a starting event being finished, leading to a process being started:

May 17 15:13:32 sonostation-usb26-eth init: event_finished: Finished starting event
May 17 15:13:32 sonostation-usb26-eth init: ezono-cyclades-stop-software-control state changed from starting to pre-start
May 17 15:13:32 sonostation-usb26-eth kernel: init[1]: segfault at 3c ip 00c7c6b3 sp bfa3c48c error 4 in libc-2.8.so[c07000+163000]

We do, and that causes the segfault.

So there we go, Upstart has incorrectly freed the "started t_smr" event.  This must be a bug in the respawn failure handling, either over or under-reffing the emission.

I've filed bug #389586 to track the fact that respawn uses a non-monotonic clock separately, as well as bug #389588 to track another potential issue and bug #389589 for the underlying issue of that.

These should just cause bad behaviour, not a segfault - the segfault is being caused (I believe) by the respawn handling - it just happens that the date change is a handy way to trigger that

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#27

Thank you again for all your help.

This bug is indeed caused by the first process failing to respawn.

It takes a reference to the emitted event, and attempts to start; this fails due to hitting the respawn limit, so it stops again and puts back the reference to the emitted event. As the first process to try and do so, this means the emitted event once again has zero references - and so is marked EVENT_FINISHED.

Even though further processes take a reference the event, it's already too late - the event is scheduled for deletion and Upstart does that at the next opportunity.

Changed in upstart:
status:	Incomplete → Triaged
summary:	- init: segfault after changing date + init: failure to respawn finishes the start event, even if other jobs + reference it later

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#28

Will need a new upload of 0.3 for this

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#29

Here's a test case for this

        TEST_FEATURE ("with respawn limit and second job");
        event = (Event *)job2->stop_events.next;
        nih_list_add (&job2->start_events, &event->entry);

em = event_emit ("wibble", NULL, NULL);
em->progress = EVENT_HANDLING;

TEST_ALLOC_FAIL {
em->jobs = 0;

                job1->goal = JOB_STOP;
                job1->state = JOB_WAITING;
                job1->cause = NULL;

                job2->goal = JOB_STOP;
                job2->state = JOB_WAITING;
                job2->cause = NULL;

                /* job2 hashes first */
                job2->respawn_limit = 10;
                job2->respawn_interval = 1000;
                job2->respawn_time = time (NULL);
                job2->respawn_count = 10;

                TEST_DIVERT_STDERR (output) {
   job_handle_event (em);
                }
                rewind (output);

TEST_EQ (em->jobs, 1);
TEST_EQ (em->progress, EVENT_HANDLING);

                TEST_EQ (job1->goal, JOB_STOP);
  TEST_EQ (job1->state, JOB_WAITING);
                TEST_EQ_P (job1->cause, NULL);

                TEST_EQ (job2->goal, JOB_START);
  TEST_EQ (job2->state, JOB_STARTING);
                TEST_EQ_P (job2->cause, em);

                TEST_FILE_EQ (output, "test: foo respawning too fast, stopped\n$
                TEST_FILE_END (output);
                TEST_FILE_RESET (output);
}

nih_list_free (&em->event.entry);

nih_list_free (&job2->entry);

Revision history for this message

In Red Hat Bugzilla #506679, Slawomir (slawomir-redhat-bugs) wrote on 2009-06-19:

#38

The bug has been triaged upstream. It is targeted for 0.3.11

Revision history for this message

In Red Hat Bugzilla #506679, Casey (casey-redhat-bugs) wrote on 2009-06-19:

#39

Yes. Scott seems to have the issue worked out. He should be coming back with a fix soon.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-19:

#30

If you could try and apply the job.c change in this patch, that should hopefully fix the crasher http://bazaar.launchpad.net/~scott/upstart/0.3/revision/706

Revision history for this message

Slawomir Czarko (lauchpad-net-sklep) wrote on 2009-06-19:

#31

The fix solves the crash. Thanks a lot for such a quick response.

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-20:

#32

This bug has been fixed in Upstart 0.3.11

0.3.11 2009-06-19 "For Friday, June 19th 2009, I'm Jon Masters"

        * Fixed crash when the first job started by an event hits the
          respawn limit and frees the event, even though other jobs are
          also started by the same event. (Bug: #388873)

        * Added temporary support for "telinit u" until we have true re-exec
          support. This will be replaced by an initctl command in future.
          (Bug: #388742)

* Corrected formatting of initctl(8) manpage. (Bug: #388745)

Revision history for this message

Slawomir Czarko (lauchpad-net-sklep) wrote on 2009-06-20:

#33

The fix works also with 0.3.9

Revision history for this message

Scott James Remnant (Canonical) (canonical-scott) wrote on 2009-06-23:

#34

I've gone through the Upstart 0.5 and trunk code, and am confident that they are not affected by this bug.

Changed in upstart:
status:	Triaged → Fix Released

Revision history for this message

In Red Hat Bugzilla #506679, Slawomir (slawomir-redhat-bugs) wrote on 2009-06-23:

#40

Any idea when will the patch be included in Fedora?

Revision history for this message

In Red Hat Bugzilla #506679, Petr (petr-redhat-bugs) wrote on 2009-06-23:

#41

I've prepared rpm with 0.3.11 update with merged patches from fedora and already included patches in upstream. I'm testing it right now.

If Casey agree I'm ready to commit it to devel branch.

Rpms are available here:

http://plautrba.fedorapeople.org/upstart/upstart-0.3.11-1.fc12.src.rpm
http://koji.fedoraproject.org/koji/taskinfo?taskID=1429470

Revision history for this message

In Red Hat Bugzilla #506679, Casey (casey-redhat-bugs) wrote on 2009-06-23:

#42

Sure, this is worth having. Go ahead and commit it.

We're likely expecting a 0.3.12 soon (Scott said he'd take the state transfer patch as soon as I get test cases written), but how soon is uncertain.

Revision history for this message

In Red Hat Bugzilla #506679, Bug (bug-redhat-bugs) wrote on 2009-07-14:

#43

Fedora 9 changed to end-of-life (EOL) status on 2009-07-10. Fedora 9 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Bug Watch Updater (bug-watch-updater) on 2009-07-15

Changed in upstart (Fedora):
status:	Confirmed → Won't Fix

Revision history for this message

In Red Hat Bugzilla #506679, Slawomir (slawomir-redhat-bugs) wrote on 2009-07-16:

#44

The problem is still present in Fedora 11.

Revision history for this message

In Red Hat Bugzilla #506679, Slawomir (slawomir-redhat-bugs) wrote on 2009-07-16:

#45

It looks like the patch was applied in the development branch of Fedora. Any idea when will this be released as an update to Fedora 11?

Revision history for this message

In Red Hat Bugzilla #506679, Fedora (fedora-redhat-bugs) wrote on 2009-07-22:

#46

upstart-0.3.11-1.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/upstart-0.3.11-1.fc11

Bug Watch Updater (bug-watch-updater) on 2009-07-23

Changed in upstart (Fedora):
status:	Won't Fix → In Progress

Revision history for this message

In Red Hat Bugzilla #506679, Fedora (fedora-redhat-bugs) wrote on 2009-07-23:

#47

upstart-0.3.11-1.fc11 has been pushed to the Fedora 11 testing repository. If problems still persist, please make note of it in this bug report.
If you want to test the update, you can install it with
su -c 'yum --enablerepo=updates-testing update upstart'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-7922

Bug Watch Updater (bug-watch-updater) on 2009-07-24

Changed in upstart (Fedora):
status:	In Progress → Fix Committed

Revision history for this message

In Red Hat Bugzilla #506679, Fedora (fedora-redhat-bugs) wrote on 2009-08-11:

#48

upstart-0.3.11-1.fc11 has been pushed to the Fedora 11 stable repository. If problems still persist, please make note of it in this bug report.

Bug Watch Updater (bug-watch-updater) on 2009-08-12

Changed in upstart (Fedora):
status:	Fix Committed → Fix Released

Bug Watch Updater (bug-watch-updater) on 2017-10-27

Changed in upstart (Fedora):
importance:	Unknown → Medium

upstart

init: failure to respawn finishes the start event, even if other jobs reference it later

Bug Description

Other bug subscribers

Bug attachments

Remote bug watches

	Status	Importance	Assigned to	Milestone
upstart	Fix Released	High	Unassigned
0.3	Fix Released	High	Unassigned	upstart 0.3.11 "For Friday, June 19th 2009, I'm Jon Masters"
upstart (Fedora)	Fix Released	Medium	redhat-bugs #506679