GLibC

fseek(…, …, SEEK_SET) causes reading over the skipped range

Bug #1861776 reported by Konstantin on 2020-02-04

This bug affects 1 person

	Status	Importance	Assigned to
GLibC	New	Medium	sourceware-bugs #25497
glibc (Ubuntu)	Fix Released	Undecided	Unassigned
Bionic	New	Low	Unassigned
Focal	Fix Released	Undecided	Unassigned

Bug Description

When fseek is called, it calls in turn 1. lseek(), and 2. read(). In glibc 2.29 (maybe earlier) read() is only called for the last block. However in glibc 2.27 Ubuntu 18.04 is using, the read happens over the whole skipped range, which may cause a hang of an app that tries to skip too big range.

There's is a related report: https://sourceware.org/bugzilla/show_bug.cgi?id=25497 Note, per comments, in at least glibc 2.29 read() only happens for the *last block*. This means there was some fix for fseek() to not read over everything it skipped, which Ubuntu didn't backport to older glibc it's using.

# Steps to reproduce

In command below, replace `/dev/sda` if necessary with a device that is at least 2 GB in size.

Run `sudo hexdump -C /dev/sda -s 0x80000000 -n 1`. This command uses `hexdump` to print content of a disk at a large offset.

## Expected

The command returns immediately with a print

## Actual

The command hangs with high CPU load. If you use `strace hexdump …`, you'll see there a bunch of reads happens. These reads arise from glibc 2.27 implementation of `fseek()`.

See original description

Revision history for this message

In Sourceware.org Bugzilla #25497, Konstantin (hi-angel-z) wrote on 2020-02-03:

When fseek called, it in turn calls lseek (as expected), and then calls read() over the skipped range (as not expected). In the best case, it's a waste of CPU and IO resources. In the worst case, this causes an application that tried to skip too big range to just hang on fseek().

This is a follow up to discussion at https://sourceware.org/ml/libc-help/2020-01/threads.html#00046

# Steps to reproduce (in terms of terminal commands)

    $ cat test.c
    #include <fcntl.h>
    #include <stdio.h>

    int main() {
        FILE* f = fopen("/tmp/test.c", "r");
        if (!f)
            perror("");
        fseek(f, 30, SEEK_SET);
    }
    $ gcc test.c -o a
    $ strace ./a 2>&1 | tail
    mprotect(0x7fd2c36c1000, 4096, PROT_READ) = 0
    munmap(0x7fd2c3628000, 451693) = 0
    brk(NULL) = 0x557c9e900000
    brk(0x557c9e921000) = 0x557c9e921000
    openat(AT_FDCWD, "/tmp/test.c", O_RDONLY) = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=155, ...}) = 0
    lseek(3, 0, SEEK_SET) = 0
    read(3, "#include <fcntl.h>\n#include <s", 30) = 30
    exit_group(0) = ?
    +++ exited with

## Expected

There's no read() call after lseek()

## Actual

Both lseek() and read() are called.

Revision history for this message

In Sourceware.org Bugzilla #25497, Carlos-0 (carlos-0) wrote on 2020-02-03:

I'm not sure what the consequences are for optimizing away the read as part of the FILE buffer management. That is the question that would need to be answered here before we could do something like this.

Revision history for this message

In Sourceware.org Bugzilla #25497, Andreas Schwab (schwab-linux-m68k) wrote on 2020-02-03:

The read is required to sychronize the underlying file position, while keeping the stdio buffer aligned on a block boundary.

Revision history for this message

In Sourceware.org Bugzilla #25497, Konstantin (hi-angel-z) wrote on 2020-02-03:

(In reply to Andreas Schwab from comment #2)
> The read is required to sychronize the underlying file position, while
> keeping the stdio buffer aligned on a block boundary.

Though I don't know why it's necessary, but would it be possible in this case to at least only read just one block, that is the last block before the position a program is trying to set with fseek()? So at least, when a program tries to do fseek(…,0x80000000, SEEK_SET), it wouldn't hang on fseek trying to read half a terabyte of data.

Revision history for this message

In Sourceware.org Bugzilla #25497, Andreas Schwab (schwab-linux-m68k) wrote on 2020-02-03:

Where do you see it reading more than one block?

Revision history for this message

In Sourceware.org Bugzilla #25497, Konstantin (hi-angel-z) wrote on 2020-02-03:

(In reply to Andreas Schwab from comment #4)
> Where do you see it reading more than one block?

Oh, I stand corrected, on glibc 2.30 this is no longer reproducible. Though it's reproducible on glibc 2.27, just 3 versions ago. Reproducing that simply requires one to run something like `sudo hexdump -C /dev/sda -s 0xa8000f9000 -n 1`: if it hangs, it's because `fseek()` hexdump is using tries to read 0xa8000f9000 amount of data.

Revision history for this message

Konstantin (hi-angel-z) wrote on 2020-02-05:

UPD: replaced 2.30 → 2.29 in the description. Per my colleague's test, 2.29 works too.

description:

updated

Revision history for this message

Balint Reczey (rbalint) wrote on 2020-12-07:

I've set the bug as forwarded, following how the discussion at upstream goes. The fix _may_ be backported if upstream backports it to the 2.27 branch.

Changed in glibc (Ubuntu):
status:	New → Fix Released
Changed in glibc (Ubuntu Focal):
status:	New → Fix Released
Changed in glibc (Ubuntu Bionic):
importance:	Undecided → Low

Bug Watch Updater (bug-watch-updater) on 2020-12-08

Changed in glibc:
importance:	Unknown → Medium
status:	Unknown → New

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

sourceware-bugs #25497
[UNCONFIRMED] Edit

Bug watches keep track of this bug in other bug trackers.