fseek(…, …, SEEK_SET) causes reading over the skipped range

Bug #1861776 reported by Konstantin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
GLibC
New
Medium
glibc (Ubuntu)
Fix Released
Undecided
Unassigned
Bionic
New
Low
Unassigned
Focal
Fix Released
Undecided
Unassigned

Bug Description

When fseek is called, it calls in turn 1. lseek(), and 2. read(). In glibc 2.29 (maybe earlier) read() is only called for the last block. However in glibc 2.27 Ubuntu 18.04 is using, the read happens over the whole skipped range, which may cause a hang of an app that tries to skip too big range.

There's is a related report: https://sourceware.org/bugzilla/show_bug.cgi?id=25497 Note, per comments, in at least glibc 2.29 read() only happens for the *last block*. This means there was some fix for fseek() to not read over everything it skipped, which Ubuntu didn't backport to older glibc it's using.

# Steps to reproduce

In command below, replace `/dev/sda` if necessary with a device that is at least 2 GB in size.

Run `sudo hexdump -C /dev/sda -s 0x80000000 -n 1`. This command uses `hexdump` to print content of a disk at a large offset.

## Expected

The command returns immediately with a print

## Actual

The command hangs with high CPU load. If you use `strace hexdump …`, you'll see there a bunch of reads happens. These reads arise from glibc 2.27 implementation of `fseek()`.

Revision history for this message
In , Konstantin (hi-angel-z) wrote :

When fseek called, it in turn calls lseek (as expected), and then calls read() over the skipped range (as not expected). In the best case, it's a waste of CPU and IO resources. In the worst case, this causes an application that tried to skip too big range to just hang on fseek().

This is a follow up to discussion at https://sourceware.org/ml/libc-help/2020-01/threads.html#00046

# Steps to reproduce (in terms of terminal commands)

    $ cat test.c
    #include <fcntl.h>
    #include <stdio.h>

    int main() {
        FILE* f = fopen("/tmp/test.c", "r");
        if (!f)
            perror("");
        fseek(f, 30, SEEK_SET);
    }
    $ gcc test.c -o a
    $ strace ./a 2>&1 | tail
    mprotect(0x7fd2c36c1000, 4096, PROT_READ) = 0
    munmap(0x7fd2c3628000, 451693) = 0
    brk(NULL) = 0x557c9e900000
    brk(0x557c9e921000) = 0x557c9e921000
    openat(AT_FDCWD, "/tmp/test.c", O_RDONLY) = 3
    fstat(3, {st_mode=S_IFREG|0644, st_size=155, ...}) = 0
    lseek(3, 0, SEEK_SET) = 0
    read(3, "#include <fcntl.h>\n#include <s", 30) = 30
    exit_group(0) = ?
    +++ exited with

## Expected

There's no read() call after lseek()

## Actual

Both lseek() and read() are called.

Revision history for this message
In , Carlos-0 (carlos-0) wrote :

I'm not sure what the consequences are for optimizing away the read as part of the FILE buffer management. That is the question that would need to be answered here before we could do something like this.

Revision history for this message
In , Andreas Schwab (schwab-linux-m68k) wrote :

The read is required to sychronize the underlying file position, while keeping the stdio buffer aligned on a block boundary.

Revision history for this message
In , Konstantin (hi-angel-z) wrote :

(In reply to Andreas Schwab from comment #2)
> The read is required to sychronize the underlying file position, while
> keeping the stdio buffer aligned on a block boundary.

Though I don't know why it's necessary, but would it be possible in this case to at least only read just one block, that is the last block before the position a program is trying to set with fseek()? So at least, when a program tries to do fseek(…,0x80000000, SEEK_SET), it wouldn't hang on fseek trying to read half a terabyte of data.

Revision history for this message
In , Andreas Schwab (schwab-linux-m68k) wrote :

Where do you see it reading more than one block?

Revision history for this message
In , Konstantin (hi-angel-z) wrote :

(In reply to Andreas Schwab from comment #4)
> Where do you see it reading more than one block?

Oh, I stand corrected, on glibc 2.30 this is no longer reproducible. Though it's reproducible on glibc 2.27, just 3 versions ago. Reproducing that simply requires one to run something like `sudo hexdump -C /dev/sda -s 0xa8000f9000 -n 1`: if it hangs, it's because `fseek()` hexdump is using tries to read 0xa8000f9000 amount of data.

Revision history for this message
Konstantin (hi-angel-z) wrote :

UPD: replaced 2.30 → 2.29 in the description. Per my colleague's test, 2.29 works too.

description: updated
Revision history for this message
Balint Reczey (rbalint) wrote :

I've set the bug as forwarded, following how the discussion at upstream goes. The fix _may_ be backported if upstream backports it to the 2.27 branch.

Changed in glibc (Ubuntu):
status: New → Fix Released
Changed in glibc (Ubuntu Focal):
status: New → Fix Released
Changed in glibc (Ubuntu Bionic):
importance: Undecided → Low
Changed in glibc:
importance: Unknown → Medium
status: Unknown → New
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.