bash HISTCONTROL=erasedups should erase duplicates from history file before saving

Bug #189881 reported by Manuel López-Ibáñez
72
This bug affects 12 people
Affects Status Importance Assigned to Milestone
Gnu Bash
New
Undecided
Unassigned
bash (Ubuntu)
Triaged
Wishlist
Unassigned

Bug Description

Binary package hint: bash

The bash option HISTCONTROL=erasedups can be used to delete all previous lines matching the current line from the history in order to avoid duplicates. However, this is basically useless if you use multiple terminal sessions because bash doesn't check duplicated lines before saving or loading the history file.

Steps to reproduce:

1. Setup ~/.bashrc as follows:
# don't put duplicate lines in the history. See bash(1) for more options
export HISTCONTROL=erasedups
export HISTSIZE=1
export HISTIGNORE="history *:cd *:df *:exit:fg:bg:file *:ll:ls:mc:top:clear"
export HISTFILESIZE=2
#avoid overwriting history
shopt -s histappend

2. Open Terminal 1 and write "echo hola"
3. Open Terminal 2 and write "echo hola"
4. Exit Terminal 1 and 2
5. Open Terminal 3 and execute "history"
Output is:
1 echo hola
2 echo hola
Expected output is:
1 echo hola

When you have more terminals and larger histories the number of duplicates is far larger, thus defeating the purpose of using 'erasedups'.

Matthias Klose (doko)
Changed in bash:
importance: Undecided → Wishlist
status: New → Confirmed
Mika Fischer (zoop)
Changed in bash:
status: Confirmed → Triaged
Revision history for this message
Nos (7-launchpad-bleaksky-net) wrote :

This can be solved by:
export PROMPT_COMMAND="history -a;history -r;$PROMPT_COMMAND"

Revision history for this message
Nos (7-launchpad-bleaksky-net) wrote :

Actually maybe that doesn't work here, but it does help with synchronizing the history across multiple sessions.

Revision history for this message
Manuel López-Ibáñez (manuellopezibanez) wrote :

I am using the following as a workaround:

# don't put duplicate lines in the history. See bash(1) for more options
export HISTCONTROL=erasedups:ignorespace
export HISTSIZE=1000
export HISTIGNORE="history *:cd *:df *:exit:fg:bg:file *:ll:ls:mc:top:clear"
export HISTFILESIZE=10000
#avoid overwriting history
shopt -s histappend
#smart handling of multi-line commands
shopt -s cmdhist
# append every command to history
PROMPT_COMMAND="history -a;$PROMPT_COMMAND"

But I still need to remove duplicates from time to time:

#!/usr/bin/perl
use strict;

my $histfile = `echo ~/.bash_history`;

open(INPUT, "<$histfile") or die "Can't open $histfile: $!\n";
my @lines = reverse <INPUT>;
close(INPUT);

print "Before: ". scalar(@lines). " lines\n";
my @buffer = ();

for (my $i = 0; $i < @lines; $i++) {
    $lines[$i] =~ s/\s+\n$/\n/;
}

while (@lines) {
    my $line = shift @lines;
    push (@buffer, $line) unless $line =~ /^mplayer|^rm/;
    @lines = grep { $_ ne $line } @lines;
}

open(OUTPUT, ">$histfile") or die "Can't open $histfile: $!\n";
print OUTPUT reverse @buffer;
close(OUTPUT);

print "After: ". scalar(@buffer). " lines\n";

exit(0);

Not sure why you have "history -r". Is it fast enough?

Revision history for this message
Nos (7-launchpad-bleaksky-net) wrote :

The history -r ensures that all terminals are always synced with the latest history commands from all other terminals,
because the read from file only occurs when a terminal is started.

You are right, it can be slow if the history file grows too large; in which case just settle for "history -a" - everything is kept but recent commands typed in other terminals are not available.

I like to keep the full history for analysis purposes, see
http://www.oreillynet.com/onlamp/blog/2007/01/whats_in_your_bash_history.html
but in order to keep the file size small for quicker loading, I am using the following script as a workaround to archive and remove duplicate entries:

#!/bin/bash

let count=1

#Backup history so far
if [[ ! -e ~/bash_history.bck.1 ]]; then
  cat ~/.bash_history > bash_history.bck.1
else #Only backup new bits.
  while [[ -e ~/bash_history.bck.$count ]]; do
    let "count += 1"
  done
  let "last=count-1"
  new_line=`nl -n rz ~/.bash_history | grep "==== bck" | tail -n 1 | cut -c1-6`
  #make sure it is interpreted in decimal
  new_line=$(( 10#$new_line ))
  echo $new_line
  split -a1 -l $new_line ~/.bash_history ~/bash_history.bck.$count
  rm bash_history.bck.${count}a
  mv bash_history.bck.${count}b bash_history.bck.${count}
fi

#Remove duplicates from history but retain ordering
nl -n rz ~/.bash_history | sort -k2 -u | sort | cut -f2- > ~/.bash_history
#history | sort -k2 -u | sort -n | cut -f2-
# Add a marker line to separate new history from compressed history.
echo ===================================== bck.$count === `date` >> ~/.bash_history

exit

Revision history for this message
Nos (7-launchpad-bleaksky-net) wrote :

Actually
PROMPT_COMMAND="history -a;history -c; history -r;$PROMPT_COMMAND"
is better because it doesn't cause your history numbers to double!

Revision history for this message
SilverWave (silverwave) wrote :

@manu's perl script works fine, thanks.

But I use this:

awk '!x[$0]++' .bash_history > .bash.tmp && mv -f .bash.tmp .bash_history

Revision history for this message
Manuel López-Ibáñez (manuellopezibanez) wrote :

@SilverWave

Nice, does that method keep the latest of the duplicates? Have you compared speed with, say, 10000 lines? The perl script is quite slow (several seconds!).

Revision history for this message
Rolf Leggewie (r0lf) wrote :

This has been discussed upstream in

http://lists.gnu.org/archive/html/bug-bash/2008-06/msg00050.html ff
http://lists.gnu.org/archive/html/bug-bash/2007-05/msg00016.html ff

Unfortunately, I'm not sure that upstream intends to take any action or even considers it in need of fixing. I'm not aware of upstream even tracking bugs. As such, it's probably fallen off the radar.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.