Ubuntu
bash package

bash HISTCONTROL=erasedups should erase duplicates from history file before saving

Bug #189881 reported by Manuel López-Ibáñez on 2008-02-07

This bug affects 12 people

Affects		Status	Importance	Assigned to	Milestone
	Gnu Bash	New	Undecided	Unassigned
	bash (Ubuntu)	Triaged	Wishlist	Unassigned

Bug Description

Binary package hint: bash

The bash option HISTCONTROL=erasedups can be used to delete all previous lines matching the current line from the history in order to avoid duplicates. However, this is basically useless if you use multiple terminal sessions because bash doesn't check duplicated lines before saving or loading the history file.

Steps to reproduce:

1. Setup ~/.bashrc as follows:
# don't put duplicate lines in the history. See bash(1) for more options
export HISTCONTROL=erasedups
export HISTSIZE=1
export HISTIGNORE="history *:cd *:df *:exit:fg:bg:file *:ll:ls:mc:top:clear"
export HISTFILESIZE=2
#avoid overwriting history
shopt -s histappend

2. Open Terminal 1 and write "echo hola"
3. Open Terminal 2 and write "echo hola"
4. Exit Terminal 1 and 2
5. Open Terminal 3 and execute "history"
Output is:
1 echo hola
2 echo hola
Expected output is:
1 echo hola

When you have more terminals and larger histories the number of duplicates is far larger, thus defeating the purpose of using 'erasedups'.

Matthias Klose (doko) on 2008-02-08

Changed in bash:
importance:	Undecided → Wishlist
status:	New → Confirmed

Mika Fischer (zoop) on 2008-06-07

Changed in bash:
status:	Confirmed → Triaged

Revision history for this message

Nos (7-launchpad-bleaksky-net) wrote on 2009-04-29:

This can be solved by:
export PROMPT_COMMAND="history -a;history -r;$PROMPT_COMMAND"

Revision history for this message

Nos (7-launchpad-bleaksky-net) wrote on 2009-04-29:

Actually maybe that doesn't work here, but it does help with synchronizing the history across multiple sessions.

Revision history for this message

Manuel López-Ibáñez (manuellopezibanez) wrote on 2009-04-29:

I am using the following as a workaround:

# don't put duplicate lines in the history. See bash(1) for more options
export HISTCONTROL=erasedups:ignorespace
export HISTSIZE=1000
export HISTIGNORE="history *:cd *:df *:exit:fg:bg:file *:ll:ls:mc:top:clear"
export HISTFILESIZE=10000
#avoid overwriting history
shopt -s histappend
#smart handling of multi-line commands
shopt -s cmdhist
# append every command to history
PROMPT_COMMAND="history -a;$PROMPT_COMMAND"

But I still need to remove duplicates from time to time:

#!/usr/bin/perl
use strict;

my $histfile = `echo ~/.bash_history`;

open(INPUT, "<$histfile") or die "Can't open $histfile: $!\n";
my @lines = reverse <INPUT>;
close(INPUT);

print "Before: ". scalar(@lines). " lines\n";
my @buffer = ();

for (my $i = 0; $i < @lines; $i++) {
$lines[$i] =~ s/\s+\n$/\n/;
}

while (@lines) {
    my $line = shift @lines;
    push (@buffer, $line) unless $line =~ /^mplayer|^rm/;
    @lines = grep { $_ ne $line } @lines;
}

open(OUTPUT, ">$histfile") or die "Can't open $histfile: $!\n";
print OUTPUT reverse @buffer;
close(OUTPUT);

print "After: ". scalar(@buffer). " lines\n";

exit(0);

Not sure why you have "history -r". Is it fast enough?

Revision history for this message

Nos (7-launchpad-bleaksky-net) wrote on 2009-04-29:

The history -r ensures that all terminals are always synced with the latest history commands from all other terminals,
because the read from file only occurs when a terminal is started.

You are right, it can be slow if the history file grows too large; in which case just settle for "history -a" - everything is kept but recent commands typed in other terminals are not available.

I like to keep the full history for analysis purposes, see
http://www.oreillynet.com/onlamp/blog/2007/01/whats_in_your_bash_history.html
but in order to keep the file size small for quicker loading, I am using the following script as a workaround to archive and remove duplicate entries:

#!/bin/bash

let count=1

#Backup history so far
if [[ ! -e ~/bash_history.bck.1 ]]; then
  cat ~/.bash_history > bash_history.bck.1
else #Only backup new bits.
  while [[ -e ~/bash_history.bck.$count ]]; do
    let "count += 1"
  done
  let "last=count-1"
  new_line=`nl -n rz ~/.bash_history | grep "==== bck" | tail -n 1 | cut -c1-6`
  #make sure it is interpreted in decimal
  new_line=$(( 10#$new_line ))
  echo $new_line
  split -a1 -l $new_line ~/.bash_history ~/bash_history.bck.$count
  rm bash_history.bck.${count}a
  mv bash_history.bck.${count}b bash_history.bck.${count}
fi

#Remove duplicates from history but retain ordering
nl -n rz ~/.bash_history | sort -k2 -u | sort | cut -f2- > ~/.bash_history
#history | sort -k2 -u | sort -n | cut -f2-
# Add a marker line to separate new history from compressed history.
echo ===================================== bck.$count === `date` >> ~/.bash_history

exit

Revision history for this message

Nos (7-launchpad-bleaksky-net) wrote on 2009-04-29:

Actually
PROMPT_COMMAND="history -a;history -c; history -r;$PROMPT_COMMAND"
is better because it doesn't cause your history numbers to double!

Revision history for this message

SilverWave (silverwave) wrote on 2010-04-17:

@manu's perl script works fine, thanks.

But I use this:

awk '!x[$0]++' .bash_history > .bash.tmp && mv -f .bash.tmp .bash_history

Revision history for this message

Manuel López-Ibáñez (manuellopezibanez) wrote on 2010-04-17:

@SilverWave

Nice, does that method keep the latest of the duplicates? Have you compared speed with, say, 10000 lines? The perl script is quite slow (several seconds!).

Revision history for this message

Rolf Leggewie (r0lf) wrote on 2010-06-04:

This has been discussed upstream in

http://lists.gnu.org/archive/html/bug-bash/2008-06/msg00050.html ff
http://lists.gnu.org/archive/html/bug-bash/2007-05/msg00016.html ff

Unfortunately, I'm not sure that upstream intends to take any action or even considers it in need of fixing. I'm not aware of upstream even tracking bugs. As such, it's probably fallen off the radar.

Report a bug

This report contains Public information

Everyone can see this information.

Duplicates of this bug

Bug #415217

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntubash package

bash HISTCONTROL=erasedups should erase duplicates from history file before saving

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

Ubuntu
bash package