3 ways to remove duplicate lines from a text file
One of the more diffcult job, while using the shell, is working with text files to filter their content.
In the following few lines you will find 3 different ways to remove all the duplicate lines from a text file.
First of all, I'll introduce three commands that are available on almost all of the linux distributions, and maybe in all Unix dialects:
Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).
sort lines of text files. Write sorted concatenation of all FILE(s) to standard output.
The awk utility interprets a special-purpose programming language that makes it possible to handle simple data-reformatting jobs easily with just a few lines of code.
Script number one
uniq file > file.new
If the files are not already sorted then:
$ sort file | uniq > file.new
Script number two
$ sort -u filename > filename.new
Script number three
$ awk '!x[$0]++' file > file.new
I think that the third solution is the best one because it doesn't twist the original file while the 1st and second solutions sorts the content of the file.
If you found useful this article, please share it using the social buttons below. Thank you in advance.