One of the more diffcult job, while using the shell, is working with text files to filter their content.


In the following few lines you will find 3 different ways to remove all the duplicate lines from a text file.


First of all, I'll introduce three commands that are available on almost all of the linux distributions, and maybe in all Unix dialects:

uniq

Discard all but one of successive identical lines from INPUT (or standard input), writing to OUTPUT (or standard output).

 

sort 

sort lines of text files. Write sorted concatenation of all FILE(s) to standard output.


awk

The awk utility interprets a special-purpose programming language that makes it possible to handle simple data-reformatting jobs easily with just a few lines of code.



Script number one

uniq file > file.new


If the files are not already sorted then:


$ sort file | uniq > file.new


Script number two


$ sort -u filename > filename.new



 

Script number three

Code:

$ awk '!x[$0]++' file > file.new

 

I think that the third solution is the best one because it doesn't twist the original file while the 1st and second solutions sorts the content of the file.

 

If you found useful this article, please share it using the social buttons below. Thank you in advance.

Gg1