Linux Command uniq and 6 Options

The Linux command uniq is used to remove duplicate content or find duplicated entries when processing files or data. In this post, we will explain how to use the uniq command, its options, and how to utilize it efficiently.

What is the Linux Command uniq?

The uniq command is used in Linux to check for duplicate content and filter out repeated lines in files or data. However, uniq only handles consecutive duplicate lines, meaning that if duplicated content is not consecutive, it won’t provide the expected results. Therefore, it is generally common to use the sort command before using uniq to sort the file and remove duplicates afterward.

How to Use uniq

The basic syntax of the uniq command is as follows:

uniq [options] [input file] [output file]
Plaintext
  • [options]: Various options used to filter duplicate entries.
  • [input file]: The file from which duplicates will be removed.
  • [output file]: The file where the filtered results will be saved.

By default, the uniq command removes consecutive duplicate lines from the input file and saves the result in the output file. If the input file is not specified, it takes standard input (keyboard input), and if the output file is not specified, it outputs the result to the standard output (the terminal screen).

Here’s an example file:

apple
apple
raspberry
banana
banana
apple
raspberry
raspberry
watermelon
Plaintext

When using the uniq command, you can see that consecutive duplicates are removed:

Figure 1. Removing duplicate text using the Linux command uniq
Figure 1. Removing duplicate text using the Linux command uniq

Next is an example of using the sort command along with uniq to remove duplicate lines. In this case, it gives the same result as using the -u option with sort.

sort data.txt | uniq > result.txt
ShellScript
  • sort data.txt: Sorts the content of the file data.txt.
  • uniq: Removes consecutive duplicate lines from the sorted file.
  • > result.txt: Saves the result into the result.txt file.
Figure 2. Removing duplicates using the Linux command uniq with the sort command
Figure 2. Removing duplicates using the Linux command uniq with the sort command

The reason for using the sort command first is that uniq only removes consecutive duplicates. If the data is not sorted, even if there are duplicate lines, uniq may not detect them.

uniq Command Options

The uniq command offers various options. Let’s take a look at some important ones.

-c (Count the number of duplicate lines)

The -c option prints how many times each line appears. This option allows you to check not only if there are duplicates but also how many times they occur.

uniq -c sorted.txt
ShellScript

In this example, you can see that apple appears 3 times, banana 2 times, and raspberry 3 times.

Figure 3. Printing duplicate counts using the Linux command uniq with the -c option
Figure 3. Printing duplicate counts using the Linux command uniq with the -c option

-d (Print only duplicated lines)

The -d option prints only duplicated lines. It doesn’t print unique lines, making it useful when you want to find duplicated data.

uniq -d data.txt
ShellScript

As shown below, only the duplicated values apple, banana, and raspberry are printed.

Figure 4. Printing only duplicates using the Linux command uniq with the -d option
Figure 4. Printing only duplicates using the Linux command uniq with the -d option

-u (Print only unique lines)

Conversely, the -u option prints only lines that are not duplicated. This option is helpful when you want to filter out unique data.

uniq -u data.txt
ShellScript

When working with unsorted data, it may also show lines that are not consecutively duplicated. Therefore, remember that using uniq after sorting yields the desired result. In this case, the unique line watermelon is printed.

Figure 5. Printing only unique lines using the Linux command uniq with the -u option
Figure 5. Printing only unique lines using the Linux command uniq with the -u option

-i (Ignore case differences)

The -i option ignores case when checking for duplicates. This means that Apple and apple will be treated as the same string.

uniq -i sorted.txt
ShellScript

As seen in the result, Apple and apple are treated as duplicates.

Figure 6. Removing duplicates case-insensitively using the Linux command uniq with the -i option
Figure 6. Removing duplicates case-insensitively using the Linux command uniq with the -i option

-f N (Ignore the first N fields)

The -f option ignores the first N fields when comparing lines for duplicates. This is useful when only certain parts of the data are important.

uniq -f 2 sorted.txt
ShellScript

In this example, the command ignores the first two fields of each line and compares the remaining fields for duplicates.

Figure 7. Ignoring fields using the Linux command uniq with the -f option
Figure 7. Ignoring fields using the Linux command uniq with the -f option

-s N (Ignore the first N characters)

The -s option ignores the first N characters of each line when checking for duplicates.

uniq -s 4 sorted.txt
ShellScript

This command ignores the first 4 characters of each line and compares the remaining part to determine whether lines are duplicates.

Precautions When Using the uniq Command

  • Use with the sort command: As mentioned earlier, the uniq command only processes consecutive duplicate lines. To handle data correctly, it is recommended to use the sort command along with uniq. If the data is not sorted, even if there are duplicate lines, uniq may not detect them.
  • Processing speed with large files: When dealing with large files, using both sort and uniq may take some time. In such cases, you may want to split the file or choose options that use less memory.

Summary

The Linux command uniq is a powerful tool for filtering duplicate content when processing files or data in Linux. It is especially useful for checking and removing duplicates when analyzing or organizing data, and it offers a variety of options to filter data in the desired way. The uniq command can be used not only to remove duplicates but also to count duplicate occurrences, ignore case, or even skip certain fields or characters. However, remember that uniq only processes consecutive duplicate lines, so it is crucial to use it along with the sort command.

Use the uniq command appropriately to increase efficiency when processing data in Linux!

References

Leave a Comment