How to Find Duplicate Files in Linux and Remove Them

Find and Remove Duplicate files in Linux

Brief: FSlint is a great GUI tool to find duplicate files in Linux and remove them. FDUPES also find the files with same name in Linux but in the command line way. 

If you have this habit of downloading everything from the web like me, you will end up having multiple duplicate files. Most often, I can find the same songs or a bunch of images in different directories or end up backing up some files at two different places. It’s a pain locating these duplicate files manually and deleting them to recover the disk space.

If you want to save yourself from this pain, there are various Linux applications that will help you in locating these duplicate files and removing them. In this article, we will cover how you can find and remove these files in Ubuntu.

Note: You should know what you are doing. If you are using a new tool, it’s always better to try it in a virtual directory structure to figure out what it does before taking it to root or home folder. Also, it’s always better to backup your Linux system!

FSlint: GUI tool to find and remove duplicate files

FSlint helps you search and remove duplicate files, empty directories or files with incorrect names. It has a command-line as well as GUI mode with a set of tools to perform a variety of tasks.

To install FSlint, type the below command in Terminal.

sudo apt install fslint

Open FSlint from the Dash search.

Use FSlint tool find duplicate files in Linux
FSlint dashboard

FSlint includes a number of options to choose from. There are options to find duplicate files, installed packages, bad names, name clashes, temp files, empty directories etc. Choose the Search Path and the task which you want to perform from the left panel and click on Find to locate the files. Once done, you can select the files you want to remove and Delete it.

You can click on any file directory from the search result to open it if you are not sure and want to double check it before deleting it.

You can select Advanced search parameters where you can define rules to exclude certain file types or exclude directories which you don’t want to search.

[irp posts=”13624″ name=”7 Simple Ways To Free Up Space On Ubuntu and Linux Mint”]

FDUPES: CLI tool to find and remove duplicate files

FDUPES is a command line utility to find and remove duplicate files in Linux. It can list out the duplicate files in a particular folder or recursively within a folder. It asks which file to preserve before deletion and the noprompt option lets you delete all the duplicate files keeping the first one without asking you.

Installation on Debian / Ubuntu

sudo apt install fdupes

Installation on Fedora

dnf install fdupes

Once installed, you can search duplicate files using the below command:

fdupes /path/to/folder

For recursively searching within a folder, use -r option

fdupes -r /home

This will only list the duplicate files and do not delete them by itself. You can manually delete the duplicate files or use -d option to delete them.

fdupes -d /path/to/folder

This won’t delete anything on its own but will display all the duplicate files and gives you an option to either delete files one by one or select a range to delete it. If you want to delete all files without asking and preserving the first one, you can use the noprompt -N option.

fdupes command line tool to find duplicate files in Ubuntu Linux
FDUPES: finding and removing duplicate files

In the above screenshot, you can see the -d command showing all the duplicate files within the folder and asking you to select the file which you want to preserve.

Final Words

There are many other ways and tools to find and delete duplicate files in Linux. Personally, I prefer the FDUPES command line tool; it’s simple and takes no resources.

How do you deal with the finding and removing duplicate files in your Linux system? Do tell us in the comment section.

Similar Posts

  • Have a look at the tool rmlint, it is such a gem!

    I have tried rdfind and fdupes extensively. Although I found them to be useful and robust, they were limited by considering files as being duplicates based mainly on their content. Without possible consideration of the name of the file. Then I discovered rmlint recently, and it offers this as an optional argument. And so much more! Very flexible and user-friendly. Different output formats, even duplicate folders … Just what I needed. Plus GUI.

    Handy tutorial at: https://rmlint.readthedocs.io/en/latest/tutorial.html

    • I have separate hard drives that have many of the same songs. I would like to eliminate from one drive the songs that exist on both w/o doing it manually. Can rmlint identify the dupes between drives and eliminate them from only one drive?

      • Hello bob,

        From my experience thus far, this should definitely work. So that on drive_E the duplicate songs are eliminated, and on drive_K they are kept, be sure to specify them on the command line like …”Drive_E // Drive_K …” where the drive after the double slash signifies the preferred/tagged ‘original’ having priority.

        For example, a composition of a terminal/bash command which I found useful:

        rmlint –type=”f” –addoutput=csv:rmlint.csv MapOnDrive_E // MapOnDrive_K | tee output.txt

        where:
        –type=”f” specifies files only (not other ‘lint’ like broken symlinks, empty files, (empty) dirs etc.);
        –addoutput=csv:rmlint.csv produces an extra .csv (spreadsheet) file named rmlint.csv with the names of the candidates to be removed. This is optional, but I like it;
        | tee output.txt generates an extra .txt output file, in addition to the screen output. Also optional, and handy;

        Perhaps superfluous to say: don’t worry that this command will delete anything, it does only a “dry-run” providing you with all the means to run the “real thing” next.

        Try it out first with a small sample set, eg with both intended directories on another drive, in order to finetune things for your particular use-case. And glance through the options, given by the manpage for rmlint (man rmlint). It is also possible to filter/search for certain filenames/extensions.

        The same should work through the GUI version of rmlint, but I have not yet used that.

        Good luck!

        Paul

          • Hello Bob,

            Not so easy to answer, as your configuration undoubtedly differs from mine. And this comment box is not that suited to code nor screen-shots. I noticed that the double dashes of the rmlint option ‘–types=’ had been replaced by a single dash, after submitting my reply! Therefore it remains essential to consult the man page, tutorial, and have some confidence with the command line in order to experiment a little.

            More to the point: I realize I made a bad typo in the hurry … I wrongly listed the main option as ‘–type=”f”‘, but that should have been ‘–types=”df”‘. Sorry.

            In order to simulate your particular scenario, I have used two USB sticks, named ‘USB_E’ (eliminate) and ‘USB_K’ (keep). On each I created a directory named ‘Songs’. In ‘USB_E/Songs’ I placed a set of 4 audio files: Song1.wav, Song2.wav, Song1.mp3, Song2.mp3. In ‘USB_K/Songs’ I placed an overlapping set of 4 audio files Song2.wav, Song3.wav, Song2.mp3, Song3.mp3. Thus, the dupes between these folders are: Song2.wav and Song2.mp3.

            For this scenario, my basic commands to eliminate the dupes from the external ‘USB_E’, while keeping them on the external ‘USB_K’ are:
            $ cd /home/paul/Tmp [enter]
            $ rmlint –types=”df” “/media/paul/USB_E/Songs” // “/media/paul/USB_K/Songs” [enter]

            Note the double dashes before ‘types=’, and the use of quotation marks around the paths should these contain spaces. “/media/paul” Is where my external drives are attached, as indicated in the file manager.

            After running these commands, rmlint will show in its screen output exactly which files are the duplicates, and which of these it intends to keep, and which it intends to remove. Then, in order to actually carry out the job, run the executable script ‘rmlint.sh’ which rmlint has generated in the current working directory. Run this script as follows:
            $ ./rmlint.sh [enter]

            The script gives some brief info first, you can still easily abort at this stage. In order to proceed, just type any string key (e.g. a single ‘c’) followed by [enter] at the keyboard. After execution, the script will be removed automatically.

            Checking the outcome, ‘USB_E/Songs’ now contains only Song1.wav and Song1.mp3, whereas ‘USB_K/Songs’ still contains Song2.wav, Song3.wav, Song2.mp3, Song3.mp3.

            That’s it. It sounds more complicated than it is, really.

            rmlint Has a lot of options for finetuning, most of which I have not explored myself. Hope this will now work in your case. Good luck!

            Paul

    • I was going to go with rdfind, but your recommendation to rmlint is a wonderful alternative, I found it very useful, very well documented for what I required and easily available on the package manager.
      Thanks!

      • Hello Karosuo,

        Nice to read. To be fair, I find rdfind also very practical, and slightly prefer it over fdupes, but for some extra options/flexibility rmlint is hard to beat.

        And I realized that often simple deduplication can be achieved using a GUI file comparison program (like Meld or Double Commander) by manually selecting the duplicates and then click Delete.

        Paul

        • Yeah, I know, but in my case, where I just assumed I had good manual control over my repeated files and they’re a ton, rmlint was a very good one, that separation about letting me know what’s repeated and also building the script just in case I want to use it is a new perspective that I liked a lot for the bulky part.

  • Hi from Tübingen,S.Germany…………

    When i try to install FSLINT :-

    sudo apt-get install fslint
    Reading package lists… Done
    Building dependency tree
    Reading state information… Done
    Package fslint is not available, but is referred to by another package.
    This may mean that the package is missing, has been obsoleted, or
    is only available from another source

    E: Package ‘fslint’ has no installation candidate

    Many thanks…….. Dhan
    DIASPORA
    https://despora.de/people/6d39a7e04a610132027a42cdb1fcde73