HomeGuidesHow to Remove Duplicate Rows from a CSV File
Guide

How to Remove Duplicate Rows from a CSV File

4 methods — browser tool, Excel, Google Sheets, and Python

Duplicate rows are one of the most common CSV problems. They inflate counts, skew analytics, and cause double-imports in CRMs and email tools. Here are four ways to remove them, from fastest to most flexible.

Method 1: Using Tabular (fastest, no software needed)

The quickest way — works in your browser, no Excel or code required.

  1. 1Go to the Remove Duplicate Rows tool on Tabular.
  2. 2Upload your CSV or XLSX file.
  3. 3Select which column to deduplicate on (e.g. 'Email', 'ID'). Leave blank to match on all columns.
  4. 4Click Run and download the cleaned file.

Tabular keeps the first occurrence of each duplicate and removes the rest. The row order of your original file is preserved.

Method 2: Using Excel

Built into Excel — good for one-off cleanups if you already have the file open.

  1. 1Open your CSV in Excel.
  2. 2Select the data range, or click any cell in your data.
  3. 3Go to the Data tab and click Remove Duplicates.
  4. 4Check the columns you want to deduplicate on, then click OK.
  5. 5Excel shows how many rows were removed. Save as CSV via File > Save As.

Excel's Remove Duplicates is case-insensitive — 'John' and 'JOHN' will be treated as the same value.

Method 3: Using Google Sheets

Works directly in your browser if your data is already in Google Sheets.

  1. 1Open your file in Google Sheets (File > Import if starting from a CSV).
  2. 2Select your data range.
  3. 3Go to Data > Data cleanup > Remove duplicates.
  4. 4Choose which columns to check, then click Remove duplicates.
  5. 5Download the result as CSV via File > Download > CSV.

Method 4: Using Python (pandas)

Best for large files or automated pipelines.

  1. 1Install pandas if you haven't: pip install pandas
  2. 2Run the script below, replacing the filename and column name as needed.

python

import pandas as pd

df = pd.read_csv("input.csv")

# Remove duplicates across all columns
df_clean = df.drop_duplicates()

# Or deduplicate on a specific column (e.g. email)
# df_clean = df.drop_duplicates(subset=["email"])

df_clean.to_csv("output.csv", index=False)
print(f"Removed {len(df) - len(df_clean)} duplicate rows")

By default, pandas keeps the first occurrence. Pass keep='last' to keep the most recent duplicate instead.

Frequently asked questions

Which occurrence is kept when removing duplicates?

By default, most tools (including Tabular, Excel, and pandas) keep the first occurrence and remove subsequent duplicates. In pandas you can change this with the keep parameter.

Can I deduplicate on just one column, not the whole row?

Yes. In Tabular, select the specific column (e.g. 'Email') when running the tool. In Excel, uncheck all columns except the one you want. In pandas, use drop_duplicates(subset=['column_name']).

How do I remove near-duplicates (slightly different values)?

Near-duplicate detection (fuzzy matching) is more complex and requires normalizing your data first. Use Tabular's Trim Whitespace and Normalize Casing tools to standardize values before deduplicating — this catches duplicates that differ only by spacing or capitalization.

What's the fastest way to remove duplicates from a large CSV?

For files under 50 MB, Tabular handles it in seconds in your browser. For very large files (hundreds of MB or more), the Python pandas approach is most efficient as it streams data without loading everything into memory at once.

Ready to try the fastest method?

Instantly remove duplicate rows from any CSV or spreadsheet. Keep your data clean and analysis-ready.

Remove Duplicates — free