How to Validate a CSV File Before Importing
3 methods — browser tool, Python, and csvlint
Importing a CSV with formatting errors can corrupt your data, break your import job, or cause silent failures. Validating before you import takes 30 seconds and saves hours of cleanup. Here's how.
Method 1: Using Tabular (browser, instant feedback)
The easiest way — upload your file and get a human-readable validation report.
- 1Go to the CSV Validator tool on Tabular.
- 2Upload your CSV file.
- 3Review the validation report — Tabular checks for: inconsistent column counts, encoding issues, blank rows, malformed values, and structural problems.
- 4Fix any flagged issues (use Tabular's other tools to clean the file), then re-validate.
Run validation as the last step after cleaning — after removing duplicates, trimming whitespace, and fixing blank rows. A clean file should pass validation with no warnings.
Method 2: Using Python
Good for adding validation to an automated data pipeline.
- 1Install pandas: pip install pandas
- 2Run the script below to check for common CSV issues.
python
import pandas as pd
df = pd.read_csv("input.csv")
issues = []
# Check for completely empty rows
empty_rows = df.isna().all(axis=1).sum()
if empty_rows > 0:
issues.append(f"Found {empty_rows} completely empty rows")
# Check for duplicate rows
dupes = df.duplicated().sum()
if dupes > 0:
issues.append(f"Found {dupes} duplicate rows")
# Check for columns with all missing values
empty_cols = df.columns[df.isna().all()].tolist()
if empty_cols:
issues.append(f"Columns with no data: {empty_cols}")
# Check for inconsistent row lengths (pandas handles this during read)
# but you can manually check raw CSV:
with open("input.csv") as f:
lines = f.readlines()
col_counts = [len(line.split(",")) for line in lines]
if len(set(col_counts)) > 1:
issues.append("Inconsistent number of columns across rows")
if issues:
print("Validation issues found:")
for issue in issues:
print(f" - {issue}")
else:
print("No issues found — file looks clean!")Method 3: Using csvlint (command line)
A dedicated CSV linting tool for terminal users.
- 1Install csvlint: gem install csvlint (requires Ruby)
- 2Run: csvlint input.csv
- 3Review the output — csvlint checks for encoding, structure, and RFC 4180 compliance.
bash
csvlint input.csv
# Or validate against a schema
csvlint --schema=schema.json input.csvcsvlint follows RFC 4180, which is the official CSV specification. Some valid-looking CSVs use non-standard quoting or line endings that fail strict validation — this is usually fine to ignore for practical import purposes.
Frequently asked questions
What does CSV validation check for?
A thorough CSV validation checks: consistent column count across all rows, valid UTF-8 encoding, no malformed quoted fields, no null bytes, consistent line endings, a header row present, and optionally — expected column names, data types per column, and required fields being non-empty.
My CSV looks fine in Excel but fails on import. Why?
Common causes: encoding issues (Excel shows characters fine but the file is not valid UTF-8), hidden characters like null bytes or BOM markers at the start of the file, inconsistent quoting of fields containing commas, or Windows-style line endings (CRLF) that some importers don't handle. Tabular's validator catches all of these.
What is RFC 4180?
RFC 4180 is the informal standard that defines how CSV files should be formatted: comma-separated, optional double-quote field wrapping, CRLF line endings, and a header row in the first line. Most software produces RFC 4180-compliant CSVs, but not all — especially exports from older systems.
Should I validate before or after cleaning?
Both. Validate first to understand what problems exist, then clean (remove duplicates, trim whitespace, fix blank rows), then validate again to confirm the file is ready. Think of it as a before and after check.
Ready to try the fastest method?
Validate your CSV file and get a detailed report of formatting issues, encoding problems, and inconsistencies.
CSV Validator — free
Papiral
Tabular