Two different URLs. The same content. To you they're obviously the same page, just reached by slightly different addresses. To Google, they're two separate pages competing for the same spot, splitting the credit between them, and forcing the search engine to guess which one you actually wanted people to find. Most of the time you never see this happening. You just notice, vaguely, that a page you thought should rank never quite does, and you can't work out why.
This is the duplicate content problem, and it's one of the most common and least understood issues in technical SEO. The fix is usually a single line of HTML called a canonical tag. But canonicals are easy to get subtly wrong, and a wrong one can do more damage than no tag at all. This guide explains what duplicate content actually is, what a canonical tag does, and how to use it without shooting yourself in the foot.
What is duplicate content?
Duplicate content is any substantive block of content that appears at more than one URL, either within your own site or across different sites. The important word is URL: from Google's perspective, a page is defined by its address, so the same content served at two addresses is two pages, even if a human would call them one.
Here's the part that surprises people: most duplicate content isn't created on purpose. It's a side effect of how websites and servers work. Common sources include:
- www vs non-www and http vs https versions of the same page, all resolving separately.
- URL parameters from filters, sorting, tracking, or sessions, like
?sort=priceor?utm_source=newsletter, each creating a distinct URL with identical content. - Trailing slash differences, where
/pageand/page/both load. - Printer-friendly or AMP versions of an article.
- Pagination and faceted navigation on category and product listings.
- Syndicated content, where your article is legitimately republished on another site.
- Product descriptions repeated across many similar product pages, or pulled from a manufacturer.
Is duplicate content a penalty? (No, but)
Let's clear up the biggest myth right away: there is no "duplicate content penalty" in the way people imagine. Google does not hand out manual penalties simply because the same content exists at two URLs. This is one of those persistent SEO myths worth dropping.
What actually happens is more subtle and, in its own way, just as costly:
- Google picks one version and ignores the rest. When it finds duplicates, it chooses a single URL to index and show in results. If it picks a different one than you intended, your preferred page effectively disappears.
- Ranking signals get split. Backlinks and authority that should consolidate on one strong page get scattered across several weak duplicates, so none of them ranks as well as a single consolidated page would.
- Crawl budget gets wasted. Google spends time crawling near-identical URLs instead of discovering your genuinely new content.
So it's not a penalty. It's a quiet dilution, and the symptom is usually a page that underperforms for no obvious reason. If you've ever dug into why Google is ignoring some of your pages, duplicate content and canonical confusion are often hiding behind the "Duplicate without user-selected canonical" reason in the indexing report.
What is a canonical tag?
The canonical tag is how you tell Google which version of a set of duplicate or near-duplicate pages is the "real" one, the one you want indexed and ranked. It's a single line placed in the <head> of a page:
<link rel="canonical" href="https://yourdomain.com/preferred-page" />
This tells search engines: "this content may appear at other URLs, but treat this address as the master copy." Google then consolidates the ranking signals from the duplicates onto the canonical URL and indexes that one.
Importantly, the canonical tag is a hint, not a directive. Google usually respects it, but it weighs it alongside other signals like internal links, sitemaps, and redirects, and can occasionally choose a different canonical if your signals contradict each other. That's why consistency across all your signals matters so much.
Self-referencing canonicals
A best practice that trips people up: every page should generally have a canonical tag, even pages with no duplicates. In that case the tag simply points to the page's own URL, a "self-referencing canonical." This removes ambiguity and protects against duplicates that get created later by stray parameters or tracking links.
Canonical tag vs 301 redirect vs noindex
Canonicals are one of three tools that deal with duplicate or unwanted pages, and choosing the wrong one is a common mistake. The difference comes down to whether you want both URLs to stay accessible.
Use a canonical tag when
Both URLs need to remain reachable, but only one should be indexed. Classic case: a product available in three colors at three URLs, all of which need to work for shoppers, but which you want consolidated into one indexed page. The duplicates stay live; the ranking signals consolidate.
Use a 301 redirect when
The old URL should genuinely go away and everyone should land on the new one. If you've permanently moved or merged a page and there's no reason for the old address to keep serving content, a 301 redirect is the right tool. It forwards both visitors and ranking signals and removes the duplicate entirely, which a canonical doesn't do.
Use noindex when
You want a page accessible to visitors but completely out of the index, and it isn't really a duplicate of anything. Note that noindex and canonical send mixed signals when combined on the same page, so as a rule, don't put both on one URL.
The canonical mistakes that quietly backfire
A wrong canonical can be worse than none, because you're actively telling Google to do the wrong thing. The common errors:
Canonicalizing to the wrong page
If page B canonicalizes to page A, you're telling Google "don't index B, index A instead." Do that by accident, for example pointing every paginated page back to page one, or pointing a unique page at an unrelated one, and you can deindex pages you wanted to keep. Double-check that each canonical points where you actually intend.
Blocking the canonical in robots.txt
For Google to honor a canonical, it has to be able to crawl the page and read the tag. If you block the duplicate in robots.txt, Google may never see the canonical pointing to your preferred version, so the consolidation never happens. This is the same crawl-versus-index trap that catches people with noindex: the instruction only works if Google can reach the page to read it.
Chains and conflicting signals
If A canonicalizes to B, and B canonicalizes to C, you've created a canonical chain that muddies the signal. Point duplicates directly at the final canonical. Likewise, make sure your canonical tags, internal links, sitemap entries, and any redirects all agree on which URL is the master. When they contradict each other, Google has to guess, and it may not guess your way.
Relative URLs and protocol mismatches
Always use the full absolute URL in a canonical tag, including the https:// and the correct www or non-www form. A relative URL or a mismatched protocol can point Google somewhere you didn't intend.
Mixed signals with noindex or pagination
Combining noindex with a canonical, or canonicalizing paginated pages incorrectly, sends Google contradictory instructions. Keep each page's signals clean and singular.
How to find duplicate content and canonical issues
- Google Search Console. The Page Indexing report flags pages excluded as "Duplicate without user-selected canonical" or "Alternate page with proper canonical tag," and URL Inspection shows you which canonical Google actually chose for any page, which may differ from the one you set.
- The site: search. Searching
site:yourdomain.comfor a snippet of content can reveal multiple URLs serving the same text. - Crawl your own site. A site crawler shows you every URL, its canonical tag, and where parameters or duplicates are multiplying pages.
- Check the obvious culprits. Confirm that www/non-www and http/https consistently resolve to one version, and look at how filters, sorting, and tracking parameters generate URLs.
Where Steterly fits in
Duplicate content and canonical problems belong to the same family as the other issues that quietly undermine a site: they're invisible to you while you browse, but plain to a search engine. A self-referencing canonical that got dropped during a redesign, a tracking parameter spawning hundreds of duplicate URLs, a canonical pointing at a page that 404s, none of these show up when you look at your site the normal way. They surface only when you crawl it the way Google does.
Steterly is a whole-site quality scanner that does exactly that, crawling your pages and surfacing the structural and on-page problems that erode rankings and trust. Alongside the deeper technical issues, it catches the everyday rot that compounds them: broken links that strand and orphan pages, missing or broken images, typos, outdated copyright years, leftover placeholder text, missing meta titles and descriptions, and Core Web Vitals issues. Running a scan after a migration or redesign, exactly when canonicals and redirects tend to break, is the fastest way to catch a problem before it costs you.
You can start with a free scan of up to 50 pages, no credit card required. Create a free account, run a scan, and get a clear, prioritized report of what's diluting or breaking your pages, so the version of each page you want Google to rank is the one that actually wins.
Frequently asked questions
Is duplicate content penalized by Google?
No. There is no duplicate content penalty in the sense of a manual punishment. What actually happens is that Google picks one version to index and ignores the others, and ranking signals get split across the duplicates instead of consolidating on one strong page. The result is diluted performance rather than a penalty, but it still costs you.
What does a canonical tag do?
A canonical tag tells search engines which URL is the master version among a set of duplicate or near-identical pages. Google then consolidates ranking signals onto that preferred URL and indexes it instead of the duplicates. It is placed in the head of the page as a link element with rel canonical pointing to the chosen address.
What is the difference between a canonical tag and a 301 redirect?
A canonical tag keeps both URLs accessible while telling Google which one to index, which suits cases like product variants that all need to work. A 301 redirect removes the old URL entirely and sends everyone, along with the ranking signals, to the new one. Use a canonical when both pages must stay live, and a 301 when the old page should go away.
Should every page have a canonical tag?
As a best practice, yes. Even a page with no duplicates benefits from a self-referencing canonical that points to its own URL, because it removes ambiguity and protects against duplicates created later by tracking parameters or stray links. Just make sure the tag points to the correct, full, absolute URL of the page itself.
Why is Google ignoring the canonical tag I set?
A canonical is a hint rather than a strict directive, so Google weighs it against your internal links, sitemap, and redirects, and can override it if those signals conflict. Common reasons it gets ignored include the duplicate being blocked in robots.txt so Google never reads the tag, canonical chains, or a canonical pointing somewhere that contradicts your other signals.
Can URL parameters cause duplicate content?
Yes, very commonly. Parameters from filtering, sorting, sessions, and tracking each create a distinct URL that often serves the same content as the clean version, multiplying duplicates without you realizing. A self-referencing canonical on the clean URL, consistent internal linking, and careful parameter handling are the usual ways to keep this under control.