Product Taxonomy: A Complete Guide for Ecommerce Data
A product taxonomy is the tree of categories and attributes that turns raw product listings into comparable data. Here's how it works, which standard to use, and why it matters for ecommerce intelligence.
A product taxonomy is a hierarchical classification that groups products into categories and assigns each category its own set of attributes. In ecommerce, it is the tree that turns a pile of unstructured listings into a dataset you can query. Without one, "trail running shoes under $150" is a wish. With one, it is a SQL filter.
Every serious ecommerce operation runs on a taxonomy. Most teams do not call it that, and some do not realize they have one until it breaks. The interesting questions are which taxonomy to use, how deep to go, and how to actually classify real products against it without paying a human to do each one.
What is a product taxonomy?
A product taxonomy is a tree. Broad verticals sit at the top. Narrower categories branch out underneath. The leaves are the most specific level at which a product can be classified. Every category carries its own set of attributes, which are the properties that matter for things in that category and usually not for anything else.
Here is one typical path:
Apparel & Accessories
└── Clothing
└── Activewear
└── Activewear Pants
└── Joggers ← leaf categoryAt the Joggers leaf, the attached attributes include things like color, fit, fabric, closure, and inseam length. A sofa does not have an inseam. A USB cable does not have a fit. Attributes live at the category level because the properties that describe a product are not universal.
Two things separate a useful taxonomy from a useless one:
- Depth. Five or six levels of specificity is the minimum. Two levels ("Apparel → Clothing") tells you almost nothing about the product.
- Attributes per category. A flat list of 10,000 categories with no attached properties is not a taxonomy, it is a dictionary. The value comes from the link between
Joggersand the fields that describe one.
The categories are the obvious part when you first open a taxonomy file. The attributes are what make the thing actually work.
Why product taxonomy matters for ecommerce
Unstructured product data is useless for anything past rendering the page. The moment you want to do something comparative, or aggregated, or automated, you need a shared vocabulary that both machines and humans agree on.
Start with search. A shopper looking for "waterproof hiking boots, men's size 11, under $200" is filtering on a category (hiking boots), an attribute (waterproof), a size, and a price. Every one of those inputs needs a classified product on the other end.
Cross-seller comparison has the same requirement, sharper. Two listings on two different sites are only comparable if both resolve to the same category with the same attributes. Otherwise you are matching strings and hoping. This is where ecommerce intelligence lives or dies. "What is the average price of mid-tier trail running shoes in France?" is a taxonomy question. Without categories and attributes, there is no answer, just a pile of product pages.
Catalog enrichment cuts the same way. A merchant fills gaps in their own data by pulling from a structured source, but that only works if both catalogs use the same taxonomy or can be mapped to one.
Then there is the new one: AI agents. Agentic commerce platforms need structured product data to reason about what the user asked for. "Find me an alternative to this shoe" depends on the agent knowing that "this" is in the Trail Running Shoes category with specific attributes, so it can search within that category. A shopping agent without a shared taxonomy is a chatbot with opinions.
The underlying job is always the same. A product taxonomy turns descriptions into data.
The major product taxonomies
Four standards come up in any serious ecommerce conversation. They overlap, but they were built for different jobs and their shapes reflect that.
Google Product Taxonomy
Google publishes a product taxonomy used primarily for Google Shopping and the Merchant Center1. It runs a few levels deep with a small set of attributes per category. Its purpose is classifying merchant feeds for Google's ad and shopping surfaces, not powering a rich product database. If you sell on Google Shopping, you map to it. For anything broader, it is usually not deep enough on its own.
Amazon Browse Nodes
Amazon uses a proprietary classification system called browse nodes2. It is internal to Amazon, not published as a public open standard, and maps roughly to the category structure in Amazon's navigation. Useful if you sell on Amazon. Not portable to anywhere else.
Shopify Standard Product Taxonomy
Shopify publishes an open-source taxonomy covering 26 top-level verticals and more than 10,000 leaf categories, each with attached attributes and enumerated values3. The current release is v2026-02, and the whole thing lives on GitHub with a predictable update cadence. Its structure is deeper and more attribute-rich than Google's. Teams building product intelligence across multiple sites tend to pick it for that reason.
UNSPSC and GS1 GPC
The United Nations Standard Products and Services Code (UNSPSC) and GS1's Global Product Classification (GPC) are older, formal industry standards4. UNSPSC is used heavily in procurement and B2B contracting. GPC is tied to the GTIN ecosystem. Both are thorough. Both also feel like they were built for a different era than modern DTC ecommerce, because they were.
Which one should you use?
Whichever matches the channels your data will touch. If you sell on Google Shopping, you are going to map to Google's taxonomy at the feed boundary no matter what. Amazon sellers live with browse nodes. Procurement teams end up with UNSPSC because that is what their buyers want.
For anything that crosses sites or channels, an open, deep, attribute-rich standard is the right foundation. Most teams in that position pick Shopify's, or use it as the base and customize. The shape of your data matters more than the brand on the file.
Anatomy of a product taxonomy
It helps to see the full shape. A working taxonomy has three layers.
Categories (the tree itself):
Apparel & Accessories
└── Clothing
└── Activewear
└── Activewear Pants
└── JoggersEach node has a stable ID and a display name. Leaf categories are the level at which you actually classify a product.
Attributes (what describes a product in that category):
At the Joggers leaf, the attached attributes include:
color(primary color or pattern)fabric(the material, e.g., cotton, polyester, fleece)fit(relaxed, slim, athletic)closure(elastic, drawstring)inseam-length(short, regular, long)size(a size-like attribute, which typically nests inside a variant rather than creating a new one)
Attributes come in two flavors. Some are universal to a vertical and inherited (like color across all apparel). Some are specific to the category and only defined there (like inseam-length for pants).
Values (the enumerated options):
Each attribute has a set of valid values. color is not a free-text field, it is one of Beige, Black, Blue, Brown, Gray, Green, Multicolor, Orange, Pink, Purple, Red, White, Yellow. Enumerated values are what make the data comparable across sources. Free-text attributes are the enemy of cross-seller intelligence.
That three-layer structure, categories → attributes → values, is what separates a working taxonomy from a decorative one.
Classifying products against a taxonomy
Having a taxonomy is the easy part. The hard part is getting every product in your dataset mapped to the right leaf category, with the right attributes filled in, without paying a person to do it.
There are three approaches people actually use.
The first is manual classification: a team maps each product by hand. This is accurate if your team knows the taxonomy cold, and unworkable past a few thousand products.
The second is rule-based mapping. Regex and keyword rules push titles and descriptions into categories. Fast and cheap. Also brittle. A single merchant who calls joggers "sweatpants" breaks your rules, and across thousands of merchants you end up maintaining a giant pile of special cases that drift out of sync with the source sites.
The third is AI-assisted classification. A model reads the product page, picks the right leaf category, and pulls category-specific attributes out of the text and images. This is what works at scale across the open web, where product descriptions are inconsistent and multilingual and occasionally just bad. The tradeoff is cost. Running a model at query time, every time someone asks a question about a product, is expensive and slow. The economics work when classification happens upfront, at ingestion, so downstream queries hit structured data instead of invoking the model again.
This is what we built Enrich to do. After Extralt extracts raw product data from a page, Enrich runs AI once per capture. It classifies against Shopify's Standard Product Taxonomy, pinned to v2026-02, the same one shown in the visualizations above. It translates titles and descriptions to English when needed, and pulls attributes from whatever the page exposes. The output is a Variant: a product record with a leaf category and a filled-in attribute set that is directly comparable to variants from any other site. One credit per capture, no matter how many variants come out.
You do not need Enrich to use a product taxonomy. You do need something that classifies, and the real decision is whether to build and maintain that yourself.
Frequently asked questions
What is the difference between a category and an attribute?
A category is a position in the tree. Joggers is a category. An attribute is a property that describes a product within that category. Color and fabric are attributes on Joggers. Two products are in the same category if they share a leaf. They are comparable along attributes if those attributes are defined for their shared category.
How many categories should an ecommerce taxonomy have?
Enough depth to be useful, not so much that no human or model can place products accurately. Open, published standards land between a few thousand and more than 10,000 leaf categories. The right number for your team depends on the breadth of your catalog. A single-category shop does not need 10,000 leaves. A cross-category marketplace does.
Should I build my own taxonomy or use an existing one?
Use an existing one unless you have a specific reason not to. Building and maintaining a deep taxonomy with attributes is a significant ongoing cost, and the whole point of a taxonomy is shared vocabulary. If your data ever needs to be compared to anyone else's (competitors, partners, agents), a proprietary taxonomy is a liability.
What is the difference between Google's and Shopify's product taxonomy?
Google's is shallower, has fewer attributes, and is purpose-built for classifying merchant feeds into Google Shopping and ad surfaces. Shopify's is deeper and attribute-rich, with more than 10,000 leaf categories and structured attribute values per category. They are not interchangeable. Teams that need both usually keep a deep internal standard and map to Google's at the feed boundary.
How often does a product taxonomy need to update?
Ecommerce categories shift as new product types appear and old ones fade. Serious open standards cut versioned releases a few times a year. Pin to a specific version in your pipeline and upgrade deliberately, rather than tracking a moving head.
What is a product ontology, and how is it different from a taxonomy?
A taxonomy is a tree of categories with attributes. An ontology adds relationships between categories and products beyond "is a kind of." Ontologies describe how things relate semantically (a keyboard is an input device that connects to a computer). Most ecommerce systems do not need a full ontology. A well-designed taxonomy with attributes covers the practical work.
What to do with a taxonomy
Pick a standard that matches the channels your data will touch, and stick with it. Classify every product in your dataset to a leaf category, with attributes filled in at ingestion rather than later. Cache the results, because classification is expensive if you pay for it twice.
If you are building the pipeline yourself, start with whichever open standard fits, read the schema, and write the mapping code. If you would rather not, Enrich does it for you. Either way, the job is the same: raw product pages in, classified product data out. That is the precondition for everything downstream: competitive pricing analysis, competitive catalog work, and eventually agent-facing discovery when that market settles into a shape.
Want to see classified product data from your target sites? Sign up or read about how Extralt extracts and enriches ecommerce data.
Footnotes
-
Google, Google Product Taxonomy, Merchant Center Help. ↩
-
Amazon, Browse Tree Guides, Seller Central. ↩
-
Shopify, Standard Product Taxonomy v2026-02, GitHub. ↩