concepts

Sense coarsening

WordNet splits many words into very fine senses that are hard to tell apart — even for expert lexicographers. Coarsening groups those narrow senses into broader, more application-oriented concepts, and SenseBench can score every run at the fine level or under one of two coarse inventories. This page explains what coarsening is, how it changes WSD results, and how each inventory — Glite and CSI — works.

What is sense coarsening?

A coarsening is a map from WordNet 3.0 sense keys to broader concepts. Several WordNet senses of a word can map to one concept, so a prediction is coarse-correct when its WordNet sense lands in the same concept as a gold sense — even if it is not the exact fine sense. Coarsening matters for two reasons:

How coarsening affects WSD results

Examples

The four words below each have five or six WordNet senses. In every case Glite groups them exactly the way the consensus of five professional dictionaries (Merriam-Webster, Collins, Cambridge, Oxford Learner's, Longman) does. Within a Glite group, a fine-grained miss still counts as a coarse hit; crossing a group boundary is wrong at both levels. The right-hand note shows where CSI — a domain-based inventory — groups the senses differently.

party (noun) — 5 senses

WordNet senseGlite group = dictionary consensus
an organization to gain political powerpolitical party
a group of people gathered together for pleasurea party (social)
an occasion for social interaction and entertainmenta party (social)
a band of people associated temporarily in an activitysearch / working party
a person involved in legal proceedingslegal party

Glite and the dictionaries merge the social group and the social occasion into one everyday "party" and keep political, search-party, and legal apart. CSI groups them the same way here.

volume (noun) — 6 senses

WordNet senseGlite group = dictionary consensus
the amount of 3-D space occupied by an objectquantity / amount
a relative amountquantity / amount
a physical book (pages bound together)a book
a publication that is one of a seta book (in a set)
the property of being great in magnitude (bulk)bulk
the magnitude of soundloudness

Glite and the dictionaries merge the two amount senses and the two book senses. CSI instead pairs "3-D space" with "bulk" (both size) and splits the books apart.

care (verb) — 5 senses

WordNet senseGlite group = dictionary consensus
feel concern or interestcare about
be concerned withcare about
provide care forlook after / be in charge of
be in charge of, act on, or dispose oflook after / be in charge of
prefer or wish to do somethingcare to (wish)

Glite and the dictionaries pair "feel concern" with "be concerned with", and "provide care for" with "be in charge of". CSI instead lumps the "wish" sense in with the concern senses.

replacement (noun) — 6 senses

WordNet senseGlite group = dictionary consensus
the act of furnishing an equivalentthe act of replacing
an event in which one thing is substituted for anotherthe act of replacing
filling again by supplying what has been used upthe act of replacing
someone who takes the place of another (surrogate)the replacing person / thing
a person or thing that can take the place of anotherthe replacing person / thing
a person who follows next in order (successor)the replacing person / thing

Glite and the dictionaries make one clean cut — the act of replacing vs. the person or thing that does the replacing. CSI scatters the six senses across five domains.

These are typical: across 100 polysemous words, Glite reproduces the professional-dictionary grouping more closely than any other public inventory. Coarse inventories differ mainly in how aggressively and along which lines they merge — which is why SenseBench offers more than one and reports them side by side.

Glite coarse-grained

Glite is a learner-dictionary sense inventory built by Glite. Its concepts are the entries of the Glite Dictionary — a dictionary designed for English learners, where senses are grouped at the granularity a reader actually needs. Because the groupings come from a real, externally published dictionary rather than an automatic rule, Glite's coarsening is principled: across 100 polysemous words it reproduces the sense grouping of major professional dictionaries more closely than any other public inventory.

Learn more at the Glite Dictionary and glite.ai.

CSI coarse-grained (Lacerra 2020)

CSI — the Coarse Sense Inventory of Lacerra, Bevilacqua, Pasini & Navigli (AAAI 2020) — is an independent, third-party coarsening that groups every WordNet synset into one of 45 broad semantic domains (for example biology, law & crime, art, architecture & archaeology). It is included so that SenseBench's coarse results can be audited against an inventory that the dataset's authors do not control.