Commonsense-cum-Linguistic constraint dataset

For all Language Technologies-related systems, there needs to be a general dataset of the following kind –

We need to build a commonsense-cum-logical-constraint-set – a dataset wherein we link every word (starting with common nouns) in the dictionary with a general type of a constraint word / phrase which is mostly likely to be around that word in a text in which it occurs i.e. somewhere in the sentence or paragraph or……so on….(or probably definitely in the write-up as a whole) – somewhere fairly around.

For example (common nouns) :

  • Smartest (or any superlative word) – Being superlative, there has to be the phrase ‘in something’ i.e. ‘in some set/pool’, around it.
  • Name – there has to be an entity around whose name is being talked about / mentioned.
  • Fund – there have to be some number and currency unit, around.

(This will have to be done manually for words in the dictionary; the project can be crowdsourced to school students over the web).

