Natural Language Possessing in AI

Natural Language Possessing we know that People communicate in many different ways: through speaking and listening, making gestures, using specialised hand signals (such as when driving or directing traffic), using sign languages for the deaf, or through various forms of text.

By text we mean words that are written or printed on a flat surface (paper, card, street signs and so on) or displayed on a screen or electronic device in order to be read by their intended recipient (or by whoever happens to be passing by)

Basic concepts

Tokenised text and pattern matching

One of the more basic operations that can be applied to a text is tokenising: breaking up a stream of characters into words, punctuation marks, numbers and other discrete items. So for example the character string

“Dr. Watson, Mr. Sherlock Holmes”, said Stamford, introducing us

can be tokenised as in the following example, where each token is enclosed in single quotation marks

‘”’ ‘Dr.’ ‘Watson’ ‘,’ ‘Mr.’ ‘Sherlock’ ‘Holmes’ ‘”’ ‘,’ ‘said’ ‘Stamford’ ‘,’ ‘introducing’ ‘us’ ‘.’

Parts of speech

A further stage in analysing text is to associate every token with a grammatical category or part of speech (POS). A number of different POS classifications have been developed within computational linguistics and we will see some examples in subsequent chapters. The following is a list of categories that are often encountered in general linguistics: you will be familiar with many of them already from learning the grammar of English or other languages, though some terms such as Determiner or Conjunction may be new to you

Noun fish, book, house, pen, procrastination, language
Proper noun John, France, Barack, Goldsmiths, Python
Verb loves, hates, studies, sleeps, thinks, is, has
Adjective grumpy, sleepy, happy, bashful
Adverb slowly, quickly, now, here, there
Pronoun I, you, he, she, we, us, it, they
Preposition in, on, at, by, around, with, without
Conjunction and, but, or, unless
Determiner the, a, an, some, many, few, 100

Constituent structure

You will have noticed several recurring patterns in the above examples: Det Noun, Prep Det Noun and so on. You may also have noticed that some types of phrase can occur in similar contexts: (John | the cat) sat, a Proper Noun or a sequence Det Noun can come before a Verb. Some of these possibilities can be captured using the pattern-matching notation introduced above, for example

(((the | a)(cat | dog))(John | Jack | Susan))(barked | slept)

Leave a Comment