Unit 3

Techniques of Semantic Analysis

We can any of the below two semantic analysis techniques depending on the type of

information you would like to obtain from the given data.

text classification model(which assigns predefined categories to text) text extractor (which pulls out particular information from the text).

Semantic Classification Models 

Topic Classification

Based on the content, this model sorts the text into predefined categories. In a

company, Customer service teams may want to classify support tickets as they drop

into their help desk, and based on the category it will distribute the work.

With the help of semantic analysis, machine learning tools can recognize a ticket

either as a “Payment issue” or a“Shipping problem”.

Sentiment Analysis

In Sentiment analysis, our aim is to detect the emotions as positive, negative, or neutral in a text to denote urgency.

For Example, Tagging Twitter mentions by sentiment to get a sense of how customers feel about your product and can identify unhappy customers in real-time.Intentlassification

We can classify the text based on the new user 

For Example, you could analyze the keywords in a bunch of tweets that have been

categorized as “negative” and detect which words or topics are mentioned most often.

Entity Extraction

The idea of entity extraction is to identify named entities in text, such as names of

people, companies, places, etc.

This might be useful for a customer service team to automatically extract names of

products, shipping numbers, emails, and any other relevant data from customer

support tickets.requirement.

You can these types of models to tag sales emails as either “Interested” or “Not

Interested” to proactively reach out to those users who may want to try your product.

Semantic Extraction Models

Keyword Extraction

It is used to find relevant words and expressions from a text 

NLP – Word Sense Disambiguation

We understand that words have different meanings based on the context of its usage in the


If we talk about human languages, then they are ambiguous too because many words can be

interpreted in multiple ways depending upon the context of their occurrence.

Word sense disambiguation, in natural language processing (NLP), may be defined as the

ability to determine which meaning of word is activated by the use of word in a particular


Lexical ambiguity, syntactic or semantic, is one of the very first problem that any NLP system faces.

Part-of-speech (POS) taggers with high level of accuracy can solve Word’s syntactic ambiguity.

On the other hand, the problem of resolving semantic ambiguity is called WSD (word sense disambiguation).

Resolving semantic ambiguity is harder than resolving syntactic ambiguity.

For example, consider the two examples of the distinct sense that exist for the word “bass” −

I can hear bass sound.

He likes to eat grilled bass 

Machine Translation:

Machine translation or MT is the most obvious application of WSD. In MT, Lexical choice for the words that have distinct translations for different senses,

is done by WSD.the senses in MT are represented as words in the target language. Most of the machine translation systems do not use explicit WSD modules.

Information Retrieval (IR):

Information retrieval (IR) may be defined as a software program that deals with the

organization, storage, retrieval and evaluation of information from document

repositories, particularly textual information.

 The system basically assists users in finding the information they require but it does not

explicitly return the answers of the questions.

 WSD is used to resolve the ambiguities of the queries provided to the IR system.

As like MT, current IR systems do not explicitly use WSD modules and they rely on

the concept that the user would type enough context in the query to only retrieve

relevant documents.TextMining and Information Extraction (IE):In most of the applications, WSD is necessary to do accurate analysis of text.For example, WSD helps intelligent gathering system to do flagging of the correctwords.For example, medical intelligent system might need flagging of “illegal drugs” rather than “medical drugs”.


WSD and lexicography can work together in a loop because modern lexicography is

corpus based.With lexicography, WSD provides rough empirical sense groupings as well statistically significant contextual indicators of sense.

Lexical  word net:

It is required to understand the intuition of words in different positions andhold the similarity between the words as well.WordNET is a lexical database of semantic relations between words in more than 200 languages.In the field of natural language processing, there are a variety of tasks such as automatic text classification, sentiment analysis, text summarization,etc.These tasks are partially based on the pattern of the sentence and the meaning of the words in a different context.The two different words may be similar with an amount of amplitude. For example, the words ‘jog’ and ‘run’, both of them are partially different and also partially similar to each other.to perform specific NLP-based tasks, it is required to understand the intuition of words in different positions and hold the similarity between the words as well

The Distinction Between WordNET and Thesaurus

Thesaurus: It’s a book of words or of information about a particular field

or set of concepts especially : a book of words and their synonyms. b : a

list of subject headings or descriptors usually with a cross-reference

system for use in the organization of a collection of documents for

reference and retrieval.

❖ Where thesaurus is helping us in finding the synonyms and antonyms of

the words the WordNET is helping us to do more than that.

❖ WordNET interlinks the specific sense of the words wherein thesaurus

links words by their meaning only.

the WordNET the words are semantically disambiguated if they are in

close proximity to each other.

❖ Thesaurus provides a level to the words in the network if the words have

similar meaning but in the case of WordNET, we get levels of words

according to their semantic relations which is a better way of grouping the