Unit 3
Techniques of Semantic Analysis
We can any of the below two semantic analysis techniques depending on the type of
information you would like to obtain from the given data.
text classification model(which assigns predefined categories to text) text extractor (which pulls out particular information from the text).
Semantic Classification Models
Topic Classification
Based on the content, this model sorts the text into predefined categories. In a
company, Customer service teams may want to classify support tickets as they drop
into their help desk, and based on the category it will distribute the work.
With the help of semantic analysis, machine learning tools can recognize a ticket
either as a “Payment issue” or a“Shipping problem”.
Sentiment Analysis
In Sentiment analysis, our aim is to detect the emotions as positive, negative, or neutral in a text to denote urgency.
For Example, Tagging Twitter mentions by sentiment to get a sense of how customers feel about your product and can identify unhappy customers in real-time.Intentlassification
We can classify the text based on the new user
For Example, you could analyze the keywords in a bunch of tweets that have been
categorized as “negative” and detect which words or topics are mentioned most often.
Entity Extraction
The idea of entity extraction is to identify named entities in text, such as names of
people, companies, places, etc.
This might be useful for a customer service team to automatically extract names of
products, shipping numbers, emails, and any other relevant data from customer
support tickets.requirement.
You can these types of models to tag sales emails as either “Interested” or “Not
Interested” to proactively reach out to those users who may want to try your product.
Semantic Extraction Models
Keyword Extraction
It is used to find relevant words and expressions from a text
NLP – Word Sense Disambiguation
We understand that words have different meanings based on the context of its usage in the
sentence.
If we talk about human languages, then they are ambiguous too because many words can be
interpreted in multiple ways depending upon the context of their occurrence.
Word sense disambiguation, in natural language processing (NLP), may be defined as the
ability to determine which meaning of word is activated by the use of word in a particular
context.
Lexical ambiguity, syntactic or semantic, is one of the very first problem that any NLP system faces.
Part-of-speech (POS) taggers with high level of accuracy can solve Word’s syntactic ambiguity.
On the other hand, the problem of resolving semantic ambiguity is called WSD (word sense disambiguation).
Resolving semantic ambiguity is harder than resolving syntactic ambiguity.
For example, consider the two examples of the distinct sense that exist for the word “bass” −
I can hear bass sound.
He likes to eat grilled bass
Machine Translation:
Machine translation or MT is the most obvious application of WSD. In MT, Lexical choice for the words that have distinct translations for different senses,
is done by WSD.the senses in MT are represented as words in the target language. Most of the machine translation systems do not use explicit WSD modules.
Information Retrieval (IR):
Information retrieval (IR) may be defined as a software program that deals with the
organization, storage, retrieval and evaluation of information from document
repositories, particularly textual information.
The system basically assists users in finding the information they require but it does not
explicitly return the answers of the questions.
WSD is used to resolve the ambiguities of the queries provided to the IR system.
As like MT, current IR systems do not explicitly use WSD modules and they rely on
the concept that the user would type enough context in the query to only retrieve
relevant documents.TextMining and Information Extraction (IE):In most of the applications, WSD is necessary to do accurate analysis of text.For example, WSD helps intelligent gathering system to do flagging of the correctwords.For example, medical intelligent system might need flagging of “illegal drugs” rather than “medical drugs”.
Lexicography:
WSD and lexicography can work together in a loop because modern lexicography is
corpus based.With lexicography, WSD provides rough empirical sense groupings as well statistically significant contextual indicators of sense.
Lexical word net:
It is required to understand the intuition of words in different positions andhold the similarity between the words as well.WordNET is a lexical database of semantic relations between words in more than 200 languages.In the field of natural language processing, there are a variety of tasks such as automatic text classification, sentiment analysis, text summarization,etc.These tasks are partially based on the pattern of the sentence and the meaning of the words in a different context.The two different words may be similar with an amount of amplitude. For example, the words ‘jog’ and ‘run’, both of them are partially different and also partially similar to each other.to perform specific NLP-based tasks, it is required to understand the intuition of words in different positions and hold the similarity between the words as well
The Distinction Between WordNET and Thesaurus
Thesaurus: It’s a book of words or of information about a particular field
or set of concepts especially : a book of words and their synonyms. b : a
list of subject headings or descriptors usually with a cross-reference
system for use in the organization of a collection of documents for
reference and retrieval.
❖ Where thesaurus is helping us in finding the synonyms and antonyms of
the words the WordNET is helping us to do more than that.
❖ WordNET interlinks the specific sense of the words wherein thesaurus
links words by their meaning only.
the WordNET the words are semantically disambiguated if they are in
close proximity to each other.
❖ Thesaurus provides a level to the words in the network if the words have
similar meaning but in the case of WordNET, we get levels of words
according to their semantic relations which is a better way of grouping the
words.