Text Mining and Analytics: Concepts, Techniques, and Applications

Chapter 7: Text Mining and Analytics

Understanding Text Mining and Analytics

IBM’s Research Challenge

IBM Research embarked on a journey to explore new ways for computer technology to impact science, business, and society, aiming to advance computer science while aligning with IBM’s business interests.

Defining Text Analytics

Text analytics encompasses a broad range of techniques, including information retrieval, information extraction, data mining, and Web mining, to extract meaningful insights from textual data.

Information Extraction in Text Mining

Information extraction involves identifying key phrases and relationships within text by searching for predefined patterns and sequences.

Natural Language Processing (NLP)

NLP, a subfield of artificial intelligence and computational linguistics, focuses on understanding and processing human language, converting it into computer-readable formats.

ECHELON Surveillance System

The ECHELON system is believed to be capable of intercepting and analyzing various forms of communication, including telephone calls, faxes, emails, and satellite transmissions.

Clustering Techniques

Query-Specific Clustering

This hierarchical clustering method organizes documents based on their relevance to a specific query, with the most relevant documents appearing in tightly knit clusters.

Popular Text Mining Software Tools

  • ClearForest: Text analysis and visualization tools
  • IBM SPSS Modeler: Data and text analytics toolkits
  • Megaputer Text Analyst: Semantic analysis, summarization, clustering, and retrieval
  • SAS Text Miner: Comprehensive text processing and analysis tools
  • KXEN Text Coder: Text analytics solution for structured representation
  • Statistica Text Mining: User-friendly text mining with visualization capabilities
  • VantagePoint: Interactive graphical views and analysis tools
  • WordStat: Analysis of textual information from open-ended questions and interviews
  • Clarabridge: End-to-end solutions for customer experience management

Sentiment Analysis

Alternative Names

Sentiment analysis is also known as opinion mining, subjectivity analysis, and appraisal extraction.

Sentiment Analysis Process

  1. Sentiment Detection: Distinguishing facts from opinions
  2. N-P Polarity Classification: Classifying opinions as positive, negative, or neutral
  3. Target Identification: Identifying the subject of the expressed sentiment
  4. Collection and Aggregation: Combining sentiment data points into a single measure

Speech Analytics: Linguistic Approach

The linguistic approach in speech analytics focuses on explicit sentiment indicators and the context of spoken content within audio data.