Week 7:
Competency 7.1: Describe prominent areas of text
mining.
Unstructured text mining is an area which is seeing a sudden spurt
in adoptions for business applications. The spurt in adoption is triggered by
heightened awareness about text mining and the reduced price points at which
text mining tools are available today. Text mining is being applied to answer
business questions and to optimize day-to-day operational efficiencies as well
as improve long-term strategic decisions. The objective of this article is to
demystify the text mining process and examine its ROI by exploring practical real-world
instances where text mining has been successfully applied in three industries:
1. Automotive industry
(warranty management)
2. Health care industry
3. Credit card industry
Text Mining in the Automotive Industry
It’s been estimated that warranties cost automotive
companies more than $35 billion in the U.S. annually. Considering this tough
environment, it is imperative that auto companies explore all opportunities for
reducing costs. Optimizing warranty cost is a very important lever in the cost
equation for automobile manufacturers. If one is able to get even a marginal
improvement in money spent in warranty cost, it can have a multiplier effect on
the overall bottom line. One of the most underutilized dimensions of optimizing
warranty cost is input from service technicians’ comments. From those comments,
the text mining process can surface nuggets of component defect insights
yielding interventions for preventing them in future.
Text Mining in the Healthcare Industry
Most countries typically spend anywhere between 3-10% of
their GDP on healthcare. The healthcare industry is a huge spender on
technology and, with the proliferation of hospital management systems and
low-cost devices to log patient statistics, there is a sudden increase in the
breadth and depth of patient data. By mining the comments of doctors’ diagnosis
transcripts, outputs can yield information that benefits the healthcare
industry in numerous ways, such as:
1. Isolating the top 10 diseases by keyword
frequencies per region and leveraging the findings to optimize the mix of
tablets/medicines to stock on the limited outlet shelf, keeping in mind the
changes in frequency of disease related keywords.
2. Based on doctors’ comments, an early warning
system can be woven within text mining outputs to detect sudden changes to
“chatter” from doctors regarding specific diseases. For example, if the
frequency of the keyword lungs or breathing exceeds
45 appearances in the last 30 days for a given ZIP code or region, it can be a
clue to excessive environmental conditions which are resulting in respiratory
problems. A proactive intervention can be activated to remedy the situation.
The components of such a successful text mining
solution can be found in Figure 1 below.
Figure 1
Text Mining in the Credit Card Industry
With the proliferation of credit cards, companies need to do
the difficult balancing act of identifying which card features (i.e., line of
credit, billing cycle, outlet points and coverage) are resonating with
customers and, at the same time, minimize the number of defaults/recovery
related interventions. Text mining can help optimize both the collection
process as well as the customer experience optimization process.
1. A top ten complaint keyword watch list can be
generated by mining the inbound customer service rep (CSR) call transcripts on
a daily basis. From this, you can filter out keywords that were expressed by
high-value customers. For example, if the keyword billing error occurs for
customers with a credit limit over $200,000, then relationship managers can
call the customer and put interventions into the billing process to help
prevent reoccurrence.
2. Text mining can also be used to rate call
center staff performance. As an example, a large credit card company in the
U.S. had about 600 call center reps receiving inbound calls. Every rep was
expected to enter verbose comments to record the nature of the call, but not
all were entering detailed text. On one end of the spectrum, there were call
center representative entering an average 5 to 6 lines, whereas on the other
hand, there were a few who entered just 3 to 5 words. As a result, the
organization was missing out on valuable intelligence if only sparse text was
recorded. A text mining process was built which gave keyword frequency count by
call center representatives. The bottom decile had to undergo additional
training to ensure that they entered detailed text, which is valuable for the
credit card company. Please see figure 2 below.
Figure 2
In a diverse set of industries ranging from credit cards to
auto to healthcare and beyond, the text mining process is slowly being adopted
to mine gigabytes of unstructured data. In this tough economic environment, as
the pressure to optimize the efficiency of business processes increases, using
unstructured text mining techniques on previously ignored data such as comments
from technicians, doctors and call center representatives can provide
competitive differentiation. This competitive advantage can be in terms of
optimizing internal business processes and managing external customer-facing
experiences which, in turn, can have a multiplier effect on the overall bottom
line. As Marcel Proust said, “The real voyage of discovery consists not in
seeking new landscapes, but in having new eyes.” Unstructured data has always
been lying around, but never “discovered.” All it takes are “new eyes” within
the organization to look at the same unstructured data to gain new bottom-line
impacting insights.
Competency 7.2: Detail subareas of text mining
such as collaborative learning process analysis.
Data and Text Mining - overview
DM/TM is a technique that consists of applying data analysis
and discovery algorithms that, under acceptable computational efficiency
limitations produce a particular enumeration of patterns (or models) over the
data (Fayyad et al., 1996). Data mining has been directed to search patterns
from data set using methods such as neural networks, symbolic machine learning
algorithms, probabilistic reasoning, etc. In the symbolic algorithms field,
actually, there characteristic the incorporation of background knowledge
through labeled examples in unlabeled data set for future learner on unlabeled
data. There is not a pre-defined amount of labeled examples that should be
inserted in database, however, if one database contains a high number of
labeled examples more easy and correct will be its works. The semi-supervised
learning was chosen because of its flexibility and accuracy to use incorporated
knowledge (ideal state), represented by labeled examples in the data set, and
to classify the students’ performance, represented by unlabeled examples, in
collaborative process. For each realized classification, it is possible to know
its accuracy level and the used patterns for definition of the value. Another
reason is the ability to work with an undetermined amount of examples, but it
is important to provide a minimum quantity of data.
Competency 7.3: Use tools such as LightSIDE in a
very simple way to run a text classification experiment.
Training and evaluating newsgroup topic dataset predictive
model
The evaluation was configured to use 20 folds in the
cross-validation.
Evaluation metric:
Accuracy = 0.5796 ≈ 57.9%
Kappa = 0.4414 ≈ .44
Competency 7.4: Describe how models might be
used in Learning Analytics research, specifically for the problem of assessing
some reasons for attrition along the way in MOOCs.
This endeavor (text mining, collaborative learning process
analysis) holds the potential for enabling substantially improved on-line
instruction both by providing teachers and facilitators with reports about the
groups they are moderating and by triggering context sensitive collaborative
learning support on an as-needed basis.