1 Orientation Program
1.1 Introduction
1.2 First Timer chase
1.3 Product Introductions
1.4 Development environment
1.5 Project Introductions and Allocation
2 Internship Project
2.1 Improving WSO2 Support Knowledge Base Version 1.0
2.2 Releasing WSO2 Support Knowledge Base Version 2.0
2.3 Creating a conversational chat bot
2.4 Documentation Knowledge Base
2.5 Releasing WSO2 Support Knowledge Base Version 3.0
2.6 Introductory Sessions
3 Other Activities
3.1 Training Sessions
3.2 Recreational Activities
3.3 Inter-house badminton tournament
3.4 “Smart@ss” Quiz
3.5 Inter-floor basketball tournament
3.6 Tea time Championship
4 Conclusion
ACKNOWLEDGEMENTS
REFERENCES
1 Orientation Program
1.1 Introduction
We joined WSO2 as interns on 25th of
July 2016. In first 5 days there was an orientation program to be familiar with
WSO2 environment and several info sessions were conducted to make us aware of
various aspects on the company. On the very first day of the internship period,
we stepped into the main office of WSO2 in 20, Palm Grove, Colombo 03. We were
warmly welcome by Mr. Charitha Bandara, Senior HR Lead with Ms. Pramila Rajapaksha,
Director of HR & Admin. One week orientation program was conducted by Mr.
Charitha Bandara and it was a friendly discussion which resolves our doubts and
introduced WSO2 culture.
He explained how flat hierarchy is
maintained within the company and how each member is treated equally. They
emphasized that we should talk to each other with their first names, even for
the CEO of the company. In addition to that they briefly explained us about the
history of WSO2, the business model used in WSO2. In the company there was no
dress code we should wear, and we are free to wear casual outfits.
Mr. Charitha Bandara further explained
that office hours are flexible and there will be no deadlines for our work though
we have to be responsible for ourselves. Permanent employees working in WSO2
has the option of working from home, but interns will not get that chance. From
those instructions given in induction program we realized that WSO2 is a
pleasant place to work.
After the session concluded, we were
asked to collect the corporate laptops from Mr. Akila Basnayaka and we were
divided into the four houses in WSO2 (‘Cloud Bots’ ,’Legions’, ‘Wild Boars’ and
‘Titans’) and I was allocated to the house ‘Titans’.
1.2 First Timer chase
In third day we had to participate
quite interesting event called ‘First timer chase’. Intention of this event was
to make the interns familiar with WSO2 premises, meet the employees and
interact with them. That was quite fun and we were divided into the 4 houses of
WSO2 and further divided into groups of 3 people. Each team had to complete the
list of task given in a paper within a limited time duration. Since it was a
competition, all of us tried to complete them as quickly as possible.
1.3 Product Introductions
During the 4th and 5th days of the
orientation program a few technical sessions were held to introduce some of the
main products of WSO2 namely WSO2 Carbon, WSO2 Application Server, WSO2 API
Manager, WSO2 Enterprise Service Bus, WSO2 Identity Server, WSO2 Enterprise
Mobility Manager & IoT Server and WSO2 Data Analytics Server. These
sessions were conducted by mostly by the product team lead or a team member.
1.4 Development environment
After getting the laptops we were
asked to set up our laptops with the development environment and set up the MAC
address to be able to access the organization’s WIFI network.
Ubuntu - We were instructed to install
Ubuntu OS since that OS provide more flexibility to developers and has
thousands of free and open source development packages out there for Ubuntu.
Since I had the previous experience in working with Ubuntu it was easier for me
to get used to it.
Java - Since almost all of the WSO2
products are java applications, the default programming language used in WSO2
is none other than JAVA.
Eclipse and IntelliJ - Eclipse and the IntelliJ are 2
main java IDEs which are mostly used by the developers in WSO2.
1.5 Project Introductions and Allocation
On the first day of the second week
introduction sessions were held to all the available internship projects by the
mentors. Mr. Selvaratnam Uthaiyashankar asked us to send our top 5 project
preferences by the end of the day. After I sent the preferences I was assigned
to the project “Intelligent Support Bot on top of NLP based KB” which was
mentored by Dr. Srinath Perera (Vice President - Research), Mr. Samisa
Abeysinghe (Vice President - Delivery) and Mr. Nirmal Fernando (Associate
Technical Lead). We were also assigned to a direct mentor for the training
period and I was assigned to Mr. Samisa Abeysinghe (Vice President - Delivery).
2 Internship Project
The next day I and Madhawa (Madhawa
Vidanapathirana worked in the same project as me during the training period)
met Mr. Samisa Abeysinghe and he explained us the objectives of the project. He
asked us to read the documentation created by the previous two interns who
worked on the project “WSO2 Support Knowledge Base”. I went through the
documentation and got an idea about the current state of the system and I
created a milestone plan and a weekly plan for the project.
We identified the main 2 phases of
the project as improving the current system and developing an intelligent bot
on top of the improved knowledge base.
2.1 Improving WSO2 Support Knowledge Base Version 1.0
After discussing with our mentor we
identified 2 main areas to improve in the existing version of the Support KB
system.
1. Improving the summarization of the
results
2. Improving the solution classification
2.1.1
Improving the summarization
We implemented a sentence tagging
scheme to tag sentences to improve summarization. I undertook the following
tasks.
1. Implementing Tagger to introduce /
remove tags
2. Implementing an algorithm to untag
tagged sentences with escaping tags in original text
3. Implementing TagFilter to filter a
given type of tagged sentence
Sentence Tagging Scheme
Sentences in Issue Description and
Issue Comments of an issue are tagged as Plain Text, Question, Answer,
Ignorable, Code or Log.
This tagging scheme is based on the
classification obtained using the Stanford Classifier [2] trained separately
for QAI (question, answer and ignorables) and CLN (code, log and normal text)
tagging schemes.
Tagger class was implemented to
introduce / remove tags. TagFilter was implemented to filter a given type of
tagged sentence / get filtered, untagged text. Following tags are introduced
using this scheme.
Tag
|
Description
|
\qs
|
Question
Sentence start
|
\qe
|
Question
Sentence end
|
\as
|
Answer
Sentence start
|
\ae
|
Answer
Sentence end
|
\is
|
Ignorable
Sentence start
|
\ie
|
Ignorable
Sentence end
|
\cs
|
Code
Sentence start
|
\ce
|
Code
Sentence end
|
\ls
|
Log
Sentence start
|
\le
|
Log
Sentence end
|
Stanford Column Classifier
The Stanford Column Classifier is a
Machine Learner based Classifier developed by Stanford University and
distributed freely under GNU GPL Version 2.0 License for non-commercial use.
Given records of Single-Column/Multi-Column Label Tagged Data as training set,
the Stanford Classifier is capable of accurately guessing the label of an
arbitrary record with same number of columns. The column data-types should
match and could be either textual or numeric. The classifier should be
pre-trained using a training set and if the training set is to be changed,
classifier should be re-trained. However, the training process is very fast. [2]
A separate JAVA application was
developed to expose Stanford Classifier Features to the Support KB Python
Application. The two applications were interfaced using Py4J Framework. [3]
The Stanford Column Classifier was
extensively used in the Support Knowledge Base System.
fastText by Facebook
The fastText is a library for
efficient learning of word representations and sentence classification made by
Facebook Research. It provides a Text Classification system in C which is
similar in accuracy to Stanford Column Classifier. Additionally, fastText
claims to be faster than Stanford Classifier for Learning. [4]
I trained and tested the FB fastText
classifer using the same training set which we used for the Stanford classifier
and it gave the same results which is around 70% accuracy.
However, since we had already
started developments using Stanford Classifier and it provided sufficient
performance, the switch to fastText was not made.
2.1.2
Improving the solution classification
We started the next phase of improving
the Support KB; Improving the Solution Classifier. We made an initial design
for the Keyword Weighting scheme for the Solution Classifier. The scheme
gives weight to the keywords according to the place (title, questions in the
description, rest of the description and the answer comments) they matched and
the number of times they occurred. We decided to store the weights in two
places.
1. Summaries collection – each issue
will have a keyword list with weights
2. Keywords collection – the mapped
issues will have the total weight they obtain from the mapped keywords
The initial weighting scheme did not
show expected results. We changed the weighting scheme several times. The
changes will be documented. We are still in the process of finding a good
scheme.
I wrote the
AnswerCommentsKeywordExtractor class to extract keywords from the answer
comments of issues which calls the old java text classifier application twice
for each issue which made the KB generation very slow which became a
problem.
I researched on NLTK wordnet
lemmatizer to use instead of the porter stemmer to get the root words when
extracting keywords but seems it will not make much improvement. With the
wordnet lemmatizer it is not possible to find a common root word matched for
all related words. Eg: lemmatizer gives 'generalize', for 'generalized' and
'generalizing' when part of speech (POS) tag parameter is used but not for
'generalization'. I posted a question on StackOverflow [5] regarding this. I used
lemmatization [6] for keywords along
with the stemming. Since the wordnet lemmatizer does not recognize the treebank
POS (Part of Speech) tags given by nltk, I had to use a POS tag converter.
I researched on using SYN sets to
identify synonyms of the search keywords to improve the search accuracy. Then I
decided to use synonyms of the keywords extracted from the search text and and
include the synonyms in the keyword matching algorithm. I implemented the
SynonymFinder and integrated it with the SolutionGenerator for synonym matching
for search keywords. When the synonym matching is implemented, the system was
taking a long time to generate the solution. The reason was the SynonymFinder
is trying each combination of all synonyms of all keywords. Therefore I limited
the number of synsets used to 4. Since the wordnet synsets are ordered in most
common synonyms first it produced good results.
After improving the system we started making the system ready
for production.
2.2 Releasing WSO2 Support Knowledge Base Version 2.0
We started to prepare the current
system for deploying on production server. I started fixing the
Admin-Full-Answer view to comply with the new summarizing scheme. While doing
this I encountered a problem with the angular app. Angular application could
not get data of the result (full answer of ticket) passed when the page is
rendered by python app. As the solution I added a hidden field in the html and
added the issue id. Then from the angular app I used a jQuery call to get the
issue id and added a new route in python app to get jsonified data of the full
answer.
After preparing the Support KB code
for production deployment we tested the deployment on the staging server. For
deploying on the server the Support KB need to run in a WSGI [7] application. To test this I created a
test flask application [8] in a virtual
environment and deployed on apache server using mod_WSGI. [9]
Then I created a WSGI application
for the Support KB and configured the staging machine to run the Support KB
WSGI application. Few problems occurred while making this work on the server.
First the WSGI app was not working on the server. I tried to find out the
configuration error or misconfiguration for some time but could not find a
reason.
Then I reinstalled the apache on the
server and configured from the beginning. Then it was working. Then the search
of the Support KB was not working. After checking for the problem
Then I deployed the Support KB on
the production instance using the same procedures as the staging machine. We
setup the DNS with the existing domain name with the new IP address of the new
production instance and finished deploying the “WSO2 Support Knowledge Base
Version 2.0” and we asked the support-dev mail-group to try it out and give us
feedback.
2.3 Creating a conversational chat bot
2.3.1
Secondary Solution Generation and Classification
After the discussion with Mr. Samisa
we started working on the Secondary solution classification/generation which
will lead to the development of the BOT. The plan was to take top 5 solutions
for a search from the existing Support KB and identify the questions asked in
those tickets and create a mapping with the answers provided in the ticket
comments. Then rank the identified questions according to the relevance to the
initial search text and find the best answer/answers and compile a answer for
the search problem.
I started implementing the Question
Identifier which is the first part of the secondary solution generation. The
Question Identifier takes the top 5 solutions from the Support KB search
results and identify the question sentences of each ticket and returns a
questions list. A problem we faced here is there
are many types of tickets which are described in various ways and are answered
in various ways.
There are many types of tickets
which are described in various ways and are answered in various ways which
makes it hard to identify questions and create the mapping to answers. We
decided to identify several types of descriptions and implement Answer Matchers
for each type.
We identified 6 types of tickets to
which needed Answer Matchers for the implement question-answer mapping for the
Secondary Solution Generation.
1. Question list in description, QA
List in comments
2. Direct Question in descriptions
3. Short Question with a scenario
described
4. Short scenario description without
question
5. Several short questions with
scenarios
6. Several short scenario descriptions
without questions
I implemented the Answer Matcher for
the types 3 and 4. For the type 1 we can directly extract mapped questions and
answers (Madhawa implemented). For type 2, 3 and 4 the answer can be found in
comments (Can assume a reasonable answer in first comment) and a proper answer
should be compiled using all comments. But for the type 5 and 6 it is not
possible to identify proper questions and answers since there is no standard.
We had a meeting the mentors and we
discussed on the difficulty in extracting information from unstructured tickets
and presenting them properly and the difficulty of generating answers.
Conclusion was that we need a significant technological advancement to achieve
it.
Natural Language Answer Generation
which is needed to generate answer of BOT is a complex field which is still in
development. We were stuck with the project with currently available resources
of Natural Language Generation. Therefore we decided to research conversational
BOT building platforms.
2.3.2
Conversational Bot
There are two types of chat bots
- Functions based on a set of rules - responds only for specific commands.
- More advanced version uses machine learning
From the research I found that we can
build bots on platforms like Facebook Messenger and Slack. We can use services like wit.ai and howdy’s botkit (open source) to build a bot. I started making a test chat
bot on slack using Howdy’s Botkit. [10]
Slack bot was implemented using
nodejs. When a question is asked from this bot it retrieves results from the
support KB and answers using the solutions. Open source tool Howdy’s botkit was
used for bot creation.
Then the bot was integrated to the
slack platform. [11]
First I implemented an endpoint in
Support KB’s app.py to provide search results to the Bot. This endpoint returns
the top 5 search results for the search text (asked question) in JSON form.
Then I implemented bot function to
get the search results from the support KB and answer with comments of the top
search result. The answer provided was not user friendly. Therefore I improved
the presentation of the solution by giving the title and short description
first and then answer only with the first comment.
Then I implemented interactive
search narrow down using the prominent keywords of the top search results using
the support kb endpoint response which contains a mapping of prominent keyword
to issue IDs. After adding this the bot asks the user for a most relevant keyword
providing the list provided by the endpoint and according to the user reply the
bot will provide the narrowed down answer.
At this point the bot code was
somewhat complicated with plenty of callbacks. I searched a way to avoid this
situation. This kind of situation is called a ‘callback hell’. To avoid this we
have to shallow the code, modularize and handle errors. Therefore I re-factored
the code by shallowing it by adding functions and function calls.
‘Callback hell’ is a situation where
callbacks are nested within other callbacks several levels deep, potentially
making it difficult to understand and maintain the code when using asynchronous
JavaScript, or JavaScript that uses callbacks. This problem has occurred in our
bot code which is written in node JS. As a solution I shallowed the code by
adding functions. We need to make more refactoring to avoid this by using
PromisesJS [12] structure.
Therefore I started modifying the
code to use Promises, which is a good way to handle asynchronous calls. After
adding the promises to http get calls there was a problem when adding promises
to ask() method of the bot. There were several ways to add but it was not
working. Then I had to wrap the ask() methods in new functions with promises.
After adding promises to the
existing bot, I started implementing the feedback system for the bot. The
modification was to give the solution first, ask questions to get feedback on
relevance and completeness of the answer and then provide more solutions based
on the feedback.
Implementing the feedback system for
the bot was not successful due to premature conversation ending in bot due to a
race condition when asynchronous callback handling using promises. I found some
temporary solutions in the web but it was not successful. I tried removing
convo.next() methods until asynchronous call finishes, but it did not work.
2.4 Documentation Knowledge Base
After discussing with Mr. Samisa we
planned to start implementing a new Knowledge Base based on WSO2 documentations.
For this we first chose a single article of ESB documentation to extract HTML
source and parse it into a tree structure to extract knowledge. I started
implementing the WebPageTreeGenerator to generate hierarchical tree from the
web article.
First I implemented filtering a HTML
document from the tag id when parsing into the web page tree. Then I
implemented fetching HTML content from a web page when the URL is given.
Up to this point we were working on
a sample page of the documentation and extracting knowledge from that sample
page. Then I looked into a way to download the full WSO2 documentation to
extract knowledge for the Knowledge Base. I found a python framework named
Scrapy [13] which can be used to
implement web crawlers to extract knowledge from web pages. Using this
framework I implemented a web crawler named ‘DocScraper’ which can go into the URLs
of a given page recursively and extract the HTML content.
The main page of the WSO2
documentation [14] , https://docs.wso2.com/ does not have a ‘wiki-content’ div
which I used to extract HTML content. Therefore I could not use this as the
starting URL for the DocScraper since it extract URLs from the ‘wiki-content’
div. Therefore I started extracting from the ESB documentation.
Then I integrated the DocScraper
with the Database generator which Madhawa implemented. Then we generated the
Documentation knowledge base by running the application in the staging machine.
I implemented an ArticleGenerator to
generate the articles collection from the doc_pages collection of the database
which was generated by scraping the WSO2 Documentations using the DocScraper.
Then I modified it to generate keywords as well while generating the articles
collection. There was an error due to the binary files in the doc_pages
collection. Therefore I added a try catch block to handle non-HTML documents.
I added the app.py to the
documentation_kb with a route to display article with information when the
article Id is given. This is the basic bot for the documentation KB. I added
the front end HTML to show the article information. Since we stored the HTML
content of the articles I rendered the HTML to this view. Then I added access
to the parent nodes of the article and the sub articles which are title type. I
filtered other types of sub articles since their ‘content’ field is too long
and the actual sub articles we want to navigate are the title type sub articles.
The ArticleGenerator which generates
the Knowledge Base from the scraped HTML has missed some pages. I modified the
Scraper to start from the WSO2 documentation root page instead starting from
the ESB page. I added a separate parse method to the root page since it does
not have a ‘wiki-content’ div.
I added the check in doc_pages
collection for the doc_spider to make sure the page has not been generated
earlier even if the application is stopped and re-run. Then I added a new
collection to keep track of generated pages when generating articles to be able
to run generation from the place where stopped.
The search in the documentation
knowledge base goes out when the important keywords are not in the title
structure. The reason is current search only takes the title hierarchy for
keyword extraction. We have started implementing a secondary filter by getting
top results and checking the TFIDF [15] value for the
unmatched keywords in the content. I added a method to get unmatched keywords
when the best matching articles are given and a method to get best matching
article by checking TFIDF of the content.
I modified the secondary keyword
matching method to use TFIDF scores given from simple_tfidf calculation for the
unmatched keywords of the search text.
I implemented ArticleClassifier to
classify articles from the search question type. This was implemented after the
discussion to create a classification by the type of the question ('what',
'how', 'is', 'does', 'why', 'which') and classify matching articles
accordingly. We expect this classification will give an improvement to the
search since it will it will classify articles in a way which identifies what
kind of an answer the user needs.
I modified the ArticleClassifier to
pos_tag article topics and save the pattern in the database. Then I implemented
a training data set generator for the Stanford Classifier which uses the saved
pos_tag patterns from the database. Then I generated the article_type
collection by manually classifying the topics using the ArticleClassifier to
create the training set.
I added the generate_test_set method
to generate the test data set for the Stanford Classifier to classify article
topic types. Then I added update_with _types method to classify articles from
the articles collection using the trained Stanford Classifier.
Then I separated the keyword mapping
collection generation from article collection generation since it was too slow
and crashing while generation. We
isolated the article generation and ran it. Then I added keyword mapping
collection generation as a separate method.
Then I trained and tested the
accuracy of the topic type classifier. The classification accuracy was
averaged 80.7% for classifying articles as the question type (how, what, why,
does). Next step will be applying the topic type classification for the search.
I added the SearchQuestionClassifier
to get the question type of the search text. This takes the search text and
using the trained article_type classifier returns the relevant question type.
This is used in the smart_search algorithm to identify the user’s question
type.
Then I added the base layout
template for the frontend and then added front end templates for show_articles
page and search page with the javascript files.
Then I added the route in the app.py
to get the best results from the smart_search which are generated using a
combination of simple_search (which takes all the keywords from article parents
and content) and learned data from vote up function. Then I showed the best
articles from the smart_search on the search page using an ajax call to get the
data.
Then I added article links to the
search results. Next I will do some improvements to the frontend such as
correction for the links from documentation, adding topic hierarchy path of the
article and highlighting the first article.
The Web Application
The system had to be exposed to
external users over the internet and thus we had to expose the functionality
via a web app. The web app was expected to provide the capability to specify a
query or a set of keywords and retrieve concise view of articles from
documentation knowledge base or issues from support knowledge base, In addition
it was also expected to provide certain capabilities to the admins of the
system.
Since the system was implemented in
Python we used Flask framework which although is a microframework is powerful
and simple to use.
Although the web app was a simple
one, we paid special attention to the design of the interfaces as we wanted to
make it as simple and user-friendly as possible to ensure engineers could
identify what they want as soon as possible. The app had three main interfaces
as follows:
·
Search
Interface -
The Search or the Main Interface shown in Fig. 9.1 provides the user with a search box to enter the query or set of keywords and click on either the ‘Search Docs’, to search in the documentation KB or ‘Search Tickets’, to search in the Support KB.
The Search or the Main Interface shown in Fig. 9.1 provides the user with a search box to enter the query or set of keywords and click on either the ‘Search Docs’, to search in the documentation KB or ‘Search Tickets’, to search in the Support KB.
·
Documentation
Search Results Interface -
The Documentation Search Results interface which is displayed to all users displays the top solutions obtained from the intelligent search on documentation KB to the user once he specifies the set of keywords. For each displayed result a recommending button is given.
The Documentation Search Results interface which is displayed to all users displays the top solutions obtained from the intelligent search on documentation KB to the user once he specifies the set of keywords. For each displayed result a recommending button is given.
If
the user wants to know more about the article he can click on the topic of each
issue which would display the full article page rather than the concise view
given in the search results page and the link to the original source (WSO2
documentation page).
·
Tickets
Search Results Interface -
The Tickets Search Results interface which is displayed to all users displays the top solutions obtained from the intelligent search on support KB to the user once he specifies the set of keywords. For each displayed result the fields displayed are the issue summary, view count and short description. More detailed view can be viewed by clicking ‘view additional information’. Code/ log line blocks are collapsed and can be viewed by clicking ‘view code/log’. Question sentences and matched keywords are bolded for improve presentation. For each displayed result a recommending button is given.
The Tickets Search Results interface which is displayed to all users displays the top solutions obtained from the intelligent search on support KB to the user once he specifies the set of keywords. For each displayed result the fields displayed are the issue summary, view count and short description. More detailed view can be viewed by clicking ‘view additional information’. Code/ log line blocks are collapsed and can be viewed by clicking ‘view code/log’. Question sentences and matched keywords are bolded for improve presentation. For each displayed result a recommending button is given.
If
the user wants to view the answer he can click on a “View Solution” button for
each issue which would display to him the summarized answer and the link to the
original source (JIRA Ticket [16] ). Clicking on the
“View More” button also increments the view count for the issue.
·
Article/Issue
Interface -
The full article/ Issue page can be viewed by clicking on article topic or issue title.
The full article/ Issue page can be viewed by clicking on article topic or issue title.
·
Admin
Interface -
The admin interface accessible only to admins provides several capabilities to admins as described below:
The admin interface accessible only to admins provides several capabilities to admins as described below:
§ View Usage - As shown in Figure 2.6
it is possible to get an idea about the number of logins, searches performed
§ View most recent searches- To view
the most recent search queries searched in the knowledge base
§ View most recent votes - To view the
most recent recommendations done by the users.
§ View Most Viewed Issues - To get an
idea about the most popular topics that are searched in the knowledge base
§ Update Knowledge Base -
o
JIRA
- To specify a time period and update the knowledge base with Jira tickets from
that period and view the progress in updating.
o
Documentation
- This can be done in 3 steps.
·
Update
Knowledge Sources (extract the WSO2 documentation) - When the content of
documentation has been changed, it is required to re-scan the documentation and
re-generate Knowledge Base.
·
Update
Knowledge Base - Generate the articles for the knowledge base using the
extracted doc pages.
·
Keyword
Indexing - Keywords are indexed and keywords - articles mapping is created.
§ Manage Admins - To add/remove admins
2.5
Releasing WSO2 Support Knowledge Base Version
3.0
Having implemented the above, we
deployed the Support Knowledge Base Version 3.0 to get feedback from actual
users. The users with wso2 email address can access the deployed system via https://supportkb.wso2.com/. Initially, the system was only
available through private network of WSO2. But later on, it was made accessing
over internet as well, provided that user is authenticated by AppM Login
system.
We used Apache mod_wsgi to host the
web app considering the reliability and since Apache was used widely in the
company.
2.5.1 Support Knowledge Base Version 3.0
Requirement
In addition to the public Jira, all
paid customers of WSO2 who receive WSO2 Support Services are entitled to
individual Jira accounts through which they can raise the issues they identify
as Jira tickets. WSO2 engineers working in support then work on these tickets
and provide solutions to these issues using back and forth communication via
comments on the ticket. If the answers provided by WSO2 resolve the customer’s
query the ticket is marked as resolved/closed. Furthermore internal discussions
are carried out on the “support-dev” mailing group where engineers discuss
within themselves as to what the possible solutions for an issue could be.
Currently, there is no central database maintaining these information and when
a new issue was raised WSO2 engineers had to manually search through
documentations, past resolved/closed issues and mail threads among other
resources to identify if solutions already existed.
What the WSO2 Support Knowledge Base
was expected to deliver was an internal knowledge base that maintained past
resolved issues, in terms of Jira tickets, Documentation and Gmail threads* and
eventually other resources such as OSQA and Stack Overflow questions, and when
a user (a WSO2 Engineer) specifies the issues, either in natural language or as
a set of keywords, identify similar solved issues from the past and other
relevant resources and suggest the issues along with summarized answers built
either from the comments on the Jira ticket, replies to the Gmail thread* or
content from WSO2 Documentation, to the user as possible solutions to the
issue. The weights used for Term Frequency Calculation uses the criteria as
follow.
Advantage
The main advantage of the WSO2
Support Knowledge Base is that it would significantly reduce the time spent by
an engineer on identifying the solution for an issue if the issue had already
been resolved. Without the WSO2 Support Knowledge Base a user would manually
search the Jira system, Gmail threads and WSO2 Docs for possible solutions but
the WSO2 Support Knowledge Base would be a one stop solution that instantly
suggests similar issues by searching across all the resources. As this reduces
the time the engineer spends on identifying possible solutions this would leave
him with more time to focus on the actual solution.
Furthermore this would help ensure
that no possible similar issues are missed by human error thereby ensuring that
the support engineer does not reinvent the wheel or spend valuable time on an
already resolved issue.
These advantages of the WSO2 Support
Knowledge Base not only ensure that the support engineers use their valuable
time and efforts effectively but also help improve the efficiency and
effectiveness of the WSO2 Support System.
Architecture
High level architectural view of the
system is shown below in the Figure 2.8
Support Tickets Knowledge Base
The Support Tickets Knowledge Base
is a Knowledge Base on Support JIRA Tickets of WSO2. A user can search for similar
issues (to a current issue) that has occurred in the past using this system.
Knowledge Base Creation for Support
Tickets KB - The
TicketsKBGenerator extracts tickets from WSO2 Support JIRA System into the
Backend Database of KB through API exposed by JIRA System.
Afterwards, the sentences in Ticket
Description and Ticket Comments were classified and tagged as “Question”,
“Plain Text”, “Code”, “Log” and “Ignorable” using Stanford Classifier.
Additionally, comments of tickets were classified as “Question Comment”,
“Answer Comment”, and “Ignorable Comment” using Stanford Classifier.
Then, the tickets were further processed to identify Keywords and an
index of Keywords was generated. The index contains a Weighted Term Frequency
calculated for each keyword-ticket combination based on a customized version of
Term Frequency Inverse Document Frequency algorithm (TF-IDF) algorithm.
Solution Generation for Support
Tickets KB - The
Solution Generator of Support Tickets KB implements “Search Tickets”
Functionality of WSO2 Knowledge Base. This is basically a keyword based search
with enhancements based on user recommendations of search results (Continues
Learning System).
The solution generator on Support
Tickets first extracts keywords of search text using RAKE [17] algorithm. Afterwards, it uses Keywords
collection in database to identify matching Support Tickets for the Query.
Afterwards, the results were sorted based on total TF-IDF value calculated for
each search result. The top 10 results were chosen to be displayed to the user.
Afterwards, the Continuous Learning
System enhances points given to user recommended solutions. Then, the results
are re-sorted based on new scores.
Finally, the summarizer summarizes
search results based on Tags on Sentences of Description and Comments. Then,
the results are displayed to the User.
Documentation Knowledge Base
The Documentation Knowledge Base is
a Knowledge Base on official WSO2 Documentation available at https://docs.wso2.com. This KB contains sections of
documentation pages (Whole Pages, Headings and Subheadings of Pages,
Paragraphs) arranged in a single giant hierarchical tree structure. A user
could search for any WSO2 product related inquiry in this system and obtain the
relevant sections for the inquiry.
Knowledge Base Creation for
Documentation KB - The
Knowledge Base Creation on WSO2 Documentation is a lengthy process which is
subdivided into 3 processes explained below.
1. Documentation Extraction - Extract
Pages in WSO2 Docs using Scrapy Web Crawler. (refer DocumentationExtractor.py
for implementation)
2. Articles Generation - Divide
extracted pages into sections of headings and paragraphs (aka Articles in this
Document). Then, arrange the headings and paragraphs into a hierarchical
structure representing structure of documentation page. (Eg- paragraphs listed
under a heading are identified as sub-items of the heading. Sub-headings listed
under a heading are identified as sub-headings of the heading.) Then, this
hierarchical structure of the page is connected to the overall hierarchical
tree generated for WSO2 Doc by connecting with the appropriate parent page
node.
3. Keywords Generation - Keywords are
generated for each and every article recognized by Article Generator. These
keywords of an article are extracted from its own text and titles of its
ancestor headings. Afterwards, these keywords are indexed to a keywords
collection (known as article_keywords) with Term Frequency Values in a similar
manner to index of JIRA Tickets keywords in DB. Additionally, a separate index
by the name description_keywords is also generated in DB for headings. This
includes keywords from article_keywords with the addition of keywords extracted
from immediate child paragraphs/headings.
Solution Generation for
Documentation Knowledge Base - The Solution Generator of Documentation KB implements
“Search Docs” functionality of Support Knowledge Base. This is basically a
keyword based search with enhancements based on user recommendations of search
results (Continues Learning System).
The Solution Generator first
extracts keywords in user inquiry using RAKE. Afterwards, the keywords are
searched in “description_keywords” index and matching articles were obtained.
Afterwards, the articles were sorted based on total TF-IDF value calculated for
each article. Then, the top 10 solutions were selected and re-sorted based on
user recommendations using Continuous Learning System. Finally, the results
were displayed to the user.
2.6
Introductory Sessions
We discussed and planned
introductory sessions for support teams and made a script for the sessions
which includes hands on session for try the system with the current support
issues. We conducted 5 sessions for the Virtusa support team, ESB support team,
IS support team, APIM support team and permanent support team. The objective of
these sessions were to get the employees familiar and use the “Support
Knowledge Base” system.
Session Outline
1. Basic introduction to the system introducing
both documentation search and tickets search and the 2 knowledge bases behind
the 2 searches.
2. Introducing the learning feature we
implemented in the application so that users can recommend and promote the
solutions, through which the system will improve over time.
3. Asking to access the system and
doing a demonstration of how the system works. Showing some example questions
for both searches.
4. Explaining the scenarios which best
fits for each search. Users can use the 'Search Docs' to search a WSO2 product
related question.
Eg:
·
How
to configure ESB with Identity server?
·
What
are the system requirements for Identity Server?
·
How
to cluster ESB?
5. Users can use the 'Search Tickets'
to search a support questions related to bugs and issues in products.
Eg:
·
How
to control the hostname that gets set for accessing the WSDL in ESB?
·
How
to preserve header in SOAP content sent by ESB?
6. Asking to try out the system with
their current ticket issues.
7. Doing a Q & A session.
8. Taking feedback.
Feedback from the sessions
1. Extending the tickets based search
to include matches from all ticket comments.
2. Extending the tickets based search
to include sections identified as Logs or Code.
3. Extend the KB to Blogs, Articles,
White Papers, and Stack Overflow etc.
4. Extract knowledge from OSQA system
as well.
5. Bring all sources for Support
Engineers to one place.
6. Remove the limit of tickets shown as
solutions.
7. Automating Knowledge Base Update
8. Improve Performance of Doc KB Search
9. Add a Spell Checker for the Search.
10. Add Suggestions showing feature to
the Search
11. Use the Metadata Document on Blogs
3 Other Activities
3.1
Training Sessions
I participated in several training
sessions conducted for wso2 employees which were organized by the HR team and
done by senior employees. I participated in following sessions.
1. GIT beyond basics
2. Docker training
3. Server administration training
3.2
Recreational Activities
During the lunch time and tea time
we have several recreational activities we did like Table Tennis, Carrom, Pool,
Foosball and playing musical instruments. This was a great opportunity to get
to know employees who we could not meet during the work and have fun.
3.3
Inter-house badminton tournament
Inter house badminton tournament was
organized and we took part in it. My doubles partner was Mr. Jithendra from
Infrastructure team. It was a great chance to meet new people and have fun.
3.4
“Smart@ss” Quiz
This was a General Knowledge Quiz
organized by HR team for the employees. Each house had 2 teams. I was in the
‘Titans-2’ team.
3.5
Inter-floor basketball tournament
An inter-floor basketball tournament
was held and I took part in the 2nd floor team and we emerged as 1st runners up
after only losing the final to the 7th floor team. We got Pizza Hut vouchers as
gifts and our team went on a lunch out later.
3.6
Tea time Championship
A team time championship was
organized for Table Tennis, carom, foosball and pool. I took part in all
activities. I lost foosball and carom. In Table Tennis I got into semifinals in
men’s singles, men’s doubles and mix doubles but could not finish because the
internship period ended before the tournament concluded.
4 Conclusion
On 23rd of December 2016 we
concluded our projects and finished our internship at WSO2. We handed over our
office equipment to Mr. Dinesh. On this day we participated the WSO2 year end
party at Eagle Lakeside Hotel and we had a lot of fun.
We had our internship presentations
at WSO2 on 2nd March 2017 and after finishing all our presentations WSO2 took
us for a GO-Karting plus dinner outing. Pramila and Nirshan from the HR joined
us and we had so much fun racing.
We are very grateful the way WSO2
treat their interns in addition to the superb learning experience they provide.
A BIG THANKS to WSO2!
ACKNOWLEDGEMENTS
I would like to express my gratitude to everyone who has supported me to make my internship a success and to making it memorable experience in my life.
First of all I would like to thank Dr. Chathura R. De Silva, Head of the Department of Computer Science and Engineering, University of Moratuwa and Dr. Dilum Bandara, Industrial Training Coordinator of the Department of Computer Science and Engineering for the immense effort taken to provide us with best training establishments and for the commitment shown in making sure each and every student was selected by well-established Software Development Companies.
I would like to extend my gratitude to Mr. Nihal Wijeyewickrema the director of Industrial Training Division, University of Moratuwa and all the members of Industrial Training Division for their efforts to make our stay at these training establishments a pleasant one. Also I must thank all the members at National Apprentice and Industrial Training Authority (NAITA) for guiding us from the very beginning, and for the work done throughout the internship period to make it a success.
Also I am grateful to Dr. Sanjiva Weerawarana, Founder, Chairman and CEO of WSO2, for allowing me to be a part of WSO2 family and giving us an invaluable opportunity to learn world renowned technologies within a great working environment. I would like to thank Ms Pramila Rajapaksa, Director of Human Resources and Administration at WSO2, Mr. Samisa Abyesinghe, Vice President-Delivery, who was my mentor during the internship period. Also I would like to thank all my colleagues at WSO2 Support team for being there for me whenever I needed any guidance. Furthermore, I would like to thank each and every person at WSO2 Lanka (Pvt) Ltd who has helped me in various ways to have an awesome internship experience at WSO2.
Last but not least, I would take this opportunity to thank my fellow interns at WSO2 who were there with me, during my internship period, sharing joy as well as work. This experience would not be the same without your presence in it.
REFERENCES
[1]
|
"WSO2
Wikipedia," [Online]. Available: https://en.wikipedia.org/wiki/WSO2.
|
[2]
|
"Stanford
Column Data Classifier," [Online]. Available:
http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/classify/ ColumnDataClassifier.html.
|
[3]
|
"Py4J,"
[Online]. Available: https://www.py4j.org.
|
[4]
|
"Facebook Fast
Text Classifier," [Online]. Available:
https://github.com/facebookresearch/fastText.
|
[5]
|
"StackOverflow
Lemmatization," [Online]. Available:
http://stackoverflow.com/questions/39302880/getting-the-root-word-using-the-
wordnet-lemmatizer.
|
[6]
|
"Lemmatisation,"
[Online]. Available: https://en.wikipedia.org/wiki/Lemmatisation.
|
[7]
|
"WSGI,"
[Online]. Available:
https://en.wikipedia.org/wiki/Web_Server_Gateway_Interface.
|
[8]
|
"Flask,"
[Online]. Available: http://flask.pocoo.org.
|
[9]
|
"mod_wsgi,"
[Online]. Available: http://flask.pocoo.org/docs/0.12/deploying/mod_wsgi/.
|
[10]
|
"Howdy’s
botkit," [Online]. Available: https://github.com/howdyai/botkit).
|
[11]
|
"Slack Bot
Integration," [Online]. Available: https://my.slack.com/services/new/bot.
|
[12]
|
"PromisesJS,"
[Online]. Available: https://www.promisejs.org/.
|
[13]
|
"Scrapy,"
[Online]. Available: https://scrapy.org.
|
[14]
|
"WSO2
documentation," [Online]. Available: https://docs.wso2.com/.
|
[15]
|
"TFIDF,"
[Online]. Available: https://en.wikipedia.org/wiki/Tf%E2%80%93idf.
|
[16]
|
"JIRA,"
[Online]. Available: http://wso2.com/library/1146/.
|
[17]
|
"RAKE,"
[Online]. Available: https://github.com/zelandiya/RAKE-tutorial.
|